Hugo Huang
on 29 November 2023
When OpenAI released ChatGPT on November 30, 2022, no one could have anticipated that the following 6 months would usher in a dizzying transformation for human society with the arrival of a new generation of artificial intelligence. Since the emergence of deep learning in the early 2010s, artificial intelligence has entered its third wave of development. The introduction of the Transformer algorithm in 2017 propelled deep learning into the era of large models. OpenAI established the GPT family based on the Decoder part of the Transformer.
ChatGPT quickly gained global popularity, astonishing people with its ability to engage in coherent and deep conversations, while also revealing capabilities such as reasoning and logical thinking that reflect intelligence. Alongside the continuous development of AI pre-training with large models, ongoing innovation in Artificial Intelligence Generated Content (Generative AI) algorithms, and the increasing mainstream adoption of multimodal AI, Generative AI technologies represented by ChatGPT accelerated as the latest direction in AI development. This acceleration is driving the next era of significant growth and prosperity in AI, poised to have a profound impact on economic and social development. CEOs may find detailed advice for adopting Gen AI in my recently published article in Harvard Business Review – What CEOs Need to Know About the Costs of Adopting GenAI.
Definition and Background of Generative AI Technology
Generative AI refers to the production of content through artificial intelligence technology. It involves training models to generate new content that resembles the training data. In contrast to traditional AI, which mainly focuses on recognizing and predicting patterns in existing data, Generative AI emphasizes creating new, creative data. Its key principle lies in learning and understanding the distribution of data, leading to the generation of new data with similar features. This technology finds applications in various domains such as images, text, audio, and video. Among these applications, ChatGPT stands out as a notable example. ChatGPT, a chatbot application developed by OpenAI based on the GPT-3.5 model, gained massive popularity. Within just two months of its release, it garnered over 100 million monthly active users, surpassing the growth rates of all historical consumer internet applications. Generative AI technologies, represented by large language models and image generation models, have become platform-level technologies for the new generation of artificial intelligence, contributing to a leap in value across different industries.
The explosion of Generative AI owes much to developments in three AI technology domains: generative algorithms, pre-training models, and multimodal technologies.
Generative Algorithms: With the constant innovation in generative algorithms, AI is now capable of generating various types of content, including text, code, images, speech, and more. Generative AI marks a transition from Analytical AI, which focuses on analyzing, judging, and predicting existing data patterns, to Generative AI, which deduces and creates entirely new content based on learned data.
Pre-training Models: Pre-training models, or large models, have significantly transformed the capabilities of Generative AI technology. Unlike the past where researchers had to train AI models separately for each task, pre-training large models have generalized Generative AI models and elevated their industrial applications. These large models have strong language understanding and content generation capabilities.
Multimodal AI Technology: Multimodal technology enables Generative AI models to generate content across various modalities, such as converting text into images or videos. This enhances the versatility of Generative AI models.
Foundational technologies of Generative AI
Generative Adversarial Networks (GANs): GANs, introduced in 2014 by Ian Goodfellow and his team, are a form of generative model. They consist of two components: the Generator and the Discriminator. The Generator creates new data, while the Discriminator assesses the similarity between the generated data and real data. Through iterative training, the Generator becomes adept at producing increasingly realistic data.
Variational Autoencoders (VAEs): VAEs are a probabilistic generative method. They leverage an Encoder and a Decoder to generate data. The Encoder maps input data to a distribution in a latent space, while the Decoder samples data from this distribution and generates new data.
Recurrent Neural Networks (RNNs): RNNs are neural network architectures designed for sequential data processing. They possess memory capabilities to capture temporal information within sequences. In generative AI, RNNs find utility in generating sequences such as text and music.
Transformer Models: The Transformer architecture relies on a Self-Attention mechanism and has achieved significant breakthroughs in natural language processing. It’s applicable in generative tasks, such as text generation and machine translation.
Applications and Use Cases of Generative AI
Text Generation
Natural language generation is a key application of Generative AI, capable of producing lifelike natural language text. Generative AI can compose articles, stories, poetry, and more, offering new creative avenues for writers and content creators. Moreover, it can enhance intelligent conversation systems, elevating the interaction experience between users and AI.
ChatGPT (short for Chat Generative Pre-trained Transformer) is an AI chatbot developed by OpenAI, introduced in November 2022. It employs a large-scale language model based on the GPT-3.5 architecture and has been trained using reinforcement learning. Currently, ChatGPT engages in text-based interactions and can perform various tasks, including automated text generation, question answering, and summarization.
Image Generation
Image generation stands as one of the most prevalent applications within Generative AI. Stability AI has unveiled the Stable Diffusion model, significantly reducing the technical barriers for AI-generated art through open-source rapid iteration. Consumers can subscribe to their product DreamStudio to input text prompts and generate artworks. This product has attracted over a million users across 50+ countries worldwide.
Audio-Visual Creation and Generation
Generative AI finds use in speech synthesis, generating realistic speech. For instance, generative models can create lifelike speech by learning human speech characteristics, suitable for virtual assistants, voice translation, and more. AIGC is also applicable to music generation. Generative AI can compose new music pieces based on given styles and melodies, inspiring musicians with fresh creative ideas. This technology aids musicians in effectively exploring combinations of music styles and elements, suitable for music composition and advertising music.
Film and Gaming
Generative AI can produce virtual characters, scenes, and animations, enriching creative possibilities in film and game production. Additionally, AI can generate personalized storylines and gaming experiences based on user preferences and behaviors.
Scientific Research and Innovation
Generative AI can explore new theories and experimental methods in fields like chemistry, biology, and physics, aiding scientists in discovering new knowledge. Additionally, it can accelerate technological innovation and development in domains like drug design and materials science.
Code Generation Domain
Having been trained on natural language and billions of lines of code, certain generative AI models are proficient in multiple programming languages, including Python, JavaScript, Go, Perl, PHP, Ruby, and more. They can generate corresponding code based on natural language instructions.
GitHub Copilot, a collaboration between GitHub and OpenAI, is an AI code generation tool. It provides code suggestions based on naming or contextual code editing. It has been trained on billions of lines of code from publicly available repositories on GitHub, supporting most programming languages.
Content Understanding and Analysis
Bloomberg recently released a large language model (LLM) named BloombergGPT tailored for the financial sector. Similar to ChatGPT, it employs Transformer models and large-scale pre-training techniques for natural language processing, boasting 500 billion parameters. BloombergGPT’s pre-training dataset mainly comprises news and financial data from Bloomberg, constructing a dataset with 363 billion labels, supporting various financial industry tasks.
BloombergGPT aims to enhance users’ understanding and analysis of financial data and news. It generates finance-related natural language text based on user inputs, such as news summaries, market analyses, and investment recommendations. Its applications span financial analysis, investment consulting, asset management, and more. For instance, in asset management, it can predict future stock prices and trading volumes based on historical data and market conditions, providing investment recommendations and decision support for fund managers. In financial news, BloombergGPT automatically generates news summaries and analytical reports based on market data and events, delivering timely and accurate financial information.
AI Agents
In April 2023, an open-source project named AutoGPT was released on GitHub. As of April 16, 2023, the project has garnered over 70K stars. AutoGPT is powered by GPT-4 and is capable of autonomously achieving any user-defined goals. When presented with a task, AutoGPT autonomously analyzes the problem, proposes an execution plan, and carries it out until the user’s requirements are met.
Apart from standalone AI Agents, there’s the possibility of a ‘Virtual AI Society’ composed of multiple AI agents. GenerativeAgents, as explored in a paper titled “GenerativeAgents: Interactive Simulacra of Human Behavior” by Stanford University and Google, successfully constructed a ‘virtual town’ where 25 intelligent agents coexist.
Leading business consulting firms predict that by 2030, the generative AI market size will reach $110 billion USD.
Operations of Gen AI
Operating GenAI involves a comprehensive approach that encompasses the entire lifecycle of GenAI models, from development to deployment and ongoing maintenance. It encompasses various aspects, including data management, model training and optimization, model deployment and monitoring, and continuous improvement. GenAI MLOps is an essential practice for ensuring the success of GenAI projects. By adopting MLOps practices, organizations can improve the reliability, scalability, maintainability, and time-to-market of their GenAI models.
Canonical’s MLOps presents a comprehensive open-source solution, seamlessly integrating tools like Charmed Kubeflow, Charmed MLFlow, and Charmed Spark. This approach liberates professionals from grappling with tool compatibility issues, allowing them to concentrate on modeling. Charmed Kubeflow serves as the core of an expanding ecosystem, collaborating with other tools tailored to individual user requirements and validated across diverse platforms, including any CNCF-compliant K8s distribution and various cloud environments. Orchestrated through Juju, an open-source software operator, Charmed Kubeflow facilitates deployment, integration, and lifecycle management of applications at any scale and on any infrastructure. Professionals can selectively deploy necessary components from the bundle, reflecting the composability of Canonical’s MLOps tooling—an essential aspect when implementing machine learning in diverse environments. For instance, while Kubeflow comprises approximately 30 components, deploying just three— Isto, Seldon, and MicroK8s—suffices when operating at the edge due to distinct requirements for edge and scalable operations.