What do you mean "AI"?
AI stands for artificial intelligence, the science and engineering of making machines that can perform tasks that usually require human intelligence, such as reasoning, learning, and creating. The most important takeaway in this article is that you get out what you put in - The more you understand how AI works, how to prompt it, and how to build, automate and maintain it - The more benefits you will see.
Did you know that the term AI was coined in the 1950's?
AI has a long and fascinating history, divided into three main stages:
Now let me tell you a story...
Let's take a trip back to the 1950s and 60s...
Imagine you are a scientist in the 1950s and 60s, and you dream of making a machine that can think like a human. You have access to a HUGE computer that takes up an entire room, and you spend countless hours coding it with rules and logic to play tic-tac-toe. You make a program that can beat you in the game and start to feel a little worried. You also feel like a genius and a pioneer, thinking you have just created artificial intelligence. You call your colleagues and show them your fantastic invention.
They point out that tic-tac-toe is a simple game with limited moves and that your program needs to think but just follow a fixed set of instructions. You realize you have a long way to go…
Fast forward to the 1980s, and you are a computer scientist with a different approach. You want to make a machine that can learn from data instead of following fixed rules. You have access to a more miniature, faster computer that can process large amounts of information, and you spend countless hours training it to recognize handwritten digits. You develop a program that can classify the digits correctly after seeing thousands of examples and adjusting its parameters. You feel like a master and a visionary, thinking you have just unlocked the power of machine learning. You call your colleagues and show them your amazing invention. They are impressed, but only for a short time. They point out that handwritten digits are a simple problem with a limited number of classes and that your program is not really learning but just memorizing patterns. You realize you still have a long way to go…
Finally, jump to the 2010s, and you are working on a new project. You want to make a machine that can generate new and original content. You have access to a robust and sophisticated computer that can model complex and nonlinear relationships, and you spend countless hours making it create realistic images of faces. After learning from millions of photos and using billions of parameters, you finally manage to make a program that can produce high-resolution and diverse faces that look like real people. You feel like a wizard and a legend, thinking you have just witnessed the magic of deep learning. You call your colleagues and show them your fantastic invention. They point out that faces are a complex problem with a high degree of variation and that your program could generate more than just remixing features.
But you don’t give up. You keep working on your dream, and you keep improving your machine. You make it play more challenging games, such as chess and Go. You make it recognize more complex objects, such as animals and cars. You make it generate more diverse content, such as music and speech. You make it do more amazing things, such as translating languages and writing stories. You make it think more like a human, or maybe even better?
And you wonder: what will you make it do next?
While the above points are all the main ones, it's important for you to know that there were many other astonishing breakthroughs that got us to our current GenerativeAI models, including revolutionary ideas that brought us General Adversarial Networks.
This is the story of how AI evolved from artificial intelligence to machine learning to deep learning and how it enabled the creation of generative AI, which is the topic of this article. If you are interested and intrigued by this story and want to learn more about how generative AI works and what it can do, keep reading and join us on this journey.
How GenerativeAI Works
GenerativeAI is a branch of artificial intelligence that focuses on creating new and original content, such as images, text, music, and code, based on some input or data. Generative AI is one of the most exciting and rapidly evolving fields of AI, as it has the potential to unleash human creativity and innovation in unprecedented ways. Nevertheless, how does generative AI work, and what are the techniques and tools behind it? This article will explain generative AI's basic concepts and techniques, such as deep learning, neural networks, foundation models, and generative pre-trained transformers. We will also provide some examples of generative AI tools and models like ChatGPT, DALL-E, and StyleGAN.
What is Deep Learning?
Deep learning is a subset of machine learning and artificial intelligence. Machine learning teaches computers to learn from data and perform tasks without explicitly programming them. Deep learning is a more advanced and powerful form of machine learning, which uses multiple layers of artificial neurons, called neural networks, to learn from large and complex data sets. Neural networks are inspired by the structure and function of the human brain, which consists of billions of interconnected neurons that process and transmit information.
A neural network comprises three (or more) layers: an input layer, one or more hidden layers, and an output layer. The input layer receives the data, such as an image, a text, or a sound, and converts it into a numerical format that the network can understand. The hidden layers perform various computations and transformations on the data, such as detecting edges, shapes, colors, words, or patterns and extracting the relevant features and patterns for the task. The output layer produces the final result, such as a label, a prediction, a classification, or a generation, based on the data and the learned features and patterns.
A neural network learns by adjusting the weights and biases of its connections, which determine how much each neuron influences the next one. The weights and biases are initially set randomly and then updated through a process called backpropagation, which compares the network's output with the desired output and calculates the error or the difference between them. The error is then propagated backward through the network, and the weights and biases are adjusted accordingly to minimize the error and improve the network's performance. This process is repeated for many iterations or epochs until the network converges to an optimal state, where the error is minimized and the output is accurate.
What are Foundation Models?
Foundation models are large and powerful neural networks that are pre-trained on massive amounts of data, such as text, images, or audio, and can perform a wide range of tasks across different domains and modalities, such as natural language processing, computer vision, speech recognition, and generation. Foundation models, also called self-supervised learning models, learn from the data without requiring human labels or annotations. Foundation models leverage the inherent structure and patterns in the data, such as the syntax and semantics of the language, the shapes and colors of images, or the frequencies and amplitudes of sounds, to learn general and transferable representations of the data, which can then be fine-tuned or adapted for specific tasks or applications.
Foundation models are the backbone of generative AI, as they enable the creation of new and original content based on some input or data. Foundation models can generate content by sampling from their learned representations or by conditioning on some input, such as a prompt, a keyword, a caption, or a sketch. Using a common representation or a shared encoder-decoder architecture, foundation models can generate content across different modalities, such as text-to-image, image-to-text, text-to-speech, speech-to-text, or image-to-image.
Some examples of foundation models are:
BERT: A foundation model for natural language processing for Bidirectional Encoder Representations from Transformers. BERT is pre-trained on a large corpus of text, such as Wikipedia and books, and learns bidirectional representations of words and sentences, which capture both the left and the proper context. BERT can be fine-tuned for various natural language processing tasks, such as question answering, sentiment analysis, text summarization, and text generation.
GPT: You've already heard of Copilot and ChatGPT; ChatGPT is powered by OpenAI’s GPT-4, which has 170 trillion parameters and was trained on a 45 GB dataset. Foundational models for natural language processing for Generative Pretrained Transformers. Which just means you can talk to each other back and forth! :) GPT is pre-trained on a large corpus of text, such as the Web, and learns unidirectional representations of words and sentences, which capture only the left context. GPT can generate coherent and fluent text based on some input or prompt, such as a topic, a keyword, a sentence, or a paragraph. GPT can also perform various natural language processing tasks, such as text classification, text completion, and text translation, using a unique token or a prefix to indicate the task.
ResNet: A foundation model for computer vision for Residual Networks. ResNet is pre-trained on a large dataset of images, such as ImageNet, and learns deep and residual representations of images, which capture the high-level and low-level features and patterns. ResNet can be fine-tuned for various computer vision tasks, such as image classification, object detection, face recognition, and image generation.
WaveNet: A foundation model for speech processing and generation for Waveform-based Neural Network. WaveNet is pre-trained on a large audio dataset, such as speech and music, and learns probabilistic and autoregressive representations of audio, which capture the temporal and spectral features and patterns. WaveNet can generate realistic and high-quality audio based on some input or condition, such as a text, a speaker, or a genre. Using a shared encoder-decoder architecture, WaveNet can also perform various speech-processing tasks, such as speech recognition, synthesis, and translation.
Generative Pretrained Transformers
Generative pre-trained transformers are a foundation model used in transformer architecture. This network architecture uses attention mechanisms to learn the relationships and dependencies between the data elements, such as words, pixels, or sounds. Attention mechanisms allow the network to focus on the most relevant and essential parts of the data and to encode and decode the data in parallel rather than sequentially, improving the network's efficiency and performance. Generative pre-trained transformers are pre-trained on large and diverse datasets of text, images, or audio. They can generate new and original content based on some input or condition, such as a prompt, a keyword, a caption, or a sketch.
ChatGPT: A powerful language model that can generate realistic and coherent text for various purposes, such as chatting, writing, and translating. ChatGPT is based on GPT-4, the largest generative pre-trained transformer with 170 trillion parameters. ChatGPT can learn from any text data, such as the Web, books, news, and social media, and can adapt to different tasks by using special tokens or prefixes.
DALL-E: A creative image model that can produce original and high-quality images from natural language descriptions, such as “a blue cat wearing a hat.” DALL-E is an improved version of DALL-E, which was the first generative pre-trained transformer for text-to-image generation. DALL-E can combine concepts, attributes, and styles from a large and diverse dataset of text-image pairs, such as the Web, and can also perform various text-to-image tasks by using special tokens or suffixes.
StyleGAN: A realistic image model that can generate stunning and high-resolution images from various inputs, such as sketches, styles, or domains. StyleGAN is a generative pre-trained transformer that uses a style-based generator architecture for image-to-image generation. StyleGAN can learn from a large and diverse dataset of images, such as faces, animals, and landscapes, and can also perform various image-to-image tasks by using a shared encoder-decoder architecture and a style mixing technique.
We've got a long way to go!
Generative AI is a fascinating and powerful field of artificial intelligence that enables the creation of new and original content, such as images, text, music, and code, based on some input or data. Generative AI uses deep learning, neural networks, foundation models, and generative pre-trained transformers, which learn from large and complex data sets. They generate content by sampling from their learned representations or by conditioning on some input. Generative AI has many applications and benefits for various domains and industries, such as healthcare, education, entertainment, and business, as well as many challenges and risks, such as data quality, bias, privacy, and security. Generative AI is also an excellent tool and partner for human creativity and innovation, as it can enhance, inspire, and collaborate with us to produce novel and diverse content.
Thanks for stopping by. Be sure to like/follow for more! If you ever have questions about one of my articles, please don't hesitate to contact me!
Take care and stay safe.
Additional Resources and Links
Never stop learning!
If you want to learn more about foundation models and generative AI, you can check out these web pages:
What are Foundation Models? - Foundation Models in Generative AI Explained - AWS
What is generative AI, what are foundation models, and why do they matter? - IBM
Explainer: What is a foundation model? | Ada Lovelace Institute
These pages provide more information and examples about the definition, characteristics, use cases, impact, and challenges of foundation models and generative AI.
Please note that Microsoft Copilot will assist all of my articles, whether text, images, videos, etc.. While I write my original content, I take pleasure in having Microsoft Copilot proofread for me. This saves me countless hours of going through my articles line by line, looking for an off-comma. :)
Director, Sales Engineering-Global
11moGlad you listened to Yves Gagnon