0% found this document useful (0 votes)
47 views3 pages

Genaitable

Uploaded by

venkatesh k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views3 pages

Genaitable

Uploaded by

venkatesh k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Here’s a comprehensive table that breaks down various concepts, algorithms, and libraries used

across Generative AI (GenAI) domains like text generation, image generation, video generation,
speech synthesis, and multi-modal systems. Each row in the table provides a specific GenAI
application or domain, with details on key algorithms, popular libraries or models, and common use
cases.

Popular
GenAI Concept Key Algorithms/Models Primary Use Cases
Libraries/Frameworks

- Transformer (GPT, BERT, T5, - Hugging Face


- Chatbots
LLaMA) Transformers
- Content creation
Text Generation (LLMs) - Fine-tuning and RLHF - OpenAI GPT
- Text summarization
(Reinforcement Learning - DeepSpeed
- Question answering
from Human Feedback) - LangChain

- Hugging Face
- Seq2Seq (T5, BART) - Translation
Text-to-Text Transformers
- Transformers with pre- and - Paraphrasing
Transformation - spaCy
post-processing - Summarization
- NLTK

- Diffusion Models (Stable


Diffusion, DALL-E)
- Hugging Face Diffusers
- GANs (StyleGAN, BigGAN) - Art and illustration
- PyTorch
Image Generation - Variational Autoencoders - Image inpainting
- TensorFlow
(VAEs) - Super-resolution
- StyleGAN2
- Transformer-based (VQ-
VAE, VQ-GAN)

- 3D/Spatio-temporal GANs - Pytorch3D


- Video synthesis
(MoCoGAN) - DeepMind’s Deep
- Animation
Video Generation - Temporal Diffusion Models Video Prior
- Scene
- Transformers for video - Hugging Face
reconstruction
(TimeSformer) Transformers

- Tacotron
- PyTorch - Text-to-speech
- WaveNet
Audio Generation and - Hugging Face - Audiobook
- GAN-TTS
Speech Synthesis Transformers narration
- Diffusion-based audio
- Google TTS - Voice cloning
models

- Convolutional Recurrent - Hugging Face - Transcription


(DeepSpeech) Transformers - Voice assistants
Speech-to-Text
- Transformer-based - SpeechBrain - Real-time speech
(Wav2Vec 2.0, Whisper) - OpenAI Whisper processing

- Image Encoder + Text - Captioning for


- Hugging Face
Decoder (CLIP, Flamingo) accessibility
Transformers
Image Captioning - CNN-RNN hybrids - Social media
- OpenAI CLIP
- Vision Transformers with automation
- PyTorch
text output (ViT-GPT) - Image indexing
Popular
GenAI Concept Key Algorithms/Models Primary Use Cases
Libraries/Frameworks

- CNNs + RNNs (e.g., LSTM - Audio analysis


- PyTorch
Speech Recognition and for audio sequences) - Sentiment analysis
- Librosa
Audio Classification - Transformers (Audio - Transcription and
- SpeechBrain
Spectrogram Transformers) voice analysis

- CLIP (Contrastive
- Visual question
Language–Image - Hugging Face
answering
Multi-modal Models Pretraining) Transformers
- Image-text retrieval
(Image + Text) - Flamingo - OpenAI CLIP
- Enhanced search
- Unified Transformer - PyTorch
engines
models (e.g., OFA, BLIP)

- Diffusion Models (Stable


- Illustration
Diffusion, DALL-E)
Text-to-Image - Hugging Face Diffusers generation
- GANs with text-
Generation (Text - DALL-E mini - Custom art
conditioning (AttnGAN)
Prompts) - PyTorch - Concept
- Variational Autoencoders
visualization
(VQ-VAE)

- Video Diffusion
- Temporal Diffusion Models
models - Marketing videos
Text-to-Video - GANs for video (TGAN,
- Pytorch3D - Video synthesis
Generation MoCoGAN)
- Hugging Face - Storytelling
- Transformers (VideoGPT)
Transformers

- 3D GANs (GANcraft,
3DGAN) - 3D model
- PyTorch3D
Text-to-3D Object - Neural Radiance Fields generation
- NVIDIA NeRF
Generation (NeRF) - Game assets
- Blender
- Diffusion-based 3D - AR/VR applications
synthesis

- Q-learning - Game AI
- Stable Baselines3
Reinforcement Learning- - Actor-Critic methods (PPO, - Robotics control
- RLlib
based Generation SAC) - Autonomous
- OpenAI Gym
- Multi-agent RL agents

- Document-based
- Dense Passage Retrieval
- Haystack question answering
(DPR)
Knowledge Retrieval and - Hugging Face - Knowledge-
- Retrieval-Augmented
Augmentation (RAG) Transformers grounded chatbots
Transformers (RAG)
- Pyserini - Real-time
- BM25 for retrieval
information lookup

Personalized - Collaborative Filtering - TensorFlow - Content


Recommendations - Matrix Factorization (SVD, - PyTorch recommendation
NMF) - LightFM - Product
Popular
GenAI Concept Key Algorithms/Models Primary Use Cases
Libraries/Frameworks

- Deep Learning (NARRE, suggestions


GRU4Rec) - Social media feeds

- Fine-tuning for
- RLHF - Hugging Face
Large Language Model chatbots
- Transfer Learning Transformers
(LLM) Tuning and - Specialized task
- Data Augmentation - DeepSpeed
Optimization LLMs
(Backtranslation) - OpenAI Gym
- Bias mitigation

Notes on Key Components and Terminologies

 LLMs (Large Language Models): Foundation models like GPT-4, BERT, and T5 enable
generative tasks, including text generation, chatbots, and question-answering, by utilizing a
large corpus of data to understand language patterns and semantics.

 Diffusion Models: This class of models, popular for high-quality image and video generation,
progressively denoises random noise to create detailed images or video frames.

 GANs (Generative Adversarial Networks): Useful for images, audio, and even video, GANs
train two networks (generator and discriminator) in an adversarial manner to produce high-
fidelity, realistic outputs.

 Transformers for Vision and Video: With the development of Vision Transformers (ViT) and
TimeSformer, transformers have expanded into visual domains, excelling in tasks requiring
spatial-temporal coherence, such as image classification and video generation.

 Speech Models: Tacotron and WaveNet are prominent for text-to-speech, while models like
Wav2Vec and Whisper by OpenAI have transformed speech-to-text tasks.

 CLIP and Multi-modal Models: CLIP and other multi-modal models process and align both
text and image data, enabling applications like visual question answering and image
captioning.

 RAG (Retrieval-Augmented Generation): Integrates retrieval methods with generative


models for tasks that require grounding in external documents or databases, such as
question-answering over a specific knowledge base.

This table and guide should serve as a foundation across the spectrum of Generative AI, offering
insight into the methods and tools used to build diverse applications, from creative content
generation to personalized recommendations and beyond. Let me know if you want further detail on
any particular area!

4o

You might also like