19/06/2025
Natural Language Processing
Spring 2025
Prof. Dr. M. Fasih Uddin Butt
Building Generative AI Applications
To Your Needs
➢ What is RAG?
➢ Why we need RAG
➢ Important Terminologies in RAG (Key Components)
➢ How RAG works ? (WorkFlow in RAG)
➢ Types
➢ Comparison
➢ Fine Tuning (Alternative Of RAG)
1
19/06/2025
What is RAG?
➢ RAG stands for Retrieval-Augmented Generation.
➢ It combines retrieval systems with Generative AI
models to produce accurate and relevant responses.
➢ It is particularly useful for applications that require
up-to-date, fact-based, or domain-specific
responses.
Why we need RAG ?
➢ Halucination (Incorrect Information), when an AI model
generates incorrect or misleading results. This can happen
in any type of AI model, including natural language
processing (NLP) models and computer vision models.
➢ Data Staleness The model's inability to provide updated
information because it was trained on a fixed dataset that
does not include newer data.
2
19/06/2025
Important Terminologies in RAG (Key Components)
Retriever:
(But there is something which is done before, Let’s See that First)
➢ Searches for relevant information from external knowledge bases or
datasets.
Generator:
➢ Uses the retrieved information to create coherent and accurate
responses.
Feedback Loop: (Optional)
➢ Optional mechanism to refine outputs iteratively.
Preprocessing Before Retrieval
1. Chunking
● What it is:
Breaking large documents or datasets into smaller, manageable
pieces (chunks).
● Why it’s needed:
○ Large text blocks are difficult to process efficiently.
○ Helps maintain context and relevance in retrieval.
● Example:
○ A 10,000-word article might be divided into 500-word chunks.
3
19/06/2025
2. Tokenization
● What it is:
Splitting text into smaller units called tokens (e.g., words, phrases,
or characters).
● Why it’s needed:
○ Allows text to be processed numerically for embedding and search.
○ Prepares the text for the embedding model.
● Example:
○ "Retrieval-Augmented Generation" →
["Retrieval", "-", "Augmented", "Generation"]
3. Embedding
● What it is:
Converting text chunks into dense numerical vectors using pre-
trained models (e.g., Sentence Transformers, OpenAI Embedding
API).
● Why it’s needed:
○ Vectors represent semantic meaning, enabling efficient similarity
search.
○ These embeddings capture the context of the text.
● Where it's stored:
○ Store embeddings in vector databases (e.g., FAISS, Pinecone, Weaviate,
ChromaDB).
○ These databases allow quick and efficient similarity searches.
4
19/06/2025
Important Terminologies in RAG (Key Components)
Retriever
● The retriever is responsible for finding the most relevant information
from an external knowledge base, database, or document store.
● It uses methods like vector similarity search (e.g., FAISS,
ElasticSearch) or traditional keyword matching to locate data
relevant to the input query.
● Why it’s important:
○ Ensures the generative model has access to accurate and
contextually appropriate information to base its response.
Important Terminologies in RAG (Key Components)
Generator
● The generator is a pre-trained language model (e.g., GPT, BERT, T5
or from Groq) that creates responses by incorporating the retrieved
information.
● It synthesizes retrieved data and transforms it into human-like,
coherent text.
● Why it’s important:
○ Acts as the "voice" of the system, converting raw retrieved data
into usable, conversational, or actionable outputs.
5
19/06/2025
Important Terminologies in RAG (Key Components)
Feedback Loop (Optional)
● A mechanism to iteratively refine the output by re-querying the
retriever or adjusting the generator’s response based on user
feedback or model evaluation.
● Why it’s important:
○ Helps improve the accuracy and relevance of responses over
time.
○ Critical for applications requiring high precision, like healthcare
or legal advisory systems.
How RAG works
( WorkFlow Diagram )
6
19/06/2025
Standard RAG
➢ Combines retrieval with generation in a straightforward manner.
Workflow:
1. Input query.
2. Retrieve relevant documents.
3. Generate response using retrieved documents.
Use Case:
● Question answering using enterprise knowledge bases
Corrective RAG
➢ Enhances response accuracy by correcting errors in real-time.
Workflow:
1. Generate an initial response.
2. Identify errors using retrieval.
3. Correct errors based on retrieved facts.
Use Case:
● Customer support chatbots with high accuracy requirements.
7
19/06/2025
Corrective RAG
Speculative RAG
➢ Prioritizes efficiency by speculating which documents are relevant
without full retrieval.
Workflow:
1. Model predicts relevance without actual retrieval.
2. Generates speculative output.
Advantages:
● Faster responses at the cost of potential accuracy.
Use Case:
● Real-time conversational AI with high-speed requirements.
8
19/06/2025
Speculative RAG
Agentic RAG
➢ Adds decision-making capabilities to the RAG model.
Workflow:
1. Retrieve information.
2. Evaluate context and goals.
3. Generate adaptive and strategic responses.
Use Case:
● Virtual assistants for decision-making tasks.
9
19/06/2025
Agentic RAG
Comparison
Technique Focus Strengths Weaknesses
Standard RAG Simplicity Easy to implement Limited
adaptability
Corrective Accuracy Error correction in Slower responses
RAG real-time
Speculative Efficiency Faster responses Risk of
RAG inaccuracies
Agentic RAG Decision- Strategic outputs Higher
making complexity
10
19/06/2025
Fine Tuning versus RAG
Aspect Fine Tuning RAG
Definition Modifies a pre-trained model by
training it on new data.
Combines a pre-trained model with external
knowledge retrieval
Purpose Customizes the model for a specific
task
Enhances responses dynamically with
external information.
Data Requires training on task-specific
data.
Uses external data stored in a vector
database or index.
Dependency
Flexibility Requires retraining for updates or
new data.
Dynamically updates responses without
retraining.
Computational High, due to additional training
requirements, High GPU, CPU req.
Low, as it uses pre-trained models with
retrieval.
Cost
Example Use Creating a specialized application for
a specific domain e.g (health care)
Answering questions about frequently
updated knowledge (e.g., news, chatbot).
Case
11