Lang Chain
Lang Chain
0. Announcement Page 1
What is LangChain
29 January 2025 08:35
LangChain is an open source framework that helps in building LLM based applications. It
provides modular components and end-to-end tools that help developers build complex AI
applications, such as chatbots, question-answering systems, retrieval-augmented generation
(RAG), autonomous agents, and more.
0. Announcement Page 2
Why LangChain first
29 January 2025 08:35
0. Announcement Page 3
Curriculum Structure
29 January 2025 08:36
0. Announcement Page 4
My Focus
29 January 2025 08:35
1. Updated information
2. Clarity
3. Conceptual understanding
4. The 80 percent approach
0. Announcement Page 5
Timeline
29 January 2025 08:37
0. Announcement Page 6
What is LangChain
07 January 2025 23:14
Example Queries
1. Explain page number 5 as if I am a 5 year old
2. Generate a True False exercise on Linear
Regression
3. Generate notes for Decision Trees
Complete ecosystem
Conversational Chatbots
AI Knowledge Assistants
AI Agents
Workflow Automation
Summarization/Research Helpers
LlamaIndex
Haystack
In LangChain, “models” are the core interfaces through which you interact with AI models.
2. Role-Based Prompts
• Custom Memory: For advanced use cases, you can store specialized state (e.g.,
the user’s preferences or key facts about them) in a custom memory class.
3. Models Page 22
What are Models
07 January 2025 23:15
The Model Component in LangChain is a crucial part of the framework, designed to facilitate
interactions with various language models and embedding models.
It abstracts the complexity of working directly with different LLMs, chat models, and
embedding models, providing a uniform interface to communicate with them. This makes it
easier to build applications that rely on AI-generated text, text embeddings for similarity
search, and retrieval-augmented generation (RAG).
3. Models Page 23
Plan of Action
10 February 2025 09:21
3. Models Page 24
Language Models
Language Models are AI systems designed to process, generate, and understand natural
language text.
LLMs - General-purpose models that is used for raw text generation. They take a string(or plain
text) as input and returns a string( plain text). These are traditionally older models and are not
used much now.
Chat Models - Language models that are specialized for conversational tasks. They take a
sequence of messages as inputs and return chat messages as outputs (as opposed to using
plain text). These are traditionally newer models and used more in comparison to the LLMs.
3. Models Page 25
Setup
10 February 2025 09:21
3. Models Page 26
Demo
1. OpenAI
2. Anthropic
3. Google
4. HuggingFace
3. Models Page 27
Open Source Models
11 February 2025 08:58
Open-source language models are freely available AI models that can be downloaded,
modified, fine-tuned, and deployed without restrictions from a central provider. Unlike closed-
source models such as OpenAI's GPT-4, Anthropic's Claude, or Google's Gemini, open-source
models allow full control and customization.
Disadvantages
3. Models Page 28
3. Models Page 29
Embedding Models
07 January 2025 23:17
3. Models Page 30
Recap
24 February 2025 09:42
4. Prompts Page 31
A mistake from my side!
24 February 2025 09:42
4. Prompts Page 32
What are Prompts
10 January 2025 08:37
Prompts are the input instructions or queries given to a model to guide its output.
4. Prompts Page 33
Static vs Dynamic Prompts
14 February 2025 00:01
paper_input = st.selectbox( "Select Research Paper Name", ["Select...", "Attention Is All You Need",
"BERT: Pre-training of Deep Bidirectional Transformers", "GPT-3: Language Models are Few-Shot
Learners", "Diffusion Models Beat GANs on Image Synthesis"] )
length_input = st.selectbox( "Select Explanation Length", ["Short (1-2 paragraphs)", "Medium (3-5
paragraphs)", "Long (detailed explanation)"] )
Please summarize the research paper titled "{paper_input}" with the following
specifications:
Explanation Style: {style_input}
Explanation Length: {length_input}
1. Mathematical Details:
- Include relevant mathematical equations if present in the paper.
- Explain the mathematical concepts using simple, intuitive code snippets
where applicable.
2. Analogies:
- Use relatable analogies to simplify complex ideas.
If certain information is not available in the paper, respond with: "Insufficient
information available" instead of guessing.
Ensure the summary is clear, accurate, and aligned with the provided style and
length.
This makes it reusable, flexible, and easy to manage, especially when working
with dynamic user inputs or automated workflows.
4. Prompts Page 34
Prompt Template
10 January 2025 08:37
This makes it reusable, flexible, and easy to manage, especially when working
with dynamic user inputs or automated workflows.
4. Prompts Page 35
Messages
14 February 2025 00:02
4. Prompts Page 36
Chat Prompt Templates
14 February 2025 00:02
4. Prompts Page 37
Message Placeholder
14 February 2025 00:09
4. Prompts Page 38
Recap
28 February 2025 00:12
In LangChain, structured output refers to the practice of having language models return
responses in a well-defined data format (for example, JSON), rather than free-form text. This
makes the model output easier to parse and work with programmatically.
Source - https://www.linkedin.com/pulse/structured-outputs-from-llms-langchain-output-parsers-vijay-chaudhary-
wgjqc
Data Extraction
API building
Agents
TypedDict is a way to define a dictionary in Python where you specify what keys and values
should exist. It helps ensure that your dictionary follows a specific structure.
• It tells Python what keys are required and what types of values they should have.
• It does not validate data at runtime (it just helps with type hints for better coding).
Output Parsers in LangChain help convert raw LLM responses into structured formats like
JSON, CSV, Pydantic models, and more. They ensure consistency, validation, and ease of use in
applications.
The StrOutputParser is the simplest output parser in LangChain. It is used to parse the output
of a Language Model (LLM) and return it as a plain string.
It works by defining a list of fields (ResponseSchema) that the model should return, ensuring
the output follows a structured format.
7. Chains Page 56
What & Why
11 March 2025 17:26
7. Chains Page 57
Simple Chain
11 March 2025 17:26
7. Chains Page 58
Sequential Chain
11 March 2025 17:26
7. Chains Page 59
Parallel Chain
11 March 2025 17:26
7. Chains Page 60
Conditional Chain
11 March 2025 17:27
7. Chains Page 61
Recap
18 March 2025 16:55
8. Runnables Page 62
The Why
18 March 2025 16:59
8. Runnables Page 63
8. Runnables Page 64
8. Runnables Page 65
8. Runnables Page 66
The What
19 March 2025 00:12
8. Runnables Page 67
Recap
20 March 2025 10:08
9. LCEL Page 68
Plan of Action
20 March 2025 11:02
9. LCEL Page 69
1. RunnableSequence
20 March 2025 12:10
It is useful when you need to compose multiple runnables together in a structured workflow.
9. LCEL Page 70
2. RunnableParallel
20 March 2025 18:33
Each runnable receives the same input and processes it independently, producing a dictionary
of outputs.
9. LCEL Page 71
3. RunnablePassthrough
20 March 2025 22:34
RunnablePassthrough is a special Runnable primitive that simply returns the input as output
without modifying it.
9. LCEL Page 72
4. RunnableLambda
20 March 2025 23:18
RunnableLambda is a runnable primitive that allows you to apply custom Python functions
within an AI pipeline.
9. LCEL Page 73
5. RunnableBranch
21 March 2025 08:02
It functions like an if/elif/else block for chains — where you define a set of condition functions,
each associated with a runnable (e.g., LLM call, prompt chain, or tool). The first matching
condition is executed. If no condition matches, a default runnable is used (if provided).
9. LCEL Page 74
LCEL
21 March 2025 08:39
9. LCEL Page 75
Plan of Action
27 March 2025 10:59
Document loaders are components in LangChain used to load data from various sources into a
standardized format (usually as Document objects), which can then be used for chunking,
embedding, retrieval, and generation.
TextLoader is a simple and commonly used document loader in LangChain that reads plain text
(.txt) files and converts them into LangChain Document objects.
Use Case
• Ideal for loading chat logs, scraped text, transcripts, code snippets, or any plain text data
into a LangChain pipeline.
Limitation
PyPDFLoader is a document loader in LangChain used to load content from PDF files and
convert each page into a Document object.
Limitations:
• It uses the PyPDF library under the hood — not great with scanned PDFs or complex
layouts.
DirectoryLoader is a document loader that lets you load multiple documents from a directory
(folder) of files.
WebBaseLoader is a document loader in LangChain used to load and extract text content from
web pages (URLs).
It uses BeautifulSoup under the hood to parse HTML and extract visible text.
When to Use:
• For blogs, news articles, or public websites where the content is primarily text-based and
static.
Limitations:
• Doesn’t handle JavaScript-heavy pages well (use SeleniumURLLoader for that).
• Loads only static content (what's in the HTML, not what loads after the page renders).
Text Splitting is the process of breaking large chunks of text (like articles, PDFs, HTML pages, or
books) into smaller, manageable pieces (chunks) that an LLM can handle effectively.
Large Text
Chunks
• Downstream tasks - Text Splitting improves nearly every LLM powered task
Space exploration has led to incredible scientific discoveries. Space exploration has led to incredible scientific discoveries. From landing
From landing on the Moon to exploring Mars, humanity on the Moon to explorin
continues to push the boundaries of what’s possible beyond our
planet.
g Mars, humanity continues to push the boundaries of what’s possible beyond our
These missions have not only expanded our knowledge of the planet. These missi
universe but have also contributed to advancements in
technology here on Earth. Satellite communications, GPS, and
even certain medical imaging techniques trace their roots back
ons have not only expanded our knowledge of the universe but have also
to innovations driven by space programs.
contributed to advancements in
My name is Nitish
I am 35 years old
I live in Gurgaon
How are you
Farmers were working hard in the fields, preparing the soil and planting seeds for
the next season. The sun was bright, and the air smelled of earth and fresh grass.
The Indian Premier League (IPL) is the biggest cricket league in the world. People
all over the world watch the matches and cheer for their favourite teams.
Terrorism is a big danger to peace and safety. It causes harm to people and creates
fear in cities and villages. When such attacks happen, they leave behind pain and
sadness. To fight terrorism, we need strong laws, alert security forces, and support
from people who care about peace and safety.
Movie id Plot
M001 In the present day, Farhan receives a call from Chatur, saying that
Ranchi is coming. Farhan is so excited that he fakes a heart attack to
get off a flight and picks up Raju from his home (who forgets to
wear his pants). Farhan and Raju meet Chatur at the water tower of
their college ICE (Imperial College of Engineering), where Chatur
informs them that he has found Rancho. Chatur taunts Farhan and
Raju that now he has a mansion worth $3.5 million in the US, with a
heated swimming pool, a maple wood floor living room and a
Lamborghini for a car. Chatur reveals that Rancho is in Shimla…
M004 In the peculiar town of Chanderi, India, the residents believe in the
myth of an angry woman ghost, referred to as "Stree" (Hindi for
woman) (Flora Saini), who stalks men during Durga Puja festival.
This is explained by the sudden disappearance of these men, leaving
their clothes behind. She is said to stalk the men of the town,
whispering their names and causing disappearances if they look
back at her.
The whole town protects itself from Stree during the 4 nights of
Durga Puja by writing "OO Stree, Kal Aana" on their walls.
Additionally, men are advised to avoid going out alone after 10 PM
during the festival and to move in groups for safety. This practice
reflects a societal parallel to the precautions typically advised to
women for their own protection…
A vector store is a system designed to store and retrieve data represented as numerical
vectors.
Key Features
1. Storage – Ensures that vectors and their associated metadata are retained, whether in-
memory for quick lookups or on-disk for durability and large-scale use.
2. Similarity Search - Helps retrieve the vectors most similar to a query vector.
3. Indexing - Provide a data structure or method that enables fast similarity searches on
high-dimensional vectors (e.g., approximate nearest neighbor lookups).
4. CRUD Operations - Manage the lifecycle of data—adding new vectors, reading them,
updating existing entries, removing outdated vectors.
Use-cases
1. Semantic Search
2. RAG
3. Recommender Systems
4. Image/Multimedia search
• Vector Store
○ Typically refers to a lightweight library or service that focuses on storing vectors
(embeddings) and performing similarity search.
○ May not include many traditional database features like transactions, rich query
languages, or role-based access control.
○ Examples: FAISS (where you store vectors and can query them by similarity, but you
handle persistence and scaling separately).
• Vector Database
○ A full-fledged database system designed to store and query vectors.
A vector database is effectively a vector store with extra database features (e.g.,
clustering, scaling, security, metadata filtering, and durability)
• Supported Stores: LangChain integrates with multiple vector stores (FAISS, Pinecone, Chroma,
Qdrant, Weaviate, etc.), giving you flexibility in scale, features, and deployment.
• Common Interface: A uniform Vector Store API lets you swap out one backend (e.g., FAISS) for
another (e.g., Pinecone) with minimal code changes.
• Metadata Handling: Most vector stores in LangChain allow you to attach metadata (e.g.,
timestamps, authors) to each document, enabling filter-based retrieval.
Chroma is a lightweight, open-source vector database that is especially friendly for local
development and small- to medium-scale production needs.
A Wikipedia Retriever is a retriever that queries the Wikipedia API to fetch relevant content for
a given query.
A Vector Store Retriever in LangChain is the most common type of retriever that lets you
search and fetch documents from a vector store based on semantic similarity using vector
embeddings.
(Prompt)
• Indexing
• Retrieval
• Augmentation
• Generation
2. Text Chunking - Break large documents into small, semantically meaningful chunks
4. Storage in a Vector Store - Store the vectors along with the original chunk text + metadata in a
vector database.
Augmentation - Augmentation refers to the step where the retrieved documents (chunks of
relevant context) are combined with the user’s query to form a new, enriched prompt for
the LLM.
1. UI based enhancements
2. Evaluation
a. Ragas
b. LangSmith
3. Indexing
a. Document Ingestion
b. Text Splitting
c. Vector Store
4. Retrieval
a. Pre-Retrieval
i. Query rewriting using LLM
ii. Multi-query generation
iii. Domain aware routing
b. During Retrieval
i. MMR
ii. Hybrid Retrieval
iii. Reranking
c. Post-Retrieval
i. Contextual Compression
5. Augmentation
a. Prompt Templating
b. Answer grounding
c. Context window optimization
6. Generation
a. Answer with Citation
b. Guard railing
7. System Design
a. Multimodal
b. Agentic
c. Memory based
A tool is just a Python function (or API) that is packaged in a way the LLM can understand
and call when needed.
A built-in tool is a tool that LangChain already provides for you —it’s pre-built, production-
ready, and requires minimal or no setup.
You don’t have to write the function logic yourself — you just import and use it.
All other tool types like @tool, StructuredTool are built on top of
BaseTool
A toolkit is just a collection (bundle) of related tools that serve a common purpose —
packaged together for convenience and reusability.
In LangChain:
ReAct is a design pattern used in AI agents that stands for Reasoning + Acting. It
allows a language model (LLM) to interleave internal reasoning (Thought) with
external actions (like tool use) in a structured, multi-step process.
Instead of generating an answer in one go, the model thinks step by step, deciding what it
needs to do next and optionally calling tools (APIs, calculators, web search, etc.) to help it.