0% found this document useful (0 votes)

92 views91 pages

Large Language Models: Introduction and Recent Advances

The document outlines a graduate course on Large Language Models (LLMs) led by Tanmoy Chakraborty at IIT Delhi for the semester 2024-2025. It includes course directives, project details, content coverage, prerequisites, and academic integrity guidelines. The course will focus on state-of-the-art research in LLMs, with an emphasis on model development, evaluation, and ethical considerations.

Uploaded by

hodcai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views91 pages

Large Language Models: Introduction and Recent Advances

Uploaded by

hodcai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

Large Language Models

Introduction and Recent Advances

ELL881 · AIL821

Tanmoy Chakraborty
Associate Professor, IIT Delhi
https://tanmoychak.com/
Semester 1, 2024-2025
Course Instructors

Course TA

Tanmoy Chakraborty Yatin Nandwani Dinesh Raghu

IIT Delhi IBM Research IBM Research

Anwoy Chatterjee
PhD student, IIT Delhi

Sourish Dasgupta Gaurav Pandey Manish Gupta

DA-IICT IBM Research Microsoft
Course Directives
• Slot H (Mon, Wed: 11-12; Thu: 12-13)
• Website: https://lcs2-iitd.github.io/ELL881-AIL821-2401/
• YouTube: https://www.youtube.com/@lcs2575
• Room: II-301

Marks distribution (tentative) • Audit: B- (threshold to pass the course)

• Minor: 15% • Grading Scheme: TBD
• Major: 25%
• Quiz (2): 10%
• Assignment (1): 20%
• Mini-project: 30% (group-wise)

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

* You are welcome to propose a new idea if you find it fascinating to be qualified for a course project. Instructor opines!

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Course Project
• Some problem statements, and datasets will be floated soon*
• Each group should consist of 1-2 students
• Best Project Award
• You need to Students are encouraged to publish their projects in good
conferences/journals
• develop models
• evaluate your models Deliverables:
• prepare presentation 1. Final project report (15%), 8 pages ACL format. Encouraged to arxiv
• write tech report 2. Repo of dataset and source code (5%)
3. Final project presentation (10%)

* You are welcome to propose a new idea if you find it fascinating to be qualified for a course project. Instructor opines!

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Do Not Plagiarize !
Academic Integrity is of utmost importance. If anyone is found cheating/plagiarizing, it will
result in negative penalty (and possibly even more: an F grade or even DisCo).
Collaborate. But do NOT cheat.
• Assignments to be done individually.
• Do not share any part of code.
• Do not copy any part of report from any online resources or published works.
• If you reuse other’s works, always cite.
• If you discuss with others about assignment or outside your group for project, mention their names in
the report.
• Do not use GenAI tools (like, ChatGPT).
We will check for pairwise plagiarism in submitted assignment code files among you all.
We will also check the probability of any submitted content being AI generated.
Project reports will be checked for plagiarism across all web resources.

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Course Content
• This is an advanced graduate course and we will be teaching and discussing state-of-
the-art papers about large language models.

• The course is mostly presentation- and discussion-based and all the students are
expected to come to the class regularly and participate in discussion

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Course Content

Basics
• Introduction
• Intro to NLP
• Intro to Language
Models (LMs)
• Word Embeddings
(Word2Vec,
GloVE)
• Neural LMs (CNN,
RNN, Seq2Seq,
Attention)

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Course Content

Basics Architecture
• Introduction • Intro to
• Intro to NLP Transformer
• Intro to Language • Decoder-only LM,
Models (LMs) Prefix LM,
Decoding
• Word Embeddings strategies
(Word2Vec,
GloVE) • Encoder-only LM,
Encoder-decoder
• Neural LMs (CNN, LM
RNN, Seq2Seq,
Attention) • Advanced
Attention
• Mixture of Experts

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Course Content

Basics Architecture Learnability

• Introduction • Intro to • Scaling laws
• Intro to NLP Transformer • Instruction fine-
• Intro to Language • Decoder-only LM, tuning
Models (LMs) Prefix LM, • In-context learning
Decoding
• Word Embeddings strategies • Alignment
(Word2Vec, • Distillation and
GloVE) • Encoder-only LM,
Encoder-decoder PEFT
• Neural LMs (CNN, LM • Efficient/Constraint
RNN, Seq2Seq, LM inference
Attention) • Advanced
Attention
• Mixture of Experts

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Course Content

Architecture Learnability User Acceptability

Basics
• Introduction • Intro to • Scaling laws • RAG
• Intro to NLP Transformer • Instruction fine- • Multilingual LMs
• Intro to Language • Decoder-only LM, tuning • Tool-augmented
Models (LMs) Prefix LM, • In-context learning LMs
Decoding
• Word Embeddings strategies • Alignment • Reasoning
(Word2Vec, • Distillation and • Vision Language
GloVE) • Encoder-only LM,
Encoder-decoder PEFT Models
• Neural LMs (CNN, LM • Efficient/Constraint • Handling long
RNN, Seq2Seq, LM inference context
Attention) • Advanced
Attention • Model editing
• Mixture of Experts

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Course Content

Learnability User Acceptability Ethics and Misc.

Basics Architecture
• Introduction • Intro to • Scaling laws • RAG • Bias, toxicity and
• Intro to NLP Transformer • Instruction fine- • Multilingual LMs hallucination
• Intro to Language • Decoder-only LM, tuning • Tool-augmented • Interpretability
Models (LMs) Prefix LM, • In-context learning LMs • Beyond
Decoding Transformer: State
• Word Embeddings strategies • Alignment • Reasoning
(Word2Vec, Space Models
• Encoder-only LM, • Distillation and • Vision Language
GloVE) PEFT Models
Encoder-decoder
• Neural LMs (CNN, LM • Efficient/Constraint • Handling long
RNN, Seq2Seq, LM inference context
Attention) • Advanced
Attention • Model editing
• Mixture of Experts

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Pre-Requisites
• Excitement about language!
• Willingness to learn

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Pre-Requisites
• Excitement about language!
• Willingness to learn

Mandatory Desirable
• Data Structures & Algorithms • NLP
• Machine Learning • Deep learning
• Python programming

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Pre-Requisites
• Excitement about language!
• Willingness to learn

Mandatory Desirable
• Data Structures & Algorithms • NLP
• Machine Learning • Deep learning
• Python programming

This course will NOT cover:

• Details of NLP (ELL884: https://sites.google.com/view/ell881), Machine Learning and Deep Learning
• Coding practice
• Generative models for modalities other than text

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Reading and Reference Materials
• Books (optional reading)
• Speech and Language Processing, Dan Jurafsky and James H. Martin
https://web.stanford.edu/~jurafsky/slp3/
• Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze
• Natural Language Processing, Jacob Eisenstein
https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf
• A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg
http://u.cs.biu.ac.il/~yogo/nnlp.pdf
• Journals
• Computational Linguistics, Natural Language Engineering, TACL, JMLR, TMLR, etc.
• Conferences
• ACL, EMNLP, NAACL, COLING, AAAI, IJCNLP, ICML, NeurIPS, ICLR, WWW, KDD, SIGIR, etc.

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Research Papers Repository

https://aclanthology.org/

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Research Papers Repository

https://arxiv.org/list/cs.CL/recent

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Acknowledgements (Non-exhaustive List)
• Advanced NLP, Graham Neubig http://www.phontron.com/class/anlp2022/
• Advanced NLP, Mohit Iyyer https://people.cs.umass.edu/~miyyer/cs685/
• NLP with Deep Learning, Chris Manning, http://web.stanford.edu/class/cs224n/
• Understanding Large Language Models, Danqi Chen https://www.cs.princeton.edu/courses/archive/fall22/cos597G/
• Natural Language Processing, Greg Durrett https://www.cs.utexas.edu/~gdurrett/courses/online-course/materials.html
• Large Language Models: https://stanford-cs324.github.io/winter2022/
• Natural Language Processing at UMBC, https://laramartin.net/NLP-class/
• Computational Ethics in NLP, https://demo.clab.cs.cmu.edu/ethical_nlp/
• Self-supervised models, CS 601.471/671: Self-supervised Models (jhu.edu)
• WING.NUS Large Language Models, https://wing-nus.github.io/cs6101/
• And many more…

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.

Language Model

Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.

P(the monsoon rains

have arrived) 0.2

Language Model

Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.

P(the monsoon rains

have arrived) 0.2

Language Model
P(monsoon the have
0.001
rains arrived)

Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

LMs can ‘Generate’ Text !

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

LMs can ‘Generate’ Text !

Vocabulary
V = {arrived, delhi, have,
Given input ‘the monsoon rains have’ , LM can calculate
is, monsoon, rains, the} P(xi | the monsoon rains have) , ∀ xi ϵ V

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

LMs can ‘Generate’ Text !

Vocabulary
V = {arrived, delhi, have,
Given input ‘the monsoon rains have’ , LM can calculate
is, monsoon, rains, the} P(xi | the monsoon rains have) , ∀ xi ϵ V

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

LMs can ‘Generate’ Text !

Vocabulary
V = {arrived, delhi, have,
Given input ‘the monsoon rains have’ , LM can calculate
is, monsoon, rains, the} P(xi | the monsoon rains have) , ∀ xi ϵ V

For generation, next token is sampled

from this probability distribution

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

LMs can ‘Generate’ Text !

Vocabulary
V = {arrived, delhi, have,
Given input ‘the monsoon rains have’ , LM can calculate
is, monsoon, rains, the} P(xi | the monsoon rains have) , ∀ xi ϵ V

For generation, next token is sampled

from this probability distribution

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

LMs can ‘Generate’ Text !

Vocabulary
V = {arrived, delhi, have,
Given input ‘the monsoon rains have’ , LM can calculate
is, monsoon, rains, the} P(xi | the monsoon rains have) , ∀ xi ϵ V

For generation, next token is sampled

Auto-regressive LMs calculate from this probability distribution
this distribution efficiently, e.g.
using ‘Deep’ Neural Networks

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

‘Large’ Language Models
The ‘Large’ in terms of model's size (# parameters) and massive size of training dataset.

Model sizes have

increased by an order of
5000x over just the last
4 years !!!

Image source: https://hellofuture.orange.com/en/the-gpt-3-language-model-revolution-or-evolution/

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

‘Large’ Language Models
The ‘Large’ in terms of model's size (# parameters) and massive size of training dataset.

Model sizes have

increased by an order of
5000x over just the last
4 years !!!

Other recent models: PaLM (540B), OPT (175B), BLOOM

(176B), Gemini-Ultra (1.56T), GPT-4 (1.76T)
Disclaimer: For API-based models like GPT-4/Gemini-Ultra, the number of parameters are not
announced officially – these are rumored numbers as on the web
Image source: https://hellofuture.orange.com/en/the-gpt-3-language-model-revolution-or-evolution/

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

LLMs in AI Landscape

Image source: https://www.manning.com/books/build-a-large-language-model-from-scratch

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Evolution of
(L)LMs

We will
discuss
about many
of them in
this course!

Image source: https://synthedia.substack.com/p/a-timeline-of-large-

language-model

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Post-Transformers Era
The LLM Race
Google Designed Transformers: But Could it Take
Advantage?

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Google Designed Transformers: But Could it Take
Advantage?

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Google Designed Transformers: But Could it Take
Advantage?

The beginning of use of Transformer as Language

Representation Models.

BERT achieved SOTA on 11 NLP tasks.

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

Google Designed Transformers: But Could it Take
Advantage?
DistilBERT, TinyBERT, MobileBERT

The beginning of use of Transformer as Language

Representation Models.

BERT achieved SOTA on 11 NLP tasks.

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

However, someone was waiting for the right
opportunity!!

Guess Who?

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

However, someone was waiting for the right
opportunity!!

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

OpenAI Started Pushing the Frontier

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

OpenAI Started Pushing the Frontier

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

OpenAI Started Pushing the Frontier

• Use of decoder-only architecture

• The idea of generative pre-training over large corpus

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

The Beginning of Scale

• GPT-1 (117 M) → GPT-2 (1.5 B) 13x increase in # parameters

• Minimal changes (some LayerNorms added, modified weight
initialization)
• Increase in context length: GPT-1 (512 tokens) → GPT-2 (1024 tokens)

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

The Beginning of Scale

Performance boosts across tasks

• GPT-1 (117 M) → GPT-2 (1.5 B) 13x increase in # parameters

• Minimal Changes (some LayerNorms added, modified weight
initialization)
• Increase in context length: GPT-1 (512 tokens) → GPT-2 (1024 tokens)

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

What Was Google Developing Parallelly?

Tanmoy Chakraborty LLMs: Introduction & Recent Advances

What Was Google Developing Parallelly?

• Similar broader goal of converting all text-based language problems

into a text-to-text format.
• Used Encoder-Decoder Architecture.
• Pre-training strategy differs from GPT
• Strategy more similar to BERT