Large Language Models
Introduction and Recent Advances
           ELL881 · AIL821
            Tanmoy Chakraborty
        Associate Professor, IIT Delhi
         https://tanmoychak.com/
                                         Semester 1, 2024-2025
                     Course Instructors
                                                          Course TA
Tanmoy Chakraborty      Yatin Nandwani    Dinesh Raghu
     IIT Delhi           IBM Research     IBM Research
                                                           Anwoy Chatterjee
                                                          PhD student, IIT Delhi
 Sourish Dasgupta        Gaurav Pandey     Manish Gupta
      DA-IICT             IBM Research      Microsoft
Course Directives
•   Slot H (Mon, Wed: 11-12; Thu: 12-13)
•   Website: https://lcs2-iitd.github.io/ELL881-AIL821-2401/
•   YouTube: https://www.youtube.com/@lcs2575
•   Room: II-301
     Marks distribution (tentative)          • Audit: B- (threshold to pass the course)
      • Minor: 15%                           • Grading Scheme: TBD
      • Major: 25%
      • Quiz (2): 10%
      • Assignment (1): 20%
      • Mini-project: 30% (group-wise)
                                         Tanmoy Chakraborty                   LLMs: Introduction & Recent Advances
Course Project
•   Some problem statements, and datasets will be floated soon*
•   Each group should consist of 1-2 students
•   Best Project Award
•   You need to                Students are encouraged to publish their projects in good
                                       conferences/journals
      •   develop models
      •   evaluate your models
      •   prepare presentation
      •   write tech report
                                 * You are welcome to propose a new idea if you find it fascinating to be qualified for a course project. Instructor opines!
                                                             Tanmoy Chakraborty                                          LLMs: Introduction & Recent Advances
Course Project
•   Some problem statements, and datasets will be floated soon*
•   Each group should consist of 1-2 students
•   Best Project Award
•   You need to                Students are encouraged to publish their projects in good
                                       conferences/journals
      •   develop models
      •   evaluate your models         Deliverables:
      •   prepare presentation         1. Final project report (15%), 8 pages ACL format. Encouraged to arxiv
      •   write tech report            2. Repo of dataset and source code (5%)
                                       3. Final project presentation (10%)
                                 * You are welcome to propose a new idea if you find it fascinating to be qualified for a course project. Instructor opines!
                                                             Tanmoy Chakraborty                                          LLMs: Introduction & Recent Advances
Do Not Plagiarize !
Academic Integrity is of utmost importance. If anyone is found cheating/plagiarizing, it will
result in negative penalty (and possibly even more: an F grade or even DisCo).
Collaborate. But do NOT cheat.
• Assignments to be done individually.
• Do not share any part of code.
• Do not copy any part of report from any online resources or published works.
• If you reuse other’s works, always cite.
• If you discuss with others about assignment or outside your group for project, mention their names in
  the report.
• Do not use GenAI tools (like, ChatGPT).
We will check for pairwise plagiarism in submitted assignment code files among you all.
We will also check the probability of any submitted content being AI generated.
Project reports will be checked for plagiarism across all web resources.
                                                         Tanmoy Chakraborty                          LLMs: Introduction & Recent Advances
Course Content
• This is an advanced graduate course and we will be teaching and discussing state-of-
 the-art papers about large language models.
• The course is mostly presentation- and discussion-based and all the students are
 expected to come to the class regularly and participate in discussion
                                       Tanmoy Chakraborty                LLMs: Introduction & Recent Advances
Course Content
      Basics
•   Introduction
•   Intro to NLP
•   Intro to Language
    Models (LMs)
•   Word Embeddings
    (Word2Vec,
    GloVE)
•   Neural LMs (CNN,
    RNN, Seq2Seq,
    Attention)
                        Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Course Content
      Basics                Architecture
•   Introduction        •    Intro to
•   Intro to NLP             Transformer
•   Intro to Language   •    Decoder-only LM,
    Models (LMs)             Prefix LM,
                             Decoding
•   Word Embeddings          strategies
    (Word2Vec,
    GloVE)              •    Encoder-only LM,
                             Encoder-decoder
•   Neural LMs (CNN,         LM
    RNN, Seq2Seq,
    Attention)          •    Advanced
                             Attention
                        •    Mixture of Experts
                                                  Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Course Content
      Basics                Architecture              Learnability
•   Introduction        •    Intro to             •   Scaling laws
•   Intro to NLP             Transformer          •   Instruction fine-
•   Intro to Language   •    Decoder-only LM,         tuning
    Models (LMs)             Prefix LM,           •   In-context learning
                             Decoding
•   Word Embeddings          strategies           •   Alignment
    (Word2Vec,                                    •   Distillation and
    GloVE)              •    Encoder-only LM,
                             Encoder-decoder          PEFT
•   Neural LMs (CNN,         LM                   •   Efficient/Constraint
    RNN, Seq2Seq,                                     LM inference
    Attention)          •    Advanced
                             Attention
                        •    Mixture of Experts
                                                       Tanmoy Chakraborty    LLMs: Introduction & Recent Advances
Course Content
                            Architecture              Learnability           User Acceptability
      Basics
•   Introduction        •    Intro to             •   Scaling laws            •   RAG
•   Intro to NLP             Transformer          •   Instruction fine-       •   Multilingual LMs
•   Intro to Language   •    Decoder-only LM,         tuning                  •   Tool-augmented
    Models (LMs)             Prefix LM,           •   In-context learning         LMs
                             Decoding
•   Word Embeddings          strategies           •   Alignment               •   Reasoning
    (Word2Vec,                                    •   Distillation and        •   Vision Language
    GloVE)              •    Encoder-only LM,
                             Encoder-decoder          PEFT                        Models
•   Neural LMs (CNN,         LM                   •   Efficient/Constraint    •   Handling long
    RNN, Seq2Seq,                                     LM inference                context
    Attention)          •    Advanced
                             Attention                                        •   Model editing
                        •    Mixture of Experts
                                                       Tanmoy Chakraborty                            LLMs: Introduction & Recent Advances
Course Content
                                                      Learnability           User Acceptability       Ethics and Misc.
      Basics                Architecture
•   Introduction        •    Intro to             •   Scaling laws            •   RAG                  •   Bias, toxicity and
•   Intro to NLP             Transformer          •   Instruction fine-       •   Multilingual LMs         hallucination
•   Intro to Language   •    Decoder-only LM,         tuning                  •   Tool-augmented       •   Interpretability
    Models (LMs)             Prefix LM,           •   In-context learning         LMs                  •   Beyond
                             Decoding                                                                      Transformer: State
•   Word Embeddings          strategies           •   Alignment               •   Reasoning
    (Word2Vec,                                                                                             Space Models
                        •    Encoder-only LM,     •   Distillation and        •   Vision Language
    GloVE)                                            PEFT                        Models
                             Encoder-decoder
•   Neural LMs (CNN,         LM                   •   Efficient/Constraint    •   Handling long
    RNN, Seq2Seq,                                     LM inference                context
    Attention)          •    Advanced
                             Attention                                        •   Model editing
                        •    Mixture of Experts
                                                       Tanmoy Chakraborty                            LLMs: Introduction & Recent Advances
Pre-Requisites
             • Excitement about language!
             • Willingness to learn
                               Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Pre-Requisites
                 • Excitement about language!
                 • Willingness to learn
        Mandatory                           Desirable
        •   Data Structures & Algorithms    • NLP
        •   Machine Learning                • Deep learning
        •   Python programming
                                     Tanmoy Chakraborty       LLMs: Introduction & Recent Advances
Pre-Requisites
                             • Excitement about language!
                             • Willingness to learn
                 Mandatory                                  Desirable
                 •    Data Structures & Algorithms          • NLP
                 •    Machine Learning                      • Deep learning
                 •    Python programming
This course will NOT cover:
• Details of NLP (ELL884: https://sites.google.com/view/ell881), Machine Learning and Deep Learning
• Coding practice
• Generative models for modalities other than text
                                                     Tanmoy Chakraborty              LLMs: Introduction & Recent Advances
Reading and Reference Materials
• Books (optional reading)
      •   Speech and Language Processing, Dan Jurafsky and James H. Martin
      https://web.stanford.edu/~jurafsky/slp3/
      •   Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze
      •   Natural Language Processing, Jacob Eisenstein
      https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf
      •    A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg
      http://u.cs.biu.ac.il/~yogo/nnlp.pdf
• Journals
      •   Computational Linguistics, Natural Language Engineering, TACL, JMLR, TMLR, etc.
• Conferences
      •   ACL, EMNLP, NAACL, COLING, AAAI, IJCNLP, ICML, NeurIPS, ICLR, WWW, KDD, SIGIR, etc.
                                                 Tanmoy Chakraborty                         LLMs: Introduction & Recent Advances
Research Papers Repository
                                           https://aclanthology.org/
                      Tanmoy Chakraborty       LLMs: Introduction & Recent Advances
Research Papers Repository
                                           https://arxiv.org/list/cs.CL/recent
                      Tanmoy Chakraborty                LLMs: Introduction & Recent Advances
Acknowledgements (Non-exhaustive List)
• Advanced NLP, Graham Neubig http://www.phontron.com/class/anlp2022/
• Advanced NLP, Mohit Iyyer https://people.cs.umass.edu/~miyyer/cs685/
• NLP with Deep Learning, Chris Manning, http://web.stanford.edu/class/cs224n/
• Understanding Large Language Models, Danqi Chen https://www.cs.princeton.edu/courses/archive/fall22/cos597G/
• Natural Language Processing, Greg Durrett https://www.cs.utexas.edu/~gdurrett/courses/online-course/materials.html
• Large Language Models: https://stanford-cs324.github.io/winter2022/
• Natural Language Processing at UMBC, https://laramartin.net/NLP-class/
• Computational Ethics in NLP, https://demo.clab.cs.cmu.edu/ethical_nlp/
• Self-supervised models, CS 601.471/671: Self-supervised Models (jhu.edu)
• WING.NUS Large Language Models, https://wing-nus.github.io/cs6101/
• And many more…
                                                      Tanmoy Chakraborty                        LLMs: Introduction & Recent Advances
What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.
                                        Tanmoy Chakraborty             LLMs: Introduction & Recent Advances
What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.
                                            Language Model
                                                                            Vocabulary
                                                                     V = {arrived, delhi, have,
                                                                     is, monsoon, rains, the}
                                        Tanmoy Chakraborty               LLMs: Introduction & Recent Advances
What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.
             P(the monsoon rains
                 have arrived)                                       0.2
                                            Language Model
                                                                            Vocabulary
                                                                     V = {arrived, delhi, have,
                                                                     is, monsoon, rains, the}
                                        Tanmoy Chakraborty                 LLMs: Introduction & Recent Advances
What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.
             P(the monsoon rains
                 have arrived)                                       0.2
                                            Language Model
             P(monsoon the have
                                                                     0.001
                rains arrived)
                                                                            Vocabulary
                                                                     V = {arrived, delhi, have,
                                                                     is, monsoon, rains, the}
                                        Tanmoy Chakraborty                 LLMs: Introduction & Recent Advances
LMs can ‘Generate’ Text !
                        Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
LMs can ‘Generate’ Text !
       Vocabulary
V = {arrived, delhi, have,
                             Given input ‘the monsoon rains have’ , LM can calculate
is, monsoon, rains, the}              P(xi | the monsoon rains have) , ∀ xi ϵ V
                                                         Tanmoy Chakraborty            LLMs: Introduction & Recent Advances
LMs can ‘Generate’ Text !
       Vocabulary
V = {arrived, delhi, have,
                             Given input ‘the monsoon rains have’ , LM can calculate
is, monsoon, rains, the}              P(xi | the monsoon rains have) , ∀ xi ϵ V
                                                         Tanmoy Chakraborty            LLMs: Introduction & Recent Advances
LMs can ‘Generate’ Text !
       Vocabulary
V = {arrived, delhi, have,
                             Given input ‘the monsoon rains have’ , LM can calculate
is, monsoon, rains, the}              P(xi | the monsoon rains have) , ∀ xi ϵ V
                                         For generation, next token is sampled
                                           from this probability distribution
                                                          Tanmoy Chakraborty           LLMs: Introduction & Recent Advances
LMs can ‘Generate’ Text !
       Vocabulary
V = {arrived, delhi, have,
                             Given input ‘the monsoon rains have’ , LM can calculate
is, monsoon, rains, the}              P(xi | the monsoon rains have) , ∀ xi ϵ V
                                         For generation, next token is sampled
                                           from this probability distribution
                                                          Tanmoy Chakraborty           LLMs: Introduction & Recent Advances
LMs can ‘Generate’ Text !
       Vocabulary
V = {arrived, delhi, have,
                             Given input ‘the monsoon rains have’ , LM can calculate
is, monsoon, rains, the}              P(xi | the monsoon rains have) , ∀ xi ϵ V
                                         For generation, next token is sampled
Auto-regressive LMs calculate              from this probability distribution
this distribution efficiently, e.g.
using ‘Deep’ Neural Networks
                                                          Tanmoy Chakraborty           LLMs: Introduction & Recent Advances
        ‘Large’ Language Models
        The ‘Large’ in terms of model's size (# parameters) and massive size of training dataset.
                                                                                            Model sizes have
                                                                                            increased by an order of
                                                                                            5000x over just the last
                                                                                            4 years !!!
Image source: https://hellofuture.orange.com/en/the-gpt-3-language-model-revolution-or-evolution/
                                                                                                    Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
        ‘Large’ Language Models
        The ‘Large’ in terms of model's size (# parameters) and massive size of training dataset.
                                                                                            Model sizes have
                                                                                            increased by an order of
                                                                                            5000x over just the last
                                                                                            4 years !!!
                                                                                                                    Other recent models: PaLM (540B), OPT (175B), BLOOM
                                                                                                                    (176B), Gemini-Ultra (1.56T), GPT-4 (1.76T)
                                                                                                                         Disclaimer: For API-based models like GPT-4/Gemini-Ultra, the number of parameters are not
                                                                                                                         announced officially – these are rumored numbers as on the web
Image source: https://hellofuture.orange.com/en/the-gpt-3-language-model-revolution-or-evolution/
                                                                                                    Tanmoy Chakraborty                                                  LLMs: Introduction & Recent Advances
       LLMs in AI Landscape
Image source: https://www.manning.com/books/build-a-large-language-model-from-scratch
                                                                                        Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Evolution of
(L)LMs
                                                                                                               We will
                                                                                                               discuss
                                                                                                               about many
                                                                                                               of them in
                                                                                                               this course!
Image source: https://synthedia.substack.com/p/a-timeline-of-large-
language-model
                                                                      Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Post-Transformers Era
The LLM Race
Google Designed Transformers: But Could it Take
Advantage?
                       Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Google Designed Transformers: But Could it Take
Advantage?
                       Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Google Designed Transformers: But Could it Take
Advantage?
                                            The beginning of use of Transformer as Language
                                                        Representation Models.
                                                BERT achieved SOTA on 11 NLP tasks.
                       Tanmoy Chakraborty                               LLMs: Introduction & Recent Advances
Google Designed Transformers: But Could it Take
Advantage?
                                                             DistilBERT, TinyBERT, MobileBERT
                                            The beginning of use of Transformer as Language
                                                        Representation Models.
                                                BERT achieved SOTA on 11 NLP tasks.
                       Tanmoy Chakraborty                               LLMs: Introduction & Recent Advances
However, someone was waiting for the right
opportunity!!
                  Guess Who?
                       Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
However, someone was waiting for the right
opportunity!!
                       Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
OpenAI Started Pushing the Frontier
                       Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
OpenAI Started Pushing the Frontier
                       Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
OpenAI Started Pushing the Frontier
               •   Use of decoder-only architecture
               •   The idea of generative pre-training over large corpus
                                    Tanmoy Chakraborty                     LLMs: Introduction & Recent Advances
The Beginning of Scale
            •   GPT-1 (117 M) → GPT-2 (1.5 B) 13x increase in # parameters
            •   Minimal changes (some LayerNorms added, modified weight
                initialization)
            •   Increase in context length: GPT-1 (512 tokens) → GPT-2 (1024 tokens)
                                        Tanmoy Chakraborty                             LLMs: Introduction & Recent Advances
The Beginning of Scale
                              Performance boosts across tasks
            •   GPT-1 (117 M) → GPT-2 (1.5 B) 13x increase in # parameters
            •   Minimal Changes (some LayerNorms added, modified weight
                initialization)
            •   Increase in context length: GPT-1 (512 tokens) → GPT-2 (1024 tokens)
                                        Tanmoy Chakraborty                             LLMs: Introduction & Recent Advances
What Was Google Developing Parallelly?
                      Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
What Was Google Developing Parallelly?
           •   Similar broader goal of converting all text-based language problems
               into a text-to-text format.
           •   Used Encoder-Decoder Architecture.
           •   Pre-training strategy differs from GPT
                        • Strategy more similar to BERT
                                        Tanmoy Chakraborty                           LLMs: Introduction & Recent Advances
Was It Only Google vs OpenAI?
Where did Meta Stand?
                      Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Was It Only Google vs OpenAI?
Where did Meta Stand?
                      Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Was It Only Google vs OpenAI?
Where did Meta Stand?
   •   Replication study of BERT pretraining
   •   Measured the impact of many key
       hyperparameters and training data
       size.
   •   Found that BERT was significantly
       undertrained, and can match or
       exceed the performance of every
       model published after it.
                                               Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Was It Only Google vs OpenAI?
Where did Meta Stand?
   •   Replication study of BERT pretraining
   •   Measured the impact of many key
       hyperparameters and training data
       size.
   •   Found that BERT was significantly
       undertrained, and can match or
       exceed the performance of every
       model published after it.
                                               Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Was It Only Google vs OpenAI?
Where did Meta Stand?
   •   Replication study of BERT pretraining
   •   Measured the impact of many key                              •   Proposed methods to learn cross-
       hyperparameters and training data                                lingual language models (XLMs)
       size.                                                        •   Obtained SOTA on:
   •   Found that BERT was significantly                                   • cross-lingual classification
       undertrained, and can match or                                      • unsupervised and supervised
       exceed the performance of every                                       machine translation
       model published after it.
                                               Tanmoy Chakraborty                         LLMs: Introduction & Recent Advances
OpenAI Continues to Scale
                      Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
OpenAI Continues to Scale
                    175 B parameters !
                       Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
OpenAI Continues to Scale
                       175 B parameters !
                  OpenAI stops open-sourcing!!
                          Tanmoy Chakraborty     LLMs: Introduction & Recent Advances
Google Starts Scaling too (But is it Late) !
                        540 B parameters !
                           Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Google Starts Scaling too (But is it Late) !
                         540 B parameters !
                      Google follows OpenAI in
                      stopping open-sourcing !
                      It’s now the “LLM Race”
                            Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
2021-2022: A Flurry of LLMs
    Megatron-Turing
         NLG
                           Codex
                      Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Meta Promotes Open-sourcing !
                      Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Meta Promotes Open-sourcing !
                      Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
Meta Promotes Open-sourcing !
                 •    A suite of decoder-only pre-trained
                     transformers ranging from 125M to
                     175B parameters
                 •   Open-sourced !!!
                               Tanmoy Chakraborty           LLMs: Introduction & Recent Advances
The ChatGPT Moment
                 November 30, 2022
                     Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
2023: The Year of Rapid Pace
  Feb, 2023: Google releases Bard
                                    Tanmoy Chakraborty   LLMs: Introduction & Recent Advances
2023: The Year of Rapid Pace
  Feb, 2023: Google releases Bard   Feb, 2023: Meta releases its LLaMA
                                      family of open-source models
                                           Tanmoy Chakraborty            LLMs: Introduction & Recent Advances
2023: The Year of Rapid Pace
                                    Feb, 2023: Meta releases its LLaMA   March, 2023: Anthropic, a
  Feb, 2023: Google releases Bard                                        start-up founded in 2021 by
                                      family of open-source models
                                                                           ex-OpenAI researchers,
                                                                               releases Claude
                                           Tanmoy Chakraborty                  LLMs: Introduction & Recent Advances
2023: The Year of Rapid Pace
                                    Feb, 2023: Meta releases its LLaMA   March, 2023: Anthropic, a
  Feb, 2023: Google releases Bard                                        start-up founded in 2021 by
                                      family of open-source models
                                                                           ex-OpenAI researchers,
                                                                               releases Claude
   March, 2023: OpenAI releases
              GPT-4
                                           Tanmoy Chakraborty                  LLMs: Introduction & Recent Advances
2023: The Year of Rapid Pace
                                      Feb, 2023: Meta releases its LLaMA   March, 2023: Anthropic, a
  Feb, 2023: Google releases Bard                                          start-up founded in 2021 by
                                        family of open-source models
                                                                             ex-OpenAI researchers,
                                                                                 releases Claude
                                    Sept, 2023: Mistral AI
   March, 2023: OpenAI releases      releases Mistral-7B
              GPT-4                         model
                                              Tanmoy Chakraborty                 LLMs: Introduction & Recent Advances
2023: The Year of Rapid Pace
                                      Feb, 2023: Meta releases its LLaMA               March, 2023: Anthropic, a
  Feb, 2023: Google releases Bard                                                      start-up founded in 2021 by
                                        family of open-source models
                                                                                         ex-OpenAI researchers,
                                                                                             releases Claude
                                    Sept, 2023: Mistral AI
   March, 2023: OpenAI releases      releases Mistral-7B           Nov, 2023: xAI releases
              GPT-4                         model                          Grok
                                              Tanmoy Chakraborty                             LLMs: Introduction & Recent Advances
2023: The Year of Rapid Pace
                                      Feb, 2023: Meta releases its LLaMA               March, 2023: Anthropic, a
  Feb, 2023: Google releases Bard                                                      start-up founded in 2021 by
                                        family of open-source models
                                                                                         ex-OpenAI researchers,
                                                                                             releases Claude
                                    Sept, 2023: Mistral AI
                                     releases Mistral-7B           Nov, 2023: xAI releases       Dec, 2023: Google
   March, 2023: OpenAI releases
                                            model                          Grok                   releases Gemini
              GPT-4
                                              Tanmoy Chakraborty                             LLMs: Introduction & Recent Advances
And now we are in 2024 seeing even
    more rapid advancements !
Why Does This Course Exist?
Why Does This Course Exist?
Why do we need a separate course on LLMs? What changes with the scale of LMs?
                                                                Content credits: https://stanford-cs324.github.io/winter2022/
                                     Tanmoy Chakraborty               LLMs: Introduction & Recent Advances
Why Does This Course Exist?
Why do we need a separate course on LLMs? What changes with the scale of LMs?
                               Emergence
                                                                Content credits: https://stanford-cs324.github.io/winter2022/
                                     Tanmoy Chakraborty               LLMs: Introduction & Recent Advances
Why Does This Course Exist?
Why do we need a separate course on LLMs? What changes with the scale of LMs?
                                  Emergence
Although the technical machineries are almost similar, ‘just scaling up’ these models
results in new emergent behaviors, which lead to significantly different capabilities and
societal impacts.
                                                                      Content credits: https://stanford-cs324.github.io/winter2022/
                                         Tanmoy Chakraborty                 LLMs: Introduction & Recent Advances
Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
                                                                           Content credits: https://stanford-cs324.github.io/winter2022/
                                               Tanmoy Chakraborty                LLMs: Introduction & Recent Advances
Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
    • In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
      (without separate task-specific fine-tuning).
        • In-context learning is an example of emergent behavior.
                                                                                       Content credits: https://stanford-cs324.github.io/winter2022/
                                                        Tanmoy Chakraborty                   LLMs: Introduction & Recent Advances
Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
    • In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
      (without separate task-specific fine-tuning).
        • In-context learning is an example of emergent behavior.
LLMs are widely adopted in real-world.
                                                                                       Content credits: https://stanford-cs324.github.io/winter2022/
                                                        Tanmoy Chakraborty                   LLMs: Introduction & Recent Advances
Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
    • In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
      (without separate task-specific fine-tuning).
        • In-context learning is an example of emergent behavior.
LLMs are widely adopted in real-world.
    • Research: LLMs have transformed NLP research world, achieving state-of-the-art performance across a wide
      range of tasks such as sentiment classification, question answering, summarization, and machine translation.
                                                                                        Content credits: https://stanford-cs324.github.io/winter2022/
                                                        Tanmoy Chakraborty                    LLMs: Introduction & Recent Advances
Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
    • In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
      (without separate task-specific fine-tuning).
        • In-context learning is an example of emergent behavior.
LLMs are widely adopted in real-world.
    • Research: LLMs have transformed NLP research world, achieving state-of-the-art performance across a wide
      range of tasks such as sentiment classification, question answering, summarization, and machine translation.
    • Industry: Here is a very incomplete list of some high profile large language models that are being used in
      production systems:
        • Google Search (BERT)
        • Facebook content moderation (XLM)
        • Microsoft’s Azure OpenAI Service (GPT-3/3.5/4)
                                                                                        Content credits: https://stanford-cs324.github.io/winter2022/
                                                        Tanmoy Chakraborty                    LLMs: Introduction & Recent Advances
Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
                                                                        Content credits: https://stanford-cs324.github.io/winter2022/
                                                Tanmoy Chakraborty            LLMs: Introduction & Recent Advances
Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
    • Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but are not
      factually correct.
        • Significant challenge for high-stakes applications like healthcare
                                                                                         Content credits: https://stanford-cs324.github.io/winter2022/
                                                          Tanmoy Chakraborty                   LLMs: Introduction & Recent Advances
Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
    • Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but are not
      factually correct.
        • Significant challenge for high-stakes applications like healthcare
    • Social bias: Most LLMs show performance disparities across demographic groups, and their predictions can
      enforce stereotypes.
        • P(He is a doctor) > P(She is a doctor.)
        • Training data contains inherent bias
                                                                                         Content credits: https://stanford-cs324.github.io/winter2022/
                                                          Tanmoy Chakraborty                   LLMs: Introduction & Recent Advances
Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
    • Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but are not
      factually correct.
        • Significant challenge for high-stakes applications like healthcare
    • Social bias: Most LLMs show performance disparities across demographic groups, and their predictions can
      enforce stereotypes.
        • P(He is a doctor) > P(She is a doctor.)
        • Training data contains inherent bias
    • Toxicity: LLMs can generate toxic/hateful content.
        • Trained on a huge amount of Internet data (e.g., Reddit), which inevitably contains offensive content
        • Challenge for applications such as writing assistants or chatbots
                                                                                                     Content credits: https://stanford-cs324.github.io/winter2022/
                                                          Tanmoy Chakraborty                               LLMs: Introduction & Recent Advances
Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
    • Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but are not
      factually correct.
         • Significant challenge for high-stakes applications like healthcare
    • Social bias: Most LLMs show performance disparities across demographic groups, and their predictions can
      enforce stereotypes.
         • P(He is a doctor) > P(She is a doctor.)
         • Training data contains inherent bias
    • Toxicity: LLMs can generate toxic/hateful content.
         • Trained on a huge amount of Internet data (e.g., Reddit), which inevitably contains offensive content
         • Challenge for applications such as writing assistants or chatbots
    • Security: LLMs are trained on a scrape of the public Internet - anyone can put up a website that can enter the
      training data.
         • An attacker can perform a data poisoning attack.                                           Content credits: https://stanford-cs324.github.io/winter2022/
                                                           Tanmoy Chakraborty                               LLMs: Introduction & Recent Advances
We Will Cover Almost All of These in 5 Modules
Module-1: Basics
   • A refresher on the basics of NLP required to understand and appreciate LLMs
   • How did we end up in Neural NLP?
       • We will discuss the transition and the foundations of Neural NLP.
                                                                                                      Intro to Language
                                                                                Intro to NLP
   • The basics of Language Modelling                                                                   Models (LMs)
   • Initial Neural LMs
                                                                                                     Neural LMs (CNN,
                                                                              Word Embeddings
                                                                                                      RNN, Seq2Seq,
                                                                             (Word2Vec, GloVE)
                                                                                                        Attention)
                                                  Tanmoy Chakraborty                      LLMs: Introduction & Recent Advances
We Will Cover Almost All of These in 5 Modules
• Module-2: Architecture
   • Workings of Vanilla Transformers
   • Different Transformer Variants
                                                                                                            Decoder-only LM,
       • How do their training strategies differ? How are Masked LMs (like, BERT)                              Prefix LM,
                                                                                    Intro to Transformer
                                                                                                               Decoding
         different from Auto-regressive LMs (like, GPT)?                                                       strategies
   • Response generation (Decoding) strategies
                                                                                     Encoder-only LM,
                                                                                     Encoder-decoder       Advanced Attention
   • What makes modern open-source LLMs like LLaMA & Mistral more                          LM
     effective over vanilla transformers?
       • An in-depth exploration of the advanced attention mechanisms                           Mixture of Experts
   • Mixture-of-Experts: an effective architectural choice in modern LLMs
                                                     Tanmoy Chakraborty                           LLMs: Introduction & Recent Advances
We Will Cover Almost All of These in 5 Modules
• Module-3: Learnability
   • Scaling Laws: how does performance vary with scale of LMs? When does ‘emergence’ kick in?
   • What makes modern LLMs so good in following user instructions?
   • What is In-context Learning? What are its various facets?              Scaling laws
                                                                                                   Instruction fine-
                                                                                                        tuning
   • How are LLMs made to generate responses preferred by humans?
       • Does it remove toxicity in responses?
                                                                         In-context learning          Alignment
   • Efficiency is crucial in production systems.
       • How are smaller LMs made capable using pre-trained LLMs?
                                                                          Distillation and       Efficient/Constraint
       • How are LLMs efficiently fine-tuned?                                   PEFT                 LM inference
       • How are response generation latency of LLMs improved?
                                                    Tanmoy Chakraborty                       LLMs: Introduction & Recent Advances
We Will Cover Almost All of These in 5 Modules
• Module-4: User Acceptability
   • How can we make LLMs aware of certain relevant facts while generation?
   • Can LLMs operate in multiple languages?
                                                                             Retrieval-
   • Can LLMs reason?                                                       Augmented
                                                                          Generation (RAG)
                                                                                             Multilingual LMs       Reasoning
   • Can usage of external tools help LLMs perform better?
                                                                          Tool-augmented     Vision Language       Handling long
   • Can LLMs handle multiple modalities, like image?                           LMs               Models             context
       • What changes are required in their architecture to do so?
   • How much long inputs can LLMs candle?                                                    Model editing
       • How can we increase their context length?
   • Can we edit model components to mitigate certain issues in LLMs?
                                                     Tanmoy Chakraborty                               LLMs: Introduction & Recent Advances
We Will Cover Almost All of These in 5 Modules
• Module-5: Ethics and Miscellaneous
   • A discussion on ethical issues and risks of LLM usage
   • How are different emergent abilities in LLMs facilitated?
       • A peep into the internal workings of LLMs to understand the source of
                                                                                 Bias, toxicity and
         their capabilities                                                                              Interpretability
                                                                                   hallucination
   • Can LMs based on alternate architecture match Transformer-based
                                                                                                 Beyond
     LLMs?                                                                                  Transformer: State
                                                                                              Space Models
       • State-Space Models (SSMs)
                                                  Tanmoy Chakraborty                         LLMs: Introduction & Recent Advances
Suggestions (For Effective Learning)
• To understand the concepts clearly, experiment with the models (Hugging Face makes life easier).
• Smaller models (like, GPT2) can be run on Google Colab / Kaggle.
    • Even 7B models can be run with proper quantization.
                                      Always get your hands dirty !
             LLM Research is all about implementing and experimenting with your ideas.
                                                  Tanmoy Chakraborty                LLMs: Introduction & Recent Advances
Suggestions (For Effective Learning)
• To understand the concepts clearly, experiment with the models (Hugging Face makes life easier).
• Smaller models (like, GPT2) can be run on Google Colab / Kaggle.
    • Even 7B models can be run with proper quantization.
                                    Rule of thumb:
             Nevelr believe           in any hypothesis until your
                                  Always get your hands dirty !
                              experiments            verify    it !
             LLM Research is all about implementing and experimenting with your ideas.
                                                  Tanmoy Chakraborty                LLMs: Introduction & Recent Advances