0% found this document useful (0 votes)

17 views60 pages

01 Introduction

Uploaded by

Igor Viveiros Souza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views60 pages

01 Introduction

Uploaded by

Igor Viveiros Souza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

UNIVERSIDADE*FEDERAL

DE*MINAS*GERAIS

Advanced Seminars on Large Language Models

Introduction to LLMs
Rodrygo L. T. Santos
rodrygo@dcc.ufmg.br
Silhouette of a human female on the left
and a humanoid AI on the right; a white
wire connects their brains through their
mouths symbolizing communication.
By DALL·E 3
Language

A natural ability for humans

◦ Effortless use for communication
◦ Expressive of thoughts, emotions, instructions
A challenge for machines
◦ Ambiguity, context-dependency, nuanced semantics
A milestone towards AGI?
Credit: Shutterstock
Credit: Forbes
Credit: The Verge
By Codex
Credit: The Register
By DALL·E 3
Photorealistic closeup video of two
pirate ships battling each other as
they sail inside a cup of coffee.
By SORA
Language model

A probability distribution over word sequences

◦ 𝑃(“Today is Wednesday”) » 0.001
◦ 𝑃(“Today Wednesday is”) » 0.0000000000001
◦ 𝑃(“The eigenvalue is positive”) » 0.00001
Also a mechanism for “generating” text
◦ 𝑃(“Wednesday”|“Today is”) > 𝑃(“blah”|“Today is”)
Language model

Ideal (aka full dependence) model

◦ 𝑃 𝑤! … 𝑤" = 𝑃 𝑤! 𝑃 𝑤# 𝑤! … 𝑃 𝑤" 𝑤! … 𝑤"$!
Infeasible in practice
◦ Expensive computation
◦ Poor estimation (data sparsity)
Evolution of language models

Statistical LMs
(1950s-1990s)
Tunable dependence via n-grams

3-gram (”trigram”)
◦ 𝑃 𝑤! … 𝑤" = 𝑃 𝑤! 𝑃 𝑤# 𝑤! … 𝑃 𝑤" 𝑤"$# , 𝑤"$!
2-gram (”bigram”)
◦ 𝑃 𝑤! … 𝑤" = 𝑃 𝑤! 𝑃 𝑤# 𝑤! … 𝑃 𝑤" 𝑤"$!
1-gram (”unigram”)
◦ 𝑃 𝑤! … 𝑤" = 𝑃 𝑤! 𝑃(𝑤# ) … 𝑃(𝑤" )
Improved estimation via smoothing

𝑃(𝑤)

maximum
likelihood
estimation

smoothed
estimation

word 𝑤
Evolution of language models

Statistical LMs Neural LMs

(1950s-1990s) (2013)
Neurons
output Improved word-level representation
𝑤
%# ◦ From sparse to distributional semantics
◦ Better generalization to unseen data
dense Context still lacking
network ◦ Fixed-length input and output
◦ Non-sequential representation

𝑤!:#$!
input
Neurons… with recurrence!
output

𝑤
%# 𝑤
%! 𝑤
%& 𝑤
%'

recurrent dense ℎ% dense ℎ! dense ℎ&

network network network network

ℎ#$!

𝑤#$! 𝑤% 𝑤! 𝑤&
input
Neurons… with recurrence!
output Sequential bless
𝑤
%# ◦ Dynamic state maintains linguistic context
◦ Enables handling variable-length sequences
recurrent Sequential curse
network ◦ Single state as information bottleneck
ℎ#$! ◦ Inherently non-parallelizable

𝑤#$!
input
Evolution of language models

Statistical LMs Neural LMs Pretrained LMs

(1950s-1990s) (2013) (2018)
Vaswani et al. (NIPS 2017)
Neurons… with attention!
output

𝑤
%#

attention The animal didn’t cross the street because it was too ______

𝑤!:#$!
input
Neurons… with attention!
output

𝑤
%#

attention The animal didn’t cross the street because it was too ______

𝑤!:#$!
input
Neurons… with attention!
output

𝑤
%#

attention The animal didn’t cross the street because it was too ______

𝑤!:#$!
input
Neurons… with attention!
output

𝑤
%#

attention The animal didn’t cross the street because it was too ______

𝑤!:#$!
input
Neurons… with attention!
output

𝑤
%#

attention
attn The animal didn’t cross the street because it was too scared

𝑤!:#$!
input
Attention is (not) all you need
𝑤
)#

Prediction: select best output; decode

dense
Enrichment: attend to multiple contexts; add nonlinearities
attn ℎ

Preparation: tokenize; mark position; encode

𝑤!:#$!
Transformer
𝑤
)# Effective representation
◦ Can attend to entire context – no bottleneck
◦ Attention heads as representation subspaces
dense ◦ Order retained via positional encoding
Decoder
attn ℎ Efficient processing
𝑛
◦ Parallelization across tokens and heads
◦ Much faster training and inference
𝑤!:#$!
◦ Scalability to massive training datasets
Transformer architectures
𝑤
)!:% 𝑤
)# 𝑤
)#

Encoder Encoder Decoder Decoder

𝑛 𝑛 𝑛 𝑛

𝑤!:% 𝑤!:#$! 𝑤!:#$!

encoder-only encoder-decoder decoder-only

(e.g. BERT (2018)) (e.g. T5 (2019)) (e.g. GPT (2018))
The power of transfer learning

Self-supervised pretraining (expensive)

◦ Standard language modeling objective
◦ Train on massive textual corpora
Supervised fine-tuning (cheap)
◦ Multiple task-specific objectives
◦ Improved performance downstream
Evolution of language models

Statistical LMs Neural LMs Pretrained LMs Large LMs

(1950s-1990s) (2013) (2018) (2020)
Model size vs. time

GPT-1 GPT-2
117M 1.5B

2018 2019 2020 2021 2022 2023 2024

Model size vs. time

GPT-1 GPT-2 GPT-3

117M 1.5B 175B

2018 2019 2020 2021 2022 2023 2024

Model size vs. time

Are you guys

still there?

GPT-1 GPT-2 GPT-3 GPT-4

117M 1.5B 175B 1.76T*

2018 2019 2020 2021 2022 2023 2024

Model size vs. time
Advent of the Transformer
Availability of massive datasets
Access to powerful computing

GPT-4
1.76T*

2018 2019 2020 2021 2022 2023 2024

Credit: Google
Credit: Google
Credit: Mistral AI
Credit: Mistral AI
Credit: Anthropic
Credit: Anthropic
Credit: Reuters
The power of scaling

LLMs show improved performance with scale

◦ Increased model size (in trillions of parameters)
◦ Increased training size (in trillions of tokens)
Improvements in next token prediction
◦ But also in unforeseen capabilities!
Instruction following
PROMPT COMPLETION

Classify this review: Positive

I loved this film!
Sentiment:

LLM
Instruction following
PROMPT COMPLETION

Classify this review: received a very nice

I loved this film! book review
Sentiment:

LLM
In-context learning
PROMPT COMPLETION

Classify this review: Positive

I don’t like this chair!
Sentiment: Negative

Classify this review: LLM

I loved this film!
Sentiment:
Basic, emerging, augmented capabilities!
The challenges of scaling

System challenges
◦ Substantial compute and energy consumption
◦ Continual learning and adaptation
Data challenges
◦ Data quality and representativeness
◦ Low-resource domains and languages
The challenges of scaling

Human challenges
◦ Responsible alignment
◦ Interpretability and explainability
◦ Privacy and security
Course goals

Understand the fundamentals of LLMs

Explore the capabilities and limitations of LLMs
Keep up with the current state of the field
Have a grasp of where the field is headed
Course scope

LLM architectures – Transformers and beyond

LLM lifecycle
◦ Pretraining: data preparation, objectives
◦ Adaptation: instruction, alignment, PEFT/MEFT
◦ Utilization: prompting, in-context, augmentation
◦ Evaluation: language, downstream
Course structure (tentative)
Intro lectures by instructor Week Mon Wed
Paper seminars by students 18/03 G1 G2
◦ 1 group per class 25/03 G3 G4
(rotate every 2 weeks) 01/04 G1 G2
◦ 2 papers per group 08/04 G3 G4
(30min + 20min discussion) 15/04 G1 G2
◦ 2 students per paper 22/04 G3 G4
Course structure (tentative)
Week Mon Wed
Final paper list and 18/03 G1 G2
25/03 G3 G4
seminar schedule will
01/04 G1 G2
be available later 08/04 G3 G4
today for enrollment 15/04 G1 G2
22/04 G3 G4
Course grading

Seminar presentations
◦ 3x 20% = 60%
Seminar feedback
◦ 21x 1% = 21%
Class participation
◦ 21x 1% = 21%
Course attendance

❝
Os créditos relativos a cada disciplina só
serão conferidos ao aluno que obtiver, no
mínimo, o conceito D e que comprovar efetiva
frequência a, no mínimo, 75% (setenta e cinco
por cento) das atividades em que estiver
matriculado, vedado o abono de faltas.
NGPG, art. 65
Course materials: books & surveys

Build a Large Language Model (from Scratch)

by Raschka (2024)
Large Language Models: A Survey
by Minaee et al. (2024)
A Comprehensive Overview of Large Language Models
by Naveed et al. (2024)
Course materials: books & surveys

Efficient Large Language Models: A Survey

by Wan et al. (2024)
A Survey of Large Language Models
by Zhao et al. (2023)
Course materials: courses and tutorials

Generative AI with Large Language Models

by DeepLearning.AI / AWS
Large Language Models
by Databricks
Neural Networks: Zero to Hero
by Karpathy
Pre-course survey

Fill in a short survey describing your past experience

and expectations related to the course
◦ https://forms.gle/7mcatGc5LtAFM2ta7
UNIVERSIDADE*FEDERAL
DE*MINAS*GERAIS

Coming next…

Architecture of LLMs
Rodrygo L. T. Santos
rodrygo@dcc.ufmg.br

Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
BCS Document
No ratings yet
BCS Document
6 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
IJRPR29621
No ratings yet
IJRPR29621
7 pages
LLMs: Applications & Challenges
No ratings yet
LLMs: Applications & Challenges
30 pages
Training Large Language Models
No ratings yet
Training Large Language Models
7 pages
Lec 1
No ratings yet
Lec 1
11 pages
Scalexm - Ai: A Compact Guide To Large Language Models
No ratings yet
Scalexm - Ai: A Compact Guide To Large Language Models
9 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
AComprehensive Overviewof Large Language Models
No ratings yet
AComprehensive Overviewof Large Language Models
36 pages
Evolution of Large Language Models
No ratings yet
Evolution of Large Language Models
32 pages
Pranay Report
No ratings yet
Pranay Report
26 pages
Question-Answer System On Medical Domain With LLMS Using Various Fine-Tuning Methods
No ratings yet
Question-Answer System On Medical Domain With LLMS Using Various Fine-Tuning Methods
15 pages
Large Language Models and Their Use Cases
No ratings yet
Large Language Models and Their Use Cases
3 pages
A Survey of Large Language Models
No ratings yet
A Survey of Large Language Models
140 pages
Pranay Report-1
No ratings yet
Pranay Report-1
36 pages
A Survey of Large Language Models
No ratings yet
A Survey of Large Language Models
144 pages
A Survey of Large Language Models
No ratings yet
A Survey of Large Language Models
140 pages
A214 Ayush Nigam Seminar-1
No ratings yet
A214 Ayush Nigam Seminar-1
16 pages
Understanding Large Language Models (LLMS)
No ratings yet
Understanding Large Language Models (LLMS)
2 pages
LLMs
No ratings yet
LLMs
10 pages
Report - PDF 20240827 210738 0000
No ratings yet
Report - PDF 20240827 210738 0000
23 pages
Large Language Models (LLMS) : Survey, Technical Frameworks, and Future Challenges
No ratings yet
Large Language Models (LLMS) : Survey, Technical Frameworks, and Future Challenges
51 pages
Understanding Large Language Models (LLMS) - A Mode
No ratings yet
Understanding Large Language Models (LLMS) - A Mode
3 pages
Generative Artificial Intelligence - Opportunities and Challenges of Large Language Models - SpringerLink
No ratings yet
Generative Artificial Intelligence - Opportunities and Challenges of Large Language Models - SpringerLink
8 pages
Know Thy Frenemy
No ratings yet
Know Thy Frenemy
40 pages
ChatGPT in The Age of Generative AI and Large Lang
No ratings yet
ChatGPT in The Age of Generative AI and Large Lang
60 pages
Paper 1
No ratings yet
Paper 1
44 pages
LLM Compact Guide
No ratings yet
LLM Compact Guide
9 pages
Build An LLM Application From Scratch MEAP 2 - Hamza Farooq
No ratings yet
Build An LLM Application From Scratch MEAP 2 - Hamza Farooq
161 pages
LLMs: Challenges, Ethics, and Impact
No ratings yet
LLMs: Challenges, Ethics, and Impact
17 pages
Suggested Topics For Your LLM
No ratings yet
Suggested Topics For Your LLM
2 pages
LLM - Michael R Douglas
No ratings yet
LLM - Michael R Douglas
47 pages
LLMs: A Comprehensive Review
No ratings yet
LLMs: A Comprehensive Review
36 pages
(2303.18223) A Survey of Large Language Models
No ratings yet
(2303.18223) A Survey of Large Language Models
115 pages
Compact Guide To Large Language Models
No ratings yet
Compact Guide To Large Language Models
9 pages
Fai Unit-5 TB
No ratings yet
Fai Unit-5 TB
7 pages
LLM in Ai
No ratings yet
LLM in Ai
35 pages
LLMs: Applications and Challenges
No ratings yet
LLMs: Applications and Challenges
31 pages
A Review On Large Language Models Archit
No ratings yet
A Review On Large Language Models Archit
32 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
Seminar
No ratings yet
Seminar
14 pages
2 Notes
No ratings yet
2 Notes
3 pages
A Survey of Large Language Models
No ratings yet
A Survey of Large Language Models
124 pages
LLMs: A Research Community Overview
No ratings yet
LLMs: A Research Community Overview
37 pages
Large Language Models: Overview & Challenges
No ratings yet
Large Language Models: Overview & Challenges
31 pages
Large Language Models
No ratings yet
Large Language Models
3 pages
D 02 Large Language Models
100% (1)
D 02 Large Language Models
58 pages
2025 P12 The Architecture of Language Understanding The Mechanics Behind Llms
No ratings yet
2025 P12 The Architecture of Language Understanding The Mechanics Behind Llms
19 pages
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
No ratings yet
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
12 pages
Mod 4
No ratings yet
Mod 4
69 pages
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir Online PDF
100% (3)
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir Online PDF
115 pages
Large Language Models (LLMS) - Architecture, Training, Applications, and Challenges
No ratings yet
Large Language Models (LLMS) - Architecture, Training, Applications, and Challenges
5 pages
A Bibliometric Review of Large Language Models Research From 2017 To 2023
No ratings yet
A Bibliometric Review of Large Language Models Research From 2017 To 2023
36 pages
An Analysis of Large Language Models: Their Impact and Potential Applications
No ratings yet
An Analysis of Large Language Models: Their Impact and Potential Applications
24 pages
Hardware Acceleration of LLMS: A Comprehensive Survey and Comparison
No ratings yet
Hardware Acceleration of LLMS: A Comprehensive Survey and Comparison
15 pages
Marking Guideline
No ratings yet
Marking Guideline
12 pages
Basic Lesson Final Test A1
No ratings yet
Basic Lesson Final Test A1
5 pages
Vision 1 Video Scripts
No ratings yet
Vision 1 Video Scripts
3 pages
Praying in Tongues Guide
No ratings yet
Praying in Tongues Guide
4 pages
Group Project
No ratings yet
Group Project
13 pages
PREPARE 3 Vocabulary Plus Unit 08
No ratings yet
PREPARE 3 Vocabulary Plus Unit 08
3 pages
PREPOSITIONS-2 Question.
No ratings yet
PREPOSITIONS-2 Question.
1 page
Elements of Poetry Overview
No ratings yet
Elements of Poetry Overview
24 pages
Premchand The Child
No ratings yet
Premchand The Child
10 pages
MLP Elective Language Courses
No ratings yet
MLP Elective Language Courses
2 pages
Vigenere Cipher C#
No ratings yet
Vigenere Cipher C#
4 pages
Jumanji TG
No ratings yet
Jumanji TG
4 pages
UNIT 2 - Syaharani
No ratings yet
UNIT 2 - Syaharani
3 pages
MSC (Cyber Security) Program Structure & Syllabus
No ratings yet
MSC (Cyber Security) Program Structure & Syllabus
126 pages
Afrikaans in Soweto
No ratings yet
Afrikaans in Soweto
5 pages
Unit4 B
No ratings yet
Unit4 B
3 pages
Diaspora Identity and Religion New Directions in Theory and Research Transnationalism 1st Edition Waltraud Kokot Complete Edition
100% (2)
Diaspora Identity and Religion New Directions in Theory and Research Transnationalism 1st Edition Waltraud Kokot Complete Edition
133 pages
Moore Childhood in Victorian Literature
No ratings yet
Moore Childhood in Victorian Literature
17 pages
Gerunds Possessives PDF
0% (1)
Gerunds Possessives PDF
2 pages
IRIS Radar Manual
No ratings yet
IRIS Radar Manual
120 pages
A Ghostly Wife Class 11 Long Questions - WBCHSE Semester 2
No ratings yet
A Ghostly Wife Class 11 Long Questions - WBCHSE Semester 2
4 pages
Coal Lab 8
No ratings yet
Coal Lab 8
5 pages
Carol of The Bells
No ratings yet
Carol of The Bells
2 pages
Evans Levinson BBS Response
No ratings yet
Evans Levinson BBS Response
21 pages
ch02 DataPresVisualisation
No ratings yet
ch02 DataPresVisualisation
28 pages
List #2: HOMOPHONES - Crossword Labs
No ratings yet
List #2: HOMOPHONES - Crossword Labs
2 pages
101 Google Tips, Tricks and Hacks
No ratings yet
101 Google Tips, Tricks and Hacks
3 pages
(Ebook) P-47 Thunderbolt in Action - Aircraft No. 208 by Larry Davis ISBN 9780897475426, 0897475429 PDF Download
80% (5)
(Ebook) P-47 Thunderbolt in Action - Aircraft No. 208 by Larry Davis ISBN 9780897475426, 0897475429 PDF Download
74 pages
Imm 065
No ratings yet
Imm 065
3 pages
3
No ratings yet
3
21 pages

01 Introduction

Uploaded by

01 Introduction

Uploaded by

UNIVERSIDADE*FEDERAL

Advanced Seminars on Large Language Models

A natural ability for humans

A probability distribution over word sequences

Ideal (aka full dependence) model

Statistical LMs Neural LMs

recurrent dense ℎ% dense ℎ! dense ℎ&

Statistical LMs Neural LMs Pretrained LMs

Prediction: select best output; decode

Preparation: tokenize; mark position; encode

Encoder Encoder Decoder Decoder

𝑤!:% 𝑤!:#$! 𝑤!:#$!

encoder-only encoder-decoder decoder-only

Self-supervised pretraining (expensive)

Statistical LMs Neural LMs Pretrained LMs Large LMs

2018 2019 2020 2021 2022 2023 2024

GPT-1 GPT-2 GPT-3

2018 2019 2020 2021 2022 2023 2024

Are you guys

GPT-1 GPT-2 GPT-3 GPT-4

2018 2019 2020 2021 2022 2023 2024

2018 2019 2020 2021 2022 2023 2024

LLMs show improved performance with scale

Classify this review: Positive

Classify this review: received a very nice

Classify this review: Positive

Classify this review: LLM

Understand the fundamentals of LLMs

LLM architectures – Transformers and beyond

Build a Large Language Model (from Scratch)

Efficient Large Language Models: A Survey

Generative AI with Large Language Models

Fill in a short survey describing your past experience

You might also like