0% found this document useful (0 votes)

54 views48 pages

Lecture 4 Word Representation

The document discusses various methods for representing words in Natural Language Processing, focusing on techniques like Word2Vec and its Skipgram model. It highlights the importance of context in understanding word meanings and introduces concepts like Point-wise Mutual Information for measuring word similarities. Additionally, it touches on the evolution from manual word relationships to dense representations that capture semantic and syntactic nuances.

Uploaded by

Himanshu Ranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views48 pages

Lecture 4 Word Representation

Uploaded by

Himanshu Ranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

DS 207: Introduction to

Natural Language Processing

Representing Words
Danish Pruthi
Quick feedback: how is the pace of teaching?

2
Quick feedback: how is the pace of teaching?
• Too slow 🥱

2
Quick feedback: how is the pace of teaching?
• Too slow 🥱
• Just about right 😎

2
Quick feedback: how is the pace of teaching?
• Too slow 🥱
• Just about right 😎
• Too fast 🤕

2
Representing words

3
Representing words
• Ideally, we want to represent the meaning (or idea) conveyed by that word

3
Representing words
• Ideally, we want to represent the meaning (or idea) conveyed by that word
• Synonyms (and related words) should be represented similarly

3
Representing words
• Ideally, we want to represent the meaning (or idea) conveyed by that word
• Synonyms (and related words) should be represented similarly
• WordNet

3
WordNet: Hyponyms & Hypernyms

• Not scalable, subjective, labor intensive, can not compute similarities …

4
"You shall know a word by the company it keeps"
John Rupert Firth, 1957

• I offered her some random

• It took some time for my random to brew

• I can not live without drinking random in the morning

5
Word {vectors/representation/embedding}

-3.2 -3.8
-2.9 -2
coffee 1.0 tea 1.1
2.2 2.3
0.6 0.5
… …

6
Word2vec

7
Word2vec
• Method for learning word vectors
(Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, Jeffrey Dean. NeurIPS 2013)

• Key idea:
• Use a large corpus of text
• For center word vc and context ("outside") words be vo
• Use the similarity of word vectors vc and vo to compute the probability of the word
being used in the context (and vice versa)

• Keep adjusting (aka learning) the word vectors to optimize the probability

7
Skipgram model: predicting context

I can not live without drinking coffee in the morning as …

I offered her a cup of coffee to drink

It took some time for my coffee to brew

8
Skipgram model: predicting context

I can not live without drinking coffee in the morning as …

I offered her a cup of coffee to drink

It took some time for my coffee to brew

Free supervision (aka self supervision)!

8
Skipgram objective

∏
L(θ) = P(context | wt; θ)
t=1

T T

∏ ∏
L(θ) = P(wt+j | wt; θ)
t=1 −m≤j≤m
j≠0

9
Skipgram objective

P(wt+j | wt; θ) or … P(wo | wc) or … P(o | c)

T
exp(uo vc)
P(o | c) =
∑w∈V exp(uw vc)
T

10
Skipgram objective w/ negative sampling
• For efficiency, one can use negative sampling:

11
Skipgram objective w/ negative sampling
• For efficiency, one can use negative sampling:

∏ ∏
argmaxθ P(D = 1 | c, o; θ) P(D = 0 | c, o; θ)
c,o ∈ D c,o ∈ D′

11
Skipgram objective w/ negative sampling
• For efficiency, one can use negative sampling:

∏ ∏
argmaxθ P(D = 1 | c, o; θ) P(D = 0 | c, o; θ)
c,o ∈ D c,o ∈ D′

T T
∏ ∏
argmaxθ σ(uo vc) (1 − σ(uo vc))
c,o ∈ D c,o ∈ D′

11
Skipgram objective w/ negative sampling
• For efficiency, one can use negative sampling:

∏ ∏
argmaxθ P(D = 1 | c, o; θ) P(D = 0 | c, o; θ)
c,o ∈ D c,o ∈ D′

T T
∏ ∏
argmaxθ σ(uo vc) (1 − σ(uo vc))
c,o ∈ D c,o ∈ D′

T T
∑ ∑
argmaxθ log σ(uo vc) log σ(−uo vc)
c,o ∈ D c,o ∈ D′

11
Continuous Bag-of-words
(Mikolov et al. 2013)
• Predict word based on sum of surrounding embeddings
Continuous Bag-of-words
(Mikolov et al. 2013)
• Predict word based on sum of surrounding embeddings
giving a *** at the
Continuous Bag-of-words
(Mikolov et al. 2013)
• Predict word based on sum of surrounding embeddings
giving a *** at the

lookup lookup lookup lookup

Continuous Bag-of-words
(Mikolov et al. 2013)
• Predict word based on sum of surrounding embeddings
giving a *** at the

lookup lookup lookup lookup

+ + +
Continuous Bag-of-words
(Mikolov et al. 2013)
• Predict word based on sum of surrounding embeddings
giving a *** at the

lookup lookup lookup lookup

+ + +
=
Continuous Bag-of-words
(Mikolov et al. 2013)
• Predict word based on sum of surrounding embeddings
giving a *** at the

lookup lookup lookup lookup

+ + +
=

W =

scores
Continuous Bag-of-words
(Mikolov et al. 2013)
• Predict word based on sum of surrounding embeddings
giving a *** at the

lookup lookup lookup lookup

+ + +
=

W = softmax

scores probs
Continuous Bag-of-words
(Mikolov et al. 2013)
• Predict word based on sum of surrounding embeddings
giving a *** at the

lookup lookup lookup lookup

+ + +
= talk

W = softmax loss

scores probs
Skip-gram
(Mikolov et al. 2013)
• Predict each word in the context given the word
Skip-gram
(Mikolov et al. 2013)
• Predict each word in the context given the word

talk
lookup
Skip-gram
(Mikolov et al. 2013)
• Predict each word in the context given the word

talk
lookup

W =
Skip-gram
(Mikolov et al. 2013)
• Predict each word in the context given the word

talk giving
lookup

a
W =
at
loss
the
After training … word analogies
• Man : Woman : : King : ??
• Big : Biggest : : Bad : ??
• Man: Programmer : : Woman : ??

14
After training …

15
Extensions to phrases
• w2v("British Airways") - w2v("Britain") + w2v("India") ≈ w2v("Air India")
• w2v("Steve Balmer") - w2v("Microsoft") + w2v("Google") ≈ w2v("Larry Page")

16
Also additive compositionality

17
Word similarities

WordSimilarity-353

18
Count-based words vectors
• We've studied two extremes so far,
• One hot vectors, and
• Dense word embeddings

19
Co-occurence matrices

Figure from Jurafsky & Martin

• Which similarity metric is a better choice? Dot product or cosine similarity?

sim(cherry, information) = 0.018
sim(digital, information) = 0.996

20
Point-wise Mutual Information

21
Point-wise Mutual Information
P(w, c)
PMI(w, c) = log2
P(w)P(c)

Positive Point-wise Mutual Information

P(w, c)
PPMI(w, c) = max(log2 , 0)
P(w)P(c)

21
Point-wise Mutual Information

Values from Jurafsky & Martin

Speech & Language Processing
22
Point-wise Mutual Information

23
Summary of word representations
• We started with word identities

• Manual efforts to write down meanings, relationships among words

• Dense representations based on the idea that meanings of words can be inferred from
the context in which they occur

• Capture interesting semantic and syntactic relationships, but also biases!

• Other count-based methods

3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
NLP Word Vectors for Students
No ratings yet
NLP Word Vectors for Students
33 pages
Wordembed
No ratings yet
Wordembed
31 pages
cs224n 2025 Lecture02 Wordvecs2
No ratings yet
cs224n 2025 Lecture02 Wordvecs2
46 pages
Word2vector Paper PDF
No ratings yet
Word2vector Paper PDF
9 pages
Word Vectors I
No ratings yet
Word Vectors I
23 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
NLP with Deep Learning for Students
No ratings yet
NLP with Deep Learning for Students
45 pages
NLP Deep Learning for Students
No ratings yet
NLP Deep Learning for Students
57 pages
NLP Word Embeddings Explained
No ratings yet
NLP Word Embeddings Explained
55 pages
Vector Semantics and Embedding (Part 2)
No ratings yet
Vector Semantics and Embedding (Part 2)
47 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Unit 2
No ratings yet
Unit 2
15 pages
11.chapter8 WordEmbedding
No ratings yet
11.chapter8 WordEmbedding
17 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
Word Embedding
No ratings yet
Word Embedding
35 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Word Embedding
No ratings yet
Word Embedding
9 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
L4 Cse256 Fa24 We
No ratings yet
L4 Cse256 Fa24 We
68 pages
Dan Jurafsky and James Martin Speech and Language Processing
No ratings yet
Dan Jurafsky and James Martin Speech and Language Processing
46 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
Lecture1 Word Embeddings
No ratings yet
Lecture1 Word Embeddings
99 pages
Neural Network Language Models
No ratings yet
Neural Network Language Models
23 pages
11 Word 2 Vec
No ratings yet
11 Word 2 Vec
21 pages
07 Word Embeddings Notes
No ratings yet
07 Word Embeddings Notes
23 pages
Word2Vec for NLP Enthusiasts
100% (1)
Word2Vec for NLP Enthusiasts
12 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
Foundations of Text Representation, LLMs and Transformers
No ratings yet
Foundations of Text Representation, LLMs and Transformers
87 pages
CS490 Advanced Topics in Computing - Deep Learning
No ratings yet
CS490 Advanced Topics in Computing - Deep Learning
20 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
Word 2 Vec
No ratings yet
Word 2 Vec
22 pages
Neural Embedding for NLP Experts
No ratings yet
Neural Embedding for NLP Experts
9 pages
12 Subrata DL
No ratings yet
12 Subrata DL
25 pages
Lab 5
No ratings yet
Lab 5
27 pages
NLP Using Deep Learning Handson
No ratings yet
NLP Using Deep Learning Handson
7 pages
5b. Word Vectors
No ratings yet
5b. Word Vectors
24 pages
Levy Improving Distributional
No ratings yet
Levy Improving Distributional
16 pages
Word 2 Vec
No ratings yet
Word 2 Vec
29 pages
Week 5
No ratings yet
Week 5
26 pages
Chapter II
No ratings yet
Chapter II
26 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
Intro to NLP & Word Vectors
No ratings yet
Intro to NLP & Word Vectors
42 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
8 pages
Word Vectors for NLP Students
No ratings yet
Word Vectors for NLP Students
34 pages
Neural Models For NLP
No ratings yet
Neural Models For NLP
67 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Week 2 and 3
No ratings yet
Week 2 and 3
76 pages
Lecture 2 Generative Text Classification W Annotations
No ratings yet
Lecture 2 Generative Text Classification W Annotations
27 pages
Lecture 14 Post Training Annotations
No ratings yet
Lecture 14 Post Training Annotations
79 pages
Lecture 1 Course Overview
No ratings yet
Lecture 1 Course Overview
41 pages
Lecture 10 Seq To Seq Annotations
No ratings yet
Lecture 10 Seq To Seq Annotations
68 pages
Computer Graphics and OpenGL-18CS55-M5 NOTES
No ratings yet
Computer Graphics and OpenGL-18CS55-M5 NOTES
29 pages
Computer Graphics & OpenGL Basics
No ratings yet
Computer Graphics & OpenGL Basics
41 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Module 3 Notes
No ratings yet
Module 3 Notes
34 pages
Module 4 Notes
No ratings yet
Module 4 Notes
36 pages
Module 1
No ratings yet
Module 1
104 pages
Module 2
No ratings yet
Module 2
157 pages
Graph Algorithms for Students
No ratings yet
Graph Algorithms for Students
47 pages
IOI Training 2017 - Week 3 Dynamic Programming: Vernon Gutierrez March 2017
No ratings yet
IOI Training 2017 - Week 3 Dynamic Programming: Vernon Gutierrez March 2017
12 pages
IOI Training Phillipines
No ratings yet
IOI Training Phillipines
4 pages
NOI - PH 2017 Training Week 9: Kevin Charles Atienza
No ratings yet
NOI - PH 2017 Training Week 9: Kevin Charles Atienza
6 pages
NOI - PH Training: Week 4: Jared Guissmo Asuncion
No ratings yet
NOI - PH Training: Week 4: Jared Guissmo Asuncion
22 pages
Training Guide Final
No ratings yet
Training Guide Final
34 pages
Graph Theory Basics
No ratings yet
Graph Theory Basics
25 pages
Advanced Data Structures Training
No ratings yet
Advanced Data Structures Training
6 pages
Week8 PDF
No ratings yet
Week8 PDF
44 pages
Prejudice in Discourse
No ratings yet
Prejudice in Discourse
18 pages
Pronunciation 10d1
No ratings yet
Pronunciation 10d1
4 pages
Spotlight Cae TB
89% (9)
Spotlight Cae TB
188 pages
Lesson Plan - Flipped Lesson La Fecha
No ratings yet
Lesson Plan - Flipped Lesson La Fecha
3 pages
English Mock Exam 2018
No ratings yet
English Mock Exam 2018
14 pages
Dr. Dolittle JR
100% (3)
Dr. Dolittle JR
49 pages
Uas Bing 2023
No ratings yet
Uas Bing 2023
16 pages
Upper Intermediate Unit 5a PDF
100% (2)
Upper Intermediate Unit 5a PDF
2 pages
Punjab Technical University: Scheme& Syllabus of
No ratings yet
Punjab Technical University: Scheme& Syllabus of
45 pages
1 Phil Iri
No ratings yet
1 Phil Iri
84 pages
5th Grade Poetry Lessons
No ratings yet
5th Grade Poetry Lessons
16 pages
DAY 7 - Lesson 5 Logic
No ratings yet
DAY 7 - Lesson 5 Logic
7 pages
Teacher's Book: Grammar & Vocabulary
No ratings yet
Teacher's Book: Grammar & Vocabulary
115 pages
Negotiating International Business
No ratings yet
Negotiating International Business
13 pages
Zonal School Cultural Events 2022-23
No ratings yet
Zonal School Cultural Events 2022-23
6 pages
The Relationship Between Culture and Communication
No ratings yet
The Relationship Between Culture and Communication
7 pages
Planap - Communiceting Emotions PDF
No ratings yet
Planap - Communiceting Emotions PDF
27 pages
Basic English Grammar Course
No ratings yet
Basic English Grammar Course
169 pages
Action Script For Making A Good Presentation
No ratings yet
Action Script For Making A Good Presentation
2 pages
Assessment of Central Auditory Processing-1
No ratings yet
Assessment of Central Auditory Processing-1
96 pages
Listen To A Chant and Match The Monsters With Their Bedroom. Match The Monsters With Their Bedroom
No ratings yet
Listen To A Chant and Match The Monsters With Their Bedroom. Match The Monsters With Their Bedroom
12 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
Ga T6 6C
No ratings yet
Ga T6 6C
3 pages
Bank Po Syllabus C128fbaf
No ratings yet
Bank Po Syllabus C128fbaf
6 pages
Ic 1-5
No ratings yet
Ic 1-5
14 pages
SSC Canteen Attendant 2018 Syllabus - General Intelligence
No ratings yet
SSC Canteen Attendant 2018 Syllabus - General Intelligence
3 pages
Copyreading and Headline Writing 1
95% (22)
Copyreading and Headline Writing 1
53 pages
ING-U2 Sesión 2
No ratings yet
ING-U2 Sesión 2
26 pages
Grade 1 Phonics: Magic e Lesson
100% (1)
Grade 1 Phonics: Magic e Lesson
16 pages
Malay Manual
67% (3)
Malay Manual
130 pages

Lecture 4 Word Representation

Uploaded by

Lecture 4 Word Representation

Uploaded by

DS 207: Introduction to

Natural Language Processing

• Not scalable, subjective, labor intensive, can not compute similarities …

• I offered her some random

• It took some time for my random to brew

• I can not live without drinking random in the morning

I can not live without drinking coffee in the morning as …

I offered her a cup of coffee to drink

It took some time for my coffee to brew

I can not live without drinking coffee in the morning as …

I offered her a cup of coffee to drink

It took some time for my coffee to brew

Free supervision (aka self supervision)!

P(wt+j | wt; θ) or … P(wo | wc) or … P(o | c)

lookup lookup lookup lookup

lookup lookup lookup lookup

lookup lookup lookup lookup

lookup lookup lookup lookup

lookup lookup lookup lookup

lookup lookup lookup lookup

Figure from Jurafsky & Martin

• Which similarity metric is a better choice? Dot product or cosine similarity?

Positive Point-wise Mutual Information

Values from Jurafsky & Martin

• Manual efforts to write down meanings, relationships among words

• Capture interesting semantic and syntactic relationships, but also biases!

• Other count-based methods

You might also like