CHAPTER – 2
Language Modeling and Part of Speech
Tagging
Subject: NLP Prepared By:
Asst. Prof. Chaitali Bhoi
Code: 3170723 CE, NIT
Language Model
• Predicting is difficult—especially about the
future
• But how about predicting something that
seems much easier, like the next few words
someone is going to say?
• Ex: hey, hi !! , How are……you?
Language Model
• In the following sections we will formalize this
intuition by introducing models that assign a
probability to each possible next word.
• The same models will also serve to assign a
probability to an entire sentence.
Language Model
• example, could predict that the following
sequence has a much higher probability of
appearing in a text
1 all of a sudden I notice three guys standing on
the sidewalk - it makes sense
2 on guys all I of notice sidewalk three a sudden
standing the
Application
• speech recognition
• spelling correction
• grammatical error correction
• machine translation
Language Model
• Models that assign probabilities of sequences
and words are called language models or LMs.
• we introduce the simplest model that assigns
probabilities to sentences and sequences of
words, the n-gram
N-gram
• An n-gram is a sequence of n words
• 2-gram(bigram) - is a two-word sequence of
words
• 3-gram(trigram) - is a three-word sequence of
words
• n-gram models to estimate the probability of the
last word of an n-gram given the previous words,
and also to assign probabilities to entire
sequences.
Conditional Probability
• We will use the P(A|B)notation to represent
the conditional probability of A given that the
event B has occurred. B is the “conditioning
event.”
Conditional Probability Example
• Suppose that of all individuals buying a certain
digital camera, 60% include an optional
memory card in their purchase, 40% include
an extra battery, and 30% include both a card
and battery Given that the selected individual
purchased an extra battery, the probability
that an optional card was also purchased is
Conditional Probability solution
• A = {memory card purchased}
• B = {battery purchased}
• P(A)=0.60
• P(B)=0.40
• P(A intersection B) = 0.30
• That is, of all those purchasing an extra
battery, 75% purchased an optional memory
card
Joint Probability
Joint Probability Example
Bigram
• <s> I am Jack </s>
• <s> Jack am I </s>
• <s> Jack I like </s>
• <s> Jack I do like </s>
• <s> Do I like Jack </s>
• Assume we use Bigram
• find Probable word
• 1)Jack ... 2)Jack I do....
• 3) Jack I am Jack ....
• 4) do I like Jack ....
• find sentence probability
• 1) I like Jack
• 2) Jack like nothing
Evaluating Language Models
• extrinsic evaluation
• intrinsic evaluation
extrinsic evaluation
• The best way to evaluate the performance of a
language model is to embed it in an
application and measure how much the
application improves. Such end-to-end
evaluation is called extrinsic evaluation
intrinsic evaluation
• An intrinsic evaluation metric is one that mea-
intrinsic evaluation sures the quality of a
model independent of any application.
Smoothing Techniques
• Add-1 / Laplace
• Add-K
• Backoff and Interpolation
• Advanced:
• Good- Turing
• Kneser-Ney
Add-1 / Laplace Smoothing
• Add-1 smoothing
• Equation:
• Unigram: Ci + 1 / N + V
• Bigram:
• PAdd-1(wn|wn−1) = C(wn−1wn) +1 /
C(wn−1) +1V
Add-K Smoothing
• Add-K smoothing , k=0.5
• Equation:
• Unigram: Ci + k / N + kV
• Bigram:
• PAdd-k(wn|wn−1) = C(wn−1wn) +k /
C(wn−1) +kV
Example
• I live in India
• I live in
• I live
• in India
• India I live
• I in
• I love to live in India
Unigram
• I == 5 + 0.5 / 16 + 0.5(4) = 0.3
• Love == 0 + 0.5 / 18 = 0.02
• to == 0.02
• live == 4+0.5/18 = 0.25
• in == 0.25
• India ==0.19
Bigram
• I love == 0+0.5 / 5+0.5(4) =0.07
• love to == 0+0.5 / 0+2=0.25
• to live == 0+0.5/ 0+2=0.25
• live in ==2+0.5/ 4+2=0.4
• in India == 2+0.5/ 4+2=0.4
Backoff and Interpolation
• Backoff
• Use n-1 gram instead of n gram if find
probability zero for n gram model.
Backoff and Interpolation
• Interpolation
• Pˆ(wn|wn−2wn−1) = λ1P(wn|wn−2wn−1)
• +λ2P(wn|wn−1)
• +λ3P(wn)
• such that the λs sum to 1
• λi = 1
• I Love to == (0.2)(0) + 0.4.(0) + 0.4(0) = 0
• love to live= 0.2*0+0.4*0+0.4*4=1.6
• to live in = 0.2*0+0.4*0.5+0.4*4=1.8
• live in india= 1.5
• 0.2* to live in india/ to live in == 0.2 *(0/0) = 0
• 0.2 * live in India / live in = 0.2 * (1/2) =
0.2*0.5 =0.1
• 0.3 * in India / in = 0.3 * (2/4) = 0.3*0.5=0.15
• 0.3 * India = 0.3*3=0.9
• 0.1* to live in india/ to live in == 0.1 *(0/0) = 0
• 0.1 * live in India / live in = 0.1 * (1/2) =
0.1*0.5 =0.05
• 0.4 * in India / in = 0.4 * (2/4) = 0.1*0.5=0.05
• 0.4 * India = 0.7*3=2.1
• 2.2
KNESER- Ney smoothing
POS Tagging
• Part of Speech Tagging - defined as the process
of assigning one of the parts of speech to the
given word.
• POS tagging is a task of labelling each word in a
sentence with its appropriate part of speech.
• parts of speech include nouns, verb, adverbs,
adjectives, pronouns, conjunction and their sub-
categories
POS Tagging
• noun --> place/ name/ organization name
• madal verb --> will, can , could, should, might,
must
• verb -> action // run,eat,speak, listen ,do
• adjective--> quality describe for noun
Rule-based
• Rule-based taggers use dictionary or lexicon
for getting possible tags for tagging each
word. If the word has more than one possible
tag, then rule-based taggers use hand-written
rules to identify the correct tag.
Stochastic
• Word frequency
• Sentence sequences
Transmission based
• Combination of rule based and stochastic.
POS Tagging Example
• Collecting Data- labeled data
• Create lookup table- tag each word with most
common POS
• Tag our sentence statement.
Hidden Markov Model (HMM)
• Need two things :
• Emission Probabilities:
• How likely Jane will be Noun, or Modal or Verb
• Transition Probabilities:
• How likely is Noun followed by modal which is
followed by a verb
Morphology
• Study of how words can be created.
Different type of words:
- Have exact meaning: pen, board, phone
- Combination of different meaningful word:
showcase(show + case), useless(use + case)
- Have no meaning: ing, s, es
Morphology parsing
• Collect morphene ( small meaningful word
unit which further not divided) from world.
• Morphene
– Steam word
– Affix (suffix: loved , prefix: reform infix: passersby)
Morphology parser
• Lexicon – information stored like which word
is stem word and affix formation
• Morphotactics – which word is suitable for
before word / after word / in between.
– It is set of rules.
– Ex: three words : Use Able Ness
– From this meaningful word is: useableness
Morphology parser
• Orthographic Rules – used to change words
• Speeling rules
• Ex: lady + s = ladys (not proper word)
• lady + s = ladies ( proper word)
Types of Morphene
• 1) Free morphene : independent word
• Two types of free morphene.
A) Lexical : pictured / visual words like: pen, book,
yellow, eyes
B) Grammatical : AND, OR, NOT
Types of Morphene
• 2) Bound – combined with free morphene and
make meaningful word.
– Ex: love + ing = loving
• Two types of Bound morphene:
• A) Inflection – if words added to free
morphene and POS tag will not be changed.
– Ex: cat(Noun) + s = cats(Noun)
Types of Morphene
B) Derivational
Class Changing – POS tag Change
Ex: danger (Noun) + ous = dangerous(ADJ.)
Class maaintain – word change not POS
Ex: law(Noun) + yers = lawyers(Noun)