0% found this document useful (0 votes)
4 views3 pages

NLP Experiment 3

Uploaded by

laali34569
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

NLP Experiment 3

Uploaded by

laali34569
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

EXPERIMENT -3

AIM: To construct a Part-of-Speech (POS) tagger using the Hidden Markov Model (HMM) and
implement the Viterbi algorithm to decode the most probable sequence of tags for a given sentence.

DESCRIPTION: Part of Speech (POS) tagging is the process of assigning a grammatical category
to each word in a sentence based on its role in the context. These categories include noun, verb,
adjective, adverb, preposition, and others. POS tagging helps computers understand the structure of
language, making it easier for them to process and analyze text.

Example:
Sentence: “The cat is sitting on the mat.”
POS tags:
 The → Determiner
 cat → Noun
 is → Verb
 sitting → Verb
 on → Preposition
 the → Determiner
 mat → Noun
POS tagging with Hidden Markov Model:
A Hidden Markov Model is a statistical model used to understand systems where the actual states
are not directly visible, but we can see outcomes that depend on those hidden states. In simple
terms, HMM helps us figure out what is going on behind the scenes based on what we observe.
example. Suppose you are trying to guess the weather each day, but you can't look outside. Instead,
you watch what people do. If someone is walking, shopping, or cleaning, these actions give you
clues about what the weather might be like.
Step 1: Define the Training Data
We start by creating a small dataset of sentences. Each word is labeled with its correct part of
speech.

train_data = [

[("the", "DET"), ("cat", "NOUN"), ("sat", "VERB")],


[("the", "DET"), ("dog", "NOUN"), ("barked", "VERB")],
[("a", "DET"), ("dog", "NOUN"), ("sat", "VERB")],
]

Step 2: Calculate Probabilities


This step builds the statistical foundation for the HMM. The model counts how likely:
 A sentence starts with each tag
 One tag follows another (transition)
 A word appears with a tag (emission)
Then it converts these counts into probabilities, which are used by the Viterbi algorithm later.

from collections import defaultdict


import math
transition = defaultdict(lambda: defaultdict(int))
emission = defaultdict(lambda: defaultdict(int))
start_prob = defaultdict(int)
tag_counts = defaultdict(int)

for sentence in train_data:


prev_tag = None
for i, (word, tag) in enumerate(sentence):
tag_counts[tag] += 1
emission[tag][word] += 1

if i == 0:
start_prob[tag] += 1
else:
transition[prev_tag][tag] += 1
prev_tag = tag

def normalize(d):
total = sum(d.values())
return {k: v / total for k, v in d.items()}

start_prob = normalize(start_prob)
for tag in emission:
emission[tag] = normalize(emission[tag])
for prev in transition:
transition[prev] = normalize(transition[prev])

Step 3: Define the Viterbi Algorithm


This is the Viterbi algorithm. It uses the probabilities from Step 2 to figure out the most likely
sequence of tags for a given sentence. At each word, it checks all possible tags and selects the
path that has the highest probability so far. It continues this process until the end of the
sentence and returns the best sequence of POS tags.
def viterbi(sentence, states, start_p, trans_p, emit_p):
V = [{}]
path = {}

for state in states:


V[0][state] = start_p.get(state, 0) * emit_p[state].get(sentence[0], 1e-6)
path[state] = [state]

for t in range(1, len(sentence)):


V.append({})
new_path = {}

for curr_state in states:


max_prob, prev_state = max(
(V[t - 1][y0] * trans_p[y0].get(curr_state, 1e-6) *
emit_p[curr_state].get(sentence[t], 1e-6), y0)
for y0 in states
)
V[t][curr_state] = max_prob
new_path[curr_state] = path[prev_state] + [curr_state]
path = new_path

n = len(sentence) - 1
prob, final_state = max((V[n][y], y) for y in states)
return path[final_state]
Step 4: Run the Tagger on a New Sentence
Now we test the HMM model on a new sentence. Even though the model hasn’t seen this exact
sentence before, it will use what it learned to predict the most likely part of speech tags for each
word.
test_s = ["a", "cat", "barked"]
states = list(tag_counts.keys())

predicted_tags = viterbi(test_s, states, start_prob, transition, emission)


print("Sentence:", test_s)
print("Predicted Tags:", predicted_tags)

OUTPUT:
Sentence: ['a', 'cat', 'barked']
Predicted Tags: ['DET', 'NOUN', 'VERB']

You might also like