0% found this document useful (0 votes)

10 views20 pages

1.pos Tagging 1

The document discusses Part-Of-Speech (POS) tagging in Natural Language Processing (NLP), detailing its importance for syntactic analysis and word sense disambiguation. It covers various POS tagsets, ambiguity in tagging, and different approaches to POS tagging, including rule-based and learning-based methods. Additionally, it introduces sequence labeling problems and probabilistic sequence models like Hidden Markov Models (HMM) for handling interdependent classifications.

Uploaded by

msddevprofessional

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views20 pages

1.pos Tagging 1

Uploaded by

msddevprofessional

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Natural Language Processing:

Part-Of-Speech Tagging,
Sequence Labeling, and
Hidden Markov Models (HMMs)

•NLP - POS and HMM 1

Part Of Speech Tagging

• Annotate each word in a sentence with a

part-of-speech marker.
• Lowest level of syntactic analysis.
John saw the saw and decided to take it to the table.
NNP VBD DT NN CC VBD TO VB PRP IN DT NN

• Useful for subsequent syntactic parsing and

word sense disambiguation.

•NLP - POS and HMM 2

English POS Tagsets

• Original Brown corpus used a large set of

87 POS tags.
• Most common in NLP today is the Penn
Treebank set of 45 tags.
– Tagset used in these slides.
– Reduced from the Brown set for use in the
context of a parsed corpus (i.e. treebank).
• The C5 tagset used for the British National
Corpus (BNC) has 61 tags.
•NLP - POS and HMM 3
English Parts of Speech
• Noun (person, place or thing)
– Singular (NN): dog, fork
– Plural (NNS): dogs, forks
– Proper (NNP, NNPS): John, Springfields
– Personal pronoun (PRP): I, you, he, she, it
– Wh-pronoun (WP): who, what
• Verb (actions and processes)
– Base, infinitive (VB): eat
– Past tense (VBD): ate
– Gerund (VBG): eating
– Past participle (VBN): eaten
– Non 3rd person singular present tense (VBP): eat
– 3rd person singular present tense: (VBZ): eats
– Modal (MD): should, can
– To (TO): to (to eat)
•NLP - POS and HMM 4
English Parts of Speech (cont.)
• Adjective (modify nouns)
– Basic (JJ): red, tall
– Comparative (JJR): redder, taller
– Superlative (JJS): reddest, tallest
• Adverb (modify verbs)
– Basic (RB): quickly
– Comparative (RBR): quicker
– Superlative (RBS): quickest
• Preposition (IN): on, in, by, to, with
• Determiner:
– Basic (DT) a, an, the
– WH-determiner (WDT): which, that
• Coordinating Conjunction (CC): and, but, or,
• Particle (RP): off (took off), up (put up)

•NLP - POS and HMM 5

Closed vs. Open Class

• Closed class categories are composed of a

small, fixed set of grammatical function
words for a given language.
– Pronouns, Prepositions, Modals, Determiners,
Particles, Conjunctions
• Open class categories have large number of
words and new ones are easily invented.
– Nouns (Googler, textlish), Verbs (Google),
Adjectives (geeky), Abverb (automagically)

•NLP - POS and HMM 6

Ambiguity in POS Tagging

• “Like” can be a verb or a preposition

– I like/VBP candy.
– Time flies like/IN an arrow.
• “Around” can be a preposition, particle, or
adverb
– I bought it at the shop around/IN the corner.
– I never got around/RP to getting a car.
– A new Prius costs around/RB $25K.

•NLP - POS and HMM 7

POS Tagging Process
• Usually assume a separate initial tokenization process that
separates and/or disambiguates punctuation, including
detecting sentence boundaries.
• Degree of ambiguity in English (based on Brown corpus)
– 11.5% of word types are ambiguous.
– 40% of word tokens are ambiguous.
• Average POS tagging disagreement amongst expert human
judges for the Penn treebank was 3.5%
– Based on correcting the output of an initial automated tagger,
which was deemed to be more accurate than tagging from scratch.
• Baseline: Picking the most frequent tag for each specific
word type gives about 90% accuracy
– 93.7% if use model for unknown words for Penn Treebank tagset.

•NLP - POS and HMM 8

POS Tagging Approaches
• Rule-Based: Human crafted rules based on lexical
and other linguistic knowledge.
• Learning-Based: Trained on human annotated
corpora like the Penn Treebank.
– Statistical models: Hidden Markov Model (HMM),
Maximum Entropy Markov Model (MEMM),
Conditional Random Field (CRF)
– Rule learning: Transformation Based Learning (TBL)
– Neural networks: Recurrent networks like Long Short
Term Memory (LSTMs)
• Generally, learning-based approaches have been
found to be more effective overall, taking into
account the total amount of human expertise and
effort involved. •NLP - POS and HMM 9
Classification Learning
• Typical machine learning addresses the problem
of classifying a feature-vector description into a
fixed number of classes.
• There are many standard learning methods for this
task:
– Decision Trees and Rule Learning
– Naïve Bayes and Bayesian Networks
– Logistic Regression / Maximum Entropy (MaxEnt)
– Perceptron and Neural Networks
– Support Vector Machines (SVMs)
– Nearest-Neighbor / Instance-Based

•NLP - POS and HMM 10

Beyond Classification Learning
• Standard classification problem assumes
individual cases are disconnected and independent
(i.i.d.: independently and identically distributed).
• Many NLP problems do not satisfy this
assumption and involve making many connected
decisions, each resolving a different ambiguity,
but which are mutually dependent.
• More sophisticated learning and inference
techniques are needed to handle such situations in
general.

•NLP - POS and HMM 11

Sequence Labeling Problem
• Many NLP problems can viewed as sequence
labeling.
• Each token in a sequence is assigned a label.
• Labels of tokens are dependent on the labels of
other tokens in the sequence, particularly their
neighbors (not i.i.d).

foo bar blam zonk zonk bar blam

•NLP - POS and HMM 12

Information Extraction
• Identify phrases in language that refer to specific types of
entities and relations in text.
• Named entity recognition is task of identifying names of
people, places, organizations, etc. in text.
people organizations places
– Michael Dell is the CEO of Dell Computer Corporation and lives
in Austin Texas.
• Extract pieces of information relevant to a specific
application, e.g. used car ads:
make model year mileage price
– For sale, 2002 Toyota Prius, 20,000 mi, $15K or best offer.
Available starting July 30, 2006.

•NLP - POS and HMM 13

Semantic Role Labeling
• For each clause, determine the semantic role
played by each noun phrase that is an
argument to the verb.
agent patient source destination instrument
– John drove Mary from Austin to Dallas in his
Toyota Prius.
– The hammer broke the window.
• Also referred to a “case role analysis,”
“thematic analysis,” and “shallow semantic
parsing”
•NLP - POS and HMM 14
Bioinformatics

• Sequence labeling also valuable in labeling

genetic sequences in genome analysis.
extron intron
– AGCTAACGTTCGATACGGATTACAGCCT

•NLP - POS and HMM 15

Problems with Sequence Labeling as
Classification
• Not easy to integrate information from
category of tokens on both sides.
• Difficult to propagate uncertainty between
decisions and “collectively” determine the
most likely joint assignment of categories to
all of the tokens in a sequence.

•NLP - POS and HMM 16

Probabilistic Sequence Models

• Probabilistic sequence models allow

integrating uncertainty over multiple,
interdependent classifications and
collectively determine the most likely
global assignment.
• Two standard models
– Hidden Markov Model (HMM)
– Conditional Random Field (CRF)

•NLP - POS and HMM 17

Markov Model / Markov Chain

• A finite state machine with probabilistic

state transitions.
• Makes Markov assumption that next state
only depends on the current state and
independent of previous history.

•NLP - POS and HMM 18

Sample Markov Model for POS

0.1

Det Noun
0.5
0.95
0.9
stop
0.05 Verb
0.25
0.1
PropNoun 0.8
0.4
0.5 0.1
0.25
0.1
start
•NLP - POS and HMM 19
Refer POS and Basic HMM and proceed
this Example

•NLP - POS and HMM 20

Pos Tagging
No ratings yet
Pos Tagging
84 pages
POS Tagging and HMM in NLP
No ratings yet
POS Tagging and HMM in NLP
93 pages
POStagging
No ratings yet
POStagging
72 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
NLP Session 6
No ratings yet
NLP Session 6
5 pages
Module-5 (Markov Model and Pos Tagging)
No ratings yet
Module-5 (Markov Model and Pos Tagging)
66 pages
Machine Learning Natural Language 2023
No ratings yet
Machine Learning Natural Language 2023
28 pages
9.chapter7 POS Tagging
No ratings yet
9.chapter7 POS Tagging
37 pages
Sample
No ratings yet
Sample
8 pages
NLP Ia2
No ratings yet
NLP Ia2
18 pages
POS Tagging for NLP Enthusiasts
No ratings yet
POS Tagging for NLP Enthusiasts
47 pages
Lec3-Posner Intro
No ratings yet
Lec3-Posner Intro
30 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
NLP 4
No ratings yet
NLP 4
83 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
01 NLP Unit 4 Part 1
No ratings yet
01 NLP Unit 4 Part 1
25 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
2025-NLP-Lecture 05 - Sequence Labeling For Parts of Speech and Name Entities
No ratings yet
2025-NLP-Lecture 05 - Sequence Labeling For Parts of Speech and Name Entities
69 pages
Mod 1
No ratings yet
Mod 1
71 pages
CH-2 Natural Language Processing Models and Algorithm
No ratings yet
CH-2 Natural Language Processing Models and Algorithm
119 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
POS HMM Viterbi Algo 2025
No ratings yet
POS HMM Viterbi Algo 2025
52 pages
NLP Programming en 04 HMM
No ratings yet
NLP Programming en 04 HMM
24 pages
NLP PT1 - Syllabus25 (1) - 250820 - 123838
No ratings yet
NLP PT1 - Syllabus25 (1) - 250820 - 123838
10 pages
Pos Tagging
No ratings yet
Pos Tagging
128 pages
Lesson 3 Natural Language Understanding Techniques
No ratings yet
Lesson 3 Natural Language Understanding Techniques
89 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
PoS Tagging and HMM in NLP
No ratings yet
PoS Tagging and HMM in NLP
50 pages
2 cs626 Pos Tagging Week of 1aug22
No ratings yet
2 cs626 Pos Tagging Week of 1aug22
57 pages
Corpus Analysis
No ratings yet
Corpus Analysis
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
27 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
17 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
94 pages
Project Report
No ratings yet
Project Report
12 pages
Core Components of Natural Language Processing
No ratings yet
Core Components of Natural Language Processing
43 pages
Chapter Two Natural Language Processing
No ratings yet
Chapter Two Natural Language Processing
141 pages
NLP Soln
No ratings yet
NLP Soln
19 pages
Lecture 20-23 Part of Speech Tagging
No ratings yet
Lecture 20-23 Part of Speech Tagging
36 pages
POS Tagging: Techniques and Challenges
No ratings yet
POS Tagging: Techniques and Challenges
75 pages
Unit-3.Word Level Analysis AIML
No ratings yet
Unit-3.Word Level Analysis AIML
5 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Speech Recognition Systems Guide
No ratings yet
Speech Recognition Systems Guide
13 pages
Lecture6 2022
No ratings yet
Lecture6 2022
101 pages
AIYA Session 3 Presentation
No ratings yet
AIYA Session 3 Presentation
40 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
NLP Insem Notes
No ratings yet
NLP Insem Notes
13 pages
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
No ratings yet
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
108 pages
Pos Tagging and Chunking
No ratings yet
Pos Tagging and Chunking
29 pages
NLPChapter 3
No ratings yet
NLPChapter 3
14 pages
NLP Unit 1
No ratings yet
NLP Unit 1
43 pages
NLP Final
No ratings yet
NLP Final
33 pages
SNLP 2mark Unit2
No ratings yet
SNLP 2mark Unit2
4 pages
Chapter23 - Natural Language Processing
No ratings yet
Chapter23 - Natural Language Processing
87 pages
Natural Language Processing: Parts of Speech Tagging - Pos
No ratings yet
Natural Language Processing: Parts of Speech Tagging - Pos
20 pages
Ai TXT Unit4
No ratings yet
Ai TXT Unit4
39 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
No ratings yet
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
28 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
Great Writing Level 4 Table of Contents
No ratings yet
Great Writing Level 4 Table of Contents
1 page
Book 1 LESSON 1
No ratings yet
Book 1 LESSON 1
25 pages
Sentence Variety
No ratings yet
Sentence Variety
3 pages
Shafiul Jannat PriyO - Rules of Cloze Test With Clues & Without Clues For SSC-HSC
No ratings yet
Shafiul Jannat PriyO - Rules of Cloze Test With Clues & Without Clues For SSC-HSC
6 pages
Grammar Guide: Adjective Clauses
No ratings yet
Grammar Guide: Adjective Clauses
3 pages
Grammar Guide for ESL Students
No ratings yet
Grammar Guide for ESL Students
7 pages
IELTS Writing
100% (1)
IELTS Writing
63 pages
MINGGU 4 2024 (5 Gemilang)
No ratings yet
MINGGU 4 2024 (5 Gemilang)
4 pages
Present Simple Tense Guide
No ratings yet
Present Simple Tense Guide
20 pages
Language Acquisition: by Don L. F. Nilsen and Alleen Pace Nilsen
No ratings yet
Language Acquisition: by Don L. F. Nilsen and Alleen Pace Nilsen
48 pages
English 5 Output Q4
No ratings yet
English 5 Output Q4
1 page
Learning Area Grade Level Quarter Date I. Lesson Title Ii. Most Essential Learning Competencies (Melcs) Iii. Content/Core Content
No ratings yet
Learning Area Grade Level Quarter Date I. Lesson Title Ii. Most Essential Learning Competencies (Melcs) Iii. Content/Core Content
4 pages
Grade 9 Grammar Revision Guide
No ratings yet
Grade 9 Grammar Revision Guide
8 pages
PLP Goals
No ratings yet
PLP Goals
10 pages
Divinian Full Text DL v.8.3
No ratings yet
Divinian Full Text DL v.8.3
82 pages
Vocabulary List (Grade 6) 2025
No ratings yet
Vocabulary List (Grade 6) 2025
3 pages
Class X Artificial Intelligence
No ratings yet
Class X Artificial Intelligence
77 pages
Word Classes Homework Ks2
100% (1)
Word Classes Homework Ks2
6 pages
English Grammar
No ratings yet
English Grammar
131 pages
Untitled
No ratings yet
Untitled
6 pages
Computing The Linguistic-Based Cues of Fake Ne
No ratings yet
Computing The Linguistic-Based Cues of Fake Ne
7 pages
Legal Expressions Under Interpretation of Statutes
No ratings yet
Legal Expressions Under Interpretation of Statutes
3 pages
English Grammar Basics
No ratings yet
English Grammar Basics
38 pages
Types of Modifiers With Pictorial Examples
No ratings yet
Types of Modifiers With Pictorial Examples
44 pages
Jss 1 Business Studies-First Term
No ratings yet
Jss 1 Business Studies-First Term
12 pages
100 English Grammar Rules PDF
No ratings yet
100 English Grammar Rules PDF
30 pages
Grammar
No ratings yet
Grammar
62 pages
TeacherÔÇÖs Notes For Photocopiable Activities
No ratings yet
TeacherÔÇÖs Notes For Photocopiable Activities
13 pages
Some Common Collocation Mistakes Among Second-Year Students
100% (2)
Some Common Collocation Mistakes Among Second-Year Students
43 pages
English For Management Kel 1
No ratings yet
English For Management Kel 1
9 pages

1.pos Tagging 1

Uploaded by

1.pos Tagging 1

Uploaded by

Natural Language Processing:

•NLP - POS and HMM 1

• Annotate each word in a sentence with a

• Useful for subsequent syntactic parsing and

•NLP - POS and HMM 2

• Original Brown corpus used a large set of

•NLP - POS and HMM 5

• Closed class categories are composed of a

•NLP - POS and HMM 6

• “Like” can be a verb or a preposition

•NLP - POS and HMM 7

•NLP - POS and HMM 8

•NLP - POS and HMM 10

•NLP - POS and HMM 11

foo bar blam zonk zonk bar blam

•NLP - POS and HMM 12

•NLP - POS and HMM 13

• Sequence labeling also valuable in labeling

•NLP - POS and HMM 15

•NLP - POS and HMM 16

• Probabilistic sequence models allow

•NLP - POS and HMM 17

• A finite state machine with probabilistic

•NLP - POS and HMM 18

•NLP - POS and HMM 20

You might also like