0% found this document useful (0 votes)
41 views37 pages

Lecture15 Parsing

The document provides an overview of parsing in Natural Language Processing, detailing top-down and bottom-up parsing methods, their advantages and disadvantages, and specific algorithms like the Earley and CYK parsers. It also discusses probabilistic parsing and its benefits, particularly in the context of Indian languages and their unique grammatical structures. References to foundational works in the field are included, highlighting the complexity of parsing in diverse linguistic contexts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views37 pages

Lecture15 Parsing

The document provides an overview of parsing in Natural Language Processing, detailing top-down and bottom-up parsing methods, their advantages and disadvantages, and specific algorithms like the Earley and CYK parsers. It also discusses probabilistic parsing and its benefits, particularly in the context of Indian languages and their unique grammatical structures. References to foundational works in the field are included, highlighting the complexity of parsing in diverse linguistic contexts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

|| Jai Sri Gurudev||

Sri Adichunchanagiri Shikshana Trust (R)

SJB INSTITUTE OF TECHNOLOGY


(Affiliated to Visvesvaraya Technological University, Belagavi& Approved by AICTE, New Delhi.)
No. 67, BGS Health & Education City, Dr. Vishnuvardhan Road Kengeri, Bengaluru – 560 060

Subject: Natural Language Processing(18CS743)


By
CHETAN R, Assistant Professor
Semester / Section: 7A and B

Department of Information Science & Engineering


Aca. Year: ODD SEM /2021-22
PARSING

2
Overview
 The task that uses the rewrite rules of a grammar to either generate a
particular sequence of words or reconstruct its derivation is termed
as parsing.
 The following constraints guide the search process:
1. Input: Words in the input sentence. A valid parse is one that covers
all the words in a sentence. These words must constitute the leaves of
the final parse tree.
2. Grammar: The root of the final parse tree must be the start symbol
of the grammar.
3
Top-down parsing

4
Top-down search space

5
Bottom-up parsing

6
Pros and Cons
 The top-down parser search starts generating trees with the start
symbol of the grammar. It never wastes time exploring a tree
leading to a different root.
 It wastes time exploring S trees that eventually result in words that
are inconsistent with the input.
 The bottom up parser never explores a tree that does not match
the input.
 It wastes time generating trees that have no chance of leading to an
7 S-rooted tree.
Basic Top-Down Parser

8
Derivation using top-down, depth first algorithm

9
Disadvantages
1. Left Recursion: which causes the search to get stuck in an infinite loop.
2. Structural Ambiguity: which occurs when a grammar assigns more than one parse
to a sentence.
3. Attachment Ambiguity: If a constituent fits more than one position in a parse tree.
4. Coordination Ambiguity: Occurs when its is nit clear which phrases are being
combined with a conjunction like ‘and’.
5. Local Ambiguity: Occurs when certain parts of a sentence are ambiguous.
6. Repeated Parsing: Parser often builds valid trees for portions of the input that it
discards during backtracking.

10
EARLEY PARSER
 It implements an efficient parallel top-down search using
dynamic programming.
 It builds a table of sub-trees for each of the constituents in the
input.
 The most important component of this algorithm is the Earley
Chart that has n+1 entries, where n is the number of words in
the input.
 The algorithm makes a left to right scan of input to fill the
11 elements in this chart.
State Information
1. A sub-tree corresponding to a grammar rule.
2. Information about the program made in completing the
sub-tree.
3. Position of the sub-tree with respect to input.
A state is represented as a dotted rule and a pair of numbers
representing starting position and the position of dot.
A  X1 …  C … Xm, [i,j]

12
Algorithm

13
Predictor
 Generates new states representing potential expansion of the
non-terminal in the left-most derivation.
 It is applied to every state that has a non-terminal to the right of
the dot, when the category of that non-terminal is different
from the part-of-speech.
 If A  X1 …  C … Xm, [i,j] then for every rule of the form
B the operation adds to chart[j], the state:
B  , [j,j]
14
Example
 When the generating state is S   NP VP, [0,0] the predictor
adds the following states to chart[0]:
 NP   Det Nominal, [0,0]
 NP   Noun, [0,0]
 NP   Pronoun, [0,0]
 NP   Det Noun PP, [0,0]

15
Scanner
 Scanner is used when a state has a part of speech category to
the right of the dot.
 It examines the input to see if the part-of-speech appearing to
the right of the dot matches one of the part-of-speech
associated with the current input.
 If Yes, then it creates a new state using the rule.
 If the state is A  …  a …, [i,j] and ‘a’ is associated with wj
then, it adds a  … wj  [i,j] to chart [j+1].
16
Completer
 Completer is used when the dot reaches the right end of the
rule.
 The presence of such a state signifies successful completion of
the parse of some grammatical category.
 If A  … , [j,k], then the computer adds
B  … A  … [i, k] to chart [k] for all states
B  … A  … , [i, j] in chart [j].

17
18
CYK Parser
 Cocke-Younger-Kasami is a dynamic programming parsing
algorithm.
 It follows a bottom-up approach in parsing. Builds the parse tree
incrementally. Each entry in the table is based on the previous
entries.
 The CYK algorithm assumes the grammar to be in chomsky normal
form (CNF). A CFG is in CNF if all the rules are of only two forms.
A BC
19 A  W, where W is a word.
CYK Algorithm

20
Example
Sentence: “The girl wrote an essay”

21
Probabilistic Parsing
 A statistical parser works by assigning probabilities to possible parses
of a sentence and returning the most likely parse as the final one.
 More formally given a grammar G, sentence s and a set of possible
parse trees of s which we denote by (s), a probabilistic parser finds
the most likely parse ` of s as follows:

22
Advantages
1. Probabilistic parser offers is removal of ambiguity for
parsing.
2. The search becomes more efficient.

23
Probabilistic Context Free Grammar

24
Example

25
Probability Estimation

26
Estimating Rule Probability

27
Two Parse Trees

28
Parsing PCFGs

29
Indian Languages
The majority of the indian languages are free word order.
The order of the sentence can be changed without leading
to a grammatically incorrect sentence.

30
Contd..
 Extensive and productive use of complex predicates (CPs) is another property that
most Indian languages have in common.
 A complex predicate combines a light verb, noun, or adjective to produce a new
verb.

31
Parsing Indian Languages
 Bharti and Sangal described an approach for parsing of Indian
languages based on Paninian grammar formalism. It has 2 stages:
1. Is responsible for identifying word groups.
2. Assigning parse structure to the input sentence.

32
Karaka Chart

33
Constraint Graph

34
Constraints

35
Parse of the sentence

36
References
1. Bharti, Akshar and Rajeev Sangal, 1990, ‘A Karaka-based
approach parsing of Indian Languages’, Proceedings of the 13th
Conference on Computational Linguistics, Association for
Computational Linguistics, 3.
2. Chomsky, N., 1957, Syntactic Structures, Mouton, The Hague.

37

You might also like