0% found this document useful (0 votes)
23 views53 pages

Unit 2

Unit 2 covers Syntax Analysis, focusing on Context-Free Grammars (CFG), parsing techniques, and their applications in Natural Language Processing (NLP). It explains the structure of CFGs, different parsing strategies (top-down and bottom-up), and specific algorithms like CYK and Earley Parsing. Additionally, it discusses dependency parsing and introduces Probabilistic Context-Free Grammars (PCFG) for enhanced sentence generation accuracy.

Uploaded by

kalpanavijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views53 pages

Unit 2

Unit 2 covers Syntax Analysis, focusing on Context-Free Grammars (CFG), parsing techniques, and their applications in Natural Language Processing (NLP). It explains the structure of CFGs, different parsing strategies (top-down and bottom-up), and specific algorithms like CYK and Earley Parsing. Additionally, it discusses dependency parsing and introduces Probabilistic Context-Free Grammars (PCFG) for enhanced sentence generation accuracy.

Uploaded by

kalpanavijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Unit 2

Syntax Analysis
Unit 2 – Syntax Analysis
Context Free Grammars, Grammar Rules for English,
Top-Down Parsing, Bottom-Up Parsing, Ambiguity,
CKY Parsing, Dependency Parsing, Earley Parsing -
Probabilistic Context-Free Grammars
Context Free Grammars
Definition 1:
A context-free grammar consists of a set of rules or productions, each of
which expresses the ways that symbols of the language can be grouped and
ordered together, and a lexicon of words and symbols.

Definition 2:
A context-free grammar (CFG) is a list of rules that define the set of all well-
formed sentences in a language. Each rule has a left-hand side, which
identifies a syntactic category, and a right-hand side, which defines its
alternative component parts, reading from left to right.
Context Free Grammars
Context-free grammar G is a 4-tuple.

G = (V, T, S, P)
These parameters are as follows;
•V – Set of variables (also called as Non-terminal symbols)
•T – Set of terminal symbols (lexicon)
•The symbols that refer to words in a language are called terminal symbols.
•Lexicon is a set of rules that introduce these symbols.
•S – Designated start symbol (one of the non-terminals, S ∈ V)
Context Free Grammars
•S – Designated start symbol (one of the non-terminals, S ∈ V)
•P – Set of productions (also called as rules).

Each rule in P is of the form A → s, where


•A is a non-terminal (variable) symbol.
•Each rule can have only one non-terminal symbol on the left hand side of the rule.
•s is a sequence of terminals and non-terminals. It is from (T U V)*, infinite set of
strings.
•A grammar G generates a language L.
Context Free Grammars
•A leftmost derivation is a sequence of strings s1,s2, …sn
• s1 =S, the start symbol

• sn, includes only terminal symbols

Example

[S]

[S] [NP VP]

[S] [NP VP] [DT N VP]

…..

[S] [NP VP]


Context Free Grammars
• ["the", "child", "ate", "the", "cake", "with", "the", "fork"]

• S -> NP VP

• NP -> DT N | NP PP

• PP -> PRP NP

• VP -> V NP | VP PP

• DT -> 'a' | 'the’

• N -> 'child' | 'cake' | 'fork’

• PRP -> 'with' | 'to’

• V -> 'saw' | 'ate'


CFG Rule &
Leftmost
derivation
T H E C H ILD AT E T H E
C A KE W IT H T H E F O R K
Left most derivation
Grammar Rules for English
•Sentence structure: A sentence must have a subject and a predicate.
•Parts of speech: Words are categorized into nouns, verbs, adjectives, adverbs, pronouns,
prepositions, conjunctions, and interjections.
•Agreement: Subject and verb must agree in number.
•Tense consistency: Verbs must be used consistently in the same tense.
•Pronoun agreement: Pronouns must agree with the antecedent in gender and number.
•Modifiers: Adjectives and adverbs must be placed correctly in sentences.
•Parallelism: Ideas in a list or comparison must be expressed in parallel form.
•Capitalization: The first word of a sentence and proper nouns must be capitalized.
•Punctuation: Proper use of punctuation is crucial for clear and concise writing.
•Spelling: Correct spelling is important for clarity and credibility.
•Word choice: Precise word choice is essential for effective communication.
•Tone: The tone of a piece of writing should match the purpose and audience.
Parsing
Parsing is the technique of examining a text containing a string of tokens, to find its
grammatical structure according to the given grammar.

Parsing

Top-down Bottom-up
Parsing Parsing
Parsing
The parser receives a string of token from the lexical analyser.

When the string of tokens can be generated by the grammar of the


source language, then the parse tree can be constructed.

Additionally, it reports the syntax errors present in the course thing.


After this, the generated parse tree is transferred to the next stage of
the compiler.
Top-Down Parsing
•Top-down parsing is a parsing strategy used in Natural Language
Processing (NLP) to analyze sentence structure based on a predefined
grammar (such as Context-Free Grammar - CFG).

•It starts from the root (start symbol) and expands downward by
applying grammar rules, attempting to match the input sentence.
How Top-Down Parsing works?
Start with the Start Symbol

• The parser begins with the root of the grammar (e.g., S for sentence).

Expand Using Grammar Rules

• Apply production rules (from a grammar) to break the sentence into smaller components.

Match Input Sentence

• Compare expanded rules with the actual input string.

Backtracking (if needed)

• If a rule fails, the parser backtracks and tries an alternative rule.


Top-Down Parsing
Sentence to Parse
S → NP VP
"The cat chased a dog"
NP → Det N | N

VP → V NP Parsing Process (Top-Down)


Start with S
Det → 'the' | 'a'
Expand S → NP VP
N → 'cat' | 'dog'
Expand NP → Det N → "The cat"
V → 'chased' | 'saw' Expand VP → V NP → "chased a dog"
The sentence matches the grammar!
Top-down Parsing
import nltk # Create a recursive descent parser
(Top-Down)
from nltk import CFG
parser =
# Define the grammar nltk.RecursiveDescentParser(grammar)
grammar = CFG.fromstring(""" # Sentence to parse
S -> NP VP sentence = "the cat chased a
NP -> Det N | N dog".split()
# Parsing the sentence
VP -> V NP
for tree in parser.parse(sentence):
Det -> 'the' | 'a' print(tree) # Print the parse
N -> 'cat' | 'dog' tree
V -> 'chased' | 'saw' tree.pretty_print() # Display
tree structure
""")
Top-down Parsing
Advantages
✔Easy to implement using recursive functions.
✔ Uses grammar rules directly for parsing.
✔ Good for small grammars in NLP.
Bottom-up Parsing
Bottom-up parsing is a parsing strategy used in Natural Language Processing (NLP)
where the parser starts from the input words (tokens) and builds up towards the
start symbol (S) using grammar rules.
How Bottom-up Parsing works?
Start from the Input Words
•Begin with individual words (tokens) in the sentence.
Apply Grammar Rules in Reverse
•Try to form higher-level constructs (phrases, clauses) by combining tokens.
Reduce to the Start Symbol (S)
•Continue merging rules until the full sentence is recognized as S.
Shift-Reduce Mechanism (Used in Bottom-Up Parsers)
•Shift: Read the next input token.
•Reduce: Apply a grammar rule if possible.
•Repeat until the full sentence is parsed.
Bottom-up Parsing
S → NP VP Sentence to Parse
"The cat chased a dog"
NP → Det N | N
VP → V NP
Parsing Process (Bottom-Up)
Det → 'the' | 'a' Start with words: [the, cat, chased, a, dog]
N → 'cat' | 'dog' Apply Det → the, N → cat → Reduce to NP
V → 'chased' | 'saw' Apply V → chased
Apply Det → a, N → dog → Reduce to NP
Apply VP → V NP
Apply S → NP VP
✅ Successfully parsed!
Bottom-up Parsing
import nltk # Create a shift-reduce parser (Bottom-
Up)
from nltk import CFG
parser = nltk.ShiftReduceParser(grammar)

# Define the grammar


# Sentence to parse
grammar = CFG.fromstring("""
sentence = "the cat chased a
S -> NP VP
dog".split()
NP -> Det N | N
VP -> V NP
# Parsing the sentence
Det -> 'the' | 'a'
for tree in parser.parse(sentence):
N -> 'cat' | 'dog'
print(tree) # Print the parse tree
V -> 'chased' | 'saw'
tree.pretty_print() # Display tree
""") structure
Bottom-up Parsing
Advantages of Bottom-Up Parsing
✔ Efficient for large grammars since it avoids unnecessary expansions.
✔ Handles left-recursive grammars naturally.
✔ No need to predict next rules like in top-down parsing.
Top Down Approaches
Handles Left-
Method Backtracking? Recursion? Efficiency
Recursive Descent
✅ Yes ❌ No ❌ Slow (Backtracking)
Parsing
Backtracking Parser ✅ Yes ❌ No ❌ Very Slow

LL(1) Predictive Parsing ❌ No ❌ No ✅ Fast (Table-based)

✅ Fast (Dynamic
Earley Parsing ❌ No ✅ Yes
Programming)
Memoized Recursive ✅ Faster than naive
❌ No ✅ Yes
Descent recursion
Bottom up Approaches
Method Lookahead? Handles Ambiguity?
Efficiency
Shift-Reduce ❌ No ❌ No
⚡ Simple but can fail
LR(0) ❌ No ❌ No
⚡ Basic
SLR(1) ✅ Yes (1) ❌ No
✅ Better than LR(0)
LR(1) ✅ Yes (1) ✅ Yes
✅ Strongest (Large Table)
LALR(1) ✅ Yes (1) ✅ Yes (Efficient)
✅ Used in Compilers
CYK ✅ Yes (Table-based) ❌ No (CNF Only)
✅ NLP, Dynamic Programming
GLR ✅ Yes (Multi-path) ✅ Yes (Handles Ambiguity) ✅ NLP,
Complex Parsing
Top-down vs Bottom-up
BASIS FOR
TOP-DOWN PARSING BOTTOM-UP PARSING
COMPARISON
Initiates from Root Leaves
Working Production is used to derive Starts from the token and then
and check the similarity in the go to the start symbol.
string.
Uses Backtracking (sometimes) Handling
Strength Moderate More powerful
Producing a parser Simple Hard
Type of derivation Leftmost derivation Rightmost derivation
CYK Parsing
The Cocke-Younger-Kasami (CYK) algorithm is a bottom-up parsing
technique used to determine whether a given string belongs to a context-
free grammar (CFG) and, if so, construct a parse tree.

Features

✔ Efficient for Context-Free Grammars (CFGs)


✔ Uses Dynamic Programming (DP) to parse a string in O(n³) time
✔ Requires the grammar to be in Chomsky Normal Form (CNF)
CYK Parsing
Before using CYK, the grammar must be in CNF, meaning:
Every production rule is either:
A → BC (two non-terminals)
A → a (a single terminal)
No empty (ε) or unit productions (A → B) allowed.
CYK Parsing
S → NP VP
NP → Det N | N
VP → V NP
Det → 'the' | 'a'
N → 'cat' | 'dog'
V → 'chased' | 'saw'
CYK Parsing Algorithm
1)Create a 2D Table (Parse Table)
•Rows: Substrings of input sentence
•Columns: Grammar rules applied
2) Fill the Table Using Dynamic Programming
•Assign terminal rules in the bottom row.
•Combine non-terminals based on grammar rules.
3) Check if the Start Symbol (S) Appears at the Top
•If S is found in the last row, the sentence is valid.
CYK Parsing
import nltk # CYK Parser in NLTK
from nltk.parse.chart import ChartParser
# Define a CFG in Chomsky Normal Form (CNF)
cfg = nltk.CFG.fromstring(""" # Create a CYK parser
S -> NP VP parser = ChartParser(cfg)
NP -> Det N | N
VP -> V NP
# Sentence to parse
Det -> 'the' | 'a'
sentence = "the cat chased a
N -> 'cat' | 'dog'
dog".split()
V -> 'chased' | 'saw'
""")
# Generate parse trees
# Convert to CNF (for CYK Parsing) for tree in parser.parse(sentence):
cnf_grammar = cfg.chomsky_normal_form() print(tree)
tree.pretty_print()
CYK Parsing
Example Parse Table for "the cat chased a dog“

Word the cat chased a dog

Row 1 Det N V Det N

Row 2 NP - VP NP -

Row 3 - S - - -
Dependency Parsing
It aims at breaking a sentence depending on the relationship between the
words rather than any predefined ruleset.

The same sentence ‘John sees Bill’ has the below dependency parsing:
Dependency Parsing
Dependency parsing is a technique used in Natural Language
Processing (NLP) to analyze the grammatical structure of a sentence by
establishing relationships (dependencies) between words.

 It helps determine the syntactic structure of a sentence by identifying


the head (governing) words and their dependent words.
Dependency Parsing
Head and Dependent
Every word (except the root) is dependent on another word (its head).
Example: In "She eats an apple", eats is the head, and She, an, and apple depend on it.
Dependency Tree
A tree-like structure that represents word relationships.
Example dependency tree for "She eats an apple":
eats
/ | \
Dependency Parsing
Dependency Relations
Words have specific grammatical relationships with their heads.
Common relations:
◦ nsubj (Nominal Subject) → "She" depends on "eats"
◦ dobj (Direct Object) → "apple" depends on "eats"
◦ det (Determiner) → "an" depends on "apple"
Dependency Parsing
import spacy Output
She --> nsubj --> eats
# Load English language model
nlp = spacy.load("en_core_web_sm") eats --> ROOT --> eats

# Sample sentence
an --> det --> apple
sentence = "She eats an apple" apple --> dobj --> eats

# Process the sentence


doc = nlp(sentence)
# Print dependencies
for token in doc:
print(f"{token.text} -->
{token.dep_} --> {token.head.text}")
Applications of Dependency Parsing
✅ Machine Translation – Understanding sentence structure improves
accuracy.
✅ Question Answering Systems – Extracts relationships between words.
✅ Sentiment Analysis – Identifies dependencies for sentiment-related words.
✅ Chatbots & Virtual Assistants – Helps in semantic understanding.
Earley Parsing
Earley Parsing is a chart-based, top-down parsing algorithm used for context-free
grammars (CFGs), including ambiguous and left-recursive grammars.
It was introduced by Jay Earley in 1970 and is efficient for parsing natural language
and programming languages.

Features of Earley Parsing


✔ Handles all CFGs, including left-recursive and ambiguous grammars.
✔ Works efficiently for both recognition and parsing.
✔ Best case: O(n), Average case: O(n²), Worst case: O(n³).
✔ Uses dynamic programming to store intermediate parsing results.
How Earley Parsing works?
Earley parsing uses a chart-based approach where parsing progresses through three
operations:
1.Prediction – Expands non-terminals using grammar rules.
2.Scanning – Matches input words with grammar rules.
3.Completion – Resolves dependencies and completes parsing.
It maintains a chart (table) where each entry stores parsing states.
Earley Parsing
Given Grammar: Sentence to Parse
S → NP VP "The cat chased a dog“
NP → Det N | N
VP → V NP
Det → 'the' | 'a'
N → 'cat' | 'dog'
V → 'chased' | 'saw'
Earley vs CYK vs Dependency
Parsing
Feature Earley Parsing CYK Parsing Dependency Parsing
Type Top-down, dynamic Bottom-up, dynamic Word-to-word relations
Grammar Support All CFGs CNF-only Non-CFGs possible
Handles Ambiguity? Yes No Yes
Use Case General NLP, Speech Formal NLP, Parsing Dependency-based NLP
Probabilistic CFG
A Probabilistic Context-Free Grammar (PCFG) is an extension of a Context-Free
Grammar (CFG) where each production rule is assigned a probability. This
probability represents the likelihood of a rule being used in generating a sentence.

Features of PCFG
✔ Handles Ambiguity: Helps choose the most probable parse among multiple
possible parses.
✔ Improves NLP Applications: Used in speech recognition, machine translation, and
syntax parsing.
✔ Enables Probabilistic Parsing: Rather than just checking grammatical correctness,
PCFG predicts the most likely structure.
PCFG Definition
A PCFG is defined as (N, Σ, R, S, P):
N = Non-terminals (e.g., S, NP, VP)
Σ = Terminals (words like 'the', 'cat', 'chased')
R = Set of production rules
S = Start symbol
P = Probability assigned to each rule
PCFG Grammar
S → NP VP (1.0) Given:
"The cat chased a dog"
NP → Det N (0.6) | N (0.4)
Possible Parse Tree:
VP → V NP (1.0) S (1.0)
Det → 'the' (0.8) | 'a' (0.2) / \
NP (0.6) VP (1.0)
N → 'cat' (0.5) | 'dog' (0.5) / \ / \
V → 'chased' (0.7) | 'saw' (0.3) Det N V NP (0.6)
| | | / \
the cat chased Det N
Note:Each rule's probabilities sum | |
up to 1 for a given non-terminal. a dog
PCFG Grammar
Probability of this parse: Given:
P(S → NP VP) = 1.0 "The cat chased a dog"
P(NP → Det N) = 0.6

P(Det → the) = 0.8


Possible Parse Tree:
P(N → cat) = 0.5

P(VP → V NP) = 1.0

P(V → chased) = 0.7

P(NP → Det N) = 0.6

P(Det → a) = 0.2

P(N → dog) = 0.5


Final Probability: 1.0 × 0.6 × 0.8 × 0.5 × 1.0 × 0.7 × 0.6 × 0.2 × 0.5 = 0.0168
PCFG vs CFG vs Dependency Parsing
Dependency
Features PCFG Regular CFG
Parsing
Word-to-word
Type Probabilistic Rule-based
relations
Handles
✅ Yes ❌ No ✅ Yes
Ambiguity?
Syntax +
Focus Syntax only Dependencies
Probability
Chatbots, AI
Used In Speech, AI, NLP Formal NLP
models
Ambiguity
Ambiguity in parsing occurs when a sentence has multiple valid parse trees or
interpretations. This makes it difficult for a parser to determine the intended
meaning of a sentence.
Ambiguity is a major challenge in Natural Language Processing (NLP), compilers, and
AI-driven text analysis.
Types of Ambiguity
Syntactic Ambiguity
Lexical Ambiguity
Attachment Ambiguity
Coordination Ambiguity
Scope Ambiguity
Syntactic Ambiguity (Structural
Ambiguity)
It occurs when a sentence has more than one grammatical structure, leading to
multiple valid parse trees.

Example:
💬 "I saw the man with the telescope."

👉 Possible Interpretations:
I used a telescope to see the man.
The man I saw had a telescope.
Syntactic Ambiguity (Structural
Ambiguity)
(1) VP as Modifier (2) NP as Modifier

✅ PCFG (Probabilistic Context-Free Grammar) can help by choosing the more likely
structure.
Lexical Ambiguity (Word Sense
Ambiguity)
Occurs when a word has multiple meanings in different contexts.
Example:
💬 "The bank is closed.“
👉 Possible Interpretations:
Bank = Financial institution 🏦
Bank = Riverbank 🌊
🔹 Solution:
Word sense disambiguation (WSD) techniques (e.g., BERT, WordNet) can help
determine the most likely meaning based on context.
Attachment Ambiguity
Occurs when it's unclear which phrase a modifier (prepositional phrase or relative
clause) should attach to.
Example:
💬 "She hit the boy with the book."
👉 Possible Interpretations:
She used a book to hit the boy.
The boy who had a book was hit.
🔹 Solution:
Probabilistic or dependency parsing can determine the most probable attachment.
Coordination Ambiguity
Occurs when a conjunction (and, or) connects phrases in a way that allows multiple
interpretations.
Example:
💬 "She saw the dog and the cat on the roof."
👉 Possible Interpretations:
Both the dog and the cat were on the roof.
She saw the dog, and the cat was on the roof.
🔹 Solution:
Dependency parsing can clarify the correct attachment.
Scope Ambiguity
Occurs when it's unclear how far the scope of a quantifier or negation extends.
Example:
💬 "All students didn't pass the exam."
👉 Possible Interpretations:
No student passed.
Some students failed.
🔹 Solution:
Contextual NLP models like Transformer-based models (BERT, GPT) help resolve
scope ambiguity.

You might also like