UNIT III
SYNTAX ANALYSIS
Topics
Role of the parser
Writing Grammars
Context-Free Grammars
Top Down parsing
Recursive Descent Parsing
Predictive Parsing
Bottom-up parsing
Shift Reduce Parsing
Operator Precedence Parsing
LR Parsers
SLR Parser
Canonical LRParser
LALR Parser
The role of Parser
Lexical toke
Parse Rest of
Sourc Analyze n Parser Tree front end IC
e r getNextTok
G
progr en
am
Symbol
table
Parsing Techniques
⚫ There are two types of parsing
Top Down parsing
Recursive Descent Parsing
Predictive Parsing
Bottom-up parsing
Shift Reduce Parsing
Operator Precedence Parsing
LR Parsers
SLR Parser
Canonical LRParser
LALR Parser
Recursive –Descent Parser
⚫ Steps are
⚫ Check whether the
Grammar (G) has Left
Recursion / Left Factoring
⚫ If it has Eliminate LR /LF
⚫ Draw the transition Diagram
⚫ Minimize the transition
diagram
Elimination of left recursion
⚫ We will consider a grammar with
productions:
Left Recursion : A → Aα / β
where α and β are sequences of
terminals and nonterminals that do not
start with A.
⚫ Elimination :
A → βA’
A’ → αA/ ε
Example: An Expression
Grammar
⚫ Let's eliminate left
recursion from the
grammar below:
E→E+T|T
T→T*F |F
F→(E)|a
⚫ The left associativity for
+ is illustrated by the
parse tree for a + a + a:
Example: An Expression
Grammar
⚫ Eliminating left
recursion gives the
grammar below:
E → TE‘
E' → +TE' | ε
T → FT‘
T' → *FT' | ε
F→(E)|a
⚫ Note the right
associativity of + in the
parse tree for
Algorithm for Eliminating General
Left Recursion
Arrange nonterminals in some order A1, A2, ... , An.
for i := 1 to n do begin
for j := 1 to i - 1 do begin
Replace each production of the form Ai -> Ajβ by
the productions:
Ai → α1β | α2β | ... | αkβ
where Aj → α1 | α2 | . . . | αk are all the current Aj
productions.
end { for j }
Remove immediate left recursion from the Ai
productions, if necessary
end { for i }
Left Factoring
⚫ In general, if A → αβ1 | αβ2 are two
A-productions, and the input
begins with a nonempty string
derived from α, we do not know
whether to expand to αβ1 or to αβ2.
⚫ Instead, the grammar may be
changed. The formal technique is
to change the grammar to
A → αA'
A' → β1 | β2
Left-Factoring – Example1
A → abB | aB | cdg | cdeB | cdfB
⇓
A → aA’ | cdg | cdeB | cdfB
A’ → bB | B
⇓
A → aA’ | cdA’’
A’ → bB | B
A’’ → g | eB | fB
Left-Factoring – Example2
A → ad | a | ab | abc | b
⇓
A → aA’ | b
A’ → d | ε | b | bc
⇓
A → aA’ | b
A’ → d | ε | bA’’
A’’ → ε | c
Example
The grammar is as follows
⚫ E -> E + T | T
⚫ T-> T*F|F
⚫ F -> (E) | id
After removing left-recursion , left-factoring
The rules are :
Rules and their transition
diagrams
E T T’
⚫ E->T T’
START
ε
⚫ T’ -> +T T’ | ε T’ +
T T
⚫ T -> F T’’
T F T’’
⚫ T -> *F T’’ | ε ε
T’ +
T T
⚫ T -> (E) |id
T ( E )
id
Optimization
After optimization it yields the following DFA
like structures:
+ *
T F
STAR
T ε
FINAL ε
T ( E )
id
Definition of First(x)
⚫ Let X be a grammar symbol (a terminal
or nonterminal) or ε. Then the set First(X)
consisting of terminals, and possibly ε, is
defined as follows:
⚫ If X is a terminal or ε, then First(X) = {X}.
⚫ If X is a nonterminal, then for each
production
X → X1X2...Xn, First(X) contains First(X1) -
{ε}.
⚫ If also for some i < n, all the sets First(X1),…,
First(Xi) contain ε, then First(X) contains
First(Xi+1) - {ε}.
Definition of First(α)
⚫ Now, define First(α) for string α =
X1X2…Xn (a string of terminals and
nonterminals), as follows:
⚫ First(α) contains First(X1) - {ε}.
⚫ For each i = 2,…,n, if First(Xk)
contains ε for all k = 1,…i - 1, then
First(α) contains First(Xi) - {ε}.
⚫ Finally, if for all i = 1,…,n, First(Xi)
contains ε, then add ε to First(α).
Example: First Sets
⚫ First(a) = {a}, First(ε) = {ε}, First(b) = {b}
⚫ To calculate First(B), look at rules for B:
⚫ B → b ⇒ First(B) = First(b) = {b}.
⚫ To calculate First(A), look at rules for A:
S → AB
⚫ A → a ⇒ First(A) includes {a} A→a|ε
⚫ A → ε ⇒ First(A) includes {ε} B→b
⚫ We conclude that First(A) = {a, ε}
⚫ To calculate First(S), look at rule S → AB.
⚫ First(S) = First(AB), which includes First(A) – {ε} =
{a}
⚫ Since First(A) contains ε, First(AB) also includes
First(B) = {b}
⚫ We conclude that First(S) = {a, b}
Definition of Follow(A)
⚫ Given a nonterminal A, the set
Follow(A), consisting of terminals,
and possibly $ (the end of input
symbol), is defined as follows:
1. If A is the start symbol, then $ is
in Follow(A).
2. If there is a production B → α A γ,
then First(γ) - {ε} is in Follow(A).
3. If there is a production B → α A γ
such that ε is in First(γ), then
Example: Follow Sets
⚫ By rule 2, if there is a production B
→ α A γ, then First(γ) - {ε} is in
Follow(A). Here are examples
showing how right hand sides of
rules match the pattern above(red
is α, green is γ)
⚫ X → abYmZ α = ab A = Y γ = mZ
⚫ X → abYmZ α = abYm A = Z γ = ε
⚫ B → Axy α=ε A=A γ = xy
Example: Follow Sets
⚫ By rule 3, if there is a production B
→ α A γ such that ε is in First(γ),
then Follow(A) contains Follow(B).
⚫ Examples of right hand sides that
match this pattern.
⚫ A → aX α=a A=X γ=ε
⚫ A → aXY α=a A=X γ = Y (ε ∈
First(Y))
⚫A→Y α=ε A=Y γ=ε
LL(1) Parser
input buffer
⚫ our string to be parsed. We will assume that its end is marked
with a special symbol $.
output
⚫ a production rule representing a step of the derivation
sequence (left-most derivation) of the string in the input buffer.
stack
⚫ contains the grammar symbols
⚫ at the bottom of the stack, there is a special end marker symbol
$.
⚫ initially the stack contains only the symbol $ and the starting
symbol S. $S 🡺 initial stack
⚫ when the stack is emptied (ie. only $ left in the stack), the
parsing is completed.
parsing table
⚫ a two-dimensional array M[A,a]
⚫ each row is a non-terminal symbol
⚫ each column is a terminal symbol or the special symbol $
⚫ each entry holds a production rule.
LL(1) Parser – Parser Actions
⚫ The symbol at the top of the stack (say X) and the current
symbol in the input string (say a) determine the parser
action.
⚫ There are four possible parser actions.
1. If X and a are $ 🡺 parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
🡺 parser pops X from the stack, and moves the next symbol
in the input buffer.
3. If X is a non-terminal
🡺 parser looks at the parsing table entry M[X,a]. If M[X,a]
holds a production rule X→Y1Y2...Yk, it pops X from the
stack and pushes Yk,Yk-1,...,Y1 into the stack. The parser also
outputs the production rule X→Y1Y2...Yk to represent a step
of the derivation.
4. none of the above 🡺 error
⚫ all empty entries in the parsing table are errors.
⚫ If X is a terminal symbol different from a, this is also an error case.
LL(1) Parser – Example1
S → aBa a b $
B → bB | ε S S → aBa
B B→ε B → bB
stack input output
$S abba$ S → aBa
$aBa abba$
$aB bba$ B → bB
$aBb bba$
$aB ba$ B → bB
$aBb ba$
$aB a$ B→ε
$a a$
$ $ accept, successful completion
LL(1) Parser – Example1 (cont.)
Outputs: S → aBa B → bB B → bB B→ε
Derivation(left-most): S⇒aBa⇒abBa⇒abbBa⇒abba
S
parse tree
a B a
b B
b B
ε
LL(1) Parser – Example2
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
id + * ( ) $
E E→ E → TE’
TE’
E’ E’ → +TE’ E’ → ε E’ → ε
T T → FT’ T → FT’
T’ T’ → ε T’ → *FT’ T’ → ε T’ → ε
F F → id F → (E)
LL(1) Parser – Example2
stack input output
$E id+id$ E → TE’
$E’T id+id$ T → FT’
$E’ T’F id+id$ F → id
$ E’ T’id id+id$
$ E’ T’ +id$ T’ → ε
$ E’ +id$ E’ → +TE’
$ E’ T+ +id$
$ E’ T id$ T → FT’
$ E’ T’ F id$ F → id
$ E’ T’id id$
$ E’ T’ $ T’ → ε
$ E’ $ E’ → ε
$ $ accept
Constructing LL(1) Parsing
Tables
⚫ Two functions are used in the construction of
LL(1) parsing tables:
⚫ FIRST & FOLLOW
⚫ FIRST(α) is a set of the terminal symbols which
occur as first symbols in strings derived from α
where α is any string of grammar symbols.
⚫ if α derives to ε, then ε is also in FIRST(α) .
⚫ FOLLOW(A) is the set of the terminals which
*
occur immediately after (follow) the non-terminal
A in the strings derived* from the starting symbol.
⚫ a terminal a is in FOLLOW(A) if S ⇒ αAaβ
⚫ $ is in FOLLOW(A) if S ⇒ αA
Constructing LL(1) Parsing Table --
Algorithm
⚫ for each production rule A → α of a
grammar G
⚫ for each terminal a in FIRST(α)
🡺 add A → α to M[A,a]
⚫ If ε in FIRST(α)
🡺 for each terminal a in FOLLOW(A) add A
→ α to M[A,a]
⚫ If ε in FIRST(α) and $ in FOLLOW(A)
🡺 add A → α to M[A,$]
⚫ All other undefined entries of the parsing
table are error entries.
LL(1) Grammars
⚫ A grammar whose parsing table has no
multiply-defined entries is said to be LL(1)
grammar.
one input symbol used as a look-head symbol do
determine parser action
LL(1) left most derivation
input scanned from left to right
⚫ The parsing table of a grammar may
contain more than one production rule. In
this case, we say that it is not a LL(1)
grammar.
A Grammar which is not LL(1)
S→iCtSE | a FOLLOW(S) = { $,e }
E→eS | ε FOLLOW(E) = { $,e }
C→b FOLLOW(C) = { t }
FIRST(iCtSE) = {i}
a b e i t $
FIRST(a) = {a}
S→
FIRST(eS) = {e} S S → a iCtSE
FIRST(ε) = {ε} E→eS E→
E
FIRST(b) = {b} E→ε ε
two production rules for M[E,e]
C C→b
Problem 🡺 ambiguity
Properties of LL(1) Grammars
⚫ A grammar G is LL(1) if and only if the
following conditions hold for two
distinctive production rules A → α and
A→β
1. Both α and β cannot derive strings starting
with same terminals.
2. At most one of α and β can derive to ε.
3. If β can derive to ε, then α cannot derive to
any string starting with a terminal in
FOLLOW(A).
Bottom-Up Parsing
⚫ A bottom-up parser creates the parse tree of the
given input starting from leaves towards the root.
⚫ A bottom-up parser tries to find the right-most
derivation of the given input in the reverse order.
S ⇒ ... ⇒ ω (the right-most derivation of ω)
← (the bottom-up parser finds the right-most
derivation in the reverse order)
⚫ Bottom-up parsing is also known as shift-reduce
parsing because its two main actions are shift and
reduce.
⚫ At each shift action, the current symbol in the input
string is pushed to a stack.
⚫ At each reduction step, the symbols at the top of the
stack (this symbol sequence is the right side of a
production) will replaced by the non-terminal at the
left side of that production.
⚫ There are also two more actions: accept and error.
Shift-Reduce Parsing
⚫ A shift-reduce parser tries to reduce the given
input string into the starting symbol.
a string 🡺 the starting symbol
reduced to
⚫ At each reduction step, a substring of the input
matching to the right side of a production rule is
replaced by the non-terminal at the left side of that
production rule. *
rm
⚫ If the substring is chosen correctly, the right most
derivation of that string is created in the reverse
rm rm
order.
Rightmost Derivation: S⇒ω
Shift-Reduce Parser finds: ω ⇐ ... ⇐ S
Shift-Reduce Parsing --
Example
S → aABb input string: aaabb
A → aA | a aaAbb
B → bB | b aAbb ⇓ reduction
aABb
S
S ⇒ aABb ⇒ aAbb ⇒ aaAbb ⇒ aaabb
rm rm rm rm
Right Sentential Forms
⚫ How do we know which substring to be replaced at
each reduction step?
Handle
⚫ Informally, a handle of a string is a substring that
matches the right side of a production rule.
⚫ But not every substring matches the right side of a
production rule is handle
⚫ A handle of a right sentential form γ (≡ αβω) is
a production rule A → β and a position of γ
where the string β may be found and replaced by A to
produce
the previous* right-sentential form in a rightmost
derivation of
rm γ. rm
S ⇒ αAω ⇒ αβω
⚫ If the grammar is unambiguous, then every
right-sentential form of the grammar has exactly one
handle.
Handle Pruning
⚫ A right-most derivation in reverse can be
obtained by handle-pruning.
S=γ0 ⇒ γ1 ⇒ γ2 ⇒ ... ⇒ γn-1 ⇒ γn= ω
rm rm rm rm rm
input string
⚫ Start from γn, find a handle An→βn in γn,
and replace βn in by An to get γn-1.
⚫ Then find a handle An-1→βn-1 in γn-1,
and replace βn-1 in by An-1 to get γn-2.
⚫ Repeat this, until we reach S.
A Shift-Reduce Parser
E → E+T | T Right-Most Derivation of id+id*id
T → T*F | F E ⇒ E+T ⇒ E+T*F ⇒ E+T*id ⇒ E+F*id
F → (E) | id ⇒ E+id*id ⇒ T+id*id ⇒ F+id*id ⇒ id+id*id
Right-Most Sentential FormReducing Production
id+id*id F → id
F+id*id T→F
T+id*id E→T
E+id*id F → id
E+F*id T→F
E+T*id F → id
E+T*F T → T*F
E+T E → E+T
E
Handles are red and underlined in the right-sentential
forms.
A Stack Implementation of A
Shift-Reduce Parser
⚫ There are four possible actions of a
shift-parser action:
1. Shift : The next input symbol is shifted onto
the top of the stack.
2. Reduce: Replace the handle on the top of the
stack by the non-terminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls
an error recovery routine.
⚫ Initial stack just contains only the
end-marker $.
A Stack Implementation of A
Shift-Reduce Parser
Stack Input Action
$ id+id*id$ shift
$id +id*id$ reduce by F → id Parse Tree
$F +id*id$ reduce by T → F
$T +id*id$ reduce by E → T E 8
$E +id*id$ shift
$E+ id*id$ shift E 3 + T 7
$E+id *id$ reduce by F → id
$E+F *id$ reduce by T → F T 2 T 5 * F6
$E+T *id$ shift
$E+T* id$ shift F 1 F 4 id
$E+T*id $ reduce by F → id
$E+T*F $ reduce by T → T*F id id
$E+T $ reduce by E → E+T
$E $ accept
LR Parsers
⚫ The most powerful shift-reduce parsing (yet
efficient) is:
LR(k) parsing.
left to right right-most k lookhead
scanning derivation (k is omitted 🡺 it is 1)
⚫ LR parsing is attractive because:
⚫ LR parsing is most general non-backtracking
shift-reduce parsing, yet it is still efficient.
⚫ The class of grammars that can be parsed using
LR methods is a proper superset of the class of
grammars that can be parsed with predictive
parsers.
LL(1)-Grammars ⊂ LR(1)-Grammars
⚫ An LR-parser can detect a syntactic error as soon
LR Parsers
⚫ LR-Parsers
⚫ covers wide range of grammars.
⚫ SLR – simple LR parser
⚫ LR – most general LR parser
⚫ LALR – intermediate LR parser (look-head
LR parser)
⚫ SLR, LR and LALR work same (they used
the same algorithm), only their parsing
tables are different.
LR Parsing Algorithm
input a1 ... ai ... an $
stack
Sm
Xm output
LR Parsing Algorithm
Sm-1
Xm-1
.
.
Action Table Goto Table
S1 terminals and $ non-terminal
X1 s s
t four different t each item is
S0 a actions a a state number
t t
e e
s s
A Configuration of LR Parsing
Algorithm
⚫ A configuration of a LR parsing is:
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )
Stack Rest of Input
⚫ Sm and ai decides the parser action by
consulting the parsing action table. (Initial
Stack contains just So )
⚫ A configuration of a LR parsing represents the
right sentential form:
X1 ... Xm ai ai+1 ... an $
Actions of A LR-Parser
1. shift s -- shifts the next input symbol and the
state s onto the stack
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $
)
2. reduce A→β (or rn where n is a production
number)
⚫ pop 2|β| (=r) items from the stack;
⚫ then push A and s where s=goto[sm-r,A]
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an
$)
⚫ Output is the reducing production reduce A→β
3. Accept – Parsing successfully completed
4. Error -- Parser detected an error (an empty entry
in the action table)
Reduce Action
⚫ pop 2|β| (=r) items from the stack; let us
assume that β = Y1Y2...Yr
⚫ then push A and s where s=goto[sm-r,A]
( So X1 S1 ... Xm-r Sm-r Y1 Sm-r ...Yr Sm, ai ai+1 ...
an $ )
🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )
⚫ In fact, Y1Y2...Yr is a handle.
X1 ... Xm-r A ai ... an $ ⇒ X1 ... Xm Y1...Yr ai ai+1
... an $
(SLR) Parsing Tables for Expression
Grammar Action Table Goto Table
1) E → E+T state id + * ( ) $ E T F
2) E→T 0 s5 s4 1 2 3
3) T → T*F 1 s6 acc
4) T→F
2 r2 s7 r2 r2
5) F → (E)
3 r4 r4 r4 r4
6) F → id
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Actions of A (S)LR-Parser --
Example
stack input action output
0 id*id+id$ shift 5
0id5 *id+id$ reduce by F→id F→id
0F3 *id+id$ reduce by T→F T→F
0T2 *id+id$ shift 7
0T2*7 id+id$ shift 5
0T2*7id5 +id$ reduce by F→id F→id
0T2*7F10 +id$ reduce by T→T*F T→T*F
0T2 +id$ reduce by E→T E→T
0E1 +id$ shift 6
0E1+6 id$ shift 5
0E1+6id5 $ reduce by F→id F→id
0E1+6F3 $ reduce by T→F T→F
0E1+6T9 $ reduce by E→E+T E→E+T
0E1 $ accept
Constructing SLR Parsing Tables –
LR(0) Item
⚫ An LR(0) item of a grammar G is a production of G
.
a dot at the some position of the right side.
.
⚫ Ex: A → aBb Possible LR(0) Items: A → aBb
.
(four different possibility) A → a Bb
.
A → aB b
A → aBb
⚫ Sets of LR(0) items will be the states of action and
goto table of the SLR parser.
⚫ A collection of sets of LR(0) items (the canonical
LR(0) collection) is the basis for constructing SLR
parsers.
⚫ Augmented Grammar:
G’ is G with a new production rule S’→S where S’ is