School of Computing
Department of Computer Science & Engineering
10211CS107 – COMPILER DESIGN
Category : Program Core
Credit : 3
Course Handling Faculty :
Dr. T SAJU RAJ
Associate Professor
COURSE CONTENT
UNIT I Introduction to Compilers 9
Compilers, Analysis of the Source Program, The Phases of a Compiler, Cousins of
the Compiler, The Grouping of Phases, Compiler-Construction Tools. LEXICAL
ANALYSIS: Need and role of lexical analyzer-Lexical errors, Input Buffering -
Specification of Tokens, Recognition of Tokens, Design of a Lexical Analyzer
Generator.
1/28/2025 Dr.T SAJU RAJ Compiler Design 2
COURSE CONTENT
UNIT II Syntax Analysis 9
Need and role of the parser- Context Free Grammars-Top Down parsing –
Recursive Descent Parser - Predictive Parser - LL (1) Parser -Shift Reduce Parser -
LR Parser - LR (0) item - Construction of SLR Parsing table -Introduction to LALR
Parser, YACC- Design of a syntax analyser for a sample language
1/28/2025 Dr.T SAJU RAJ Compiler Design 3
Phases of Compiler
Tokens
Parse Tree
Annotated parse tree
Three Address Code
Optimized Code
Machine Code
Target Program
1/28/2025 Department of Computer Science and Engineering 4
Phases of Compiler
Error Controller
Symbol Table
Record
Information
(name,Functi Handle Errors
on,Interfaces
,etc)
1/28/2025
Target Program
Department of Computer Science and Engineering 5
Where Are We?
• Source code: if (b==0) a = “Hi”;
Lexical Analysis
• Token Stream: if( b == 0) a = “Hi”;
Syntactic Analysis
• Abstract Syntax Tree if
• (AST) Semantic Analysis
== = ;
b 0 a “Hi”
Do tokens conform to the language syntax?
1/28/2025 Department of Computer Science and Engineering 6
Roles of the Lexical analyzer
➢ Read input characters from the source program
➢ Helps to identify token into the symbol table
➢ Removes white spaces and comments from the source program
➢ Correlates error messages with the source program-Lexical errors
and Project
Management
(SEPM)
1/28/2025 Department of Computer Science and Engineering 7
Syntax Analysis
• Syntax analysis recognizes the syntactic structure of
the programming language and transforms a string
of tokens into a tree of tokens and syntactic
categories
• Parser is the program that performs syntax analysis
1/28/2025 Department of Computer Science and Engineering 8
Outline
• Role of parser
• Context free grammars
• Top down parsing
• Bottom up parsing
• Parser generators
1/28/2025 Department of Computer Science and Engineering 9
The role of parser
token
Source Lexical Parse tree Rest of Front Intermediate
Parser
program Analyzer End representation
getNext
Token
Symbol
table
1/28/2025 Department of Computer Science and Engineering 10
Parsing
• A.K.A. Syntax Analysis
– Recognize sentences in a language.
– Discover the structure of a document/program.
– Construct (implicitly or explicitly) a tree (called as a
parse tree) to represent the structure.
– The above tree is used later to guide translation.
1/28/2025 Department of Computer Science and Engineering 11
Parsing During Compilation
regular
expressions errors
lexical token rest of intermediate
source parser parse
representation
program analyzer get next tree front end
token
symbol
• Collecting token
table information
• Perform type checking
• uses a grammar to check structure of tokens • Intermediate code
generation
• produces a parse tree
• syntactic errors and recovery
• recognize correct syntax
• report errors
1/28/2025 Department of Computer Science and Engineering 12
Parsing Responsibilities
Syntax Error Identification / Handling
Recall typical error types:
• Syntactic : Omission, wrong order of tokens
• if ((x<1) & (y>5)))
• Semantic Incompatible Types, Unidentified ID’s
• if ((x<1) & (y>5.0)))
• Logical : Infinite Loop/ recursive call
• Should be <= not <
• Lexical : Misspellings
• if x<1 then y = 5:
1/28/2025 Department of Computer Science and Engineering 13
Error Reporting is Not a Trivial Task
• Difficult to generate clear and accurate error messages.
Example
function foo () {
...
if (...) {
...
} else {
...
Missing } here
...
}
<eof> Not detected until here
Example
int myVarr;
...
x = myVar; Misspelled ID here
...
Not detected until here
1/28/2025 Department of Computer Science and Engineering 14
Error Recovery
• After first error recovered
– Compiler must go on!
• Restore to some state and process the rest of the input
• Error-Correcting Compilers
– Issue an error message
– Fix the problem
– Produce an executable
Example
Error on line 23: “myVarr” undefined.
“myVar” was used.
1/28/2025 Department of Computer Science and Engineering 15
Error Recovery Trigger More Errors!
• Inadequate recovery may introduce more errors
– Those were not programmers errors
• Example:
int myVar flag ;
...
Declaration of flag is discarded
x := flag;
...
... Variable flag is undefined
while (flag==0)
... Variable flag is undefined
Too many Error message may be obscuring
– May bury the real message
– Remedy: allow 1 message per token or per statement
• Quit after a maximum (e.g. 100) number of errors
1/28/2025
• Department of Computer Science and Engineering 16
Panic Mode – E.R.A
• Discard tokens until we see a “synchronizing” token.
Example
Skip to next occurrence of
} end ;
Resume by parsing the next statement
• The key...
– Good set of synchronizing tokens
– Knowing what to do then
• Advantage
– Simple to implement Commonly used
– Does not go into infinite loop
• Disadvantage
– May skip over large sections of source with some errors
1/28/2025 Department of Computer Science and Engineering 17
Phrase-Level Recovery - ERA
• Compiler corrects the program
by deleting or inserting tokens
...so it can proceed to parse from where it was.
Example
while (x==4) y:= a + b
Insert do to fix the statement
• The key...
Don’t get into an infinite loop
1/28/2025 Department of Computer Science and Engineering 18
Context Free Grammars (CFG)
• A context free grammar is a formal model that consists of:
• Terminals
Keywords
Token Classes
Punctuation
• Non-terminals
Any symbol appearing on the lefthand side of any rule
• Start Symbol
Usually the non-terminal on the lefthand side of the first rule
• Rules (or “Productions”)
BNF: Backus-Naur Form / Backus-Normal Form
Stmt ::= if Expr then Stmt else Stmt
1/28/2025 Department of Computer Science and Engineering 19
Rule Alternative Notations
1/28/2025 Department of Computer Science and Engineering 20
Context Free Grammars : A First Look
assign_stmt → id := expr ;
expr → expr operator term
expr → term
term → id
term → real
term → integer
operator → +
operator → -
Derivation: A sequence of grammar rule applications and
substitutions that transform a starting non-term into a sequence
of terminals / tokens.
1/28/2025 Department of Computer Science and Engineering 21
Derivation
Let’s derive: id := id + real – integer ; using production:
assign_stmt → id := expr ; assign_stmt
→ id := expr ;
expr → expr operator term
→id := expr operator term;
expr → expr operator term
→id := expr operator term operator term;
expr → term
→ id := term operator term operator term;
term → id
→ id := id operator term operator term;
operator → +
→ id := id + term operator term;
term → real
→ id := id + real operator term;
operator → -
→ id := id + real - term;
term → integer → id := id + real - integer;
Department of Computer Science and Engineering 22
Arithmetic Expressions
expr → expr op expr
expr → ( expr )
expr → - expr
expr → id
op → + 9 Production rules
op → -
op → *
op → /
op →
Terminals: id + - * / ( )
Nonterminals: expr, op
Start symbol: expr
1/28/2025 Department of Computer Science and Engineering 23
Notational Conventions
• Terminals
– Lower-case letters early in the alphabet: a, b, c
– Operator symbols: +, -
– Punctuations symbols: parentheses, comma
– Boldface strings: id or if
• NonTerminals:
– Upper-case letters early in the alphabet: A, B, C
– The letter S (start symbol)
– Lower-case italic names: expr or stmt
• Upper-case letters late in the alphabet, such as X, Y, Z,
represent either nonterminals or terminals.
• Lower-case letters late in the alphabet, such as u, v, …, z,
represent strings of terminals.
1/28/2025 Department of Computer Science and Engineering 24
Notational Conventions
• Lower-case Greek letters, such as , , , represent strings of
grammar symbols. Thus A→ indicates that there is a single
nonterminal A on the left side of the production and a string of
grammar symbols to the right of the arrow.
• If A→ 1, A→ 2, …., A→ k are all productions with A on the
left, we may write A→ 1| 2| …. | k
• Unless otherwise started, the left side of the first production is
the start symbol.
E → E A E | ( E ) | -E | id
A→+|-| *| / |
1/28/2025 Department of Computer Science and Engineering 25
Derivations
Doesn’t contain nonterminals
1/28/2025 Department of Computer Science and Engineering 26
Derivation
1/28/2025 Department of Computer Science and Engineering 27
Derivation
1/28/2025 Department of Computer Science and Engineering 28
Leftmost Derivation
1/28/2025 Department of Computer Science and Engineering 29
Rightmost Derivation
1/28/2025 Department of Computer Science and Engineering 30
Parse Tree
1/28/2025 Department of Computer Science and Engineering 31
Parse Tree
1/28/2025 Department of Computer Science and Engineering 32
Parse Tree
1/28/2025 Department of Computer Science and Engineering 33
Parse Tree
1/28/2025 Department of Computer Science and Engineering 34
Ambiguous Grammar
1/28/2025 Department of Computer Science and Engineering 35
Ambiguous Grammar
• More than one Parse Tree for some sentence.
– The grammar for a programming language may be
ambiguous
– Need to modify it for parsing.
• Also: Grammar may be left recursive.
• Need to modify it for parsing.
1/28/2025 Department of Computer Science and Engineering 36
Elimination of Ambiguity
• Ambiguous
• A Grammar is ambiguous if there are multiple parse
trees for the same sentence.
• Disambiguation
• Express Preference for one parse tree over others
– Add disambiguating rule into thegrammar
1/28/2025 Department of Computer Science and Engineering 37
Parser
Parsing means determining the syntactic
structure of an expression
and Project
Management
(SEPM)
1/28/2025 Department of Computer Science and Engineering 38
Example of Parsing
X=(A-(D*F))
1/28/2025 Department of Computer Science and Engineering 39
Types of Parser
and Project
Management
(SEPM)
1/28/2025 Department of Computer Science and Engineering 40
Top Down Parser Vs Bottom Up Parser
Top-down Bottom-up
Parsing Parsing
Initiates from
Initiates from Leaves
Root
Production is used to derive
Starts from the token and then go
and check the similarity in the
to the start symbol.
string.
Type of derivation Leftmost Type of derivation Rightmost
derivation derivation
and Project
Management
(SEPM)
1/28/2025 Department of Computer Science and Engineering 41
Top-down parsers
Top-down parsing expands a parse tree from the start
symbol to the leaves
Always expand the leftmost non-terminal
id + id * id
1/28/2025 Department of Computer Science and Engineering 42
Top-down parsers
Apply E-> E+E E
id + id * id
1/28/2025 Department of Computer Science and Engineering 43
Top-down parsers
Apply E-> id E
id + id * id
1/28/2025 Department of Computer Science and Engineering 44
Top-down parsers
Apply E-> E*E E
E
E E
id + id * id
1/28/2025 Department of Computer Science and Engineering 45
Top-down parsers
Apply E-> id E
E
E E
id + id * id
1/28/2025 Department of Computer Science and Engineering 46
Top-down parsers
Apply E-> id E
E
E E
id + id * id
1/28/2025 Department of Computer Science and Engineering 47
Bottom-up Parser
Start at the leaves and grow toward root
And just as efficient
Builds on ideas in top-down parsing
Preferred method in practice
Also called LR parsing
L means that tokens are read left to right
R means that it constructs a rightmost derivation -->
1/28/2025 Department of Computer Science and Engineering 48
Bottom-up Parser
1/28/2025 Department of Computer Science and Engineering 49
Example of Bottom-up Parser
14
int + (int) + (int)
int + ( int ) + ( int )
1/28/2025 Department of Computer Science and Engineering 50
Example of Bottom-up Parser
15
int + (int) + (int)
E + (int) + (int)
int + ( int ) + ( int )
1/28/2025 Department of Computer Science and Engineering 51
Example of Bottom-up Parser
16
int + (int) + (int)
E + (int) + (int)
E + (E) + (int)
E E
int + ( int ) + ( int )
1/28/2025 Department of Computer Science and Engineering 52
Example of Bottom-up Parser
17
int + (int) + (int)
E + (int) + (int)
E + (E) + (int)
E + (int)
E E
int + ( int ) + ( int )
1/28/2025 Department of Computer Science and Engineering 53
Example of Bottom-up Parse
18
int + (int) + (int)
E + (int) + (int)
E + (E) + (int)
E + (int)
E + (E)
E
E E E
int + ( int ) + ( int )
1/28/2025 Department of Computer Science and Engineering 54
Example of Bottom-up Parse
19
int + (int) + (int) E
E + (int) + (int)
E + (E) + (int)
E + (int)
E + (E)
E E
E E E
int + ( int ) + ( int )
1/28/2025 Department of Computer Science and Engineering 55
Classification of Parsers
Parsers
Top Down Bottom Up Parsers
Parsers (Shift Reduce Parsers)
with full without Operator Precedence
LR Parsers
Backtracking Backtracking Parsers
Brute Force Recursive
Method LR (0)
Descent Parsers
Non recursive Descent Simple LR or
Parsers or LL(1) SLR(1)
Canonical LR or
CLR(1) or LR(1)
LALR(1)
1/28/2025 Department of Computer Science and Engineering 56
TOP DOWN PARSING
➢ Find a left-most derivation
➢ Find (build) a parse tree
➢ St a r t building from the root and work down…
➢ As we search for a derivation
➢ Must make choices:
➢ Which rule to use
➢ Where to use it
➢ May r u n into problems!!
18
1/28/2025 Department of Computer Science and Engineering 57
TOP-DOWN PARSING
Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule
does not work, we backtrack to try other alternatives.)
It is a general parsing technique, but not widely used.
Not efficient
Predictive Parsing
No backtracking
Efficient
Needs a special form of grammars (LL(1) grammars).
Recursive Predictive Parsing is a special form of
Recursive Descent parsing without backtracking.
Non-Recursive (Table Driven) Predictive Parser is also
known as LL(1) parser.
19
1/28/2025 Department of Computer Science and Engineering 58
EXAMPLE
Consider the grammar S → cAd
A→ab|a and the input string w = cad.
To construct a parse tree for this string using top- down approach,
Step 1:
Initially create a tree with single node labeled S. An input pointer points to
‘c’, the first symbol of w. Expand the tree with the production of S.
c A d
28-01-2025 Department of Computer Science and Engineering 59
EXAMPLE
Step2:
➢ The leftmost leaf ‘c’ matches the first symbol of w, so advance the
input pointer to the second symbol of w ‘a’ and consider the next leaf
‘A’.
➢ Expand A using the first alternative.
c A d
a b
28-01-2025 Department of Computer Science and Engineering 60
EXAMPLE
Step3:
➢ The second symbol ‘a’ of w also matches with second leaf of tree.
➢ So advance the input pointer to third symbol of w ‘d’.
➢ But the third leaf of tree is b which does not match with the input
symbol d.
➢ Hence discard the chosen production and reset the pointer to second
position. This is called backtracking.
28-01-2025 Department of Computer Science and Engineering 61
EXAMPLE
Step4:
Now try the second alternative for A.
S
c A d
Now we can halt and announce the successful completion of parsing.
28-01-2025 Department of Computer Science and Engineering 62
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 63
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 64
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 65
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 66
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 67
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 68
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 69
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 70
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 71
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 72
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 73
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 74
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 75
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 76
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 77
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
1/28/2025 Department of Computer Science and Engineering 78
R E C U R S I V E D E S C E N T PA R S I N G
(BACKTRACKING)
36
Successfully parsed!!
1/28/2025 Department of Computer Science and Engineering 79
R E C U R S I V E D E S C E N T PA R S I N G - Algorithm
A recursive-descent parsing program consists of a set of
procedures – one for each non-terminal
Execution begins with the procedure for the s ta r t symbol
Announces success if the procedure body scans the entire input
void A(){
for (j=1 to t){ /* assume there is t number of A-productions */
Choose a A-production, Aj→X1X2…Xk;
for (i=1 to k){
if (Xi is a non-terminal)
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else backtrack in input and reset the pointer
}
} 37
}
1/28/2025 Department of Computer Science and Engineering 80
PREDICTIVE PARSING
8
1
1/28/2025 Department of Computer Science and Engineering 81
PREDICTIVE PARSING
LL(1) Grammars Left Recursion- X
Can do predictive parsing Non determinism- X
Can select the right rule
Looking at only the next 1 input symbol
First L : Left to Right Scanning
Second L: Leftmost derivation
1 : one input symbol look-ahead for predictive decision
LL(k) Grammars
Can do predictive parsing
Can select the right rule
Looking at only the next k input symbols
Techniques to modify the grammar:
Left Factoring
Removal of Left Recursion
LL(k) Language 5
Can be described with an LL(k) grammar
1/28/2025 Department of Computer Science and Engineering 82
FIRST FUNCTION
FIRST(E) = {(, id}
FIRST(E) = {+, }
FIRST(T) = {(, id}
• FIRST (E) = FIRST(T) = {*, }
• FIRST (E’) = FIRST(F) = {(, id}}
• FIRST (T) = 44
• FIRST (T’) =
1/28/2025 • FIRST (F) =Department of Computer Science and Engineering 83
COMPUTING FOLLOW(A)
1. If A is s t a r t symbol, put $ in FOLLOW(A)
2. Productions of the form B → A ,
Add FIRST() – {} to FOLLOW(A)
3. Productions of the form B → A or
B → A wh ere *
Add FOLLOW(B) to FOLLOW(A)
1/28/2025 Department of Computer Science and Engineering 84
EXAMPLE
FOLLOW(E) = {$}
FOLLOW(E) =
FOLLOW(T) =
FOLLOW(T) =
FOLLOW(F) =
FIRST(E) = {(, id} Using rule #1
FIRST(E) = {+, }
FIRST(T) = {(, id} 1. If A is start symbol, put $ in FOLLOW(A)
FIRST(T) = {*, }
FIRST(F) = {(, id}} 51
Assume the first non-terminal is the start symbol
1/28/2025 Department of Computer Science and Engineering 85
EXAMPLE
FOLLOW(E) = {$, )}
FOLLOW(E) =
FOLLOW(T) = {+}
FOLLOW(T) =
FOLLOW(F) = {*}
FIRST(E) = {(, id}
FIRST(E) = {+, } Using rule #2
FIRST(T) = {(, id} 2. Productions of the form B → A ,
Add FIRST() – {} to FOLLOW(A)
FIRST(T) = {*, }
FIRST(F) = {(, id}}
52
1/28/2025 Department of Computer Science and Engineering 86
EXAMPLE
FOLLOW(E) = {$, )}
FOLLOW(E) = FOLLOW(E)
= {$, )}
FOLLOW(T) = {+} FOLLOW(E)
= {+, $, )}
FOLLOW(T) = FOLLOW(T)
= {+, $, )}
FIRST(E) = {(, id}
FOLLOW(F) = {*} FOLLOW(T)
FIRST(E) = {+, } = {*, +, $, )}
FIRST(T) = {(, id}
Using rule #3
FIRST(T) = {*, }
3. Productions of the form B → A or
FIRST(F) = {(, id}}
B → A where *
53
Add FOLLOW(B) to FOLLOW(A)
1/28/2025 Department of Computer Science and Engineering 87
EXAMPLE
S → ( A) |
A→TE
E→&TE|
T→(A)|a|b|c
FOLLOW(S) ={$}
FIRST(T) ={(,a,b,c} FOLLOW(A) = { )}
FIRST(E) = {&, } FOLLOW(E) = FOLLOW(A) = { )}
FIRST(A) ={(,a,b,c} FOLLOW(T) = FIRST(E) FOLLOW(E)
FIRST(S) = {(,}
= {&, )}
55
1/28/2025 Department of Computer Science and Engineering 88
EXAMPLE
S→aSe|B FIRST(C) =
FOLLOW(C) =
B→bBCf|C FIRST(B) =
C→cCg|d| FIRST(S) = FOLLOW(B) =
FOLLOW(S) = {$}
Assume the first non-terminal is the start symbol
1. If A is start symbol, put $ in FOLLOW(A)
2. Productions of the form B → A , Add FIRST() – {} to FOLLOW(A)
3. Productions of the form B → A or B → A where *
Add FOLLOW(B) to FOLLOW(A)
56
1/28/2025 Department of Computer Science and Engineering 89
EXAMPLE
S→aSe|B FOLLOW(C) =
B→bBCf|C {f,g} FOLLOW(B)
C→cCg|d|
= {c,d,e,f,g,$}
FIRST(C) = {c,d,} FOLLOW(B) =
FIRST(B) = {b,c,d,} {c,d} FOLLOW(S)
FIRST(S) = {a,b,c,d,}
= {c,d,e,$}
FOLLOW(S) = {$, e }
57
1/28/2025 Department of Computer Science and Engineering 90
FOLLOW EXAMPLE
S→aSe|B FOLLOW(C) =
B→bBCf|C
C→cCg|d| FOLLOW(B) =
FIRST(C) = FOLLOW(S) = {$}
FIRST(B) = Assume the first non-terminal is the start symbol
FIRST(S) = 1. If A is start symbol, put $ in FOLLOW(A)
2. Productions of the form B → A
, Add FIRST() – {} to
FOLLOW(A)
3. Productions of the form B→ A or
B → A where *
Add FOLLOW(B) to FOLLOW(A) 2
1/28/2025 Department of Computer Science and Engineering 91
FOLLOW EXAMPLE
S→aSe|B FOLLOW(C) =
B→bBCf|C {f,g} FOLLOW(B)
C→cCg|d|
= {c,d,e,f,g,$}
FIRST(C) = {c,d,} FOLLOW(B) =
FIRST(B) = {b,c,d,} {c,d} FOLLOW(S)
FIRST(S) = {a,b,c,d,}
= {c,d,e,$}
FOLLOW(S) = {$, e }
9
2
1/28/2025 Department of Computer Science and Engineering 92
FOLLOW EXAMPLE
S→ aSe|B FOLLOW(C) =
B→ bBCf|C {f,g} FOLLOW(B)
C → c C g | d |
= {c,d,e,f,g,$}
FIRST(C) = {c,d,} FOLLOW(B) =
FIRST(B) = {b,c,d,} {c,d} FOLLOW(S)
FIRST(S) = {a,b,c,d,}
= {c,d,e,$}
FOLLOW(S) = {$, e }
9
3
1/28/2025 Department of Computer Science and Engineering 93
PREDICTIVE PARSING
9
4
1/28/2025 Department of Computer Science and Engineering 94
PREDICTIVE PARSING
LL(1) Grammars
Can do predictive parsing
Can select the right rule
Looking at only the next 1 input symbol
First L : Left to Right Scanning
Second L: Leftmost derivation
1 : one input symbol look-ahead for predictive decision
LL(k) Grammars
Can do predictive parsing
Can select the right rule
Looking at only the next k input symbols
Techniques to modify the grammar:
Left Factoring
Removal of Left Recursion
LL(k) Language 5
Can be described with an LL(k) grammar
1/28/2025 Department of Computer Science and Engineering 95
TABLE DRIVEN PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 96
PARSE TABLE CONSTRUCTION
1/28/2025 Department of Computer Science and Engineering 97
TABLE DRIVEN PREDICTIVE
PARSING
1/28/2025 Department of Computer Science and Engineering 98
TABLE DRIVEN PREDICTIVE
PARSING
1/28/2025 Department of Computer Science and Engineering 99
FOLLOW (DO IT YOURSELF)
DIY!!!!!!!!!!!
First & Follow of the Given G
1
0
0
1/28/2025 Department of Computer Science and Engineering 100
FIRST AND FOLLOW
1/28/2025 Department of Computer Science and Engineering 101
FIRST AND FOLLOW
1/28/2025 Department of Computer Science and Engineering 102
FIRST AND FOLLOW
1/28/2025 Department of Computer Science and Engineering 103
FIRST AND FOLLOW
1/28/2025 Department of Computer Science and Engineering 104
FIRST AND FOLLOW
1/28/2025 Department of Computer Science and Engineering 105
FIRST AND FOLLOW
1/28/2025 Department of Computer Science and Engineering 106
FIRST AND FOLLOW
1/28/2025 Department of Computer Science and Engineering 107
FIRST AND FOLLOW
1/28/2025 Department of Computer Science and Engineering 108
PREDICTIVE PARSING ALGORITHM
1/28/2025 Department of Computer Science and Engineering 109
PREDICTIVE PARSING ALGORITHM
Production Rule First Follow
E → TE’ { (, id } { $, )}
E’ → +TE’ | ϵ { +, ϵ } {$, ) }
T → FT’ {(, id } {+, $, )}
T’→*FT’ | ϵ { *, ϵ} {+, $, )}
F → (E)| id { (, id} { *, +, $, )}
1/28/2025 Department of Computer Science and Engineering 110
PREDICTIVE PARSING
Production First Follow
Rule
E → TE’ { (, id } { $, )}
E’ → +TE’ | ϵ { +, ϵ } {$, ) }
T → FT’ {(, id } {+, $, )}
T’→*FT’ | ϵ { *, ϵ} {+, $, )}
F → € | id { (, id} { *, +, $, )}
1/28/2025 Department of Computer Science and Engineering 111
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 112
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 113
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 114
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 115
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 116
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 117
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 118
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 119
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 120
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 121
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 122
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 123
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 124
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 125
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 126
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 127
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 128
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 129
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 130
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 131
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 132
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 133
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 134
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 135
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 136
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 137
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 138
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 139
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 140
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 141
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 142
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 143
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 144
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 145
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 146
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 147
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 148
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 149
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 150
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 151
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 152
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 153
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 154
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 155
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 156
PREDICTIVE PARSING
1/28/2025 Department of Computer Science and Engineering 157
RECONSTRUCTING THE PARSE TREE
1/28/2025 Department of Computer Science and Engineering 158
RECONSTRUCTING THE PARSE TREE
1/28/2025 Department of Computer Science and Engineering 159
RECONSTRUCTING THE PARSE TREE
1/28/2025 Department of Computer Science and Engineering 160
EXAMPLE: THE “DANGLING ELSE”
GRAMMAR
1/28/2025 Department of Computer Science and Engineering 161
EXAMPLE: THE “DANGLING ELSE”
GRAMMAR
1/28/2025 Department of Computer Science and Engineering 162
EXAMPLE: THE “DANGLING ELSE”
GRAMMAR
1/28/2025 Department of Computer Science and Engineering 163
EXAMPLE: THE “DANGLING ELSE”
1/28/2025 Department of Computer Science and Engineering 164
EXAMPLE: THE “DANGLING ELSE”
1/28/2025 Department of Computer Science and Engineering 165
EXAMPLE: THE “DANGLING ELSE”
1/28/2025 Department of Computer Science and Engineering 166
EXAMPLE: THE “DANGLING ELSE”
1/28/2025 Department of Computer Science and Engineering 167
EXAMPLE: THE “DANGLING ELSE”
1/28/2025 Department of Computer Science and Engineering 168
EXAMPLE: THE “DANGLING ELSE”
1/28/2025 Department of Computer Science and Engineering 169
EXAMPLE: THE “DANGLING ELSE”
1/28/2025 Department of Computer Science and Engineering 170
EXAMPLE: THE “DANGLING ELSE”
1/28/2025 Department of Computer Science and Engineering 171
LL(1) GRAMMAR
LL(1) Grammar
LL(1) grammars
Are never ambiguous.
Will never have left recursion.
Furthermore...
If we are looking for an “A” and the next symbol is
“b”,
Then only one production must be possible
Although elimination of left recursion and left
factoring is easy.
Some grammar will never be a LL(1) grammar. 72
1/28/2025 Department of Computer Science and Engineering 172
PROPERTIES OF LL(1) GRAMMAR
A grammar G is LL(1) if an only if whenever A
→ are two distinct productions of G the
following conditions hold:
1. For no terminal a do both and derive strings
beginning with a.
2. At most one of and can derive the empty string.
3. If then * then does not derive any string
beginning with a terminal in FOLLOW(A).
1/28/2025 Department of Computer Science and Engineering 173
ERROR RECOVERY
When Do Errors Occur? Recall Predictive Parser Function:
a + b $ Input
Stack X Predictive Parsing Output
Program
Y
Z
Parsing Table
$ M[A,a]
1. If X is a terminal and it doesn’t match input.
2. If M[X, Input] is empty – No allowable actions 75
1/28/2025 Department of Computer Science and Engineering 174
ERROR RECOVERY
76
1/28/2025 Department of Computer Science and Engineering 175
ERROR RECOVERY: SKIP INPUT SYMBOLS
1/28/2025 Department of Computer Science and Engineering 176
ERROR RECOVERY: POP THE STACK
78
1/28/2025 Department of Computer Science and Engineering 177
PREDICTIVE PARSING
Production First Follow
Rule
E → TE’ { (, id } { $, )}
E’ → +TE’ | ϵ { +, ϵ } {$, ) }
T → FT’ {(, id } {+, $, )}
T’→*FT’ | ϵ { *, ϵ} {+, $, )}
F → € | id { (, id} { *, +, $, )}
synch synch
synch synch synch
synch synch synch synch
1/28/2025 Department of Computer Science and Engineering 178
ERROR RECOVERY: PANIC MODE
1/28/2025 Department of Computer Science and Engineering 179
ERROR RECOVERY - TABLE ENTRIES
5. Phrase Mode Error Recovery
1/28/2025 Department of Computer Science and Engineering 180
BOTTOM-UP PARSING
1/28/2025 Department of Computer Science and Engineering 181
RIGHTMOST DERIVATION IN REVERSE
1/28/2025 Department of Computer Science and Engineering 182
RIGHTMOST DERIVATION IN REVERSE
LR parsing corresponds to rightmost derivation in reverse
1/28/2025 Department of Computer Science and Engineering 183
REDUCTION
• A reduction step replaces a specific substring (matching the body of a production)
• Reduction is the opposite of derivation
• Bottom up parsing is a process of reducing a string to the start symbol S of the
grammar
1/28/2025 Department of Computer Science and Engineering 184
HANDLE
1/28/2025 Department of Computer Science and Engineering 185
HANDLE PRUNING
Grammar
Reduction of id + id * id
to start symbol E
Rightmost Derivation for
id + id * id
1/28/2025 Department of Computer Science and Engineering 186
SHIFT-REDUCE PARSING
1/28/2025 Department of Computer Science and Engineering 187
SHIFT-REDUCE PARSING
• It takes the given input string and builds a parse tree
• Starting from the bottom at the leaves.
• And growing the tree towards the top to the root.
1/28/2025 Department of Computer Science and Engineering 188
SHIFT-REDUCE PARSING
Data Structures-
Two data structures are required to implement a shift-
reduce parser
- A Stack is required to hold the grammar symbols.
- An Input buffer is required to hold the string to be
parsed.
1/28/2025 Department of Computer Science and Engineering 189
SHIFT-REDUCE PARSING
Working
Initially, shift-reduce parser is present in the following
configuration where-
-- Stack contains only the $ symbol.
-- Input buffer contains the input string with $ at its end.
1/28/2025 Department of Computer Science and Engineering 190
SHIFT-REDUCE PARSING
The parser works by
- Moving the input symbols on the top of the stack.
- Until a handle β appears on the top of the stack.
The parser keeps on repeating this cycle until
- An error is detected.
- Or stack is left with only the start symbol and the input buffer
becomes empty.
1/28/2025 Department of Computer Science and Engineering 191
SHIFT-REDUCE PARSING
After achieving this configuration,
- The parser stops / halts.
- It reports the successful completion of parsing.
1/28/2025 Department of Computer Science and Engineering 192
SHIFT-REDUCE PARSING
Possible Actions : A shift-reduce parser can possibly make
the following four actions-
1. Shift-
In a shift action,
The next symbol is shifted onto the top of the stack.
2. Reduce-
In a reduce action,
The handle appearing on the stack top is replaced with the
appropriate non-terminal symbol.
1/28/2025 Department of Computer Science and Engineering 193
SHIFT-REDUCE PARSING
3. Accept
In an accept action,
The parser reports the successful completion of parsing.
4. Error
In this state,
The parser becomes confused and is not able to make any
decision.
It can neither perform shift action nor reduce action nor accept
action.
1/28/2025 Department of Computer Science and Engineering 194
SHIFT-REDUCE PARSING
Rules To Remember
- It is important to remember the following rules while
performing the shift-reduce action-
- If the priority of incoming operator is more than the priority of
in stack operator, then shift action is performed.
- If the priority of incoming operator is same or less than the
priority of in stack operator, then reduce action is performed.
1/28/2025 Department of Computer Science and Engineering 195
SHIFT-REDUCE PARSING
Consider the following grammar-
E→E–E
E→ExE
E → id
Parse the input string id – id x id using a shift-reduce parser.
Solution-
The priority order is: id > x > –
1/28/2025 Department of Computer Science and Engineering 196
SHIFT-REDUCE PARSING
E→E–E
E→ExE
E → id
Stack Input Buffer Parsing Action
$ id – id x id $ Shift
$ id – id x id $ Reduce E → id
$E – id x id $ Shift
$E– id x id $ Shift
$ E – id x id $ Reduce E → id
$E–E x id $ Shift
$E–Ex id $ Shift
$ E – E x id $ Reduce E → id
$E–ExE $ Reduce E → E x E
$E–E $ Reduce E → E – E
$E $ Accept
1/28/2025 Department of Computer Science and Engineering 197
SHIFT-REDUCE PARSING
S→aTRe Remaining input: abbcde
T→Tbc|b
R→d
Rightmost derivation:
S ➔ aTRe
➔ aTde
➔ a Tb c de
➔abbcde
1/28/2025 Department of Computer Science and Engineering 198
SHIFT-REDUCE PARSING
S→aTRe Remaining input: bcde
T→Tbc|b
R→d
➔Shift a, Shift b
a b Rightmost derivation:
S ➔ aTRe
➔ aTde
➔ a Tb c de
➔abbcde
1/28/2025 Department of Computer Science and Engineering 199
SHIFT-REDUCE PARSING
S→aTRe Remaining input: bcde
T→Tbc|b
R→d
➔Shift a, Shift b
➔Reduce T → b
T
a b
Rightmost derivation:
S ➔ aTRe
➔ aTde
➔ a T b c de
➔ ab b c de
1/28/2025 Department of Computer Science and Engineering 200
SHIFT-REDUCE PARSING
S→aTRe Remaining input: de
T→Tbc|b
R→d
➔Shift a, Shift b
➔Reduce T → b
T
➔Shift b, Shift c
a b b c
Rightmost derivation:
S ➔ aTRe
➔ aTde
➔ a T b c de
➔ ab b c de
1/28/2025 Department of Computer Science and Engineering 201
SHIFT-REDUCE PARSING
S→aTRe Remaining input: de
T→Tbc|b
R→d
➔Shift a, Shift b T
➔Reduce T → b
T
➔Shift b, Shift c
➔Reduce T → T b c
a b b c
Rightmost derivation:
S ➔ aTRe
➔ aTde
➔ a T b c de
➔ ab b c de
1/28/2025 Department of Computer Science and Engineering 202
SHIFT-REDUCE PARSING
•S → a T R e T Remaining input: e
→Tbc|b R
→d T
T
• ➔Shift a, Shift b
• ➔Reduce T → b a b b c d
Rightmost derivation:
• ➔Shift b, Shift c S ➔ aTRe
• ➔Reduce T → T b c ➔ aTde
• ➔Shift d ➔ a T b c de
➔ ab b c de
1/28/2025 Department of Computer Science and Engineering 203
SHIFT-REDUCE PARSING
•S → a T R e T Remaining input: e
→Tbc|b R
→d T
T R
• ➔Shift a, Shift b
• ➔Reduce T → b a b b c d
Rightmost derivation:
• ➔Shift b, Shift c S ➔ aTRe
• ➔Reduce T → T b c ➔ aTde
• ➔Shift d ➔ a T b c de
➔ ab b c de
• ➔Reduce R → d
1/28/2025 Department of Computer Science and Engineering 204
SHIFT-REDUCE PARSING
•S → a T R e T Remaining input:
→Tbc|b R
→d T
T R
• ➔Shift a, Shift b
• ➔Reduce T → b a b b c d e
Rightmost derivation:
• ➔Shift b, Shift c S ➔ aTRe
• ➔Reduce T → T b c ➔ aTde
• ➔Shift d ➔ a T b c de
➔ ab b c de
• ➔Reduce R → d
• ➔Shift e
1/28/2025 Department of Computer Science and Engineering 205
SHIFT-REDUCE PARSING
S→aTRe Remaining input:
T→Tbc|b
R→d S
➔Shift a, Shift b T
➔Reduce T → b
T R
➔Shift b, Shift c
➔Reduce T → T b c
➔Shift d a b b c d e
Rightmost derivation:
➔Reduce R → d S ➔ aTRe
➔Shift e
➔ aTde
➔Reduce S → a T R e ➔ a T b c de
➔ ab b c de
1/28/2025 Department of Computer Science and Engineering 206
SHIFT-REDUCE PARSING - EXAMPLE
Consider the grammar:
Stack Input Action
$ id1 + id2$ shift
$id1 + id2$ reduce 6
$F + id2$ reduce4
$T + id2$ reduce2
$E + id2$ shift
$E + id2$ shift
$E + id2 reduce6
$E + F reduce 4
$E + T reduce 1
$E accept
1/28/2025 Department of Computer Science and Engineering 207
SHIFT-REDUCE PARSING
• Handle will always appear on Top of stack, never
inside
• Possible forms of two successive steps in any
rightmost derivation
STACK INPUT
• CASE 1:
S
$ yz$
A
After Reducing the
yz$
B handle
y $B
S * Az Byz
yz z Shifting from Input z$
rm rm rm
$By
Reduce the handle
$A z$
1/28/2025 Department of Computer Science and Engineering 208
SHIFT-REDUCE PARSING
STACK INPUT
• Case 2: $ xyz$
S After Reducing the handle
B A
$B xyz$
x y z
Shifting from Input
S * BxAz Bxyz
xyz $Bxy z$
rm rm rm
Reducing the handle
$BxA
z$
1/28/2025 Department of Computer Science and Engineering 209
SHIFT-REDUCE CONFLICT
1. stmt → id(parameter_list)
2. stmt → expr:=expr
3. parameter_list → parameter_list,
parameter
4. parameter_list → parameter
5. parameter_list → id
6. expr → id(expr_list)
7. expr → id
8. expr_list → expr_list, expr
9. expr_list → expr
STACK INPUT
….id ( id , id ) …$
We can’t decide which production will be used to
reduce id?
1/28/2025 Department of Computer Science and Engineering 210
SHIFT-REDUCE PARSERS
1. Operator-Precedence Parser
– simple, but only a small class of grammars.
CFG
LR SLR LALR
2. LR-Parsers
– covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (lookhead LR parser)
– SLR, LR and LALR work same, only their parsing tables are different.
1/28/2025 Department of Computer Science and Engineering 211
LR PARSERS
LR parsing is attractive because:
– LR parsing is most general non-backtracking shift-reduce
parsing,
yet it is still efficient.
– The class of grammars that can be parsed using LR methods is
a proper superset of the class of grammars that can be parsed
with predictive parsers.
LL(1)-Grammars LR(1)-Grammars
– An LR-parser can detect a syntactic error as soon as it is possible
to do so a left-to-right scan of the input.
– LR parsers can be constructed to recognize virtually all programming
language constructs for which CFG grammars canbe written
Drawback of LR method:
– Too much work to construct LR parser by hand
• Fortunately tools (LR parsers generators) are available
1/28/2025 Department of Computer Science and Engineering 212
LL VS. LR
LR (shift reduce) is more powerful than LL
(predictive parsing)
Can detect a syntactic error as soon as possible.
LR is difficult to do by hand (unlike LL)
1/28/2025 Department of Computer Science and Engineering 213
LR PARSING ALGORITHM
input a1 ... ai ... an $
stack
Sm
Xm
Sm-1 LR Parsing
Xm-1 Algorithm Output
.
.
S1 Action Table Goto Table
X1 terminals and $ non-terminal
S0 s four different s
t actions t each item is
a a a state number
t t
e e
s s
1/28/2025 Department of Computer Science and Engineering 214
CONFIGURATION OF LR PARSING
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )
Stack Rest of Input
• Sm and ai decides the parser action by consulting the
parsing action table. (Initial Stack
contains just So)
• A configuration of a LR parsing represents the right
sentential form:
X1 ... Xm ai ai+1 ... an $
1/28/2025 Department of Computer Science and Engineering 215
ACTIONS OF A LR-PARSER
1. shift s -- shifts the next input symbol and the state s onto the stack
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) € ( So X1
S1 ... Xm Sm ai s, ai+1 ... an $ )
2. reduce A→ (or rN where N is a production number)
– pop 2|| (=r) items from the stack;
– then push Aand s where s=goto[sm-r,A]
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) € ( So X1 S1 ... Xm-r
Sm-r A s, ai ... an $ )
– Output is the reducing production reduce A→
Accept – Parsing successfully completed
Error -- Parser detected an error (an empty entry in the action table)
1/28/2025 Department of Computer Science and Engineering 216
ACTIONS OF A LR-PARSER
Reduce Action
• pop 2||(=r)items from the stack;
• let us assume that = Y1Y2...Yr
• then push A and s where s=goto[sm-r,A]
( So X1 S1 ... Xm-r Sm-r Y1 Sm-r ...Yr Sm, ai ai+1 ... an $ )
€ ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )
• In fact, Y1Y2...Yr is a handle.
X1 ... Xm-r A ai ... an $ c X1 ... Xm Y1...Yr ai ai+1 ...
an $
1/28/2025 Department of Computer Science and Engineering 217
LR PARSER STACK(S)
1/28/2025 Department of Computer Science and Engineering 218
LR PARSER STACK(S)
1/28/2025 Department of Computer Science and Engineering 219
CONSTRUCTING SLR PARSING TABLES – LR(0) ITEM
• An item indicates how much of a production we have seen at a given point in the
parsing process
• For Example the item A → X • YZ
– We have already seen on the input a string derivable from X
– We hope to see a string derivable from YZ
• For Example the item A → • XYZ
– We hope to see a string derivable from XYZ
• For Example the item A → XYZ •
– We have already seen on the input a string derivable from XYZ
– It is possibly time to reduce XYZ toA
• Special Case:
Rule: A → yields only one item
A→•
1/28/2025 Department of Computer Science and Engineering 220
CONSTRUCTING SLR PARSING TABLES
• A collection of sets of LR(0) items (the canonical LR(0) collection) is
the basis for constructing SLR parsers.
• Canonical LR(0) collection provides the basis of constructing a
DFA called LR(0) automaton
– This DFA is used to make parsing decisions
• Each state of LR(0) automaton represents a set of
items in the canonical LR(0) collection
• To construct the canonical LR(0) collection for a grammar
– Augmented Grammar
– CLOSURE function
– GOTO function
1/28/2025 Department of Computer Science and Engineering 221
CONSTRUCTING SLR PARSING TABLES – LR(0) ITEM
• An LR(0) item of a grammar G is a production of G a dot at the some position of the
right side.
A → • aBb
• Ex: A → aBb
A → a • Bb
A → aB • b
Possible LR(0) Items: A → aBb •
(four different
possibility)
• Sets of LR(0) items will be the states of action and goto table of the SLR parser.
– States represent sets of “items”
• LR parser makes shift-reduce decision by maintaining states to keep track of where we
are in a parsing process
1/28/2025 Department of Computer Science and Engineering 222
GRAMMAR AUGMENTATION
1/28/2025 Department of Computer Science and Engineering 223
THE CLOSURE OPERATION
• If I is a set of LR(0) items for a grammar G, then closure(I) is the set of
LR(0) items constructed from I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I).
2. If A → .B is in closure(I) and
B→ is a production rule of G;
then B→. will be in the closure(I).
We will apply this rule until no more new LR(0) items can be
added to closure(I).
1/28/2025 Department of Computer Science and Engineering 224
CONSTRUCTION OF THE CANONICAL LR(0) COLLECTION
• To create the SLR parsing tables for a grammar G, we will
create the canonical LR(0) collection of the grammar G’.
• Algorithm:
C is { closure({S'→ • S}) }
repeat the followings until no more set of LR(0) items can
be added to C.
for each I in C and each grammar symbol X
if GOTO(I,X) is not empty and not in C
add GOTO(I,X) to C
• GOTO function is a DFA on the sets in C.
1/28/2025 Department of Computer Science and Engineering 225
CONSTRUCTION OF THE CANONICAL LR(0) COLLECTION
I0: E’ → .E I1: E’ → E. I6: E → E+.T I9: E → E+T.
E → .E+T E → E.+T T → .T*F T → T.*F
E → .T T → .F
T → .T*F I2: E → T. F → .(E) I10: T → T*F.
T → T.*F F → .id
T → .F
F → .(E)
F → .id I3: T → F. I7: T → T*.F I11: F → (E).
F → .(E)
I4: F → (.E) F → .id
E → .E+T
E → .T I8: F → (E.)
T → .T*F E → E.+T
I5: F → id. T → .F
F → .id F → .(E)
1/28/2025 Department of Computer Science and Engineering 226
CONSTRUCTING SLR PARSING TABLE
1. Construct the canonical collection of sets of LR(0) items for G’.
C{I0,...,In}
2. Create the parsing action table as follows:
If a is a terminal, A→.a in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
If A→. is in Ii , then action[i,a] is reduce A→
for all a in FOLLOW(A) where AS’.
If S’→S. is in Ii , thenaction[i,$] is accept.
If any conflicting actions generated by these rules, the grammar is
not SLR(1).
3. Create the parsing goto table
for all non-terminals A,
if goto(Ii,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are errors.
5. Initial state of the parser contains S’→.S
1/28/2025 Department of Computer Science and Engineering 227
PARSING TABLES OF EXPRESSION GRAMMAR
Action Table Goto Table
state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
1/28/2025 Department of Computer Science and Engineering 228
PARSING TABLES OF EXPRESSION GRAMMAR
Action Table Goto Table
state id + * ( ) $ E T F
0 s5 s4 1 2 3
Key to Notation
1 s6 acc
S4=“Shift input symbol and 2 r2 s7 r2 r2
push state 4” 3 r4 r4 r4 r4
R5= “Reduce by rule 5” 4 s5 s4 8 2 3
Acc=Accept 5 r6 r6 r6 r6
(blank)=Syntax Error 6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
1/28/2025 Department of Computer Science and Engineering 229
1/28/2025 Dr.T SAJU RAJ 230
EXAMPLE LR PARSE: (ID+ID)*ID
1/28/2025 Department of Computer Science and Engineering 231
EXAMPLE LR PARSE: (ID+ID)*ID
1/28/2025 Department of Computer Science and Engineering 232
EXAMPLE LR PARSE: (ID+ID)*ID
1/28/2025 Department of Computer Science and Engineering 233
ACTIONS OF A (S)LR-PARSER -- EXAMPLE
stack input action output
0 id*id+id$ shift 5
0id5 *id+id$ reduce by F→id F→id
0F3 *id+id$ reduce by T→F T→F
0T2 *id+id shift 7
0T2*7 $ shift 5
id+id
$
0T2*7id5 +id$ reduce by F→id F→id
0T2*7F10 +id$ reduce by T→T*F T→T*F
0T2 +id$ reduce by E→T E→T
0E1 +i shift 6
0E1+6 d$ shift 5
0E1+6id5 id$ reduce by F→id F→id
$
0E1+6F3 $ reduce by T→F T→F
0E1+6T9 $ reduce by E→E+T E→E+
1/28/2025 Department of Computer Science and Engineering 234
LR PARSING ALGORITHM
1/28/2025 Department of Computer Science and Engineering 235
SLR GRAMMAR: REVIEW
An LR parser using SLR parsing tables for a
grammar G is called as the SLR parser for G.
If a grammar G has an SLR parsing table, it is called SLR grammar (or SLR
grammar in short).
Every SLR grammar is unambiguous, but every unambiguous grammar is not
a SLR grammar.
If the SLR parsing table of a grammar G has a conflict, we say that that
grammar is not SLR grammar.
1/28/2025 Department of Computer Science and Engineering 236
CONFLICT EXAMPLE
1/28/2025 Department of Computer Science and Engineering 237
CONFLICT EXAMPLE
1/28/2025 Department of Computer Science and Engineering 238
CONFLICT EXAMPLE
1/28/2025 Department of Computer Science and Engineering 239
TASK 01
1. Derive the (i) LL(1) parse table, (ii) LR(0)
automation, (iii) LR(0) parse table of following
grammar:
S-> SS+
S-> SS*
S-> a
2. Parse the following string using (i) LL (1) parse
table, (ii) LR (0) parse table:
“aa+a*a+”
2
1/28/2025 Department of Computer Science and Engineering 240
LR(1) ITEM
To avoid some of invalid reductions, the states need to carry more
information.
Extra information is put into a state by including a terminal symbol
as a second component in an item.
A LR(1) item is:
.
A → ,a where a is the look-head of the LR(1) item
(a is a terminal or end-marker.)
1/28/2025 Department of Computer Science and Engineering 241
LR(1) ITEM
1/28/2025 Department of Computer Science and Engineering 242
INTUITION BEHIND LR (1) ITEMS
1/28/2025 Department of Computer Science and Engineering 243
INTUITION BEHIND LR (1) ITEMS
1/28/2025 Department of Computer Science and Engineering 244
THE CLOSURE FUNCTION
1/28/2025 Department of Computer Science and Engineering 245
THE CLOSURE FUNCTION
1/28/2025 Department of Computer Science and Engineering 246
THE GOTO FUNCTION
1/28/2025 Department of Computer Science and Engineering 247
1/28/2025 Dr.T SAJU RAJ 248
LR (1) EXAMPLE
10
1/28/2025 Department of Computer Science and Engineering 249
LR (1) EXAMPLE
10
1/28/2025 Department of Computer Science and Engineering 250
LR (1) EXAMPLE
10
1/28/2025 Department of Computer Science and Engineering 251
LR (1) EXAMPLE
10
1/28/2025 Department of Computer Science and Engineering 252
LR (1) EXAMPLE
10
1/28/2025 Department of Computer Science and Engineering 253
CONSTRUCTION OF LR(1) PARSING
1. Construct the canonical collection of sets of LR(1) items for
G’. C{I0,...,In}
2. Create the parsing action table as follows
• .
If a is a terminal, A→ a ,b in Ii and goto(Ii,a)=Ij then action[i,a] is
shift j.
• .
If A→ ,a is in Ii , then action[i,a] is reduce A→ where AS’.
• .
If S’→S ,$ is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the grammar is not
LR(1).
3. Create the parsing goto table
• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are errors.
12
5. Initial state of the parser contains S’→.S,$
1/28/2025 Department of Computer Science and Engineering 254
LALR Parsing Tables
• LALR stands for LookAhead LR.
• LALR parsers are often used in practice because
LALR parsing tables are smaller than LR(1) parsing
tables.
• The number of states in SLR and LALR parsing tables
for a grammar G are equal.
• But LALR parsers recognize more grammars than
SLR parsers.
• Astate of LALR parser will be again a set of LR(1)
items.
1/28/2025 Department of Computer Science and Engineering 255
CREATING LALR PARSING TABLES
Canonical LR(1) Parser € LALR Parser
shrink # of states
• This shrink process may introduce a reduce/reduce
conflict in the resulting LALR parser (so the grammar
is NOT LALR)
• But, this shrink process does not produce a
shift/reduce conflict.
1/28/2025 Department of Computer Science and Engineering 256
Reduce/Reduce Conflict
• But, we may introduce a reduce/reduce conflict during the
shrink process for the creation of the states of a LALR parser.
.
I1 : A → ,a I2: A → ,b.
.
B → ,b B → ,c .
.
I12: A → ,a/b € reduce/reduce conflict
.
B → ,b/c
1/28/2025 Department of Computer Science and Engineering 257
1/28/2025 Dr.T SAJU RAJ 258
Canonical LR or CLR(1) Parsing table
1/28/2025 Department of Computer Science and Engineering 259
1/28/2025 Dr.T SAJU RAJ 260
Merging of States
1/28/2025 Department of Computer Science and Engineering 261
Merging of States (contd.)
1/28/2025 Department of Computer Science and Engineering 262
Canonical Collection of LR(1) items
1/28/2025 Department of Computer Science and Engineering 263
Automatic construction of parsers (YACC)
• Two classical tools for compilers:
– Lex: A Lexical Analyzer Generator
– Yacc: “Yet Another Compiler Compiler” (Parser Generator)
• Lex creates programs that scan your tokens one by one.
• Yacc takes a grammar (sentence structure) and generates a
parser.
Lexical Rules Grammar Rules
Lex Yacc
Input yylex() yyparse() Parsed Input
1/28/2025 Department of Computer Science and Engineering 264
YACC specifications.
• Lex and Yacc generate C code for your analyzer & parser.
Lexical Rules Grammar Rules
C code Lex C code Yacc
Parsed
Input yylex() yyparse()
token Input
char C code stream C code
stream
LexicalAnalyzer Parser
(Tokenizer)
1/28/2025 Department of Computer Science and Engineering 265
Automatic construction of parsers (YACC)
• Often, instead of the standard Lex and Yacc,
Flex and Bison are used:
– Flex: A fast lexical analyzer
– (GNU) Bison: A drop-in replacement for (backwards
compatible with) Yacc
• Byacc is Berkeley implementation of Yacc (so it
is Yacc).
1/28/2025 Department of Computer Science and Engineering 106 266
Automatic construction of parsers (YACC)
• Yacc is not a new tool, and yet, it is still used in many
projects.
• Yacc syntax is similar to Lex/Flex at the top level.
• Lex/Flex rules were regular expression – action pairs.
• Yacc rules are grammar rule – action pairs.
declarations
%%
rules
%%
programs
1/28/2025 Department of Computer Science and Engineering 107 267
Automatic construction of parsers (YACC)
1/28/2025 Department of Computer Science and Engineering 108 268
Automatic construction of parsers (YACC)
• The Translation Rules Part
1/28/2025 Department of Computer Science and Engineering 110 269
Automatic construction of parsers (YACC)
1/28/2025 Department of Computer Science and Engineering 115 270
Design of a syntax analyzer
yacc –d bas.y # create y.tab.h, y.tab.c
lex bas.l # create lex.yy.c
cc lex.yy.c y.tab.c –o bas.exe # compile/link
1/28/2025 Department of Computer Science and Engineering 116 271
Design of a syntax analyzer
1/28/2025 Department of Computer Science and Engineering 120 272
Design of a syntax analyzer
1/28/2025 Department of Computer Science and Engineering 122 273
Design of a syntax analyzer
1/28/2025 Department of Computer Science and Engineering 127 274
Thank You
Department of Computer Science and Engineering