MATRUSRI
ENGINEERING COLLEGE
MATRUSRI ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT NAME: Compiler Construction
FACULTY NAME: L. Raghavendar Raju, Assistant Professor
25-10-2020 Compiler Construction 1
MATRUSRI
ENGINEERING COLLEGE
Compiler Construction
COURSE OBJECTIVES:
➢ To introduce the steps in language translation pipeline and
runtime data structures used in translation
➢ To learn about Scanning (lexical analysis) process using
regular expressions and use of LEX to generate scanner
➢ To introduce different Parsing strategies including top-
down (e.g., recursive descent, Early parsing, or LL) and
bottom-up (e.g., backtracking or LR) techniques
➢ Describe semantic analyses using an attribute grammar
➢ To learn how to build symbol tables and generate
intermediate code.
➢ To introduce techniques of program analysis and code
25-10-2020
optimization Compiler Construction 2
MATRUSRI
ENGINEERING COLLEGE
Compiler Construction
COURSE OUTCOMES:
After completing this course, the student will be able to
1. Create lexical rules and grammars for a given language
2.Generate scanners and parsers from declarative
specifications.
3. Describe an abstract syntax tree for a small language.
4. Use program analysis techniques for code optimization
5. Develop the compiler for a subset of a given language
25-10-2020 Compiler Construction 3
MATRUSRI
ENGINEERING COLLEGE
UNIT-II
Context Free Grammars & Parsing: The parsing process
Context free grammars
Parse tree &Abstract syntax trees,
EBNF and syntax diagrams, and Properties of CFLs.
Top Down Parsing: Recursive descent parsing, LL (1) parsing,
First and follow sets, Recursive descent parser, and
Error recovery in top down parsers.
OUTCOMES:
1. Generate scanners and parsers from declarative specifications.
2. Describe an abstract syntax tree for a small language.
25-10-2020 Compiler Construction 4
MATRUSRI
ENGINEERING COLLEGE
MODULE-I
CONTENTS:
Context Free Grammars & Parsing:
The parsing process, Context free grammars,
Parse tree &Abstract syntax trees,
EBNF and syntax diagrams, and Properties of CFLs.
OUTCOMES:
Generate scanners and parsers from declarative specifications
25-10-2020 Compiler Construction 5
MATRUSRI
ENGINEERING COLLEGE
The role of parser
token
Source Lexical Parse tree Rest of Front Intermediate
Parser
program Analyzer End representation
getNext
Token
Symbol
table
25-10-2020 Compiler Construction 6
MATRUSRI
ENGINEERING COLLEGE
Uses of grammars
E -> E + T | T
T -> T * F | F
F -> (E) | id
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
25-10-2020 Compiler Construction 7
MATRUSRI
ENGINEERING COLLEGE
Error handling
• Common programming errors
• Lexical errors
• Syntactic errors
• Semantic errors
• Lexical errors
• Error handler goals
• Report the presence of errors clearly and accurately
• Recover from each error quickly enough to detect subsequent errors
• Add minimal overhead to the processing of correct progrms
25-10-2020 Compiler Construction 8
MATRUSRI
ENGINEERING COLLEGE
Error-recover strategies
• Panic mode recovery
• Discard input symbol one at a time until one of designated set of
synchronization tokens is found
• Phrase level recovery
• Replacing a prefix of remaining input by some string that allows the parser to
continue
• Error productions
• Augment the grammar with productions that generate the erroneous constructs
• Global correction
• Choosing minimal sequence of changes to obtain a globally least-cost
correction
25-10-2020 Compiler Construction 9
MATRUSRI
ENGINEERING COLLEGE
Context free grammars
• Terminals
• Nonterminals expression -> expression + term
• Start symbol expression -> expression – term
• productions expression -> term
term -> term * factor
term -> term / factor
term -> factor
factor -> (expression)
factor -> id
25-10-2020 Compiler Construction 10
MATRUSRI
ENGINEERING COLLEGE
Derivations
Rightmost and
leftmost
derivations
Productions are treated • E -> E + E | E * E | -E |
as rewriting rules to (E) | id
generate a string • Derivations for –(id+id)
• E => -E => -(E) => -
(E+E) => -(id+E)=>-
(id+id)
25-10-2020 Compiler Construction 11
MATRUSRI
ENGINEERING COLLEGE
Parse trees
• -(id+id)
• E => -E => -(E) => -(E+E) => -(id+E)=>-
(id+id)
25-10-2020 Compiler Construction 12
MATRUSRI
ENGINEERING COLLEGE
Ambiguity
• For some strings there exist more than one parse tree
• Or more than one leftmost derivation
• Or more than one rightmost derivation
• Example: id+id*id
25-10-2020 Compiler Construction 13
MATRUSRI
ENGINEERING COLLEGE
Elimination of ambiguity
25-10-2020 Compiler Construction 14
MATRUSRI
ENGINEERING COLLEGE
Elimination of ambiguity (cont.)
• Idea:
• A statement appearing between a then and an else must be matched
25-10-2020 Compiler Construction 15
MATRUSRI
ENGINEERING COLLEGE
Elimination of left recursion
• A grammar is left recursive if it has a non-terminal A such that there
is a derivation A=> Aα +
• Top down parsing methods cant handle left-recursive grammars
• A simple rule for direct left recursion elimination:
• For a rule like:
• A -> A α|β
• We may replace it with
• A -> β A’
• A’ -> α A’ | ɛ
25-10-2020 Compiler Construction 16
MATRUSRI
ENGINEERING COLLEGE
Left recursion elimination (cont.)
• There are cases like following
• S -> Aa | b
• A -> Ac | Sd | ɛ
• Left recursion elimination algorithm:
• Arrange the nonterminals in some order A1,A2,…,An.
• For (each i from 1 to n) {
• For (each j from 1 to i-1) {
• Replace each production of the form Ai-> Aj γ by the production Ai -> δ1 γ | δ2 γ | …
|δk γ where Aj-> δ1 | δ2 | … |δk are all current Aj productions
• }
• Eliminate left recursion among the Ai-productions
• }
25-10-2020 Compiler Construction 17
MATRUSRI
ENGINEERING COLLEGE
Left factoring
• Left factoring is a grammar transformation that is useful for producing
a grammar suitable for predictive or top-down parsing.
• Consider following grammar:
• Stmt -> if expr then stmt else stmt
• | if expr then stmt
• On seeing input if it is not clear for the parser which production to use
• We can easily perform left factoring:
• If we have A->αβ1 | αβ2 then we replace it with
• A -> αA’
• A’ -> β1 | β2
25-10-2020 Compiler Construction 18
MATRUSRI
ENGINEERING COLLEGE
Left factoring (cont.)
• Algorithm
• For each non-terminal A, find the longest prefix α common to two or more of
its alternatives. If α<> ɛ, then replace all of A-productions A->αβ1 |αβ2 | … |
αβn | γ by
• A -> αA’ | γ
• A’ -> β1 |β2 | … | βn
• Example:
• S -> I E t S | i E t S e S | a
• E -> b
25-10-2020 Compiler Construction 19
MATRUSRI
ENGINEERING COLLEGE
MODULE-II
CONTENTS:
Top Down Parsing: Recursive descent parsing,
LL (1) parsing,
First and follow sets,
Recursive descent parser, and
Error recovery in top down parsers.
OUTCOMES:
Describe the parsing techniques for given languages
25-10-2020 Compiler Construction 20
MATRUSRI
ENGINEERING COLLEGE
Introduction
• A Top-down parser tries to create a parse tree from the root
towards the leafs scanning input from left to right
• It can be also viewed as finding a leftmost derivation for an
input string
• Example: id+id*id
E -> TE’ E
lm
E
lm
E
lm
E
lm
E
lm
E
E’ -> +TE’ | Ɛ T E’ T E’ T E’ T E’ T E’
T -> FT’
T’ -> *FT’ | Ɛ F T’ F T’ F T’ F T’ + T E’
F -> (E) | id id id Ɛ id Ɛ
25-10-2020 Compiler Construction 21
MATRUSRI
ENGINEERING COLLEGE
Recursive descent parsing
• Consists of a set of procedures, one for each nonterminal
• Execution begins with the procedure for start symbol
• A typical procedure for a non-terminal
void A() {
choose an A-production, A->X1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
}
25-10-2020 } Compiler Construction 22
MATRUSRI
ENGINEERING COLLEGE
Recursive descent parsing (cont)
• General recursive descent may require backtracking
• The previous code needs to be modified to allow
backtracking
• In general form it cant choose an A-production easily.
• So we need to try all alternatives
• If one failed the input pointer needs to be reset and another
alternative should be tried
• Recursive descent parsers cant be used for left-recursive
grammars
25-10-2020 Compiler Construction 23
MATRUSRI
ENGINEERING COLLEGE
Example
S->cAd
A->ab | a Input: cad
S S S
c A d c A d c A d
a b a
25-10-2020 Compiler Construction 24
MATRUSRI
ENGINEERING COLLEGE
First and Follow
• First() is set of terminals that begins strings derived from
*
• If α=>ɛ then is also in First(ɛ)
• In predictive parsing when we have A-> α|β, if First(α) and First(β) are
disjoint sets then we can select appropriate A-production by looking at the
next input
• Follow(A), for any nonterminal A, is set of terminals a that can appear
immediately after A in* some sentential form
• If we have S => αAaβ for some αand βthen a is in Follow(A)
• If A can be the rightmost symbol in some sentential form, then $ is in
Follow(A)
25-10-2020 Compiler Construction 25
MATRUSRI
ENGINEERING COLLEGE
Computing First
• To compute First(X) for all grammar symbols X, apply following rules
until no more* terminals or ɛ can be added to any First set:
1. If X is a terminal then First(X) = {X}.
2. If X is a nonterminal and X->Y1Y2…Yk is a production for some k>=1,
then place a in First(X) if for some i a is in First(Yi) and ɛ is in all of
First(Y1),…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is in First(Yj) for j=1,…,k
then add ɛ to First(X).
3. If X-> ɛ is a production then add ɛ to First(X)
*
• Example!
25-10-2020 Compiler Construction 26
MATRUSRI
ENGINEERING COLLEGE
Computing follow
• To compute First(A) for all nonterminals A, apply following rules until
nothing can be added to any follow set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything in First(β) except ɛ is in
Follow(B).
3. If there is a production A->B or a production A->αBβ where First(β)
contains ɛ, then everything in Follow(A) is in Follow(B)
• Example!
25-10-2020 Compiler Construction 27
MATRUSRI
ENGINEERING COLLEGE
LL(1) Grammars
• Predictive parsers are those recursive descent parsers needing no
backtracking
• Grammars for which we can create predictive parsers are called LL(1)
• The first L means scanning input from left to right
• The second L means leftmost derivation
• And 1 stands for using one input symbol for lookahead
• A grammar G is LL(1) if and only if whenever A-> α|β are two distinct
productions of G, the following conditions hold:
• For no terminal a do α and β both derive strings beginning with a
• At most one of* α or βcan derive empty string
• If α=> ɛ then β does not derive any string beginning with a terminal in Follow(A).
25-10-2020 Compiler Construction 28
MATRUSRI
ENGINEERING COLLEGE
Construction of predictive parsing table
• For each production A->α in grammar do the following:
1. For each terminal a in First(α) add A-> in M[A,a]
2. If ɛ is in First(α), then for each terminal b in Follow(A) add A-> ɛ to M[A,b].
If ɛ is in First(α) and $ is in Follow(A), add A-> ɛ to M[A,$] as well
• If after performing the above, there is no production in M[A,a] then
set M[A,a] to error
25-10-2020 Compiler Construction 29
MATRUSRI
ENGINEERING COLLEGE
First
Example Follow
F {(,id} {+, *, ), $}
E -> TE’ {(,id} {+, ), $}
E’ -> +TE’ | Ɛ T
E {(,id} {), $}
T -> FT’
T’ -> *FT’ | Ɛ E’ {+,ɛ} {), $}
T’ {*,ɛ} {+, ), $}
F -> (E) | id
Input Symbol
Non -
terminal id + * ( ) $
E E -> TE’ E -> TE’
E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ
T T -> FT’ T -> FT’
T’ T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ
25-10-2020 Compiler Construction 30
F F -> id F -> (E)
MATRUSRI
ENGINEERING COLLEGE
Another example
S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b
Input Symbol
Non -
terminal a b e i t $
S S -> a S -> iEtSS’
S’ S’ -> Ɛ S’ -> Ɛ
S’ -> eS
E E -> b
25-10-2020 Compiler Construction 31
MATRUSRI
ENGINEERING COLLEGE
Non-recursive predicting parsing
a + b $
Predictive
parsing output
stack X
Y program
Z
$
Parsing
Table
25-10-2020
M
Compiler Construction 32
MATRUSRI
ENGINEERING COLLEGE
Predictive parsing algorithm
Set ip point to the first symbol of w;
Set X to the top stack symbol;
While (X<>$) { /* stack is not empty */
if (X is a) pop the stack and advance ip;
else if (X is a terminal) error();
else if (M[X,a] is an error entry) error();
else if (M[X,a] = X->Y1Y2..Yk) {
output the production X->Y1Y2..Yk;
pop the stack;
push Yk,…,Y2,Y1 on to the stack with Y1 on top;
}
set X to the top stack symbol;
}
25-10-2020 Compiler Construction 33
MATRUSRI
ENGINEERING COLLEGE
Example
• id+id*id$
Matched Stack Input Action
E$ id+id*id$
25-10-2020 Compiler Construction 34
MATRUSRI
ENGINEERING COLLEGE
Error recovery in predictive parsing
• Panic mode
• Place all symbols in Follow(A) into synchronization set for nonterminal A: skip tokens until an
element of Follow(A) is seen and pop A from stack.
• Add to the synchronization set of lower level construct the symbols that begin higher level
constructs
• Add symbols in First(A) to the synchronization set of nonterminal A
• If a nonterminal can generate the empty string then the production deriving can be used as a
default
• If a terminal on top of the stack cannot be matched, pop the terminal, issue a message saying
that the terminal was insterted
25-10-2020 Compiler Construction 35
MATRUSRI
ENGINEERING COLLEGE
Non - Input Symbol
Example terminal
E
id
E -> TE’
+ * ( )
E -> TE’ synch
$
synch
E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ
T -> FT’ synch T -> FT’ synch synch
T
T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ
T’
F F -> id synch synch F -> (E) synch synch
Stack Input Action
E$ )id*+id$ Error, Skip )
E$ id*+id$ id is in First(E)
TE’$ id*+id$
FT’E’$ id*+id$
idT’E’$ id*+id$
T’E’$ *+id$
*FT’E’$ *+id$
FT’E’$ +id$ Error, M[F,+]=synch
T’E’$ +id$ F has been poped
25-10-2020 Compiler Construction 36