1
Syntax Analysis
Part I
2
Position of a Parser/Syntax
Analyzer in the Compiler Model
Token,
tokenval Parser/syntax analysis
Source Lexical Intermediate
and rest of
Program Analyzer representation
Get next front-end
token
Lexical error Syntax error
Semantic error
Symbol Table
3
The Parser
• A parser implements a C-F grammar as a
recognizer of strings
• The role of the parser in a compiler is twofold:
1. To check syntax (= string recognizer)
• And to report syntax errors accurately
2. To invoke semantic actions
• For static semantics checking, e.g. type checking of
expressions, functions, etc.
• For syntax-directed translation of the source code to an
intermediate representation
4
Syntax-Directed Translation
• One of the major roles of the parser is to produce
an intermediate representation (IR) of the source
program using syntax-directed translation
methods
• Possible IR output:
– Abstract syntax trees (ASTs)
– Control-flow graphs (CFGs) with triples, three-address
code, or register transfer list notation
– WHIRL (SGI Pro64 compiler) has 5 IR levels!
5
Error Handling
• A good compiler should assist in identifying and
locating errors
– Lexical errors: important, compiler can easily recover
and continue
– Syntax errors: most important for compiler, can almost
always recover
– Static semantic errors: important, can sometimes
recover
– Dynamic semantic errors: hard or impossible to detect
at compile time, runtime checks are required
– Logical errors: hard or impossible to detect
6
Viable-Prefix Property
• The viable-prefix property of parsers allows
early detection of syntax errors
– Goal: detection of an error as soon as possible
without further consuming unnecessary input
– How: detect an error as soon as the prefix of the
input does not match a prefix of any string in
the language
Error is
Error is detected here
… detected here …
Prefix Prefix DO 10 I = 1;0
for (;)
… …
7
Error Recovery Strategies
• Panic mode
– Discard input until a token in a set of designated
synchronizing tokens is found
• Phrase-level recovery
– Perform local correction on the input to repair the error
• Error productions
– Augment grammar with productions for erroneous
constructs
• Global correction
– Choose a minimal sequence of changes to obtain a
global least-cost correction
8
Grammars
• Context-free grammar is a 4-tuple
G = (N, T, P, S) where
– T is a finite set of tokens (terminal symbols)
– N is a finite set of nonterminals
– P is a finite set of productions of the form
α→β
where α ∈ (N∪T)* N (N∪T)* and β ∈
(N∪T)*
– S ∈ N is a designated start symbol
9
Notational Conventions Used
• Terminals
a,b,c,… ∈ T
specific terminals: 0, 1, id, +
• Nonterminals
A,B,C,… ∈ N
specific nonterminals: expr, term, stmt
• Grammar symbols
X,Y,Z ∈ (N∪T)
• Strings of terminals
u,v,w,x,y,z ∈ T*
• Strings of grammar symbols
α,β,γ ∈ (N∪T)*
10
Derivations
• The one-step derivation is defined by
αAβ⇒αγβ
where A → γ is a production in the grammar
• In addition, we define
– ⇒ is leftmost ⇒lm if α does not contain a nonterminal
– ⇒ is rightmost ⇒rm if β does not contain a nonterminal
– Transitive closure ⇒* (zero or more steps)
– Positive closure ⇒+ (one or more steps)
• The language generated by G is defined by
L(G) = {w ∈ T* | S ⇒+ w}
11
Derivation (Example)
Grammar G = ({E}, {+,*,(,),-,id}, P, E) with
productions P = E → E + E
E→E*E
E→(E)
E→-E
E → id
Example derivations:
E ⇒ - E ⇒ - id
E ⇒rm E + E ⇒rm E + id ⇒rm id +
id
E ⇒ E*E
E ⇒ E * E+E E ⇒ E * E+id E ⇒ E * id+id
E ⇒id * id + id
12
13
14
15
Chomsky Hierarchy: Language
Classification
• A grammar G is said to be
– Regular if it is right linear where each production is of
the form
A → w B or A → w
or left linear where each production is of the form
A → B w or A → w
– Context free if each production is of the form
A→α
where A ∈ N and α ∈ (N∪T)*
– Context sensitive if each production is of the form
αAβ→αγβ
where A ∈ N, α,γ,β ∈ (N∪T)*, |γ| > 0
– Unrestricted
16
Chomsky Hierarchy
L(regular) ⊂ L(context free) ⊂ L(context sensitive) ⊂ L(unrestricted)
Where L(T) = { L(G) | G is of type T }
That is: the set of all languages
generated by grammars G of type T
Examples:
Every finite language is regular! (construct a FSA for strings in L(G))
L1 = { anbn | n ≥ 1 } is context free
L2 = { anbncn | n ≥ 1 } is context sensitive
17
Parsing
• Universal (any C-F grammar)
– Cocke-Younger-Kasimi
– Earley
• Top-down (C-F grammar with restrictions)
– Recursive descent (predictive parsing)
– LL (Left-to-right, Leftmost derivation) methods
• Bottom-up (C-F grammar with restrictions)
– Operator precedence parsing
– LR (Left-to-right, Rightmost derivation) methods
• SLR, canonical LR, LALR
18
Top-Down Parsing
• LL methods (Left-to-right, Leftmost
derivation) and recursive-descent parsing
Grammar: Leftmost derivation:
E→T+T E ⇒lm T + T
T→(E) ⇒lm id + T
T→-E ⇒lm id + id
T → id
E E E E
T T T T T T
+ id + id + id
19
20
Recursive-Descent Parsing
21
Backtracking
using the second production of A for matching the input string
22
Left Recursion
• Productions of the form
A → A α | β| γ
are left recursive
• When one of the productions in a grammar
is left recursive then a predictive parser
loops forever on certain inputs
23
Immediate Left-Recursion
Elimination
Rewrite every left-recursive production
A → A α | β| γ| A δ
into a right-recursive production:
A → β A’| γ A’
A’ → α A’ | δ A’ | ε
24
25
26
A General Systematic Left
Recursion Elimination Method
Input: Grammar G with no cycles or ε-productions
Arrange the nonterminals in some order A1, A2, …, An
for i = 1, …, n do
for j = 1, …, i-1 do
replace each
Ai → Aj γ
with
A i → δ1 γ | δ 2 γ | … | δ k γ
where
A j → δ1 | δ 2 | … | δ k
enddo
eliminate the immediate left recursion in Ai
enddo
27
Example Left Recursion Elim.
S→ Aa|b
A → Ac | Sd | ε Choose arrangement: S, A
i = 1: nothing to do
i = 2, j = 1: A → Ac | Sd | ε
A → Ac | Aad | bd| ε
A → bdA’ |A’
A’ → cA’|adA’ | ε
28
Left Factoring
• When a nonterminal has two or more productions
whose right-hand sides start with the same
grammar symbols, the grammar is not LL(1) and
cannot be used for predictive parsing
• Replace productions
A → α β1 | α β2 | … | α βn | γ
with
A → α A’ | γ
A’ → β1 | β2 | … | βn
29
30
Examples-Left Factoring
31
32
Example-Left Recursion
33
34
Predictive Parsing: Table-Driven
Parsing
• Given an LL(1) grammar G = (N, T, P, S)
construct a table M[A,a] for A ∈ N, a ∈ T
and use a stack
input a + b $
stack
Predictive parsing
X output
program
Y
Z Parsing table
$ M
35
36
Predictive Parsing Program
37
Example
38
39
40
41
42
43
Predictive Parsing
• Eliminate left recursion from grammar
• Left factor the grammar
• Compute FIRST and FOLLOW
• Two variants:
– Recursive (recursive-descent parsing)
– Non-recursive (table-driven parsing)
44
FIRST set (1)
• FIRST(α) = { the set of terminals that begin all
strings derived from α }
FIRST(a) = {a} if a ∈ T
FIRST(ε) = {ε}
FIRST(A) = ∪A→α FIRST(α) for A→α ∈ P
FIRST(X1X2…Xk) =
if for all j = 1, …, i-1 : ε ∈ FIRST(Xj) then
add non-ε in FIRST(Xi) to FIRST(X1X2…Xk)
if for all j = 1, …, k : ε ∈ FIRST(Xj) then
add ε to FIRST(X1X2…Xk)
45
FIRST set (1.1)
46
FOLLOW set (1)
47
FOLLOW set (1.1)
• FOLLOW(A) = { the set of terminals that can
immediately follow nonterminal A }
FOLLOW(A) =
for all (B → α A β) ∈ P do
add FIRST(β)\{ε} to FOLLOW(A)
for all (B → α A β) ∈ P and ε ∈ FIRST(β) do
add FOLLOW(B) to FOLLOW(A)
for all (B → α A) ∈ P do
add FOLLOW(B) to FOLLOW(A)
if A is the start symbol S then
add $ to FOLLOW(A)
48
49
50
Example
51
52
Constructing an LL(1) Predictive
Parsing Table
for each production A → α do
for each a ∈ FIRST(α) do
add A → α to M[A,a]
enddo
if ε ∈ FIRST(α) then
for each b ∈ FOLLOW(A) do
add A → α to M[A,b]
enddo
endif
enddo
Mark each undefined entry in M error
53
LL(1) Grammar
• A grammar G is LL(1) if it is not left recursive and
for each collection of productions
A → α1 | α 2 | … | α n
for nonterminal A the following holds:
1. FIRST(αi) ∩ FIRST(αj) = ∅ for all i ≠ j
2. if αi ⇒* ε then
2.a. αj ⇒* ε for all i ≠ j
2.b. FIRST(αj) ∩ FOLLOW(A) = ∅
for all i ≠ j
54
Non-LL(1) Examples
Grammar Not LL(1) because:
S→Sa|a Left recursive
S→aS|a FIRST(a S) ∩ FIRST(a) ≠ ∅
S→aR|ε
R→S|ε For R: S ⇒* ε and ε ⇒* ε
S→aRa For R:
R→S|ε FIRST(S) ∩ FOLLOW(R) ≠ ∅
55
Consider the grammar- Construct Predictive Parsing
Table
S → i E t S S’ | a
S’ → e S | ε
E→b
56
LL(1) Grammars are
Unambiguous
A→α FIRST(α) FOLLOW(A)
S → i E t S S’ | a S → i E t S S’ i
S’ → e S | ε e$
S→a a
E→b S’ → e S e
e$
S’ → ε ε
E→b b t
Error: duplicate table entry
a b e i t $
S S→a S → i E t S S’
S’ → ε
S’ S’ → ε
S’ → e S
E E→b
57
Panic Mode Recovery
Add synchronizing actions to FOLLOW(E) = { ) $ }
undefined entries based on FOLLOW FOLLOW(E’) = { ) $ }
FOLLOW(T) = { + ) $ }
Pro: Can be automated FOLLOW(T’) = { + ) $ }
Cons: Error messages are needed FOLLOW(F) = { + * ) $ }
id + * ( ) $
E E → T E’ E → T E’ synch synch
E’ E’ → + T E’ E’ → ε E’ → ε
T T → F T’ synch T → F T’ synch synch
T’ TR → ε TR → * F TR T’ → ε T’ → ε
F F → id synch synch F→(E) synch synch
synch: the pparser pops current nonterminal A and skips input till
synch token or skips input until one of FIRST(A) is found
58
59
Phrase-Level Recovery
Change input stream by inserting missing tokens
For example: id id is changed into id * id
Pro: Can be fully automated
Cons: Recovery not always intuitive
Can then continue here
id + * ( ) $
E E → T E’ E → T E’ synch synch
E’ E’ → + T E’ E’ → ε E’ → ε
T T → F T’ synch T → F T’ synch synch
T’ insert * T’→ ε T’ → * F TR T’ → ε T’→ ε
F F → id synch synch F→(E) synch synch
insert *: pprarser inserts missing * and retries the production
60
Error Productions
E → T E’ Add “error production”:
E’ → + T E’ | ε T’ → F T’
T → F T’ to ignore missing *, e.g.: id id
T’ → * F T’ | ε Pro: Powerful recovery method
F → ( E ) | id Cons: Manual addition of productions
id + * ( ) $
E E → T E’ E → T E’ synch synch
E’ E’ → + T E’ E’ → ε E’ → ε
T T → F T’ synch T → F T’ synch synch
T’ T’ → F T’ T’→ ε T’ → * F T’ T’→ ε T’→ ε
F F → id synch synch F→(E) synch synch
61
Construct Predictive parsing table for the
following grammar and show the stack moves
for the input w=t or f and t
E → E or T | T
T → T and F | F
F → not F | (E) | t | f