Parsing review
Top-down parsers
start at the root of derivation tree and ll in
choosing production for nonterminal is the key
problem
LL (predictive) parsers use lookahead to choose
production
Bottom-up parsers
start at leaves and ll in
choosing right-hand side (rhs) of production is
the key problem
LR parsers use current stack and lookahead to
choose production
LR(k) parsers are more powerful than LL(k)
parsers because they can see the entire rhs before
choosing a production.
CPSC 434 Lecture 8, Page 1
Some denitions
For a grammar G, with start symbol S , any string
such that S is called a sentential form.
)
If contains only terminal symbols, is a
sentence in L(G).
If contains one or more non-terminals, it is
just a sentential form (not a sentence in L(G)).
A left-sentential form is a sentential form that
occurs in the leftmost derivation of some sentence.
A right-sentential form is a sentential form that
occurs in the rightmost derivation of some sentence.
CPSC 434 Lecture 8, Page 2
Bottom-up parsing
Goal:
Given an input string w and a grammar G,
construct a parse tree by starting at the leaves
and working to the root.
The parser repeatedly matches the right-hand side
rhs of a production against a substring in the
current right-sentential form.
At each match, it applies a reduction to build the
parse tree.
each reduction replaces the matched substring
with the nonterminal on the left-hand side lhs of
the production
each reduction adds an internal node to the
current parse tree
the result is another right-sentential form
The nal result is a rightmost derivation, in reverse.
CPSC 434 Lecture 8, Page 3
Example
Consider the grammar
1 goal ::= a A B
h i h i h i e
2 A ::= A b c
h i h i
3 b j
4 B ::= d
h i
and the input string abbcde.
Prod'n. Sentential Form Handle
| abbcde 3,2
3 a A bcde h i 2,4
2 a A de h i 4,3
4 aA Be h ih i 1,4
1 goal h i |
The trick appears to be scanning the input and
nding valid sentential forms.
CPSC 434 Lecture 8, Page 4
Handles
We trying to nd a substring of the current
right-sentential form where:
matches some production A ::=
reducing to A is one step in the reverse of a
rightmost derivation.
We will call such a string a handle.
Formally,
a handle of a right-sentential form
is a
production A ::= and a position in
where
may be found.
If (A ::= ;k) is a handle, then replacing the
in
at position k with A produces the previous
right-sentential form in a rightmost derivation
of
.
Because
is a right-sentential form, the substring
to the right of a handle contains only terminal
symbols.
CPSC 434 Lecture 8, Page 5
Handles
Provable fact:
If G is unambiguous, then every right-sentential
form has a unique handle.
Proof: (by denition)
1. G is unambiguous rightmost derivation is
)
unique.
2. a unique production A ::= applied to take
)
?1 to
i i
3. a unique position k at which A ::= is
)
applied
4. a handle (A ::= ;k)
)
CPSC 434 Lecture 8, Page 6
Example
The left-recursive expression grammar
(original form, before left factoring)
1 goal
h ::= expr
i h i
2 expr
h ::= expr + term
i h i h i
3 expr - term
j h i h i
4 term j h i
5 term ::= term * factor
h i h i h i
6 term / factor
j h i h i
7 factor j h i
8 factor ::= num
h i
9 j id
Prod'n. Sentential Form Handle
| hgoal i |
1 hexpr i 1,1
3 hexpr - term
i h i 3,3
5 hexpr - term * factor
i h i h i 5,5
9 hexpr - term * id
i h i 9,5
7 hexpr - factor * id
i h i 7,3
8 hexpr - num * id
i 8,3
4 hterm - num * id
i 4,1
7 hfactor - num * id
i 7,1
9 id - num * id 9,1
CPSC 434 Lecture 8, Page 7
Handle-pruning
The process we use to construct a bottom-up parse
is called handle-pruning.
To construct a rightmost derivation
S =
0
1
2
) ) ) )
?1 n )
= w;
n
we set i to n and apply the following simple
algorithm
do i = n
to 1 by -1
(1)
find the handle (A ::= ; k )
i iin i
i
(2)
replace i with A i to generate
?1
i
This takes 2n steps, where n is the length of the
derivation
CPSC 434 Lecture 8, Page 8
Shift-reduce parsing
One scheme to implement a handle-pruning,
bottom-up parser is called a shift-reduce parser.
Shift-reduce parsers use a stack and an input buer
1. initialize stack with $
2. Repeat until the top of the stack is the goal
symbol and the input token is \end of le"
a) nd the handle
if we don't have a handle on top of the stack,
shift an input symbol onto the stack
b) prune the handle
if we have a handle (A ::= ; k) on the stack,
reduce
i) pop symbols o the stack
j j
ii) push A onto the stack
CPSC 434 Lecture 8, Page 9
Back to \x "
- 2 * y
Stack Input Handle Action
$ id - num * id none shift
$id - num * id 9,1 reduce 9
$ factor
h i - num * id 7,1 reduce 7
$ term
h i - num * id 4,1 reduce 4
$ expr
h i - num * id none shift
$ expr -
h i num * id none shift
$ expr -
h i num * id 8,3 reduce 8
$ expr -
h i factor
h i * id 7,3 reduce 7
$ expr -
h i term
h i * id none shift
$ expr -
h i term *
h i id none shift
$ expr -
h i term * id
h i 9,5 reduce 9
$ expr -
h i term * factor
h i h i 5,5 reduce 5
$ expr -
h i term
h i 3,3 reduce 3
$ expr
h i 1,1 reduce 1
$ goal
h i none accept
1. Shift until top of stack is the right end of a
handle
2. Find the left end of the handle and reduce
5 shifts + 9 reduces + 1 accept
CPSC 434 Lecture 8, Page 10
Shift-reduce parsing
Shift-reduce parsers
are simple to understand
have a simple, table-driven, shift-reduce skeleton
encode grammatical knowledge in tables
A shift-reduce parser has just four canonical
actions:
1. shift | next input symbol is shifted onto the
top of the stack
2. reduce | right end of handle is on top of stack;
locate left end of handle within the stack;
pop handle o stack and push appropriate
non-terminal lhs
3. accept | terminate parsing and signal success
4. error | call an error recovery routine
CPSC 434 Lecture 8, Page 11
LR(1) grammars
Informally, we say that a grammar G is LR(1) if,
given a rightmost derivation
S =
0
1
2
) ) ) )
= w;
n
we can, for each right-sentential form in the
derivation,
1. isolate the handle of each right-sentential form,
and
2. determine the production by which to reduce
by scanning
from left to right, going at most 1
i
symbol beyond the right end of the handle of
. i
Formality will come later.
CPSC 434 Lecture 8, Page 12
Why study LR(1) grammars?
LR(1) grammars are often used to construct
shift-reduce parsers.
We call these parsers LR(1) parsers.
everyone's favorite parser (EFP)
virtually all context-free programming language
constructs can be expressed in an LR(1) form
LR grammars are the most general grammars
that can be parsed by a non-backtracking,
shift-reduce parser
ecient shift-reduce parsers can be implemented
for LR(1) grammars
LR parsers detect an error as soon as possible in
a left-to-right scan of the input
LR grammars describe a proper superset of the
languages recognized by LL (predictive) parsers
CPSC 434 Lecture 8, Page 13
Table-driven LR(1) parsing
A table-driven LR(1) parser looks like
stack
6
?
table-
source - scanner - driven - il
code parser
6
?
action
& goto
tables
Stack two items per state: state and symbol
Table building tools are readily available (yacc)
We'll learn how to build these tables by hand!
CPSC 434 Lecture 8, Page 14
LR(1) parsing
The skeleton parser:
token = next token()
repeat forever
s = top of stack
if action[s,token] = "shift s "'
i then
push token
push i s
token = next token()
else if action[s,token] =
"reduce A ::=
" then
pop 2
j
j symbols
s = top of stack
push A
push goto[s, ] A
else if action[s, token] = "accept " then
return
else error()
This takes k shifts, l reduces, and 1 accept, where k
is the length of the input string and l is the length
of the reverse rightmost derivation.
Note: Equivalent to Figure 4.30, Aho, Sethi, and Ullman
CPSC 434 Lecture 8, Page 15
Example tables
ACTION GOTO
id + * $ <expr> <term> <factor>
S0 s4 | | | 1 2 3
S1 | | | acc | | |
S2 | s5 | r3 | | |
S3 | r5 s6 r5 | | |
S4 | r6 r6 r6 | | |
S5 s4 | | | 7 2 3
S6 s4 | | | | 8 3
S7 | | | r2 | | |
S8 | r4 | r4 | | |
The Grammar
1 <goal> ::= <expr>
2 <expr> ::= <term> + <expr>
3 <term>
j
4 <term> ::= <factor> * <term>
5 <factor>
j
6 <factor> ::= id
CPSC 434 Lecture 8, Page 16