LR PARSERS
This section presents an efficient, bottom-up syntax analysis technique that
can be used to parse a large class of context-free grammars, The technique is
callcd LRtk) parsing; the "L" is for left-to-right scanning of the input, the
"R" for constructing a rightmost derivahn in reverse, and the k for the
number of input symbds of lookahead that arc used in making parsing deci-
sions. When (k) is omitted, k is assumed to be 1. LR 'parsing is attractive for
a variety of reasons.
LR parsers can be constructed to recognize virtually all programming-
language construm for which context-free grammars can be wrilten.
The LR parsing method is the most gcneral nonbxktracking shi ft-reduce
parsing methud known, yct it can be implemented as efficiently as other
shift-reduce methds.
The class of grammars that can be parsed using LR methods is a proper
superset of the class of grammars that can be p.~rsed with predidive
parsers. J
An LR parser can detect a syntactic error as soon as it is p.sib\c tu ciu rn
on a kft-to-right scan of the input.
The principal drawback of the merhod is that it is too much work to con-.
struct an LR parser by Rand for a typical programming-language grammar.
One needs a specialized tool - an LR parser generator. Fortunately. many
such generators are available. and we shall discuss the design and use of one.
Yacc, in Section 4.9. Wiih such a generator, one can write a context-free
grammar and have the generator automatically produce a parser for that
grammar. If the grammar contains ambiguities or other constructs that are
difficult to parse in a left-to-right scan of the input, then the parser generator
can locate thcw constructs and inform the compiler designer of their presence.
Aftcr discussing thc operation of an LR parser, we present three techniques
for constructing an LR parsing table for a grammar. The first method, called
simple LR (SLR for short), is the easiest tu implement, but the least powerful
of the three. It may fail to produce a parsing table for certain grammars on
which the other methds succeed. The second method, called canonical LR,
is the most powerful, and the most expensive. The third method, called Id-
ahead LR ILALR for short ), is intermediate in power and cost between the
uthcr two. The LALR method will work on most programming-language
grammars and, with some.eff~rt, can be implemented efficiently+ Some tech-
niques for compressing the size of an LR parsing table arc considered later in
this scction.
The LR Paning Al~rithm
The whematic form of an LR parw is shown in Fig+ 4.29. It consists of an
input, an output, a stack, a driver program, and a parsing tabk that has two
paris (artion and goto). The driver program is thc same for all LR pnrsers;
only the parsing table changes from one parser to another. The parsing pro-
gram reads characters from an input buffer one at a time. The program uses
a sqack to store a string of the form .r,,X ,a ,Xzs2 . + + X,S,~, where s, is on
top. Each Xi is a grammar symbol and each si is a symbol called a stu~c.,
Each state symbol summarizcs the information contained in the stack below it,
and the mmbina~ion of the state symbt an top of the stack and the current
input symbol art used to index the parsing table and determine the shift-
reduce parsing decision. In an implernentrrthn, the grammar symbols need
not appear on the stack; however. we shall always include them in our discus-
sions to help explain the khavior of an LR parser.
Thc parsing table consists of two perk, a parsing uction function ~~~firvr and
a goto function ptn. The program driving the LR parser behaves as fdlows,
It determines s,, the stale currently on top uf the stack, and u;, the current
input symbol. It then wnsults ucrionls,, ail, the parsing action table enlry For
state s,,, and input dl,, which can have one of four values:
I. shift s, where s is a state,
2. reduce by a grammar prduction A - p.
3, accept + and
4. error.
Canonical Collection of LR(0) items
An LR (0) item is a production G with dot at some position on the right side of the
production.
LR(0) items is useful to indicate that how much of the input has been scanned up to a
given point in the process of parsing.
In the LR (0), we place the reduce node in the entire row.
Example
Given grammar:
1. S → AA
2. A → aA | b
Add Augment Production and insert '•' symbol at the first position for every production in
G
1. S` → •S
2. S → •AA
3. A → •aA
4. A → •b
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •S
S → •AA
Add all productions starting with "A" in modified I0 State because "•" is followed by the
non-terminal. So, the I0 State becomes.
I0= S` → •S
S → •AA
A → •aA
A → •b
I1= Go to (I0, S) = closure (S` → S•) = S` → S•
Here, the Production is reduced so close the State.
I1= S` → S•
I2= Go to (I0, A) = closure (S → A•A)
Add all productions starting with A in to I2 State because "•" is followed by the non-
terminal. So, the I2 State becomes
I2 =S→A•A
A → •aA
A → •b
Go to (I2,a) = Closure (A → a•A) = (same as I3)
Go to (I2, b) = Closure (A → b•) = (same as I4)
I3= Go to (I0,a) = Closure (A → a•A)
Add productions starting with A in I3.
A → a•A
A → •aA
A → •b
Go to (I3, a) = Closure (A → a•A) = (same as I3)
Go to (I3, b) = Closure (A → b•) = (same as I4)
I4= Go to (I0, b) = closure (A → b•) = A → b•
I5= Go to (I2, A) = Closure (S → AA•) = SA → A•
I6= Go to (I3, A) = Closure (A → aA•) = A → aA•
Drawing DFA:
The DFA contains the 7 states I0 to I6.
LR(0) Table
o If a state is going to some other state on a terminal then it correspond to a shift
move.
o If a state is going to some other state on a variable then it correspond to go to
move.
o If a state contain the final item in the particular row then write the reduce node
completely.
Explanation:
o I0 on S is going to I1 so write it as 1.
o I0 on A is going to I2 so write it as 2.
o I2 on A is going to I5 so write it as 5.
o I3 on A is going to I6 so write it as 6.
o I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
o I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.
o I4, I5 and I6 all states contains the final item because they contain • in the right
most end. So rate the production as production number.