Chapter three
parsing/Syntax Analysis
 COMPILER DESIGN – PHASES OF COMPILER
The compilation process is a sequence of various phases. Each phase takes
input from its previous stage, has its own representation of source program,
and feeds its output to the next phase of the compiler. Let us understand the
phases of a compiler
Parser
The parser is the phase of the compiler which takes a
token string as input and with the help of existing
grammar, converts it into the corresponding Intermediate
Representation.
Comparison with Lexical Analysis
The role of the Parser
Cont….
 Error handling
Common programming errors
   Lexical errors
   parser errors/ Syntactic errors
   Semantic errors
Error handler goals
    Report the presence of errors clearly and accurately
    Recover from each error quickly enough to detect
       subsequent errors.
    Add minimal overhead to the processing of correct
       programs.
 Syntax error/parser error can be detected
  at this level if the input is not in
  accordance with the grammar.
   Parsing or syntactic analysis is the process of analyzing a
    string of symbols, either in natural language or in
    computer languages, conforming to the rules of a formal
    grammar
   It is aided by using techniques based on formal grammar
    of the programming language
   It is analyze (a string or text) into logical syntactic
    components, typically in order to test conformability to a
    logical grammar.
   It takes the token produced by lexical analysis as input
    and generates a parse tree (or syntax tree). In this phase,
    token arrangements are checked against the source code
    grammar, i.e., the parser checks if the expression made by
    the tokens is syntactically correct.
   If the lexical analyzer finds a token invalid, it
    generates an error.
   The lexical analyzer works closely with the syntax
    analyzer. It reads character streams from the source
    code, checks for legal tokens, and passes the data to
    the syntax analyzer when it demands.
   The parser analyzes the source code token stream
    against the production rules to detect any errors in
    the code. The output of this phase is a parse tree.
       This way, the parser accomplishes two tasks, i.e., parsing the
       code, looking for errors and generating a parse tree as the output
       of the phase.
Types of Parser:
Types of Parser:
The parser is mainly classified into two categories, i.e.
Top-down Parser, and Bottom-up Parser. These are
explained below:
1- Top-Down Parser: The top-down parser is the
parser that generates parse for the given input string
with the help of grammar productions by expanding the
non-terminals i.e. it starts from the start symbol and
ends on the terminals. It uses left most derivation.
Further Top-down parser is classified into 2 types:
Recursive descent parser, and Non-recursive descent
parser.
Leftmost Derivation (LMD): If the sentential
form of an input is scanned and replaced from left
to right, it is called left-most derivation.
• The sentential form derived by the left-most
derivation is called the left-sentential form.
Example: Consider the G,
E → E + E | E * E | (E ) | - E | id
Derive the string id + id * id using leftmost
derivation
Recursive descent parser is also known as the
Brute force parser or the backtracking parser.
 It basically generates the parse tree by using
   brute force and backtracking.
Non-recursive descent parser is also known as
LL(1) parser or predictive parser or without
backtracking parser or dynamic parser.
 It uses a parsing table to generate the parse
   tree instead of backtracking.
2- Bottom-up Parser: Bottom-up Parser is the parser that
generates the parse tree for the given input string with the help of
grammar productions by compressing the nonterminal i.e. it starts
from non-terminals and ends on the start symbol.
It uses the reverse of the rightmost derivation. Further Bottom-
up parser is classified into two types: LR parser, and Operator
precedence parser, Shift Reduce Parsers
LR parser is the bottom-up parser that generates the parse tree
for the given string by using unambiguous grammar. It follows the
reverse of the rightmost derivation.
 types of Bottom-up Parser
 LR parser ( a- LR(0) b- SLR(1) c-LALR(1)
d-CLR(1))
 Operator precedence parser.
 Shift Reduce Parsers
LR parser is the bottom-up parser that generates the parse tree for the given
string by using unambiguous grammar. It follows the reverse of the
rightmost derivation.
Why LR parsing:
 • LR parsers can be constructed to recognize virtually all programming-
language constructs for which context-free grammars can be written.
• The LR parsing method is the most general non-backtracking shift-reduce
parsing method known, yet it can be implemented as efficiently as other shift-
reduce methods.
• The class of grammars that can be parsed using LR methods is a proper
subset of the class of grammars that can be parsed with predictive parsers.
• An LR parser can detect a syntactic error as soon as it is possible to do so
on a left-toright scan of the input.
 • The disadvantage is that it takes too much work to construct an LR parser
by hand for a typical programming-language grammar. But there are lots of
LR parser generators available to make this task easy.
Cont…
LR(0) parser
Also know as LR(k) parsing where "L" stands for left most derivation
scanning of the input. "R" stands for the construction of the right most
derivation in reverse. "K" stands for the number of input symbols of the look
ahead used to make number of parsing decisions.
This is non-recursive shift-reduce, bottom-up parsing.
It is used to parse a large class of grammars therefore making it the most
efficient syntax analysis technique.
SLR(1) (Simple LR Parsing)
It is similar to LR(0) parser but works on a reduced entry. It has few number
of states hence very small table.
 It is has simple and fast construction
Cont….
LALR(1) Parsing(Look-Ahead LR Parsing)
It attempts to reduce the number of states in LR parser by merging similar
states. This reduces the number of states to the same as SLR but retains
some power of LR look- aheads. It is the Look ahead LR parser. Works on
an intermediate size of grammar. It can has the same number of states as
the SLR(1).
CLR(1)(Canonical Look-Ahead LR Parsing)
 Parsing Refers to canonical look ahead. It uses the canonical collection of
LR(1) to construct its parsing table. It has more states compared to SLR
parsing. In this type of parsing the reduced node is replaced only in the look
ahead symbols
Operator precedence parser generates the parse tree form given grammar
and string but the only condition is two consecutive non-terminals and
epsilon never appear on the left and right-hand side of any production.
It is a kind of shift-reduce parsing applied to a small class of operator
grammars. A grammar is said to be operator precedence if;
   No two terminals are adjacent at the right side of production.
   Production does not contain E on its right side.
Operator precedence relations.
p < q --> p has less precedence than q
p > q --> p has more precedence than q
p = q --> p has equal precedence than q Limitations.
    Applies only to small class of grammars.
    An operator like minus can be unary or binary and therefore will have
different precedence in different statements.
Rightmost Derivation (RMD): If we scan and replace the input
with production rules, from right to left, it is known as right-most
derivation.
• The sentential form derived from the right-most derivation is
called the right sentential form.
Example 1: Consider the G, E → E + E | E * E | (E ) | - E | id
Derive the string id + id * id using rightmost derivation.
Cont….
 In bottom-up parsing, Constructing a parse tree for
  an input string beginning at the leaves and going
  towards the root. Bottom up parsers start from the
  sequence of terminal symbols and work their way
  back up to the start symbol by repeatedly replacing
  grammar rules' right hand sides by the
  corresponding non-terminal. This is the reverse of
  the derivation process, and is called "reduction". A
  general type of bottom-up parser is a shift-reduce
  parser.
Cont……
Shift Reduce Parsers
Shift-reduce parsing is a type of bottom -up parsing
that attempts to construct a parse tree for an input
string beginning at the leaves (the bottom) and
working up towards the root (the top).
 parser is sometimes called as Syntax
  Analyze . It constructs the parse tree. It takes
  all the tokens one by one and uses Context
  Free Grammar to construct the parse tree.
Why Grammar ?
 The rules of programming can be entirely
  represented in some few productions. Using
  these productions we can represent what the
  program actually is. The input has to be
  checked whether it is in the desired format or
  not.
Context-Free Grammars
Syntax analysis or parsing is the second phase of a compiler. In this chapter,
we shall learn the basic concepts used in the construction of a parser. We
have seen that a lexical analyzer can identify tokens with the help of regular
expressions and pattern rules. But a lexical analyzer cannot check the syntax
of a given sentence due to the limitations of the regular expressions. Regular
expressions cannot check balancing tokens, such as parenthesis. Therefore,
this phase uses context-free grammar CFG, which is recognized by
pushdown automata.
It implies that every Regular Grammar is also context-free, but there exists some
problems, which are beyond the scope of Regular Grammar. CFG is a helpful tool in
describing the syntax of programming languages.
Cont…..
A grammar is a set of rules for putting strings together and so
corresponds to a language.
A context-free grammar is a quadruple (V, Z, P, S) where:
V is a set of variables (also called nonterminals), one of which is
  designated the start variable; It is customary to use upper-case
                                                 letters for variables;
Z (the alphabet) is a finite set of terminal symbols.
P is a finite set of rules( like production rules) ( A X). Where x
is string of variables and terminals S is a distinguished element
of V called the start symbol. The sets V and Z are assumed to
be disjoint.
Cont…..
Context-Free Languages
A language L is context-free IF AND ONLY IF there is a grammar G with L=L(G)
Classes of Grammars
In 1959 Noam Chomsky, a linguist, suggested a way of classifying grammars
according to complexity . The convention used below, and in the remaining chapters,
is that the term string includes the null string and that, in referring to grammars, the
following symbols will have particular meanings
 Generally, A context-free grammar (CFG) consists of a
set of productions that you use to replace a variable by
a string of variables and terminals. The language of a
grammar is the set of strings it generates. A language
is context-free if there is a CFG for it.
        Context free grammars
Example:
Derivation Order
Derivation example
Derivation Trees /parser tree
Derivation Trees /parser tree
Derivation Trees /parser tree
Derivation Trees /parser tree
Derivation Trees /parser tree
In a parse tree:
All leaf nodes are terminals.
All interior nodes are non-terminals.
In-order traversal gives original input string. A parse tree depicts associativity and
precedence of operators. The deepest sub-tree is traversed first, therefore the
operator in that sub-tree gets precedence over the operator which is in the parent
nodes
Ambiguity:
A grammar that produces more than one parse for some sentence is said to be
ambiguous grammar. i.e. An ambiguous grammar is one that produce more than one
leftmost or more than one rightmost derivation for the same sentence.
Ambiguous and un ambiguous grammar
thank you