Compiler Design 3
Compiler Design 3
5. Semantic analysis
                                              2
              Syntax Analysis & Parsing
          Just to recapitulate the definitions:
1. Syn-tax
   The way in which words are stringed together to form
   phrases, clauses and sentences
2. Syntax analysis:
   The task concerned with fitting a sequence of
   tokens into a specified syntax.
3. Parsing:
   To break a sentence down into its component parts of
   speech with
            . an explanation of the form, function, and
   syntactical relationship of each part
                                                          3
                      Syntax Analysis
 The step two of Front end in Compilation
 Every programming Language has RULES that prescribe
  the syntactic structure (i.e. the way in which words and
  phrases can be organized to form sentences.)
 These rules can be formally specified using :
      CONTEXT FREE GRAMMARS
          ( or GRAMMAR for Short)
                           A.k.a
             BACKUS – NAUR FORM ( BNF)
                                                        4
      Intro to Context Free Grammar (CFG)
 Essential formalism for describing the structure of the
  programs in a programming Language
 A recipe for constructing elements of a set of strings of
  symbols ( or tokens)
 Consists of a set production rules and a start symbol
 Each production rule:
   – Defines a named syntactic construct using Two parts
       1. An LHS (with a syntactic Construct) and
       2. An RHS (showing the possible form of the construct)
   – Connected by an Arrow symbol ( meaning “ is defined as” )
        sentence     → subject predicate ‘.’
                                                                5
             Importance of the Grammar
 Grammar offers significant advantages to both
   language Designers and Compiler writers:
1. Gives precise, yet easy-to-understand, syntactic
   specification of Programming Languages
2. Helps construct efficient parsers automatically ( not in
   all cases – of course)
3. Provides a good structure to the language which in
   turn helps generate correct object code
4. Makes language open ended easily amenable to add
   additional constructs at a later date
                                                          6
      Context Free Grammar – Formal Definition
A context-free grammar G = (T, N, S, P) consists of:
1. T, a set of terminals (scanner tokens - symbols that
   may not appear on the left side of rule,).
2. N, a set of nonterminals (syntactic variables generated
   by productions – symbols on left or right of rule).
3. S, a designated start nonterminal.
4. P, a set of productions (Rules). Each production has
   the form, A::=  , where A is a nonterminal and  is a
   sentential form , i.e., a string of zero or more grammar
   symbols (terminals / nonterminals)
                                                              7
                    CFG – An Example
 A grammar derives strings by
1. Beginning with the Start Symbol and
2. Repeatedly replacing a nonterminal by the right side
   of the production rule for that nonterminal
Example:
 expr  expr op expr  The terminal symbols are
 expr  ( expr )                id, num, (, ), +, –, *, /, 
 expr  – expr                 The non-terminal symbols
 expr  id | num                are expr op
 op  [ +, –, *, /,  ]  The start symbol is expr
                                                            8
                 CFG – Another Example
                                                              9
                    Notational Conventions
1. These Symbols are terminals
    1.   Lower case letters early in the alphabet ( like a, b, c)
    2.   Operator symbols such as +, - etc..
    3.   Punctuation symbols like ,(comma), (, ), etc
    4.   The digits 0, 1, ………..9
    5.   Boldface strings such as if, id etc…
2. These Symbols are nonterminals
    1. Uppercase letters early in the alphabet ( like A, B, C)
    2. The letter S when it appears, is usually the Start Symbol
3. Lower case letters late in the alphabet chiefly u, v, ….
   ……. z represent string of terminals
                                             ………. Contd.    10
                 Notational Conventions
4. Uppercase letters late in the alphabet such as X, Y, Z
   represent grammar symbols that is either terminal or
   nonterminal
5. Lower case Greek letters α, β, γ represent strings of
   grammar symbols
6. If A → α1 ,, A → α2 …….. A → αn are all Productions
   with A on the LHS ( known as A-productions), then, A
   → α1 | α2 …| αn (α1 , α2 … αn are alternatives of A)
7. Unless otherwise stated , the LHS of the first production
   is the start nonterminal s
                                                            11
     Use of Conventional Notation – Example
 Remember the previous Example:
               expr  expr op expr
               expr  ( expr )
               expr  - expr
               expr  id | num
               op  [ +, -, *, /,  ]
 Using the notations described so far, this grammar can
  be written as:
               E → E A E | ( E ) | id | num
               A→+|–|*|↑
                                                      12
  Grammar for a Tiny Language
 program ::= statement ; Statement
 statement ::= assignStmt | ifStmt
 assignStmt ::= id = expr ;
 ifStmt ::= if ( expr ) stmt
 expr ::= id | int | expr + expr
 Id ::= a | b | c | i | j | k | n | x | y | z
 int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
                                                  13
      Regular Expressions and Grammars
 EVERY construct that can be described by an RE can
   also be described by a grammar
 Example : RE ( a |b )* abb can be described as:
A0 → aA0 | bA0 | aA1 A1 → bA2 A2 → bA3 A3 → ε
 Then why use RE for Lex & CFG for syntax analysis ?
   – Lexical rules are frequently quite simple & we do not need
     powerful grammars to describe them
   – RE’s are more concise and easy to understand for tokens
   – RE’s provide for automatic construction of efficient Lexical
     Analyzers
                                                                    14
                           Derivation
 A way of showing how an input sentence is recognized
  with a grammar                 Now let us derive a structure
                                       ctr + ( a – 5) / 2
 Can also be thought of as a type of algebraic substitution
  using the grammar                         expr
 Remember the grammar:                 expr op expr
                                        expr + expr
1.       expr  expr op expr        Expr + expr op exr
2.       expr  ( expr )             expr + expr / expr
3.       expr  - expr               expr + expr / num
4.       expr  id | num            expr + (expr) / num
                                expr + (expr – expr) / num
5.       op  [ +, -, *, /,  ]
                                id = id + ( id – num) / num 15
           Derivations – Some definitions
 Uses productions to derive more complex Expressions
   The symbol:
   –     >       denotes “ derives in one step”.
   – *   > denotes “ derives in zero or more steps”( α *   >β )
   – +     denotes “ derives in one or more steps” ( α +   >β )
         >
 Given the grammar G with start symbol S, a string ω is
  part of the language L (G), If and only if S * > ω
 Further, if S * > α and α contains
   – Only terminals, then it is a – Sentence of G
   – nonterminals too, then it is a – Sentential form of G
                                                            16
              Derivation – Another Example
  Consider the following Grammar a simple ; terminated
   expression consisting of interspersed numbers and +
1: stmt → expr ‘;’             Stmt    > expr ;
2: expr → factor ‘ +’ factor
                                       > factor + factor ;
3:           | factor                  > Number + factor ;
4: factor → number
 Let us try and come up with          > Number + Number ;
  a derivation to prove that an  The process is called a
  input sentence (like 1 + 4 ;)   Leftmost Derivation because
  can be recognized from this     the left most nonterminal is
  grammar                         always replaced
                                                            17
                    Leftmost Derivation
 Start with the topmost symbol in the grammar
 Replace the leftmost nonterminal in the partially parsed
  sentence with the equivalent production’s RHS in each
  step
 The operator L > signifies left derivation
       L
 X      > Y means X derives Y by leftmost Derivation
 Each of the step (Partial Derivation) is called Viable
  Prefix
 Portion of the viable prefix that is replaced at each step is
  called the Handle
                                                            18
Leftmost Derivation Example
                              19
Leftmost Derivation Example
                              20
                    Rightmost Derivation
 Replace the rightmost nonterminal in the partially parsed
                R
   sentence         >   may signify Right Derivation
 The operator
      R
X        > Y means X derives Y by rightmost Derivation
 It is possible to start with the start symbol and do
   rightmost derivation
                                                          21
Rightmost Derivation – Another Example
                                     22
Rightmost Derivation – Another Example
                                     23
                            Parse Tree
 WHAT ?
  A graphical representation for a derivation sequence showing
  how to derive the string of a language from grammar starting
  from Start symbol
 WHY ?
   Each stage of the compiler has two purposes:
    – Detect and filter out some class of errors
    – Compute some new information or translate the
      representation of the program to make things easier for later
      stages
   Recursive structure of tree suits recursive structure of
   language definition
                                                                 25
                     Parse Tree (Contd.)
 HOW ?
  Tree nodes represent symbols of the grammar (nonterminals
  or terminals) and tree edges represent derivation steps
 Example:
  Given the production:                           expr
  expr  expr op expr
 The corresponding Parse tree is :
 It’s Properties are:
                                             expr      op   expr
   1. The root is labeled by a start symbol
   2. Each leaf is labeled by a token or by ε
   3. Each interior node is labeled by a nonterminal
                                                              26
                       Parse Tree Example
 Remember the grammar:                  The corresponding
                                            Parse tree is
       expr  expr
        expr →  expr  op expr
                   + expr
                                                expr
      expr
    expr     +( expr
         → expr  expr/ )expr
      expr
   expr     + -(expr)
        → expr   expr / expr            expr     +     expr
       expr 
expr →expr       id,–num
            + (expr   expr) / expr      Id     (expr ) /      Expr
       opid                           (ctr)
    id →   + ( id[ –+,num)
                       -, *, /,/ num
                                  ]                          num
                                           expr – expr
 And the String derived from it                              (2)
      ctr + ( a – 5) / 2                   Id          num
                                          (a)           (5)
 Using the derivation
                                                               27
          Parse Tree Example (Continued)
 the Yield of the tree
                                               expr (ctr)
   – The string derived or
     generated from the
                                       expr     +     expr
     nonterminal at the root of the
     tree                              Id     (expr ) /      Expr
   – Obtained by reading the          (ctr)
     leaves of the parse tree read                           num
                                          expr – expr
     from left to right                                      (2)
 In this Example                         Id          num
                                         (a)           (5)
 ctr = ctr + (a – 5) / 2
                                                              28
Parse Tree – Another Example
                               29
                  Another Derivation Example
Given the grammar : E  E + E | E  E | ( E ) | - E | num
      Find a derivation for the expression: 6 + 2  4
   E                E               E               E
              E     +   E       E   +   E       E   +   E
6 + (2 * 4 ) = 14                   E      E   6   E      E
E  E E  E E  E
(6 + 2) * 4 = 32 E + E E + E 4
                                                6       2
According to above grammar BOTH are                             30
                   Another Derivation Example                                   t
                                                                               u
         Given the grammar : E  E + E | E  E | ( Ey )in|p- E | id
                                                                        an
           Find a derivation for the expression: e6fo+r 2  4
                                                                t r e
     E                   E                        E
                                                          r s e           E
                                                        pa r.
                    E +         E           E +      Ee
                                                     n      m  a       E +   E
                                                  n o am
6 + (2 * 4 ) = 14                             haE sgr E
                                           e t                         6 E  E
                                         or uou
            Which derivation         s m tree
                                            b ig is                       2       4
                                u c e am
     E                   E d
                             o         correct?
                                       n          E
                         p  r e      a                                      E
                     a t       o b
                r thE    
                           d  t E           E  E                        E  E
            m a       s a i
                    s
 (6 +gra2)m * 4e=i 32                   E + E                         E + E 4
  A tenc
     s en                                                             6     2
According       to above grammar BOTH are                                           31
                 Ambiguity in Grammar
 A grammar that produces more than one parse tree for
                   any input sentence
   – Happens because the derivation can be either
              Leftmost or Rightmost
 Can also happen because of the way the grammar is
  defined
 Classic example — the if-then-else problem
   – Stmt → if Expr then Stmt
   –                 | if Expr then Stmt else Stmt
   –                 | … other stmts …
                                                      32
                             Stmt → If Expr then Stmt
      Ambiguous IF                 | if Expr then Stmt else Stmt
                                   | … other stmts
 This sentential form has two derivations
 if Expr1 then if Expr2 then Stmt1 else Stmt2
                                                             33
                             Stmt → If Expr then Stmt
      Ambiguous IF                     | if Expr then Stmt else Stmt
                                       | … other stmts
 This sentential form has two derivations
 if Expr1 then if Expr2 then Stmt1 else Stmt2
                                                                34
 Removing the ambiguity
1. Add precedence ; for Example
              This grammar is slightly larger
              Takes more rewriting to reach
               some of the terminal symbols
              Encodes expected precedence
              Produces same parse tree
              under leftmost & rightmost
               derivations
              Let’s see how it parses x - 2 * y
                                             35
Removing the ambiguity (Contd.)
                                  36
         Removing the ambiguity (Contd.)
                                   me    s
                                                                           s a ode
                                                                       t he enc
                                                                 i v e itly
                                                             s g l ic
                                                        i on        e x p
                                                     a t        d
                                                r iv an
                                             d e t ly
                                         s t ec                   e
                                     m o       d i r          n c
                                   t                        e
                            ir gh mar eced
                          nd am pr
                      t a       g r ed
                    os he              s i r
                f tm et             d e
             le       u s he
         t he eca             t
   o th n, b
 B sio
     re s
ex p
                                                                                     37
          Removing the ambiguity (Contd.)
2. Rewrite the grammar to avoid generating the problem
3. Example      The case of If then if then else :
Match each else to innermost unmatched if (common sense rule)
                                                            38
Removing the ambiguity (Contd.)
                                  40
           Removing the ambiguity (Contd.)
                                        ng                                              rt i
                                                                                 s   ta
                                                                             n s
                                                                      c t i o
                                                                   d u g.
                                                               pro torin
                                                       i pl e     fa c
                                                   ul t e ft
                                                 m        d  l
                                            a te alle
                                      m  i n      c
                                 e  i
                                   l ni         s
                            r to        o k e
                        m a       e   t
                  ra  m sa      m
               a g he
             g       h t
    r i t i n    wi t
   w
Re
                                                                                               41
                 Ambiguity - A Summary
 Ambiguity arises from two distinct sources
    – Confusion in the context-free syntax (if-then-else)
    – Confusion that requires context to resolve (overloading)
 Resolving ambiguity
    – To remove context-free ambiguity, rewrite the grammar
    – To handle context-sensitive ambiguity use precedence
         This is a language design problem
 Sometimes, the compiler writer accepts an ambiguous
  grammar
 Parsing techniques must “do the right thing” (i.e.,
  always select the same derivation)
                                                                 42
                  Recursion In Grammars
 Very Common in grammars
 Generally 3 types of recursions are Possible:
    1. A ::= Ab (Left Recursion)
    2. B ::= cB (Right Recursion)
    3. C ::= dCf (Middle Recursion or Self-embedding)
 Facts:
 If a grammar contains no middle recursion then the
  language it generates is regular.
 If there is no recursion at all in a grammar, then it is
  finite and therefore regular.
                                                             43
                      Left Recursion
 Consider E  E + T | T A top-down parser might loop
 the      T  T  F | F forever when parsing an
 grammar: F  ( E ) | id expression using this grammar
  E        E                 E                 E
        E + T            E + T             E + T
                     E   +   T         E   +   T
                                   E   +   T
 A grammar that has at least one production of the form
          A  A is a left recursive grammar
Top Down parsing cannot handle left-recursive grammars
                                                         44
                Eliminating left Recursion
 First group the A productions as        A A1 | A2| …..|
  Am|1 |2 |.. n |
  Where no i begins with an A.
 Then replace the A-productions by A 1 A’|2 A’|……
  n A’|
     A’ 1 A’| 2 A’|… n A’| 
 This eliminates all immediate left recursions from A and A’
  productions
                              BUT
 It does not eliminate left recursions involving derivations of
  two or more steps
                                                             45
                Left Recursion ( Contd.)
 Left-recursion can often be eliminated by rewriting the
  grammar
                                  EE+T|T
   This left-recursive grammar: T  T  F | F
                                  F  ( E ) | id
Can be re-written to eliminate the immediate left recursion:
                      E  TE’
                      E’  +TE’ | 
                      T  FT’
                      T’  FT’ | 
                      F  ( E ) | id                      46
       Properties of Grammars – A Summary
 Epsilon Productions
   – Has at least one production of the form “E → ε ”,
 Right( / Left) Linear Grammar:
   – Grammars where all the RHS of all productions has at most
     one non-terminal and that nonterminal appears rightmost
     (/ Leftmost.)
   – E → x E” or “E → x”.
 Left-Recursive Grammar:
    – A grammar that can generate “E → E + X” (for example).
 Similarly, “Right-recursive grammars.”
 Ambiguous Grammar
    – More than one parse tree is possible for a specific sentence47.
                  Limitations of CFG
 CFGs cannot express context conditions
 For Example:
1. Every name must be declared before it is used
2. The operands of an expression must have
   compatible types
 Possible solutions
 Use context-sensitive grammars
   – Too complicated
 Check context conditions during semantic analysis
                                                   48
                     Syntax Analysis
                  Problem Statement:
 To find a derivation sequence in a grammar G for the
  input token stream
                            or
 To conclude that none exists
                         Solution
 Build parse tree for the string ( of tokens) based on
  grammar rules
 The process is known as P A R S I N G
                                                          49
                            PARSING
    Constructs the syntax tree for a given sequence of
    tokens using grammar rules and productions such
    that:
    1. The top node is labeled with the start symbol of the
       grammar
    2. Leaf nodes are labeled with terminals and inner nodes
       with nonterminals
    3. the children of the inner node labeled N correspond to the
       members of an alternative of N, in the same order as they
       occur in that alternative
    4. The terminals labeling the leaf nodes correspond to the
       sequence of tokens, in the same order they occur in the
       input
                                                ………. Contd.
                                                         50
                    The Role of a Parser
         Lexical                                     Rest of the
         analysis
                               PARSER
                                                      front end
                        Symbol Table
 The parser continues the process of translation and
  validation started by the Lexical Analyzer:
    1. Analyzing the phrase structure of the program,
    2. Adding hierarchical (tree) structure to the flat stream of
       tokens produced by the lexical Analyzer,
    3. Outputting a data structure conveying the program's
       syntax to subsequent compile phases, AND
    4. Reporting errors when the program does not match the
       syntactic structure of the source language
                                                                    51
                       Parsing Methods
 Three Types of parsing methods in Vogue
1. Universal Parsing method
   – Too inefficient to use in any practical compilers ( hence not
     discussed any further)
   – EX: Coocke–Younger–Kasami (CYK) algorithm
2. Top – Down Parsing
   – Can be generated automatically or written manually
   – Ex: : Left-to-Right –Top –Down Parser ( A.k.a. LL parser)
3. Bottom – Up parsing
   – Can only be generated automatically
   – Ex : Left-to-Right –Bottom-Up Parser ( A.k.a. LR Parser)
                                                                 52
       Top – Down & Bottom – Up Methods
 While they are efficient;
 They can work only on restricted classes of Grammars,
              ( such as LL and LR Grammars )
                            BUT
 They are expressive enough to describe most syntactic
  constructs in Programming Languages.
                             SO
 Care must be taken while defining the grammar for the
  language to make sure that the grammar is
                       unambiguous
                                                          53
                     TOP – DOWN Parsing
 An attempt to construct a parse tree for an input string
  from the root and creating the nodes in PRE-ORDER (i.e.
  the top of the tree is constructed before its nodes)
 Can also be viewed as an attempt to find a leftmost
  derivation for an input string
 The way it works:
  1. The process starts at root node say N
  2. Repeat until the fringe of the parse tree matches the input string
      1. Determine the correct alternative for N The key Step
     2. Parser then proceeds to construct the left child of N
     3. The process of determining the correct alternative for the leftmost
        child continues till the left most child is a terminal
                                                                              54
                Top Down parsing example
 String to be parsed :array [ num dotdot num] of integer
                                     type
                             Predictive Parsing                             E
STACK:   E
         T
         E’
         $                       Program                               T        E’
         $
                 INPUT:       id + id  id $                           OUTPUT:
                                                                            E
   STACK:                                                            T                  E’
                             Predictive Parsing
     TE                          Program
      $
     E’
      $
                    NON-                          INPUT SYMBOL
              NON-                      INPUT SYMBOL
                  TERMINAL
            TERMINAL    id      id +        +*         (*      ) (    $    )      $
PARSING        E     E  TE’ E  TE’
                     E                             E  TE’    E  TE’
               E’    E’       E’  +TE’E’  +TE’            E’   E’ E’   E’  
 TABLE:        T     T
                     T  FT’ T  FT’               T  FT’     T  FT’
               T’                T’    T’  *FT’          T’   T’  
                     T’                   T’     T’  *FT’            T’   T’  
               F      F  id                        F  (E)
                      F       F  id                         F  (E)
                                                                                             63
                       A Predictive Parser
   The Working of the Table driven Predictive Parser
            INPUT:       id + id  id $                             OUTPUT:
                                                                         E
   STACK:                                                       T                E’
                        Predictive Parsing
     FT                     Program                         F         T’
     T’
     E’
     E’
      $
      $        NON-                           INPUT SYMBOL
              NON-                       INPUT SYMBOL
             TERMINAL
            TERMINAL   id id       +   +      *   *    (    ( )        )$      $
PARSING        EE    EETE’TE’                          E  TE’
                                                    E  TE’
               E’ E’          E’ E’+TE’
                                      +TE’                 E’   E’E’ E’  
 TABLE:        TT    TTFT’ FT’                    T  FT’
                                                         T  FT’
               T’                T’     T’  *FT’         T’   T’  
                  T’                 T’     T’  *FT’            T’   T’  
               F     F  id                         F  (E)
                 F       F  id                           F  (E)
                                                                                      64
                      A Predictive Parser
 The Working of the Table driven Predictive Parser
Action when Top(Stack) = input  $ : Pop stack, advance input.
           INPUT:       id + id  id $                              OUTPUT:
                                                                          E
 STACK:                                                         T               E’
                      Predictive Parsing
   id
    F                     Program                           F        T’
   T’
   E’                                                      id
    $         NON-                           INPUT SYMBOL
             NON-                       INPUT SYMBOL
            TERMINAL
           TERMINAL   id id       +   +      *   *    (    ( )        )$      $
PARSING       EE    EETE’TE’                          E  TE’
                                                   E  TE’
              E’ E’          E’ E’+TE’
                                     +TE’                 E’   E’E’ E’  
 TABLE:       TT    TTFT’ FT’                    T  FT’
                                                        T  FT’
              T’                T’     T’  *FT’         T’   T’  
                 T’                 T’     T’  *FT’            T’   T’  
              F     F  id                         F  (E)
                F       F  id                           F  (E)
                                                                                     65
                     A Predictive Parser
 The Working of the Table driven Predictive Parser
          Action when Top(Stack) = ε : Pop stack
          INPUT:       id + id  id $                                   OUTPUT:
                                                                         E
STACK:                                                         T               E’
                     Predictive Parsing
  ε
  T’                     Program                           F        T’
  E’
  $                                                       id        ε
             NON-                           INPUT SYMBOL
            NON-                       INPUT SYMBOL
           TERMINAL
          TERMINAL   id id       +   +      *   *    (    ( )        )$      $
PARSING      EE    EETE’TE’                          E  TE’
                                                  E  TE’
             E’ E’          E’ E’+TE’
                                    +TE’                 E’   E’E’ E’  
 TABLE:      TT    TTFT’ FT’                    T  FT’
                                                       T  FT’
             T’                T’     T’  *FT’         T’   T’  
                T’                 T’     T’  *FT’            T’   T’  
             F     F  id                         F  (E)
               F       F  id                           F  (E)
                                                                                    66
                      A Predictive Parser
 The Working of the Table driven Predictive Parser
Action when Top(Stack) = input  $ : Pop stack, advance input.
           INPUT:       id + id  id $                              OUTPUT:
                                                                          E
 STACK:                                                         T                 E’
                      Predictive Parsing
   E’
   +                                                        F        T’       +   T E’
                          Program
   T$
   E’                                                      id         ε
   $          NON-                           INPUT SYMBOL
             NON-                       INPUT SYMBOL
            TERMINAL
           TERMINAL   id id       +   +      *   *    (    ( )        )$      $
PARSING       EE    EETE’TE’                          E  TE’
                                                   E  TE’
              E’ E’          E’ E’+TE’
                                     +TE’                 E’   E’E’ E’  
 TABLE:       TT    TTFT’ FT’                    T  FT’
                                                        T  FT’
              T’                T’     T’  *FT’         T’   T’  
                 T’                 T’     T’  *FT’            T’   T’  
              F     F  id                         F  (E)
                F       F  id                           F  (E)
                                                                                       67
                     A Predictive Parser
 The Working of the Table driven Predictive Parser
          INPUT:       id + id  id $                              OUTPUT:
                                                                         E
 STACK:                                                        T                 E’
                     Predictive Parsing
   T                                                       F        T’       +   T E’
   E’                    Program
   $                                                      id         ε
             NON-                           INPUT SYMBOL
            NON-                       INPUT SYMBOL
           TERMINAL
          TERMINAL   id id       +   +      *   *    (    ( )        )$      $
PARSING      EE    EETE’TE’                          E  TE’
                                                  E  TE’
             E’ E’          E’ E’+TE’
                                    +TE’                 E’   E’E’ E’  
 TABLE:      TT    TTFT’ FT’                    T  FT’
                                                       T  FT’
             T’                T’     T’  *FT’         T’   T’  
                T’                 T’     T’  *FT’            T’   T’  
             F     F  id                         F  (E)
               F       F  id                           F  (E)
                                                                                      68
                      A Predictive Parser
The predictive parser proceeds in this fashion emitting the
following productions:
         T  FT’                               E
                                      T                 E’
         F  id
                                  F       T’       +    T             E’
        T’   FT’
                                 id               F         T’
        F  id                                                        
                                                   id       F T’
        T’  
        E’  
                                                             id   
  When Top(Stack) = input = $ the parser halts & accepts the input
                     string – ( id + id * id )                  69
             Algorithm for Predictive parser
 The program execution is controlled by TWO inputs
    – The input symbol a and the symbol on the top of the stack X
 There are THREE possibilities for the parser
    – If X = a = $ : halt & announce the successful completion of
      parsing
    – If X = a ≠ $ : Pop X off the stack and advance the input
      pointer to next input symbol
    – If X is a nonterminal : Consult entry M [ X, a ] in the Parsing
      table M and replace X on top of the stack with the RHS of
      the production
         If M [ X, a ] is error call error routine
                                                                   70
                     Algorithm for Predictive parser
                                                                                     c e              io n
 The program execution is controlled by TWO                                   s i n    inputs  v a t
                                                                             r             r  i
   – The input symbol a and the symbol onathe                          rse top tof-dethe stack X
                                                                ) P            o  s           o n .
 There are THREE possibilitiesLfor                         ( 1         f  t m           c ti
                                                         L theleparser               t a
                                                  a s          s   a          n e  x
   – If X = a = $ : halt & announce           w n      the o e
                                                             successful i ts      completion of
                                         n  o         d  d            e
      parsing                        o k          a n          o  o s
                                a ls         h t,         c h
   – If X = a ≠ $ :r Pop      s
                             i X off    r igthe       t
                                                  stack o  and advance the input
                       r s e        t o         e ad
      pointerpto     a nextleinputt
                                 f symbol   a h
          h  is             m           o l
   – IfTX is a nonterminal
                       fr o        m  b : Consult entry M [ X, a ] in the Parsing
                  e s          s y
      tablea  r s            1
                 M andp replace         X on top of the stack with the RHS of
      It p            s  u
   1. the production
              l o o k
          It
      2. If M [ X, a ] is error call error routine
                                                                                                      71
                     FIRST & FOLLOW
 The two functions represent a set of terminal symbols,
  that aid us in constructing Predictive parser by helping
  us fill the Parsing Table with valid entries
 Set of terminal symbols yielded by FOLLOW function
  can also be used in error recovery
 FIRST – if α is the string of grammar symbols then
  FIRST (α) is the set of terminals that begin the strings
  derived from α (If α * > ε , then ε is also in FIRST (α))
 FOLLOW (A) – the set of terminals a that can appear
  immediately to the right of A such that there exists a
  derivation of the form S * > α A a β for some α and β
                                                         72
               FIRST & FOLLOW (Contd.)
 The set of terminal symbols ( including ε )that can appear
  at the far left of any parse tree derived from a particular
  nonterminal is the FIRST set of that nonterminal
 The set of terminal symbols ( including ε ) That can
  follow a nonterminal in some derivation or the other, is
  called the FOLLOW set of that nonterminal
 A set of rules are followed to compute the FIRST and
  FOLLOW set
 These sets will be used in creating Parsing table
                                                             73
              Rules to Create FIRST
                      FIRST rules:                             GRAMMAR:
1. If X is a terminal, FIRST(X) = {X}
2. If X   , then   FIRST(X)
                                                             E  TE’
3. If X  aABC , then a  FIRST(X)
                                                             E’  +TE’ | 
                                                             T  FT’
4. If X →ABCD, then FIRST (A)  FIRST (X).
4a. Further, If A   , in the above production, then        T’  FT’ | 
       FIRST (B)  FIRST (X)........ [& so on recursively]
                                                             F  ( E ) | id
                                                                    80
                          LL (1) Parsing
 Grammars whose predictive parsing tables contain no
  multiple entries are called LL(1).
 LL(1)
    ● Left-to-right parse of input string
    ● Leftmost derivation of grammar
    ● 1 symbol look ahead (we know only what the next token is)
 LL (k) : we know what the next k tokens are
 Every LL(1) grammar is an LL(2) grammar and so on
                                                             81
                    Bottom Up Parsing
 Constructs the nodes in the syntax tree in the post order
  i.e. bottom up by
  Beginning at the leaves and working up till the root
 Suitable for automatic parser generation,
 Less restrictive in the sense that it can postpone the
  decision of which production to use till it has seen all the
  input tokens
 Hence it can handle a larger class of grammars ( LR
  grammars ) than LL parsers
 But, Efficiency is low
                                                           82
               Working of Bottom Up Parsing
 Repeatedly matches a right-sentential form from the
   language against the tree’s upper frontier.
 At each match, it applies a reduction to build on the
   frontier:
 Each reduction matches an upper frontier of the
   partially built tree to the RHS of some production
 Each reduction adds a node on top of the frontier
 The final result is a rightmost derivation, in reverse.
                                                            83
                     Bottom Up Parsing
 The main task is to find the leftmost node that has not
  yet been constructed, BUT all of whose children have
  been constructed.
 Such a sequence of symbols whose top needs to be
  constructed is called – The Handle
 Creating a parent node N for the handle based on
  grammar rules is called Reducing the handle to N
 Hence the problem can be stated as :
    1. To find the proper handle that leads to the expression &
    2. To locate the right grammar to reduce the handle to a
       parent node
                                                                  84
                 Bottom Up Parsing        S  aABe
                                          A  Abc | b
                    Consider the Grammar: B  d
We want to parse the input string abbcde.
                  INPUT:
                a b b c d e $
Production
S  aABe
 A  Abc        Bottom-Up Parsing
  Ab                Program
  Bd
                                                  85
                    Bottom Up Parsing     S  aABe
                                          A  Abc | b
                    Consider the Grammar: B  d
We want to parse the input string abbcde.
                                                       OUTPUT:
                      INPUT:
                    a A
                      b b c d e $
Production
S  aABe                                                 A
 A  Abc             Bottom-Up Parsing
                                                   A b c
  Ab                     Program
  Bd                                              b
                                                                    90
                             Conflict Situations                                     g
                                                                                  in
 Two possible conflict Scenarios                                             ars
                                                                           f P
                                                                        d o
1. Shift – Reduce Conflict:                                         K in
    – One rule implies shift while another implies reduce       this ( for E.g.)
                                                            for
    – Grammar Productions : S → if E then aSbl|eif E then S else S
                                                :
                                              N ns    u it
    – On stack: if E then S and TInput:     IO else u
                                      L U      a  e
                                                 r reduce by first rule?
    – Should shift trying for S2nd  O ars rule or
2. Reduce – Reduce Conflict             m m
                                   g ra
    – Can have two grammar     u ch rules with same RHS ( For E.g.)
                            s s
    – Grammar Productionsr a        : A → α and B → α
                      ma
    – On stack:     m
                  ra α
               e g
    – Which t h  Production to choose for Reduce A → α and B → α?
          e
       ang
    Ch                                                                           91
             Operator Precedence Parsing
 A simplified and efficient version of bottom up parsing
  technique that can be applied only to a small but
  important class of grammars – Operator Grammars
 Operator Grammars Require:
   – No right hand side of rule is empty ( ε )
    – No right hand side has 2 adjacent nonterminals
 Following Grammars represent same language;
 E  E + E mar                             EEAE
        | EGr–amE
        r E*E                                   |(E)
      to|
   era                                          | id
 Op | ( E )
        | id                              A|+|–|*|         92
            Operator Precedence Parsing
 Works on the basis of Operator Precedence:
 Uses three precedence relations
    – a <. b, a yields in precedence to b
    – a .> b, a takes precedence over b
    – a .= b, a has same precedence as b and
 Find handle as <. .> pattern at top of stack;
 Drawbacks
    – Small class of grammars qualify
    – Overloaded operators are hard (unary minus)
    – Parser correctness hard to prove
                                                    93
                        LR PARSERS
 An efficient Bottom Up parsing technique that can be
  used to parse a large class of context free grammars
 An attractive option because it:
    – Can be used with virtually any kind of grammar
    – Is the most general non-backtracking shift reduce parsing
      technique known
    – Can be implemented as efficiently as any other technique
    – Can detect syntax error as soon as possible to do so on a
      left-to-right scan of the input
 But it is hard to construct Manually; However,
 Tools are available to construct them automatically.
                                                                  94
                                 LR Parser
 Consists of :    1. A Driver Program 2. An Input 3. An Output
                   4. A stack and      5. A Parsing Table
The Stack
   – Stores a string of the form s0             Input
                                        S
      x1 s1 x2 s2 x3….xm sm with (sm at
                                        t
      the top) where each:                   LR Parsing
                                        a                   Output
                                              Program
   1. xi is a grammar symbol &          c
                                       k
   2. si is a symbol called State
      summarizing the information in            TABLEgoto
                                            action
      the stack below it
                The parsing table has 2 parts
                   1. A parsing action function and a
                   2. A goto function - goto                    95
                  LR Parser Functioning
 The parsing Program:
  – Reads input characters (a) from input buffer one at a time
  – Determines the current state on the top of the stack (s)
  – Consults the Parsing table based on current state s and
    current input symbol a & gets M [s, a] which can be either:
     1. Shift to state S
                           OR
                                2. Reduce by a grammar production
     3. Accept                  4. Error
  – The function goto takes a state and grammar symbol as
    arguments and produces a new state
  – The next move of parser is to read input a and state s and
    then consulting the entry action of parsing table
  – If action [s,a] is accept , parsing is complete
                                                                 96
                  LR Parsing Example
 Let us parse the input string : id + id  id                  $
                                   LR Parsing
STACK:     E
           0
                                    Program
                                   LR Parsing
STACK:    E5
                                    Program
          id
           0
                State              action                    goto
                        id   +    *    (      )       $     E T F      F
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                         99
GRAMMAR:
(1) E  E + T
(2) E  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                   LR Parsing
STACK:     0
                                    Program
                                   LR Parsing
STACK:     E
           3
                                    Program
           F
           0                                                           T
                State              action                    goto
                        id   +    *    (      )       $     E T F      F
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                         101
GRAMMAR:
(1) E  E + T
(2) E  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                   LR Parsing
STACK:     0
                                    Program
                                                                       T
                State              action                    goto
                        id   +    *    (      )       $     E T F      F
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                         102
GRAMMAR:
(1) E  E + T
(2) E  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                   LR Parsing
STACK:     E
           2
                                    Program
           T
           0                                                           T
                State              action                    goto
                        id   +    *    (      )       $     E T F      F
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                         103
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                   LR Parsing
STACK:     E
           7
                                    Program
           
           2                                                           T
           T    State              action                    goto
           0            id   +    *    (      )       $     E T F      F
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                         104
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                   LR Parsing
STACK:    E5
                                    Program
          id
           7                                                           T         F
               State              action                    goto
           2            id   +    *    (      )       $     E T F      F     id
           T      0     s5            s4                    1 2 3
                  1          s6                       acc
           0      2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             105
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                   LR Parsing
STACK:     7
                                    Program
           
           2                                                           T         F
           T    State              action                    goto
           0            id   +    *    (      )       $     E T F      F     id
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             106
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                   LR Parsing
STACK:   10
         E                                                                  T
                                    Program
         F
          7                                                            T        F
               State              action                    goto
          2             id   +    *    (      )       $     E T F      F        id
         T        0     s5            s4                    1 2 3
                  1          s6                       acc
          0       2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             107
GRAMMAR:
(1) E  E + T
(2) E  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                   LR Parsing
STACK:   10
         E0                                                                 T
                                    Program
         F
          7                                                            T        F
               State              action                    goto
          2             id   +    *    (      )       $     E T F      F        id
         T        0     s5            s4                    1 2 3
                  1          s6                       acc
          0       2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             108
GRAMMAR:
(1) E  E + T
(2) E  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E
                                   LR Parsing
STACK:     2                                                                T
                                    Program
           T
           0                                                           T        F
                State              action                    goto
                        id   +    *    (      )       $     E T F      F        id
                  0     s5            s4                    1 2 3
                  1          s6                       acc              id
                  2          r2   s7         r2       r2
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             109
GRAMMAR:
(1) E  E + T
(2) E  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E
                                   LR Parsing
STACK:     0                                                                T
                                    Program
                                                                       T        F
                State              action                    goto
                        id   +    *    (      )       $     E T F      F        id
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             110
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E
                                   LR Parsing
STACK:     1                                                                T
                                    Program
           E
           0                                                           T        F
                State              action                    goto
                        id   +    *    (      )       $     E T F      F        id
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             111
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E
                                   LR Parsing
STACK:     6                                                                T
                                    Program
           +
           1                                                           T        F
           E    State              action                    goto
           0            id   +    *    (      )       $     E T F      F        id
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             112
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E
                                   LR Parsing
STACK:     5                                                                T        F
                                    Program
          id
           6                                                           T        F   id
           +    State              action                    goto
           1            id   +    *    (      )       $     E T F      F        id
          E       0     s5            s4                    1 2 3
                  1          s6                       acc
           0      2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             113
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E
                                   LR Parsing
STACK:     6                                                                T        F
                                    Program
           +
           1                                                           T        F   id
           E    State              action                    goto
           0            id   +    *    (      )       $     E T F      F        id
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             114
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E        T
                                   LR Parsing
STACK:     3                                                                T        F
                                    Program
           F
           6                                                           T        F   id
           +    State              action                    goto
           1            id   +    *    (      )       $     E T F      F        id
           E      0     s5            s4                    1 2 3
                  1          s6                       acc
           0      2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             115
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E
                                   LR Parsing
STACK:     6                                                                T        F
                                    Program
           +
           1                                                           T        F   id
           E    State              action                    goto
           0            id   +    *    (      )       $     E T F      F        id
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             116
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )                                                                   E
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E   +    T
                                   LR Parsing
STACK:     9                                                                T        F
                                    Program
           T
           6                                                           T        F   id
           +    State              action                    goto
           1            id   +    *    (      )       $     E T F      F        id
           E      0     s5            s4                    1 2 3
                  1          s6                       acc
           0      2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             117
GRAMMAR:
(1) E  E + T
(2) E  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )                                                                   E
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E   +    T
                                   LR Parsing
STACK:     0                                                                T        F
                                    Program
                                                                       T        F   id
                State              action                    goto
                        id   +    *    (      )       $     E T F      F        id
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             118
GRAMMAR:
(1) E  E + T
(2) E’  T
                             LR Parser Example
(3) T  T  F
                                                                       OUTPUT:
(4) T  F
(5) F  ( E )                                                                   E
                 INPUT: id             id        +    id    $
(6) F  id
                                                                            E   +    T
                                   LR Parsing
STACK:     1                                                                T        F
                                    Program
           E
           0                                                           T        F   id
                State              action                    goto
                        id   +    *    (      )       $     E T F      F        id
                  0     s5            s4                    1 2 3
                  1          s6                       acc
                  2          r2   s7         r2       r2
                                                                       id
                  3          r4   r4         r4       r4
                  4     s5             s4                   8 2   3
                  5          r6   r6         r6       r6
                  6     s5             s4                     9    3
                  7     s5             s4                         10
                  8          s6              s11
                  9          r1   s7          r1      r1
                 10          r3   r3          r3      r3
                 11          r5   r5          r5      r5                             119
               LR Parsing Observations
 All LR parsers use the same parsing program
     Independent of the table and hence of the Grammar.
 What differentiates the LR parsers are the action and the
  goto tables:
 These tables can be constructed using :
   1. Simple LR (SLR): succeds for the fewest grammars, but is the
      easiest to implement
   2. Canonical LR: succeds for the most grammars, but is the
      hardest to implement. It splits states when necessary to
      prevent reductions that would get the parser stuck
   3. Lookahead LR (LALR): succeeds for most common syntatic
      constructions used in programming languages, but produces
      LR tables much smaller than canonical LR
                                                              120
          SLR Parsing table Construction
 Parsing table construction is based on the key idea :
A handle should NOT be reduced to a nonterminal N
  if the next token (i.e. look ahead token ) can not
  follow N (thereby resulting in dead end and
  backtracking)
 So ensure that :
  Redction of item N → α● is done ONLY if the look
  ahead token is part of FOLLOW (N).
 In order to achieve this, we define a concept called;
               Dotted ITEM or ITEM for short
                                                     121
            DOTTED ITEM ( A.k.a. ITEM)
 A concept used to summarize the state of our search &
 Sets of items to represent sets of hypotheses about the
  next token An item with rightmost ● is called a REDUCE ITEM.
 Definition: All others are SHIFT ITEMS (or KERNEL ITEMS)
   – Consider a production N →αβ
   – An LR dotted Item N →α●β ( Note the ● in between)
     means that we maintain the hypothesis that:
   1. αβ is a possible handle and
   2. We have already seen the α part
   3. When ● reaches the rightmost point like in N →αβ●, we
      have identified the handle
                                                              122
              Dotted ITEM – An Example
 Consider the production A → XYZ
 This production yields four items:
                  1. A → ● XYZ
                    2. A → X ● YZ           Shift Items
                    3. A → XY ● Z
                    4. A → XYZ ●           Reduce Item
 The production A → ε yields only one item A → ●
 An item can be represented by a pair of integers
   – One representing the production number in the grammar &
   – Other indicating the position of the dot in the production
                                                              123
                   Dotted ITEM – An Example ave
 Consider the production A → XYZ                                              e h
                                                                           n w
 This production yields four items:                                c t io
                                                                 d u ss
                                                          p   r o ce
                             1. A → ● XYZ               a           r o
                                                     o f gp
                                                 c h      s i n Items
                                                             Shift
                             2. A → X ● m      u pa r
                                              YZ
                                           o w t he
                             3. A →     s h ●inZ
                                          XY
                                   a te oint
                               d ic p                     Reduce Item
                             in A i→
                             4.       e n XYZ  ●
                           em g    v
 The production         t
                        i At → a ε yields only one item A → ●
                      n
                     a na
                   ,
                 lycaneebe represented by a pair of integers
 An item t iv e      s
     t  i
       uOne representing the production number in the grammar &
  In
   –
    – Other indicating the position of the dot in the production
                                                                                 124
                      Closure Operation
 The operation of extending the context with items is
  called the closure operation
 Useful in identifying all “ITEMS” belonging to a state
 Example:
 If a state includes: A → ●B
    – Include all items that is start of B like B →●X Y Z
    – Informally: if I expect to see B next, I expect to start
      anything that B can start with: So Include X → ●a H I
 States are thus built by closure from individual items.
                                                                 125
                Parsing table construction
                    The Central Idea:
 To construct from the grammar a DFA to recognize
  “viable prefixes”
                     The method:
 Use the grammar to build a model of the DFA
    – Group items together into sets, which give rise to the states
      of the parser
    – Use two functions goto( s, X ) and closure( s )
         goto() is analogous to move() in the subset construction
         closure() adds information to round out a state
 Build up the states and transition functions of the DFA
 Use this info to fill in the ACTION & GOTO tables
                                                                     126
       Creating States in Parsing Table Gen
 Start with the initial state
 Keep moving DOT across the symbols
    – if it s a new state; If yes mark it .
    – Otherwise mark it to an existing state
    – Compute the closures, if any, in that state & include them
    – Keep moving the dot and marking the states until all viable
      states are identified and inter transitions are marked.
 The generated DFA is then used to construct the parsing
  table with Action and go to items
                                                                   127
Parsing Table Construction
                             128
E .E          0          E E.         1             E E+.T 6
E .E+T                   E E.+T                     T  .T * F
E  .T                                                T  .F
T  .T * F                E T.         2             F  .( E )
T  .F                    T  T.*F                    F  . id
F  .( E )                                             T  T * .F
                           T  F. 3                                   7
F  . id                                               F  .( E )
                             F  id .       5          F  . id
    4        F  (. E )
             E .E+T                                   E E+T. 9
                                                8
             E  .T           F (E . )                T  T.*F
             T  .T * F       E E.+T                                     10
                                                        T  T*F.
             T  .F
             F  .( E )                         F ( E ). 11
             F  . id                                               129
      Constructing Parsing Table from DFA
 Use the following rules :
1. An arc between two states labeled with a terminal is a
   shift action.
2. An arc between two states labeled with a non-
   terminal is a goto action.
3. if a state contains an item A → , (a reduce item)
   the action is to reduce by this production, for all
   terminals in Follow (A).
 If there are shift-reduce conflicts or reduce-reduce
   conflicts, more elaborate techniques are needed
                                                       130
     Transition Diagram for Expression
                 GrammarReduce 1
0     E        1       +           6         T            9     *   7
                                        F         3
                                         (            4
    Reduce 2                                 id           5
     T         2       *           7    F                  10
                       Reduce 3                                     1. E  E + T
                                             (
                                                   4                2. E  T
      F                                           id
    Reduce 4
               3                                          5         3. T  T * F
                   (
                        Reduce 5                                    4. T  F
       (                E           8         )           11        5. F  ( E )
               4
                   T        2                 +                     6. F  id
                    F                                     6
            id                  3
    id        5
    Reduce 6                                                                   131
               LR Parsing - A Summary
 Most general paser that can parse wide variety of
   grammars
 Parsing Program is one ; only the table changes
 Takes longer time to execute
 The table generation is tough, if manually done;
    – Use of automated tools a necessity
     y.tab.c             C
                                           a.out
                       compiler
                                                    135
                        In Summary
 We use CFGs and Backus-Naur form to describe
  context-free languages
 Grammars can be ambiguous
 Parsing techniques use these grammars
 The general CYK algorithm, for testing membership in
  a context-free language Works for any unambiguous
  grammar BUT Running time is O(n3), much too slow for
  practical use (n is the length of the input)
 The practical techniques are
   – LL ( recursive Descent) &
   – LR ( Shift – Reduce ) Techniques
                                        ………. Contd.
                                                136
                       In Summary
 LL Parsing ( A.k.a Recursive Descent parsing )
 LL stands for Left-to-right scan of input & Leftmost
  derivation
 Start by expanding the start symbol and then expand
  out nonterminals based on the input, building the parse
  tree from the top down
 Works only on grammars with some restrictions. ( LL
  Grammars with no left recursion and the first token
  uniquely identifies the Production)
 Can be easily built manually
                                             ………. Contd.
                                                       137
                       In Summary
 LR Parsing ( A.k.a. Shift – Reduce parsing )
 LR stands for Left-to-right scan of input & Rightmost
  derivation
 Bottom – up parsing techniques that starts from leaf
  nodes and builds the tree upwards by reducing the
  tokens to its LHS (recognize occurrence of b (the
  handle) having seen all of what is derived from b )
 Works on wide variety of grammars ( LR Grammars
  with no right recursion)
 Too complex to build manually
 Tools are available to build them automatically
                                                      138
     Any
Questions ????
  Thank you
                 139
        Augmented Grammar & its Functions
 As a pre-requisite, “to construct DFA to recognize
   viable prefixes from the grammar” we define:
1. Augmented grammar and
2. Two functions with it namely:
    1. The closure Operation – Closure (I) and
    2. The Goto Operation - goto (I, X)
 Augmented Grammar
    –   If G is the grammar with start symbol S then the
        Augmented grammar for G is
    –   G’ with new start symbol S’ such that S’ → S
    –   Used to indicate the parser when to stop parsing
                                                           140
                  1. The closure Operation
  If I is the set of items for a grammar G then Closure (I)
   is the set of items constructed from following 2 rules
      1. Initially every item in I is added to Closure (I)
      2. If A → α●Bβ is in the Closure (I) & B → γ is a production
         then add the item B → ●γ to I if not already included
 (This rule needs to be applied repeatedly until no more new items
    can be added to Closure (I) )                         E’ ●E
                                                          E  ●E + T
GRAMMAR                 If I is the set of one item       E  ●T
E’  E                                                    T  ●T  F
                                 { E’ ●E },
E E+T|T                                                  T  ●F
T  T  F | F then the Closure (I) set would              F ●( E )
F  ( E ) | id                      contain:               F ●id 141
                      2. goto Operation
 goto (I, X) ( where I is the set of items and X is the input
  grammar symbol) is defined as:
    – The closure of the set of all items [A →αX●β] such that [A
      →α ● Xβ] is in I.
 Intuitively, if I is the set of items that are valid for some
  viable prefix γ, then goto (I,X) is the set of items that are
  valid for the viable prefix γX
                                            E E+●T
 If I is the set of 2 items
                                          T  ●T  F
    – { E’ E● and E  E● + T} ;
                                          T  ●F
   then, goto ( I, X) contains:           F ●( E )
                                          F ●id              142