Compiler Syntax Analysis Guide

The document discusses the parser phase of a compiler. The parser takes a token string as input from lexical analysis and uses the grammar to generate an intermediate representation known as a parse tree. It checks that the token arrangements conform to the grammar rules and detects syntax errors. Top-down and bottom-up parsers are described, including LR parsers, operator precedence parsers, and shift-reduce parsers. The document also discusses context-free grammars and their use in parsing.

Uploaded by

mishamoamanuel574

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

155 views42 pages

Compiler Syntax Analysis Guide

Uploaded by

mishamoamanuel574

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Chapter three

parsing/Syntax Analysis
COMPILER DESIGN – PHASES OF COMPILER
The compilation process is a sequence of various phases. Each phase takes
input from its previous stage, has its own representation of source program,
and feeds its output to the next phase of the compiler. Let us understand the
phases of a compiler
Parser
The parser is the phase of the compiler which takes a
token string as input and with the help of existing
grammar, converts it into the corresponding Intermediate
Representation.
Comparison with Lexical Analysis
The role of the Parser
Cont….
 Error handling
Common programming errors
Lexical errors
parser errors/ Syntactic errors
Semantic errors
Error handler goals
 Report the presence of errors clearly and accurately
 Recover from each error quickly enough to detect
subsequent errors.
 Add minimal overhead to the processing of correct
programs.
 Syntax error/parser error can be detected
at this level if the input is not in
accordance with the grammar.
 Parsing or syntactic analysis is the process of analyzing a
string of symbols, either in natural language or in
computer languages, conforming to the rules of a formal
grammar
 It is aided by using techniques based on formal grammar
of the programming language
 It is analyze (a string or text) into logical syntactic
components, typically in order to test conformability to a
logical grammar.
 It takes the token produced by lexical analysis as input
and generates a parse tree (or syntax tree). In this phase,
token arrangements are checked against the source code
grammar, i.e., the parser checks if the expression made by
the tokens is syntactically correct.
 If the lexical analyzer finds a token invalid, it
generates an error.
 The lexical analyzer works closely with the syntax
analyzer. It reads character streams from the source
code, checks for legal tokens, and passes the data to
the syntax analyzer when it demands.
 The parser analyzes the source code token stream
against the production rules to detect any errors in
the code. The output of this phase is a parse tree.

This way, the parser accomplishes two tasks, i.e., parsing the
code, looking for errors and generating a parse tree as the output
of the phase.
Types of Parser:
Types of Parser:
The parser is mainly classified into two categories, i.e.
Top-down Parser, and Bottom-up Parser. These are
explained below:
1- Top-Down Parser: The top-down parser is the
parser that generates parse for the given input string
with the help of grammar productions by expanding the
non-terminals i.e. it starts from the start symbol and
ends on the terminals. It uses left most derivation.
Further Top-down parser is classified into 2 types:
Recursive descent parser, and Non-recursive descent
parser.
Leftmost Derivation (LMD): If the sentential
form of an input is scanned and replaced from left
to right, it is called left-most derivation.
• The sentential form derived by the left-most
derivation is called the left-sentential form.
Example: Consider the G,
E → E + E | E * E | (E ) | - E | id
Derive the string id + id * id using leftmost
derivation
Recursive descent parser is also known as the
Brute force parser or the backtracking parser.
 It basically generates the parse tree by using
brute force and backtracking.
Non-recursive descent parser is also known as
LL(1) parser or predictive parser or without
backtracking parser or dynamic parser.
 It uses a parsing table to generate the parse
tree instead of backtracking.
2- Bottom-up Parser: Bottom-up Parser is the parser that
generates the parse tree for the given input string with the help of
grammar productions by compressing the nonterminal i.e. it starts
from non-terminals and ends on the start symbol.
It uses the reverse of the rightmost derivation. Further Bottom-
up parser is classified into two types: LR parser, and Operator
precedence parser, Shift Reduce Parsers
LR parser is the bottom-up parser that generates the parse tree
for the given string by using unambiguous grammar. It follows the
reverse of the rightmost derivation.
types of Bottom-up Parser
 LR parser ( a- LR(0) b- SLR(1) c-LALR(1)
d-CLR(1))
 Operator precedence parser.
 Shift Reduce Parsers
LR parser is the bottom-up parser that generates the parse tree for the given
string by using unambiguous grammar. It follows the reverse of the
rightmost derivation.
Why LR parsing:
• LR parsers can be constructed to recognize virtually all programming-
language constructs for which context-free grammars can be written.
• The LR parsing method is the most general non-backtracking shift-reduce
parsing method known, yet it can be implemented as efficiently as other shift-
reduce methods.
• The class of grammars that can be parsed using LR methods is a proper
subset of the class of grammars that can be parsed with predictive parsers.
• An LR parser can detect a syntactic error as soon as it is possible to do so
on a left-toright scan of the input.
• The disadvantage is that it takes too much work to construct an LR parser
by hand for a typical programming-language grammar. But there are lots of
LR parser generators available to make this task easy.
Cont…

LR(0) parser
Also know as LR(k) parsing where "L" stands for left most derivation
scanning of the input. "R" stands for the construction of the right most
derivation in reverse. "K" stands for the number of input symbols of the look
ahead used to make number of parsing decisions.
This is non-recursive shift-reduce, bottom-up parsing.
It is used to parse a large class of grammars therefore making it the most
efficient syntax analysis technique.
SLR(1) (Simple LR Parsing)
It is similar to LR(0) parser but works on a reduced entry. It has few number
of states hence very small table.
It is has simple and fast construction
Cont….

LALR(1) Parsing(Look-Ahead LR Parsing)

It attempts to reduce the number of states in LR parser by merging similar
states. This reduces the number of states to the same as SLR but retains
some power of LR look- aheads. It is the Look ahead LR parser. Works on
an intermediate size of grammar. It can has the same number of states as
the SLR(1).
CLR(1)(Canonical Look-Ahead LR Parsing)
Parsing Refers to canonical look ahead. It uses the canonical collection of
LR(1) to construct its parsing table. It has more states compared to SLR
parsing. In this type of parsing the reduced node is replaced only in the look
ahead symbols
Operator precedence parser generates the parse tree form given grammar
and string but the only condition is two consecutive non-terminals and
epsilon never appear on the left and right-hand side of any production.
It is a kind of shift-reduce parsing applied to a small class of operator
grammars. A grammar is said to be operator precedence if;
No two terminals are adjacent at the right side of production.
Production does not contain E on its right side.
Operator precedence relations.
p < q --> p has less precedence than q
p > q --> p has more precedence than q
p = q --> p has equal precedence than q Limitations.
Applies only to small class of grammars.
An operator like minus can be unary or binary and therefore will have
different precedence in different statements.
Rightmost Derivation (RMD): If we scan and replace the input
with production rules, from right to left, it is known as right-most
derivation.
• The sentential form derived from the right-most derivation is
called the right sentential form.
Example 1: Consider the G, E → E + E | E * E | (E ) | - E | id
Derive the string id + id * id using rightmost derivation.
Cont….

 In bottom-up parsing, Constructing a parse tree for

an input string beginning at the leaves and going
towards the root. Bottom up parsers start from the
sequence of terminal symbols and work their way
back up to the start symbol by repeatedly replacing
grammar rules' right hand sides by the
corresponding non-terminal. This is the reverse of
the derivation process, and is called "reduction". A
general type of bottom-up parser is a shift-reduce
parser.
Cont……

Shift Reduce Parsers

Shift-reduce parsing is a type of bottom -up parsing
that attempts to construct a parse tree for an input
string beginning at the leaves (the bottom) and
working up towards the root (the top).
 parser is sometimes called as Syntax
Analyze . It constructs the parse tree. It takes
all the tokens one by one and uses Context
Free Grammar to construct the parse tree.
Why Grammar ?
 The rules of programming can be entirely
represented in some few productions. Using
these productions we can represent what the
program actually is. The input has to be
checked whether it is in the desired format or
not.
Context-Free Grammars
Syntax analysis or parsing is the second phase of a compiler. In this chapter,
we shall learn the basic concepts used in the construction of a parser. We
have seen that a lexical analyzer can identify tokens with the help of regular
expressions and pattern rules. But a lexical analyzer cannot check the syntax
of a given sentence due to the limitations of the regular expressions. Regular
expressions cannot check balancing tokens, such as parenthesis. Therefore,
this phase uses context-free grammar CFG, which is recognized by
pushdown automata.

It implies that every Regular Grammar is also context-free, but there exists some
problems, which are beyond the scope of Regular Grammar. CFG is a helpful tool in
describing the syntax of programming languages.
Cont…..

A grammar is a set of rules for putting strings together and so

corresponds to a language.
A context-free grammar is a quadruple (V, Z, P, S) where:
V is a set of variables (also called nonterminals), one of which is
designated the start variable; It is customary to use upper-case
letters for variables;
Z (the alphabet) is a finite set of terminal symbols.
P is a finite set of rules( like production rules) ( A X). Where x
is string of variables and terminals S is a distinguished element
of V called the start symbol. The sets V and Z are assumed to
be disjoint.
Cont…..
Context-Free Languages
A language L is context-free IF AND ONLY IF there is a grammar G with L=L(G)
Classes of Grammars
In 1959 Noam Chomsky, a linguist, suggested a way of classifying grammars
according to complexity . The convention used below, and in the remaining chapters,
is that the term string includes the null string and that, in referring to grammars, the
following symbols will have particular meanings
Generally, A context-free grammar (CFG) consists of a
set of productions that you use to replace a variable by
a string of variables and terminals. The language of a
grammar is the set of strings it generates. A language
is context-free if there is a CFG for it.
Context free grammars
Example:
Derivation Order
Derivation example
Derivation Trees /parser tree
Derivation Trees /parser tree
Derivation Trees /parser tree
Derivation Trees /parser tree
Derivation Trees /parser tree
In a parse tree:
All leaf nodes are terminals.
All interior nodes are non-terminals.
In-order traversal gives original input string. A parse tree depicts associativity and
precedence of operators. The deepest sub-tree is traversed first, therefore the
operator in that sub-tree gets precedence over the operator which is in the parent
nodes
Ambiguity:
A grammar that produces more than one parse for some sentence is said to be
ambiguous grammar. i.e. An ambiguous grammar is one that produce more than one
leftmost or more than one rightmost derivation for the same sentence.
Ambiguous and un ambiguous grammar
thank you

Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
36 pages
Complier Construction (Final)
No ratings yet
Complier Construction (Final)
8 pages
UNIT-4 Parsing Techniques
No ratings yet
UNIT-4 Parsing Techniques
20 pages
Chapter 3 Syntax Analyzer1
No ratings yet
Chapter 3 Syntax Analyzer1
58 pages
Unit 2 - Part 1
No ratings yet
Unit 2 - Part 1
2 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
COP CD Unit2 PDF
No ratings yet
COP CD Unit2 PDF
371 pages
Unit - 3 Syntax Analyzer
No ratings yet
Unit - 3 Syntax Analyzer
43 pages
Compiler Designnotes
No ratings yet
Compiler Designnotes
18 pages
Document From Aditya Tripathi
No ratings yet
Document From Aditya Tripathi
5 pages
Chapter 3 (Part 1)
No ratings yet
Chapter 3 (Part 1)
33 pages
Chapter 3 - Compiler Design
No ratings yet
Chapter 3 - Compiler Design
29 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
54 pages
CD Notes
No ratings yet
CD Notes
194 pages
2024 CD-Ch03 Syntaxx Analysis
No ratings yet
2024 CD-Ch03 Syntaxx Analysis
28 pages
CT - Lecture 4
No ratings yet
CT - Lecture 4
18 pages
CD Farre
No ratings yet
CD Farre
13 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
68 pages
Syntax Analysis: CD: Compiler Design
No ratings yet
Syntax Analysis: CD: Compiler Design
90 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
Compiler Design 4
No ratings yet
Compiler Design 4
7 pages
UNIT-2: Parsing
No ratings yet
UNIT-2: Parsing
18 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
67 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
Programming Language Syntax
No ratings yet
Programming Language Syntax
41 pages
Unit 2
No ratings yet
Unit 2
22 pages
SPCC Case Study Parser
No ratings yet
SPCC Case Study Parser
4 pages
Compiler Engineering
No ratings yet
Compiler Engineering
27 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
CD - CH3 - Syntax Analysis (Parsing)
No ratings yet
CD - CH3 - Syntax Analysis (Parsing)
109 pages
Assignment No 3
No ratings yet
Assignment No 3
3 pages
Parsing
No ratings yet
Parsing
12 pages
CC LL
No ratings yet
CC LL
15 pages
Bottom-Up Parsing Techniques Explained
No ratings yet
Bottom-Up Parsing Techniques Explained
53 pages
CD 2,3 Unit's Material
100% (1)
CD 2,3 Unit's Material
170 pages
UNIT 2 Notes CD
No ratings yet
UNIT 2 Notes CD
12 pages
ACD-UNIT-4 KR23 Compiler Phase 2 Syntax Analyzer 3 Semantics Analyzer 4 Intermediate Code Generator by DR Patil Yogita D
No ratings yet
ACD-UNIT-4 KR23 Compiler Phase 2 Syntax Analyzer 3 Semantics Analyzer 4 Intermediate Code Generator by DR Patil Yogita D
42 pages
PYQs Unit 2 CD
No ratings yet
PYQs Unit 2 CD
31 pages
M2 Compiler Design
No ratings yet
M2 Compiler Design
51 pages
Syntax & Semantic Analysis Guide
No ratings yet
Syntax & Semantic Analysis Guide
32 pages
Comp Review: Compilers: Fall 1996 Textbook: "Compilers" by Aho, Sethi & Ullman
No ratings yet
Comp Review: Compilers: Fall 1996 Textbook: "Compilers" by Aho, Sethi & Ullman
10 pages
Grammars
No ratings yet
Grammars
34 pages
Parser
No ratings yet
Parser
40 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
78 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
Chapter 3 Syntax Analyzer
No ratings yet
Chapter 3 Syntax Analyzer
46 pages
Compiler Syntax Analysis Guide
No ratings yet
Compiler Syntax Analysis Guide
19 pages
Chapter 3 Syntax Analysis Full Reading Material
No ratings yet
Chapter 3 Syntax Analysis Full Reading Material
76 pages
Parsers in Compiler Design and Types
No ratings yet
Parsers in Compiler Design and Types
2 pages
Panic Mode Phrase Level Error Productions Global Correction: 4931 - GRACE College of Engineering, Thoothukudi
No ratings yet
Panic Mode Phrase Level Error Productions Global Correction: 4931 - GRACE College of Engineering, Thoothukudi
6 pages
Chapter 3a - Syntax Analysis
No ratings yet
Chapter 3a - Syntax Analysis
10 pages
Types of Parser
No ratings yet
Types of Parser
17 pages
Compiler Syntax Analysis Guide
No ratings yet
Compiler Syntax Analysis Guide
41 pages
Parsing
No ratings yet
Parsing
33 pages
Chapter - Three
No ratings yet
Chapter - Three
139 pages
Module-2 1
No ratings yet
Module-2 1
51 pages
CD Unit-2
No ratings yet
CD Unit-2
107 pages
Syntax Analysis Parsing
No ratings yet
Syntax Analysis Parsing
9 pages
Unit - 3 Irs
No ratings yet
Unit - 3 Irs
57 pages
Code Generation Using XML Based Document Transformation: Soumen Sarkar Craig Cleaveland
No ratings yet
Code Generation Using XML Based Document Transformation: Soumen Sarkar Craig Cleaveland
20 pages
CC14T
No ratings yet
CC14T
3 pages
Unit 1
No ratings yet
Unit 1
29 pages
01 Regex and CFG
No ratings yet
01 Regex and CFG
2 pages
Ai Unit-5 QB
No ratings yet
Ai Unit-5 QB
9 pages
Simple Syntax Directed Translation
No ratings yet
Simple Syntax Directed Translation
51 pages
Unit - Ii 2.1 Syntax Analysis
No ratings yet
Unit - Ii 2.1 Syntax Analysis
122 pages
Learn LLVM 17, 2nd Edition Kai Nacke PDF Download
100% (1)
Learn LLVM 17, 2nd Edition Kai Nacke PDF Download
52 pages
Application of NLP in Big Data
No ratings yet
Application of NLP in Big Data
10 pages
JNTUH Syllabus 2013 M.Tech CSE
No ratings yet
JNTUH Syllabus 2013 M.Tech CSE
33 pages
SourceInsight4UserGuide PDF
No ratings yet
SourceInsight4UserGuide PDF
436 pages
Overview of LEX and YACC
No ratings yet
Overview of LEX and YACC
6 pages
Parsers
No ratings yet
Parsers
11 pages
Rajasthan Technical University
No ratings yet
Rajasthan Technical University
1 page
Get Understanding Large Language Models Thimira Amaratunga Free All Chapters
100% (3)
Get Understanding Large Language Models Thimira Amaratunga Free All Chapters
76 pages
(Violeta Seretan) Syntax-Based Collocation Extract (B-Ok - CC) PDF
100% (1)
(Violeta Seretan) Syntax-Based Collocation Extract (B-Ok - CC) PDF
215 pages
Aircraft Weight Estimation During Take-Off Using Declarative Machine Learning
No ratings yet
Aircraft Weight Estimation During Take-Off Using Declarative Machine Learning
10 pages
Bottom Up LR (0) Parsing in C.
100% (2)
Bottom Up LR (0) Parsing in C.
18 pages
A Free Word Dependency Parser in Prolog
No ratings yet
A Free Word Dependency Parser in Prolog
8 pages
18CSC304J - Compiler Design
No ratings yet
18CSC304J - Compiler Design
2 pages
(R20CSE3202) Compiler Design: Question Bank With Blooms Taxonomy Level (BTL)
No ratings yet
(R20CSE3202) Compiler Design: Question Bank With Blooms Taxonomy Level (BTL)
10 pages
Course Outlines-Semester
No ratings yet
Course Outlines-Semester
7 pages
Gender & Society Course Mapping
No ratings yet
Gender & Society Course Mapping
136 pages
Describing Syntax and Semantics: ISBN 0-321-33025-0
No ratings yet
Describing Syntax and Semantics: ISBN 0-321-33025-0
139 pages
DDE MCA Curriculam and Syllabus FINAL
No ratings yet
DDE MCA Curriculam and Syllabus FINAL
54 pages
Pentaho Data Integration Tutorial
No ratings yet
Pentaho Data Integration Tutorial
5 pages
Exploring Natural Language Processing in Model-To-Model Transformations
No ratings yet
Exploring Natural Language Processing in Model-To-Model Transformations
17 pages
Compiler Design and Techniques
No ratings yet
Compiler Design and Techniques
318 pages
IEEE Xplore Full-Text PDF
No ratings yet
IEEE Xplore Full-Text PDF
7 pages

Compiler Syntax Analysis Guide

Uploaded by

Compiler Syntax Analysis Guide

Uploaded by

Chapter three

LALR(1) Parsing(Look-Ahead LR Parsing)

 In bottom-up parsing, Constructing a parse tree for

Shift Reduce Parsers

A grammar is a set of rules for putting strings together and so

You might also like