0% found this document useful (0 votes)
72 views37 pages

Unit 1

The document provides an overview of compiler design, explaining that a compiler translates programs written in one language into an equivalent program in another language, and discusses the main phases of a compiler including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, code generation, symbol table management, and error handling. It also describes what each phase involves at a high level, such as breaking the source code into tokens in lexical analysis and checking syntax in syntax analysis.

Uploaded by

axar kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views37 pages

Unit 1

The document provides an overview of compiler design, explaining that a compiler translates programs written in one language into an equivalent program in another language, and discusses the main phases of a compiler including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, code generation, symbol table management, and error handling. It also describes what each phase involves at a high level, such as breaking the source code into tokens in lexical analysis and checking syntax in syntax analysis.

Uploaded by

axar kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

15CS314J – COMPILER DESIGN

UNIT I -
INTRODUCTION
Prepared by:
Dr.R.I.Minu
Associate Professor

1
Overview and History
• Cause
• Software for early computers was written in assembly language
• The benefits of reusing software on different CPUs started to
become significantly greater than the cost of writing a compiler

• The first real compiler


• FORTRAN compilers of the late 1950s
• 18 person-years to build

2
So What is Compiler?
• A compiler is a program that can read a program in one language - the source
language - and translate it into an equivalent program in another language - the
target language.
• A compiler acts as a translator, transforming human-oriented programming
languages into computer-oriented machine languages.
• Ignore machine-dependent details for programmer
Compiler vs Interpreter
• An interpreter is another common kind of language processor. Instead of
producing a target program as a translation, an interpreter appears to
directly execute the operations specified in the source program on inputs
supplied by the user

• The machine-language target program produced by a compiler is usually


much faster than an interpreter at mapping inputs to outputs .
• An interpreter, however, can usually give better error diagnostics than a
compiler, because it executes the source program statement by statement
Structure of a Compiler
Structure of a Compiler

• Breaks the source program into pieces and fit into a grammatical structure
• If this part detect any syntactically ill formed or semantically unsound error it
Analysis is report to the user
• It collect the information about the source program and stored in a data
structure – Symbol Table

Synthesis • Construct the target program from the available symbol table and
intermediate representation
Compiler Front- and Back-end
Source program (character stream) Abstract syntax tree or
other intermediate form
Scanner
Machine-
(lexical analysis)
Independent Code
Tokens
Improvement
Front end

Back end
synthesis
Parser Modified intermediate form
analysis

(syntax analysis)
Target Code
Parse tree
Generation
Semantic Analysis Assembly or object code
and Intermediate
Machine-Specific
Code Generation
Code Improvement
Abstract syntax tree or
Modified assembly or object code
other intermediate form
COP4020 Spring 2014 8 3/27/2019
Lexical Analysis
Phases of Compiler-Lexical Analysis
• It is also called as scanning
• This phase scans the source code as a stream of characters and converts it
into meaningful lexemes.
• For each lexeme, the lexical analyzer produces as output a token of the
form
• It passes on to the subsequent phase, syntax analysis.
This points to an entry in the
symbol table for this token.
It is an abstract Information from the symbol-table
symbol that is used <token-name, attribute-value> entry 'is needed for semantic
during syntax analysis and code generation
analysis
Phases of Compiler-Lexical Analysis
For example consider
• position is lexeme mapped to token <id,1>
• <Id> is an abstract symbol
• <1> points to the symbol table entry for position
• The assignment symbol = is a lexeme that is mapped into the token
<=>
• We can omit second component as it has no attribute value
• Similarly initial is mapped to <id,2>,rate <id,3>
• The lexeme + is mapped to <+>, * is mapped to <*> and 60 is mapped
to <60>
Scanner: Lexical Analysis
• Lexical analysis breaks up a program into tokens
• Grouping characters into non-separatable units (tokens)
• Changing a stream to characters to a stream of tokens
program gcd (input, output);
var i, j : integer;
begin
read (i, j);
while i <> j do
if i > j then i := i - j else j := j - i;
writeln (i)
end.

program gcd ( input , output ) ;


var i , j : integer ; begin
read ( i , j ) ; while
i <> j do if i > j
then i := i - j else j
:= i - i ; writeln ( i
) end .
Syntax Analysis
Phases of Compiler-Syntax Analysis
• This is the second phase, it is also called as parsing
• It takes the token produced by lexical analysis as input and generates
a parse tree (or syntax tree).
• In this phase, token arrangements are checked against the source
code grammar, i.e. the parser checks if the expression made by the
tokens is syntactically correct.
Semantic Analysis
Phases of Compiler-Semantic Analysis
• Semantic analysis checks whether the parse tree constructed follows
the rules of language.
• The semantic analyzer uses the syntax tree and the information in the
symbol table to check the source program for semantic consistency
with the language definition.
• It also gathers type information and saves it in either the syntax tree
or the symbol table, for subsequent use during intermediate-code
generation.
• An important part of semantic analysis is type checking
Phases of Compiler-Semantic Analysis
• Suppose that position, initial, and rate have been declared to be
floating-point numbers and that the lexeme 60 by itself forms an
integer.
• The type checker in the semantic analyzer discovers that the operator
* is applied to a floating-point number rate and an integer 60.
• In this case, the integer may be converted into a floating-point
number.
Intermediate Code Generation
Phases of Compiler-Intermediate Code Generation
• After semantic analysis the compiler generates an intermediate code
of the source code for the target machine.
• It represents a program for some abstract machine.
• It is in between the high-level language and the machine language.
• This intermediate code should be generated in such a way that it
makes it easier to be translated into the target machine code.
Phases of Compiler-Intermediate Code Generation
• An intermediate form called three-address code were used
• It consists of a sequence of assembly-like instructions with three
operands per instruction. Each operand can act like a register.
Code Optimization
Phases of Compiler-Code Optimization
• The next phase does code optimization of the intermediate code.
• Optimization can be assumed as something that removes
unnecessary code lines, and arranges the sequence of statements in
order to speed up the program execution without wasting resources
(CPU, memory).
Code Generation
Phases of Compiler-Code Generation
• In this phase, the code generator takes the optimized representation
of the intermediate code and maps it to the target machine
language.
• If the target language is machine code, registers or memory locations
are selected for each of the variables used by the program.
• Then, the intermediate instructions are translated into sequences of
machine instructions that perform the same task
Phases of Compiler-Code Generation
• For example, using registers R1 and R2, the intermediate code might
get translated into the machine code
• The first operand of each instruction specifies a destination. The F in
each instruction tells us that it deals with floating-point numbers.
Symbol Table Management
Phases of Compiler-Symbol Table Management
• Symbol table is a data structure holding information about all symbols
defined in the source program
• Not part of the final code, however used as reference by all phases of
a compiler
• Typical information stored there include name, type, size, relative
offset of variables
• Generally created by lexical analyzer and syntax analyzer
• Good data structures needed to minimize searching time
• The data structure may be flat or hierarchical
Error Handling and Recovery
Error Handling and Recovery
• An important criteria for judging quality of compiler
• For a semantic error, compiler can proceed
• For syntax error, parser enters into a erroneous state
• Needs to undo some processing already carried out by the parser for
recovery
• A few more tokens may need to be discarded to reach a descent
state from which the parser may proceed
• Recovery is essential to provide a bunch of errors to the user, so that
all of them may be corrected together instead of one-by-one
The Structure of a Compiler (8)
Code Generator
[Intermediate Code Generator]

Non-optimized Intermediate Code


Scanner
[Lexical Analyzer]

Tokens

Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree

Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code

Abstract Syntax Tree w/ Attributes

30
The Structure of a Compiler (2)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Representation

Symbol and Optimizer


Attribute
Tables

(Used by all Phases of The Compiler)

Code
Generator
31
Target machine code
The Structure of a Compiler (3)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Scanner Representation

 The scanner begins the analysis of the source program by


reading the input, character by character, and grouping
Symbol and Optimizer
characters into individual words and symbols (tokens)
Attribute
Tables
 RE ( Regular expression )
 NFA ( Non-deterministic Finite Automata )
 (Used by
DFA ( Deterministic Finite Automata ) all
 LEX Phases of
The Compiler) Code
Generator
32
Target machine code
The Structure of a Compiler (4)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Parser Representation

 Given a formal syntax specification (typically as a context-


free grammar [CFG] ), the parse reads tokens and groups
Symbol and Optimizer
them into units as specified by the productions of the CFG
Attribute
being used.
Tables
 As syntactic structure is recognized, the parser either calls
corresponding semantic routines directly or builds a syntax
(Used by all
tree.
Phases
 CFG ( Context-Free Grammar ) of
 BNF ( Backus-Naur Form )The Compiler) Code
 GAA ( Grammar Analysis Algorithms ) Generator
 LL, LR, SLR, LALR Parsers
 YACC 33
Target machine code
The Structure of a Compiler (5)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Semantic Routines Representation

 Perform two functions


 Check the static semantics of each construct
Symbol and
 Do the actual translation Optimizer
Attribute
 The heart of a compiler
Tables
 Syntax Directed Translation
 Semantic Processing Techniques
(Used
by all
 IR (Intermediate Representation)
Phases of
The Compiler) Code
Generator
34
Target machine code
The Structure of a Compiler (6)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Optimizer Representation

 The IR code generated by the semantic routines is


analyzed and transformed into functionally equivalent but
Symbol and Optimizer
improved IR code
Attribute
 This phase can be very complex and slow
Tables
 Peephole optimization
 loop optimization, register allocation, code scheduling
(Used by all
Phases of
 Register and Temporary Management
 Peephole Optimization The Compiler) Code
Generator
35
Target machine code
The Structure of a Compiler (7)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Code Generator Representation
 Interpretive Code Generation
 Generating Code from Tree/Dag
 Grammar-Based Code Generator
Optimizer

Code
Generator
Target machine code 36
https://github.com/app789/Compiler-Design-Lab

You might also like