UNIT-1
Overview of the Compiler
and its Structure
1
Language processor
A language processor is a software program designed or used to
perform tasks such as processing program code to machine code.
The programs are written mostly in high level languages like Java,
C++, Python etc. and are called source code.
These source code cannot be executed directly by the computer
and must be converted into machine language to be executed.
Hence, a special translator system software is used to translate
the program written in high-level language into machine code is
called Language Processor and the program after translated into
machine code (object program / object code).
2
Translator
A translator is a program that takes one form of program as input
and converts it into another form.
Types of translators are:
1. Compiler
2. Interpreter
3. Assembler
Source Translator Target
Program Program
Error
Messages (If any)
3
Compiler
A compiler is a program that reads a program written in source
language and translates it into an equivalent program in target
language.
void main() 0000 1100 0010
{ 0100
Source
int a=1,b=2,c; Compiler 0111 1000 0001
Target
c=a+b; Program 1111 0101 1110
Program
printf(“%d”,c); 1100 0000 1000
} 1011
Source Error Target
Program Messages (If any) Program
4
Interpreter
Interpreter is also program that reads a program written in source
language and translates it into an equivalent program in target
language line by line.
Void main() 0000 1100 0010
{ 0000
int a=1,b=2,c; Interpreter 1111 1100 0010
c=a+b;
1010 1100 0010
printf(“%d”,c);
0011 1100 0010
} 1111
Error
Source Target
Messages (If any)
Program Program
5
Assembler
• Assembler is a translator which takes the assembly code as an
input and generates the machine code as an output.
MOV id3, R1 0000 1100 0010
MUL #2.0, R1 0100
MOV id2, R2 0111 1000 0001
MUL R2, R1 Assembler 1111 0101 1110
MOV id1, R2 1100 0000 1000
ADD R2, R1 1011
MOV R1, id1 1100 0000 1000
Error
Assembly Code Messages (If any) Machine Code
6
Difference between compiler & interpreter
Compiler Interpreter
Scans the entire program and translates It translates program’s one statement at
it as a whole into machine code. a time.
It generates intermediate code. It does not generate intermediate code.
An error is displayed after entire An error is displayed for every
program is checked. instruction interpreted if any.
Requirement of memory is more. Requirement of memory is less.
Example: C compiler Example: Basic, Python, Ruby
7
Difference between assembler & interpreter
Assembler Interpreter
It converts low-level language to the It converts high-level language to the
machine language. machine language.
The program for an Assembler is written The program for an Interpreter is
for particular hardware. written for particular language.
It is one to one i.e. one instruction It is one to many i.e. one instruction
translates to only one instruction. translates to many instruction.
It translates entire program before It translates program instructions line by
running. line.
Errors are displayed before program is Errors are displayed for the every
running. interpreted instruction (if any).
It is used only one time to create an It is used everytime when the program
executable file. is running.
Requirement of memory is less. Requirement of memory is more.
8
Analysis synthesis model of compilation
There are two part of compilation.
1. Analysis Phase
2. Synthesis Phase
void main() Analysis Synthesis 0000 1100
{ Phase Phase 0111 1000
int a=1,b=2,c; 0001
c=a+b; 1111 0101
printf(“%d”,c); Intermediate 1000
} Representation 1011
Source Code Target Code
9
Analysis phase & Synthesis phase
Analysis Phase Synthesis Phase
Analysis part breaks up the The synthesis part constructs
source program into the desired target program
constituent pieces and from the intermediate
creates an intermediate representation.
representation of the source Synthesis phase consist of the
program. following sub phases:
Analysis phase consist of 1. Code optimization
three sub phases: 2. Code generation
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
10
Phases of compiler
Compiler
Analysis phase Synthesis phase
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
11
Lexical analysis
Lexical Analysis is also called linear analysis Position = initial + rate*60
or scanning.
Lexical analysis
Lexical Analyzer divides the given source
statement into the tokens.
id1 = id2 + id3 * 60
Ex: Position = initial + rate * 60 would be
grouped into the following tokens:
Position (identifier)
= (Assignment symbol)
initial (identifier)
+ (Plus symbol)
rate (identifier)
* (Multiplication symbol)
60 (Number)
12
Phases of compiler
Compiler
Analysis phase Synthesis phase
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
13
Syntax analysis
Syntax Analysis is also called Parsing or Position = initial + rate*60
Hierarchical Analysis. Lexical analysis
The syntax analyzer checks each line of
the code and spots every tiny mistake. id1 = id2 + id3 * 60
If code is error free then syntax Syntax analysis
analyzer generates the tree.
=
id1 +
id2 *
id3 60
14
Phases of compiler
Compiler
Analysis phase Synthesis phase
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
15
Semantic analysis
=
Semantic analyzer determines the
meaning of a source string. id1 +
It performs following operations: id2 * int to
1. matching of parenthesis in the real
id3 60
expression.
2. Matching of if..else statement. Semantic analysis
3. Performing arithmetic operation =
that are type compatible. id1 +
4. Checking the scope of operation.
id2 *
*Note: Consider id1, id2 and id3 are real id3 inttoreal
60
16
Phases of compiler
Compiler
Analysis phase Synthesis phase
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
17
Intermediate code generator
=
Two important properties of
intermediate code : id1 +
1. It should be easy to produce.
id2 *
2. Easy to translate into target
t3 id3 inttoreal
program. t2 t1
Intermediate form can be represented 60
Intermediate code
using “three address code”.
Three address code consist of a
t1= int to real(60)
sequence of instruction, each of which t2= id3 * t1
has at most three operands. t3= t2 + id2
id1= t3
18
Phases of compiler
Compiler
Analysis phase Synthesis phase
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
19
Code optimization
It improves the intermediate code.
This is necessary to have a faster Intermediate code
execution of code or less consumption
t1= int to real(60)
of memory. t2= id3 * t1
t3= t2 + id2
id1= t3
Code optimization
t1= id3 * 60.0
id1 = id2 + t1
20
Phases of compiler
Compiler
Analysis phase Synthesis phase
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code
Semantic analysis generation
21
Code generation
The intermediate code instructions are
Code optimization
translated into sequence of machine
instruction. t1= id3 * 60.0
id1 = id2 + t1
Code generation
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2,R1
MOVF R1, id1
22
Phases of compiler
Source program
Analysis Phase
Lexical analysis
Syntax analysis
Semantic analysis
Symbol table Error detection
and recovery
Intermediate code
Sr. Variable Type Address Code optimization
Name
1 Position Float 0001
Code generation Synthesis Phase
2 Initial Float 0005
3 Rate Float 0009
Target Program
23
Exercise
Write output of all the phases of compiler for following
statements:
1. x = b-c*2
2. I=p*n*r/100
24
Context of compiler (Cousins of compiler)
Skeletal
Source Preprocessor
Program
Preprocessor
It performs the following functions:
1. Macro processing
2. File inclusion
3. Rational preprocessor
4. Language extensions
25
Context of compiler (Cousins of compiler)
Skeletal
Source Preprocessor
Program
Preprocessor
1. Macro processing: Allows user to define macros. Macro is shorthand
for longer constructs.
Ex: #define PI 3.14159265358979323846
2. File inclusion: A preprocessor may include the header file into the
program.
Ex: #include<stdio.h>
3. Rational preprocessor: It provides built in macro for construct like
while statement or if statement.
26
Context of compiler (Cousins of compiler)
Skeletal
Source Preprocessor
Program
Preprocessor
4. Language extensions: Add capabilities to the language by using
built-in macros.
• Ex: the language equal is a database query language
embedded in C.
• Statement beginning with ## are taken by preprocessor to be
database access statement unrelated to C and translated into
procedure call on routines that perform the database access.
27
Context of compiler (Cousins of compiler)
Skeletal
Source Preprocessor Compiler
Source
Program Program
Compiler
• A compiler is a program that reads a program written in source
language and translates it into an equivalent program in target
language.
28
Context of compiler (Cousins of compiler)
Skeletal
Preprocessor Compiler
Source Source Target Assembler
Program Program Assembly
Program
Assembler
• Assembler is a translator which takes the assembly program
(mnemonic) as an input and generates the machine code as an
output.
29
Context of compiler (Cousins of compiler)
Skeletal
Preprocessor Compiler
Source Source Target Assembler Relocatable
Program Program Assembly Object Code
Program
Linker / Loader
Libraries &
Linker Object Files
• Linker makes a single program from a several files of relocatable
machine code.
• These files may have been the result of several different compilation,
and one or more library files.
30
Context of compiler (Cousins of compiler)
Skeletal
Preprocessor Compiler
Source Source Target Assembler Relocatable
Program Program Assembly Object Code
Program
Linker / Loader
Libraries &
Loader Object Files
• The process of loading consists of: Absolute
1. Taking relocatable machine code. Machine
Code
2. Altering the relocatable address.
3. Placing the altered instructions and
data in memory at the proper
location.
31
Front end & back end (Grouping of phases)
Front end: Depends primarily on source language and largely independent of
the target machine.
It includes following phases:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Creation of symbol table
Back end: Depends on target machine and do not depend on source program.
It includes following phases:
6. Code optimization
7. Code generation phase
8. Error handling and symbol table operation
32
Pass structure
One complete scan of a source program is called pass.
Pass includes reading an input file and writing to the output file.
Compiler pass refers to the traversal of a compiler through the
entire program.
Compiler pass are two types: Single Pass Compiler, and Two Pass
Compiler or Multi Pass Compiler.
33
Single Pass Compiler
If we combine or group all the phases of compiler in
a single module known as single pass compiler.
A one pass/single pass compiler is that type of compiler that
passes through the part of each compilation unit exactly once.
Single pass compiler is faster and smaller than the multi pass
compiler.
As a disadvantage of single pass compiler is that it is less efficient
in comparison with multipass compiler.
Single pass compiler is one that processes the input exactly once,
so going directly from lexical analysis to code generator, and then
going back for the next read.
34
Two Pass compiler or Multi Pass compiler
A Two pass/multi-pass Compiler is a type of compiler that processes
the source code or abstract syntax tree of a program multiple times.
In multipass Compiler we divide phases in two pass as:
1. First Pass is refers as
• Front end
• Analytic part
• Platform independent
2. Second Pass: is refers as
• Back end
• Synthesis Part
• Platform Dependent
35
Pass structure
Forward reference: A forward reference of a program entity is a
reference to the entity which precedes its definition in the program.
This problem can be solved by postponing the generation of target
code until more information concerning the entity becomes
available.
It leads to multi pass model of compilation.
In Pass I: Perform analysis of the source program and note
relevant information.
In Pass II: Generate target code using information noted in pass I.
36
Types of compiler
1. One pass compiler
• It is a type of compiler that compiles whole process in one-pass.
2. Two pass compiler
• It is a type of compiler that compiles whole process in two-pass.
• It generates intermediate code.
3. Incremental compiler
• The compiler which compiles only the changed line from the source
code and update the object code.
4. Native code compiler
• The compiler used to compile a source code for a same type of
platform only.
5. Cross compiler
• The compiler used to compile a source code for a different kinds platform.
37