Unit – I
Introduction to Compilers
Analp Pathak
Assistant Professor
Computer Science Engineering Department
SRM IST, NCR Campus
2
Compiler Course Outline
(Unit – 1 to Unit 5)
• Ch. 1: Introduction
• Ch. 3: Lexical Analysis and Lex/Flex
• Ch. 4: Syntax Analysis
• Ch. 5: Syntax-Directed Translation
• Ch. 6: Type Checking
• Ch. 7: Run-Time Environments
• Ch. 8: Intermediate Code Generation
• Ch. 9: Code Generation
• Ch.10: Code Optimization
• Reference:
Aho, Ullman, and Sethi, “Compilers: Principles, Techniques, and Tools”
3
Textbook
• Compilers: Principles, Techniques, and Tools, 2/E.
– Alfred V. Aho, Columbia University
– Ravi Sethi, Avaya Labs
– Jeffrey D. Ullman, Stanford University
Commonly Known as
Dragon Book
2nd Edition with
Monica S. Lam,
Stanford University
4
Contents
• Overview and History
• What Do Compilers Do
• Compiler and Interpreters
• Phases of a compiler
• Analysis of source program / Translation
• Tex
• Features of Good Programming Language
• The Syntax and Semantics of Programming Language
• Computer Architecture and Compiler Design
• Compiler Construction Tools
5
Overview and History
• Cause
– Software for early computers was written in assembly language
– The benefits of reusing software on different CPUs started to
become significantly greater than the cost of writing a compiler
• The first real compiler
– FORTRAN compilers of the late 1950s
– 18 person-years to build
6
Overview and History (Contd…)
• Compiler technology
– is more broadly applicable and has been employed in
rather unexpected areas.
• Text-formatting languages, like nroff and troff;
preprocessor packages like eqn, tbl, pic
• Silicon compiler for the creation of VLSI circuits
• Command languages of OS
• Query languages of Database systems
7
What Do Compilers Do
• Compilers may generate three types of code:
– Pure Machine Code
• Machine instruction set without assuming the existence
of any operating system or library.
• Mostly being OS or embedded applications.
– Augmented Machine Code
• Code with OS routines and runtime support routines.
• More often
– Virtual Machine Code
• Virtual instructions, can be run on any architecture with
a virtual machine interpreter or a just-in-time compiler
• Ex. Java
8
What Do Compilers Do (Contd…)
• Another way that compilers differ from one
another is in the format of the target machine
code they generate:
– Assembly or other source format
– Relocatable binary
• Relative address
• A linkage step is required
– Absolute binary
• Absolute address
• Can be executed directly
9
Compilers and Interpreters
• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent
program written in a target language
Input
Source Target
Compiler
Program Program
Error messages Output
10
Compilers
• Source languages: Fortran, Pascal, C, etc.
• Target languages: another PL, machine Lang
• Compilers:
– Single-pass
– Multi-pass
– Load-and-Go
– Debugging
– Optimizing
11
Compilers and Interpreters
(cont’d)
• “Interpretation”
– Performing the operations implied by the
source program
Source
Program
Interpreter Output
Input
Error messages
12
Other Tools that Use the
Analysis-Synthesis Model
• Editors (syntax highlighting)
• Pretty printers (e.g. doxygen)
• Static checkers (e.g. lint and splint)
• Interpreters
• Text formatters (e.g. TeX and LaTeX)
• Silicon compilers (e.g. VHDL)
• Query interpreters/compilers (Databases)
13
Preprocessors, Compilers,
Assemblers, and Linkers
Skeletal Source Program
Preprocessor
Source Program
Try for example:
Compiler gcc -v myprog.c
Target Assembly Program
Assembler
Relocatable Object Code
Linker Libraries and
Relocatable Object Files
Absolute Machine Code
14
The Analysis-Synthesis Model of
Compilation
• There are two parts to compilation:
– Analysis determines the operations implied by the
source program which are recorded in a tree
structure
– Synthesis takes the tree structure and translates the
operations therein into the target program
15
Phases of Compiler
16
The Structure of a Compiler
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Representation
Symbol and Optimizer
Attribute
Tables
(Used by all Phases of The Compiler)
Code
Generator
Target machine code
17
The Structure of a Compiler (Contd…)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Representation
Scanner
The scanner begins the analysis of the source program by
Symbol and Optimizer
reading the input, character by character, and grouping
Attribute
characters into individual words and symbols (tokens)
Tables
RE ( Regular expression )
(Used
NFA ( Non-deterministic Finite by all )
Automata
DFA ( Deterministic FinitePhases
Automataof
)
LEX The Compiler) Code
Generator
Target machine code
18
The Structure of a Compiler (Contd…)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Parser Representation
Given a formal syntax specification (typically as a context-
free grammar [CFG] ), the parse reads tokens and groups
Symbol and Optimizer
them into units as specified by the productions of the CFG
Attribute
being used.
Tables
As syntactic structure is recognized, the parser either calls
corresponding semantic routines directly or builds a syntax
(Used by all
tree.
Phases
CFG ( Context-Free Grammar ) of
BNF ( Backus-Naur FormThe
) Compiler) Code
GAA ( Grammar Analysis Algorithms ) Generator
LL, LR, SLR, LALR Parsers
YACC
Target machine code
19
The Structure of a Compiler (Contd…)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Semantic Routines Representation
Perform two functions
Check the static semantics of each construct
Symbol and
Do the actual translation Optimizer
Attribute
The heart of a compiler
Tables
Syntax Directed Translation
Semantic Processing (Used by all
Techniques
Phases of
IR (Intermediate Representation)
The Compiler) Code
Generator
Target machine code
20
The Structure of a Compiler (Contd…)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Optimizer Representation
The IR code generated by the semantic routines is
analyzed and transformed into functionally equivalent but
Symbol and Optimizer
improved IR code
Attribute
This phase can be very complex and slow
Tables
Peephole optimization
loop optimization, register allocation, code scheduling
(Used by all
Phases
Register and Temporary of
Management
Peephole OptimizationThe Compiler) Code
Generator
Target machine code
21
The Structure of a Compiler (Contd...)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Code Generator Representation
Interpretive Code Generation
Generating Code from Tree/Dag
Grammar-Based Code Generator Optimizer
Code
Generator
Target machine code
22
Symbol-table Management
• To record the identifiers in source program
– Identifier is detected by lexical analysis and then is
stored in symbol table
• To collect the attributes of identifiers
(not by lexical analysis)
– Storage allocation : memory address
– Types
– Scope (where it is valid, local or global)
– Arguments (in case of procedure names)
• Arguments numbers and types
• Call by reference or address
• Return types
23
Symbol-table Management
• Semantic analysis uses type information check
the type consistence of identifiers
• Code generating uses storage allocation
information to generate proper relocation
address code
24
Error Detection and Reporting
• Syntax and semantic analysis handle a large fraction
of errors
• Some of the Errors may occur in compilation phase
– Lexical phase: could not form any token
• e.g. Misspelling or Juxtaposing of characters
– Syntax phase: tokens violate structure rules
• e.g. Unbalanced parenthesis, Missing punctuation
operators or Undeclared variables.
– Semantic phase: no meaning of operations
• Add an array name and a procedure name, Truncation
of results or Unreachable Code.
25
Translation of A Statement
26
Translation of A Statement
27
Translation of A Statement
28
Properties and Advantage of
Intermediate Code
• Properties:
– It should be easy to produce.
– It should be easy to translate into the target machine.
• Advantage (In current scenario, the internet era):
– Consistent and Continuous Programming Model
– Develop once and run anywhere
– Simplified deployment
– Wide platform reach
– Programming language integration
– Simplified code reuse
– Interoperability
29
Intermediate Code Representation
• Two kinds of representations
– Trees
• Parse Trees, Syntax Trees
– Linear representations
• Three-address code
30
Code Optimization
• It attempts to improve the intermediate code, to
produce more efficient code that will result a faster-
running machine code.
• Compiler Optimizations must meet the following
objectives:
– The optimization must be correct, that is, preserve the
meaning of the compiled program,
– The optimization must improve the performance of many
programs,
– The compilation time must be kept reasonable, and
– The engineering effort required must be manageable.
31
Factors influenced Code Generation
• Register Allocation
• Register Scheduling
• Code Selection
• Addressing Modes
• Instruction format
• Power of instructions
• Optimization at the machine code level
• Back patching
32
Analysis in Text Formatters:
• TeX is a typesetting system designed and mostly written by Donald
Knuth, and released in 1978.
• TeX was designed with two main goals:
– To allow anybody to produce high-quality books using a reasonably
minimal amount of effort .
– To provide a system that would give exactly the same results on all
computers, now and in the future.
• TeX is a popular means by which to typeset complex mathematical
formulae.
• It noted as one of the most sophisticated digital typographical systems in
the world.
• TeX is popular in academia, especially in mathematics, computer
science, economics, engineering, physics, statistics, and quantitative
psychology.
• Tex many other typesetting tasks, LaTeX, ConTeXt and other packages.
33
Analysis in Text Formatters (Contd…)
• \hbox {<list of boxes>}
• \hbox {\vbox{! 1} \vbox{@ 2}}
34
The Grouping of Phases
• Compiler front and back ends:
– Analysis (machine independent front end)
– Synthesis (machine dependent back end)
• Passes
– A collection of phases may be repeated only once
(single pass) or multiple times (multi pass)
– Single pass: usually requires everything to be defined
before being used in source program
– Multi pass: compiler may have to keep entire program
representation in memory
35
Features of a Good Programming Language
• Easy to understand, and Expressive Power
• Interoperability and Portability
• Good turnaround time
• Automatic error recovery and Good Error reporting
• Efficient memory usage and Garbage Collection
• Provision of good run-time environment
– Support for virtual machine
– Support for concurrent operation
– Support for unblock operations
• Ability to interface with foreign functions
• Ability to model real-world problems
• Ability to expose the functions for usage in other languages
36
The Syntax and Semantics of Programming Language
• A programming language must include the
specification of syntax (structure) and semantics
(meaning).
• Syntax typically means the context-free syntax
because of the almost universal use of context-free-
grammar (CFGs)
• Ex.
– a = b + c is syntactically legal
– b + c = a is illegal
37
The Syntax and Semantics of Programming
Language (Contd…)
• The semantics of a programming language are
commonly divided into two classes:
– Static semantics
• Semantics rules that can be checked at compiled time.
• Ex. The type and number of a function’s arguments
– Runtime semantics
• Semantics rules that can be checked only at run time
38
Computer Architecture and Compiler Design
• Compilers should exploit the hardware-specific feature and
computing capability to optimize code.
• The problems encountered in modern computing platforms:
– Instruction sets for some popular architectures are highly nonuniform.
– High-level programming language operations are not always easy to
support.
• Ex. exceptions, threads, dynamic heap access …
– Exploiting architectural features such as cache, distributed processors
and memory
– Effective use of a large number of processors
39
Compiler Design Considerations
• Debugging Compilers
– Designed to aid in the development and debugging of
programs.
• Optimizing Compilers
– Designed to produce efficient target code
• Retargetable Compilers
– A compiler whose target architecture can be changed
without its machine-independent components having to be
rewritten.
40
Compiler-Construction Tools
• Software development tools are available to
implement one or more compiler phases
– Scanner generators
– Parser generators
– Syntax-directed translation engines
– Automatic code generators
– Data-flow engines
– Compiler-construction toolkits