0% found this document useful (0 votes)

46 views63 pages

Introduction1

Uploaded by

bhuralqau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views63 pages

Introduction1

Uploaded by

bhuralqau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

CS-411:Compiler Construction

1. Introduction
Why Compilers?
• Compiler
– A program that translates from 1 language to another
– It must preserve semantics of the source
– It should create an efficient version of the target
language
• In the beginning, there was machine language
– Ugly – writing code, debugging
– Then came textual assembly
– High-level languages – Fortran, Pascal, C, C++
– Machine structures became too complex and software
management too difficult to continue with low-level
languages
Compiler learning
• Isn’t it an old discipline?
– Yes, it is a well-established discipline
– Algorithms, methods and techniques are researched and
developed in early stages of computer science growth
– There are many compilers around and many tools to
generate them automatically
• So, why we need to learn it?
– Although you may never write a full compiler
– But the techniques we learn is useful in many tasks like
writing an interpreter for a scripting language,
validation checking for forms and so on
Why Study Compilers? (1)
• Become a better programmer(!)
– Insight into interaction between languages,
compilers, and hardware
– Understanding of implementation techniques
– Better intuition about what your code does
Why Study Compilers? (2)
• Compiler techniques are everywhere
– Parsing (little languages, interpreters, XML)
– Database engines, query languages
– AI: domain-specific languages
– Text processing
• Tex/LaTex -> dvi -> Postscript -> pdf
– Hardware: VHDL; model-checking tools
– Mathematics (Mathematica, Matlab)
Why Study Compilers? (3)
• Fascinating blend of theory and engineering
– Direct applications of theory to practice
• Parsing, scanning, static analysis
– Some very difficult problems (NP-hard or
worse)
• Resource allocation, “optimization”, etc.
• Need to come up with good-enough
approximations/heuristics
Why Study Compilers? (4)
• Ideas from many parts of CS
– AI: Greedy algorithms, heuristic search
– Algorithms: graph algorithms, dynamic programming,
approximation algorithms
– Theory: Grammars, DFAs and PDAs, pattern matching,
fixed-point algorithms
– Systems: Allocation & naming, synchronization,
locality
– Architecture: pipelines, instruction set use, memory
hierarchy management
Why Study Compilers? (5)
• You might even write a compiler some day!
– You’ll almost certainly write parsers and
interpreters in some context if you haven’t
already
Course scope
• Aim:
– To learn techniques of a modern compiler
• Books:
– Des Watson, A Practical Approach to Compiler
Construction, Springer, 2017.
– Compilers – Principles, Techniques and Tools, Second
Edition by Alfred V. Aho, Ravi Sethi, Jeffery D.
Ullman
Prerequisites
• Problem Solving & Programming (C++,
Python)
• Data Structures
• Algorithm Design
• Theory of Automata
• Computer Architecture
• Assembly Programming
• Operating Systems
Topics
• High Level Languages
• Lexical analysis (Scanning)
• Syntax Analysis (Parsing)
• Syntax Directed Translation
• Intermediate Code Generation
• Run-time environments
• Code Generation
• Machine Independent Optimization
Grading policy
• Midterm (25 Marks)
• Final exam (50 Marks)
• Assignments (20 Marks)
• Quizes (5 Marks)
Terminology
• Compiler:
– a program that translates an executable program in one
language into an executable program in another
language
– we expect the program produced by the compiler to be
better, in some way, than the original
• Interpreter:
– a program that reads an executable program and
produces the results of running that program
– usually, this involves executing the source program in
some fashion
• Our course is mainly about compilers but many of
the same issues arise in interpreters
Common Issues
• Compilers and interpreters both must read
the input – a stream of characters – and
“understand” it; analysis

w h i l e ( k < l e n g t h ) { <nl> <tab> i f ( a [ k ] > 0

) <nl> <tab> <tab>{ n P o s + + ; } <nl> <tab> }
Interpreter
• Interpreter
– Execution engine
– Program execution interleaved with analysis
running = true;
while (running) {
analyze next statement;
execute that statement;
}
– Usually need repeated analysis of statements
(particularly in loops, functions)
– But: immediate execution, good debugging &
interaction
Compiler
• Read and analyze entire program
• Translate to semantically equivalent program in
another language
– Easier to execute or more efficient
– Should “improve” the program in some fashion
Typical Implementations
• Compilers
– FORTRAN, C, C++, Java, COBOL, etc. etc.
– Strong need for optimization in many cases
• Interpreters
– PERL, Python, Ruby, awk, sed, shells,
Scheme/Lisp/ML, postscript/pdf, Java VM
– Particularly effective if interpreter overhead is
low relative to execution cost of individual
statements
Hybrid approaches
• Well-known example: Java
– Compile Java source to byte codes – Java Virtual
Machine language (.class files)
– Execution
• Interpret byte codes directly, or
• Compile some or all byte codes to native code
– Just-In-Time compiler (JIT) – detect hot spots & compile on the
fly to native code – standard these days
• Variation: .NET
– Compilers generate MSIL
– All IL compiled to native code before execution
High-Level Languages
• In the 1940s it was clear that there was a
need for software tools to support the
programming process.
• Programming was done in machine code
– required considerable skill and was hard work
• Assembly language
– relieving the programmer from having to deal
with much of the low-level detail
– but require an assembler
High-Level Languages (Cont..)
• Development of high-level languages
gathered speed in the 1950.
• Need for compilers and other tools for the
implementation of these languages.
• The importance of formal language
specifications was recognized
• The correspondence between particular
grammar types and implementation was
understood
High-Level Languages (Cont..)
• The extensive use of high-level languages
prompted the rapid development of a wide
range of new languages.
– COBOL for business applications
– FORTRAN for numerical computation
– PL/I general-purpose
Advantages of High-Level Languages

• Problem solving is significantly faster

• High-level language programs
– Easier to read
– Understand
– Maintain
• High-level languages are easier to learn.
• Programs are structured more easily
• Data structuring features
Advantages of High-Level Languages (Cont..)

• Object orientation
• Support for asynchronous processes and
parallelism
• Software portability
– Machine independence
– Java
Advantages of High-Level Languages (Cont..)

• Compile-time checking can remove many

bugs at an early stage
– variable declarations
– type checking
– variables initialization
– compatibility in function arguments
• The compiler can insert runtime code such
as array bound checking
Disadvantages of High-Level Languages

• The program may need to perform some

low-level, hardware-specific operations
which do not correspond to a high-level
language feature.
• In most high-level languages there is no
way to express direct machine addressing
• Less efficiency in terms of execution speed
Abstract view

Source Machine
code Compiler code

errors
• Recognizes legal (and illegal) programs
• Generate correct code
• Manage storage of all variables and code
• Agreement on format for object (or
assembly) code
Structure of a Compiler
• First approximation
– Front end: analysis
• Read source program and understand its structure
and meaning
– Back end: synthesis
• Generate equivalent target language program

Source Front End Back End Target

Implications
• Must recognize legal programs (& complain about
illegal ones)
• Must generate correct code
• Must manage storage of all variables/data
• Must agree with OS & linker on target format

Source Front End Back End Target

More Implications
• Need some sort of Intermediate Representation(s)
(IR)
• Front end maps source into IR
• Back end maps IR to target machine code
• Often multiple IRs – higher level at first, lower
level in later phases

Source Front End Back End Target

Front End
• Split into two parts
– Scanner: Responsible for converting character stream to
token stream
• Also strips out white space, comments
– Parser: Reads token stream; generates IR
• Both of these can be generated automatically
– Source language specified by a formal grammar
– Tools read the grammar and generate scanner & parser
(either table-driven or hard-coded)

source tokens IR
Scanner Parser
Front end
Source tokens IR
Scanner Parser
code

errors

• Scanner:
– Maps characters into tokens – the basic unit of syntax
• x = x + y becomes <id, x> = <id, x> + <id, y>
– Typical tokens: number, id, +, -, *, /, do, end
– Eliminate white space (tabs, blanks, comments)
• A key issue is speed so instead of using a tool like
LEX it sometimes needed to write your own
scanner
Front end
Source tokens IR
Scanner Parser
code

errors
• Parser:
– Recognize context-free syntax
– Guide context-sensitive analysis
– Construct IR
– Produce meaningful error messages
– Attempt error correction
• There are parser generators like YACC which
automates much of the work
Front end
• Context free grammars are used to represent
programming language syntaxes:

<expr> ::= <expr> <op> <term> | <term>

<term> ::= <number> | <id>
<op> ::= + | -
Front end
• A parser tries to map a
program to the syntactic
elements defined in the
grammar
• A parse can be
represented by a tree
called a parse or syntax
tree
• x+2-y
Front end
• A parse tree can be
represented more
compactly referred to as
Abstract Syntax Tree
(AST)
• AST is often used as IR
between front end and
back end
Back end
Instruction Register Machine code
IR selection Allocation

errors

• Translate IR into target machine code

• Choose instructions for each IR operation
• Decide what to keep in registers at each
point
• Ensure conformance with system interfaces
Back end
Instruction Register Machine code
IR selection Allocation

errors

• Produce compact fast code

• Use available addressing modes
Back end
Instruction Register Machine code
IR selection Allocation

errors

• Have a value in a register when used

• Limited resources
• Optimal allocation is difficult
Traditional three pass compiler

Source IR Middle IR Machine

Front end Back end
code end code

errors

• Code improvement analyzes and change IR

• Goal is to reduce runtime
Middle end (optimizer)
• Modern optimizers are usually built as a set
of passes
• Typical passes
– Constant propagation
– Common sub-expression elimination
– Redundant store elimination
– Dead code elimination
Defining a Programming Language

• Important to users of the programming

language as well as to the compiler writer.
• how to write programs in the language
• Syntax
– Defines the sequences of characters that could
form a valid program
• Semantics
– the meaning of these programs
Defining a Programming Language

• The language definition should be

– Clear
– Precise
– Complete
– Unambiguous
– Understandable
• Programming languages have been defined
using natural language and ambiguities
• syntax is usually done using a more formal
approach
Backus–Naur Form (BNF)
• Metalanguage
• Its use in the definition of the syntax of
ALGOL 60
• It is a very simple
• powerful metalanguage
• used extensively to support the formal
definitions of a huge range of languages
Backus–Naur Form (BNF)
• BNF specification consists of a set of rules
• Rule defining a symbol
– non-terminal symbols
• in angle brackets in BNF
– terminal symbol
• cannot be expanded further
• Tokens
– A Start Symbol
• non-terminal
A trivial language
• <sentence> ::= <subject> <verb> <object>
• <subject> ::= <article> <noun>
• <object> ::= <article> <noun> | <article>
<adjective> <noun>
• <verb> ::= watches | hears | likes
• <article> ::= a | the
• <noun> ::= man | woman | bicycle | book
• <adjective> ::= red | big | beautiful
Example
• <sentence>
• <subject><verb><object>
• <article><noun><verb><object>
• the <noun><verb><object>
• the woman <verb><object>
• the woman watches <object>
• the woman watches <article><adjective><noun>
• the woman watches a <adjective><noun>
• the woman watches a beautiful <noun>
• the woman watches a beautiful bicycle
BNF for Expression
• Slightly more complicated example
– <expr> ::= <term> | <expr> + <term> |
<expr> - <term>
– <term> ::= <factor> | <term> * <factor> |
<term> / <factor>
– <factor> ::= <integer> | (<expr>)
– <integer> ::= <digit> | <integer> <digit>
– <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
– Expressions such as 2+3+4/5 and 2+3*(44-567)
BNF
• There is no upper limit to the length of
expressions that can be generated by these
particular rules.
• This is because these rules make use of
recursion
• Consider the generation or derivation of the
expression 1+2*3
BNF
• <expr>
• <expr> + <term>
• <term> + <term>
• <factor> + <term>
• 1 + <term>
• 1 + <term> * <factor>
• 1 + <factor> * <factor>
• 1 + 2 * <factor>
• 1+2*3
BNF
• Note particularly that in the expansion
– 1+2*3
– 1+ <term>
– 2*3 grouped as a single entity called a <term>
• BNF rules can be used to support the idea of
operator precedence
• * operator is higher than the precedence of the
+
• The parsing of the BNF rules allow the
specification of the precedence of operators.
BNF
• BNF rules can be used to express the
associativity of operators.
• consider the generation 1+2+3.
• Here, the 1+2 part of the expression is
grouped as a single entity called a <term>.
• Therefore, the expression 1+2+3 is
interpreted as (1+2)+3.
• The + operator is left-associative.
BNF
• If different associativity or precedence rules
are required, then the BNF could be
modified to express these different rules.
• It is perhaps surprising that such a simple
metalanguage can do all this.
Extended Backus–Naur Form (EBNF)

• There are several different variants of the

low-level syntax of BNF-like
metalanguages, but one variant became
popular after its use in the ISO Pascal
Standard.
• This variant was called Extended Backus–
Naur Form (EBNF)
• It retains the basic principles of BNF but the
syntactic detail is a little different.
Extended Backus–Naur Form (EBNF)

• expr = term | expr "+" term | expr "-" term.

• term = factor | term "*" factor | term "/"
factor.
• factor = integer | "("expr")".
• integer = digit | integer digit.
• digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" |
"7" | "8" | "9".
Extended Backus–Naur Form (EBNF)

• Some other key features.

– Parentheses can be used to indicate grouping in
the rule.
– There is a specific feature to indicate
optionality in a rule: [X] specifies zero or one
instance of X, in other words specifying that X
is optional.
– Repetition (not by recursion) is supported too:
{X} implies zero or more instances of X.
Extended Backus–Naur Form (EBNF)

• We can therefore write an essentially

equivalent set of rules:
• expr = term | expr ("+" | "-") term.
• term = factor | term ("*" | "/") factor.
• factor = integer | "("expr")".
• integer = digit {digit}.
• digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" |
"7" | "8" | "9".
Syntax Diagram
• Graphical representation of Syntax
• Complete syntax definition is made up of a
set of syntax diagrams
Grammars
• The term “grammar” has a wide range of
definitions
• We need a tight and formal definition for it
when used in the context of programming
languages.
• The idea of a set of BNF rules to represent
the grammar of a language has already been
discussed, but formally a little more is
required.
Grammars
• The grammar (G) of a language is defined
as a 4-tuple G = (N, T, S, P) where:
– N is the finite set of non-terminal symbols.
– T is the finite set of terminal symbols (N and T
are disjoint.)
– S is the starting symbol, S ∈ N. The starting
symbol is the unique non-terminal symbol that
is used as the starting point for the generation
of all the strings of the language.
Grammars
• P is the finite set of production rules
– A production rule defines a string transformation and it has the general
form
• α → β.
– This rule specifies that any occurrence of the string α in the string to
be transformed is replaced by the string β.
• There are constraints on the constitution of the strings α
and β.
– If U is defined by U = N ∪ T
• U is the set of all non-terminal and terminal symbols
• then α has to be a member of the set of all non-empty strings that
can be formed by the concatenation of members of U, and it has to
contain at least one member of N.
• β has to be a member of the set of all strings that can be formed by
the concatenation of members of U, including the empty string
– β ∈ U∗
Chomsky Hierarchy

Type 0

Type 1

Type 2

Type 3
Chomsky Hierarchy
• Chomsky type 0
– a free grammar or an unrestricted grammar
–α→β
• Chomsky type 1
– a context-sensitive grammar
– αAβ → αγβ
Chomsky Hierarchy
• Chomsky type 2
– a context-free grammar
– A→γ
– where A is a single non-terminal symbol.
– These productions correspond directly to BNF
rules.
• Chomsky type 3
– a regular grammar or a finite-state grammar
– A → a or A → aB

Compiler Construction Week 1
No ratings yet
Compiler Construction Week 1
34 pages
Intro-To-compilation 1
No ratings yet
Intro-To-compilation 1
38 pages
CH 1
No ratings yet
CH 1
21 pages
Compiler Lecture-1
No ratings yet
Compiler Lecture-1
47 pages
UNIT1
No ratings yet
UNIT1
40 pages
Week 1-2
No ratings yet
Week 1-2
27 pages
1 - Unit1
No ratings yet
1 - Unit1
20 pages
Lecture 1
No ratings yet
Lecture 1
20 pages
Compiler Construction: Mohamed Zahran (Aka Z) Mzahran@cs - Nyu.edu
No ratings yet
Compiler Construction: Mohamed Zahran (Aka Z) Mzahran@cs - Nyu.edu
37 pages
Unit 1
No ratings yet
Unit 1
38 pages
Compiler Design
No ratings yet
Compiler Design
25 pages
TK3163 Sem2 2023 1MyCh1.1-1.2 Intro
No ratings yet
TK3163 Sem2 2023 1MyCh1.1-1.2 Intro
43 pages
Introduction To Compilers
No ratings yet
Introduction To Compilers
14 pages
Compiler Construction: Nguyen Thi Thu Huong Department of Computer Science-HUST Email: Cell Phone 0903253796
No ratings yet
Compiler Construction: Nguyen Thi Thu Huong Department of Computer Science-HUST Email: Cell Phone 0903253796
35 pages
Lec 1
No ratings yet
Lec 1
26 pages
A Overview
No ratings yet
A Overview
37 pages
Introduction
No ratings yet
Introduction
40 pages
Introduction
No ratings yet
Introduction
22 pages
1 1 Introduction
No ratings yet
1 1 Introduction
27 pages
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
No ratings yet
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
31 pages
Compiler Construction Course Guide
No ratings yet
Compiler Construction Course Guide
26 pages
19CSE401 CD 01 Introduction
No ratings yet
19CSE401 CD 01 Introduction
28 pages
CDDD
No ratings yet
CDDD
22 pages
TK3163 Sem2 2020 1MyCh1.1-1.2 Intro-20200211121547
No ratings yet
TK3163 Sem2 2020 1MyCh1.1-1.2 Intro-20200211121547
39 pages
Compiler Design - Compilers Principles and Practice - A.hosking - Compiler Course Slides
No ratings yet
Compiler Design - Compilers Principles and Practice - A.hosking - Compiler Course Slides
237 pages
ch1 Intro1
No ratings yet
ch1 Intro1
27 pages
Unit 1 (A)
No ratings yet
Unit 1 (A)
40 pages
Compiler Course for CS Students
No ratings yet
Compiler Course for CS Students
79 pages
Compiler Course for CS Students
No ratings yet
Compiler Course for CS Students
97 pages
Mado CSC434 Compiler Const E-Lecture
No ratings yet
Mado CSC434 Compiler Const E-Lecture
79 pages
Lecture 1 - CSC 303
No ratings yet
Lecture 1 - CSC 303
40 pages
Compile Construction
No ratings yet
Compile Construction
84 pages
Compilers and Interpreters: Why Interpretation
No ratings yet
Compilers and Interpreters: Why Interpretation
189 pages
01 Introduction Annotated - PPTX 1
No ratings yet
01 Introduction Annotated - PPTX 1
63 pages
CMP 335 01. Introduction-1
No ratings yet
CMP 335 01. Introduction-1
18 pages
Compiler Design Ch1
No ratings yet
Compiler Design Ch1
13 pages
CSC-459-CC-Lab Manual
No ratings yet
CSC-459-CC-Lab Manual
71 pages
Compiler Design Course Overview
No ratings yet
Compiler Design Course Overview
26 pages
Compiler Design 1
No ratings yet
Compiler Design 1
55 pages
Debre Markos University Burie Campus Departement of Computer Science
No ratings yet
Debre Markos University Burie Campus Departement of Computer Science
44 pages
Compiler Design
No ratings yet
Compiler Design
29 pages
01 - Introduction To Compilers Structure & Goals
No ratings yet
01 - Introduction To Compilers Structure & Goals
22 pages
SLD 1
No ratings yet
SLD 1
30 pages
Lec 4
No ratings yet
Lec 4
60 pages
CSC303 - Compiler Design - 060624
No ratings yet
CSC303 - Compiler Design - 060624
49 pages
Chapter 1 Introduction To Compiler Construction
No ratings yet
Chapter 1 Introduction To Compiler Construction
27 pages
Compiler Construction
No ratings yet
Compiler Construction
237 pages
L1 Introduction
No ratings yet
L1 Introduction
58 pages
Compiler Course for CS Students
No ratings yet
Compiler Course for CS Students
41 pages
CSC 321 Compiler Consturction 1 Note Main
No ratings yet
CSC 321 Compiler Consturction 1 Note Main
82 pages
Introduction To Compiler Design (CD) : Mu-Mit
No ratings yet
Introduction To Compiler Design (CD) : Mu-Mit
22 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
Compiler Design and Language Translation
No ratings yet
Compiler Design and Language Translation
60 pages
Compiler Design Introduction
No ratings yet
Compiler Design Introduction
37 pages
Compiler
No ratings yet
Compiler
84 pages
CH1-1 and 1-2
No ratings yet
CH1-1 and 1-2
34 pages
Compilers
No ratings yet
Compilers
86 pages
Module 4 (ISTQB) - Shreejith
No ratings yet
Module 4 (ISTQB) - Shreejith
19 pages
VTP402 Operator Panel Specs
No ratings yet
VTP402 Operator Panel Specs
7 pages
ICDL Presentation Guide 2013
No ratings yet
ICDL Presentation Guide 2013
160 pages
GC 2025 01 26
No ratings yet
GC 2025 01 26
3 pages
Excel AVERAGE Function Guide
No ratings yet
Excel AVERAGE Function Guide
14 pages
Comp ss1
No ratings yet
Comp ss1
8 pages
Avision AV186+ A4 Document Scanners PDF
No ratings yet
Avision AV186+ A4 Document Scanners PDF
2 pages
Beginning C 5 0 Databases Second Edition Agarwal No Waiting Time
100% (4)
Beginning C 5 0 Databases Second Edition Agarwal No Waiting Time
138 pages
Class 12 Selection 24-25
No ratings yet
Class 12 Selection 24-25
6 pages
Aman's Resume
No ratings yet
Aman's Resume
1 page
Learning Microstation VBA PDF
No ratings yet
Learning Microstation VBA PDF
933 pages
Sap Abap Events in Reports
No ratings yet
Sap Abap Events in Reports
13 pages
Mining Modeling - Flexsim
100% (1)
Mining Modeling - Flexsim
19 pages
Quick Start: Axure RP
No ratings yet
Quick Start: Axure RP
14 pages
W2L2 - File Processing
No ratings yet
W2L2 - File Processing
18 pages
Value-Added Services (VAS) in SAP EWM - SAP Blogs
100% (1)
Value-Added Services (VAS) in SAP EWM - SAP Blogs
15 pages
Data Science & Machine Learning Using Python - CDR
No ratings yet
Data Science & Machine Learning Using Python - CDR
8 pages
Step by Step Process For RING UK
No ratings yet
Step by Step Process For RING UK
2 pages
23e05-St#is#6532 Task-02
No ratings yet
23e05-St#is#6532 Task-02
16 pages
Entity Relationship Model
No ratings yet
Entity Relationship Model
53 pages
Flowchart Symbols Guide
No ratings yet
Flowchart Symbols Guide
2 pages
Linux Bootloader and Kernel Guide
No ratings yet
Linux Bootloader and Kernel Guide
35 pages
Android Lab Manual
No ratings yet
Android Lab Manual
63 pages
Datascope System 98XTSM Service Manual
100% (1)
Datascope System 98XTSM Service Manual
439 pages
Iphone SE
No ratings yet
Iphone SE
1 page
Rakesh Kumar: Career Objective
No ratings yet
Rakesh Kumar: Career Objective
2 pages
Intro to Industrial Robotics
0% (1)
Intro to Industrial Robotics
22 pages
Previous Year Paper Solution PDF
No ratings yet
Previous Year Paper Solution PDF
14 pages
Software Engineering Overview
No ratings yet
Software Engineering Overview
100 pages
Data Structures & Algorithms Assignment
No ratings yet
Data Structures & Algorithms Assignment
2 pages