IT3323E
COMPILER CONSTRUCTION
1
Why study compiler construction? (1)
1. understand what is going on inside the tools that you use.
2. the theoretical and practical knowledge that is needed to
implement a programming language, know the
characteristics of many programming languages.
3. design and implement your own domain-specific language.
4. a gentle introduction to formal methods that are used for
general purpose software design.
2
Why study compiler construction? (2)
5. A large variety of applications can be modelled after a
compiler (or some part thereof). Simulators, debuggers,
program analysis tools, editors, IDEs, RDBMSs
6. complicate techniques in text processing
7. optimization techniques
8. working with really big data structures and complex
interactions between algorithms
Help you out on your next big
programming project.
3
More details about the course
• How do computers work?
(instruction set, registers, addressing modes, run-time data
structures, ...)
• How do compilers work?
• What machine code is generated for certain language
constructs?
• What is good language design?
4
Course Outline
• Functions of a language processor
• The phases of a compiler
• Generative grammar
• BNF and syntax diagrams
• Scanner
• Top down parsing with Backtracking
• Predictive parsing
• LL(k) grammars
5
Course Outline
• Recursive Descent Parsing
• The Parser of KPL
• Semantic Analysis
• Stack calculator
• Intermediate Code Generation
• Object Code Generation
• Code optimization
6
Textbooks
• Aho.A.V, Sethi.R., Lam M., Ullman.J.D.
Compiler : Principles, Techniques and Tools.
Addison Wesley. 2007.
Vietnamese translation (1986 edition)
Trình biên dịch: Nguyên lý, Kỹ thuật và Công cụ
Nhà xuất bản Thống kê, 2000
7
Text books
• Andrew.W.Appel
Modern Compiler Implementation in Java
Princeton University.1998
• Bal.H. E.
Modern Compiler Design.
John Wiley & Sons Inc (2000)
• William Allan Wulf.
The Design of an Optimizing Compiler
Elsevier Science Ltd (1980)
• Charles N. Fischer.
Crafting a Compiler
Benjamin-Cummings Pub Co (1987)
8
Course Grade Components
Approximate percentage of grade
Midterm (Practice) 50
Final exam (Quiz) 50
Reward points 1: evaluation in labs.
9
Unit 1.
Functions of
a Language Processor
10
High Level Programming Languages
• Programming languages have taken 5 generation
• A language is considered high or low level depending on its
abstraction
A high level language may use natural language elements,
be easier to use, or more portable across platforms
Low level languages are closer to the hardware
11
The first and the second generation
• The first generation: machine language
• The second generation : Assembly
• Languages of the first and the second generation are low
level languages
12
The Third Generation
• Easier to read, write and maintain
• Allow declarations
• Most 3GLs supports structured programming
• Examples: Fortran, Cobol, C, C++, Basic . . . .
13
The Fourth Generation
• Designed with a specific purpose in mind, such as the
development of commercial business software
• Reduce programming effort, cost of software development
• May include form or report builder
• Examples :SQL, Visual Basic, Oracle (SQL plus, Oracle Form,
Oracle Report). . . .
14
The Fifth Generation
• Based around solving problems using constraints given to the
program, rather than using an algorithm written by a
programmer
• Are designed to make the computer solve a given problem
without the programmer
• Most constraint-based and logic programming languages and
some declarative languages are fifth-generation languages.
15
Characteristics of high-level languages
• Hardware independence
• Close to natural languages
• Easy to read, write and maintain
• Programs written in a high-level language must be translated
into machine language
• Often result in slower execution speed, higher memory
consumption
16
Syntax and Semantics of Programming Languages
• Syntax:The way symbols can be combined to create well-
formed sentence (program) in the language
• Semantics: The meaning of syntactically valid strings in a
language
17
Language Processors
• A program that performs tasks, such as translating and
interpreting, required for processing a specified programming
language.For example,
• Compiler
• Assembler
• Interpreter
• Compiler - Compiler
18
Compilers and Interpreters
• A Compiler is a program that translates code of a programming language into
machine code (assembly)
An interpreter translates some form of source code into a target representation that
it can immediately execute and evaluate
Modification of Interpreter :a program that implements or simulates a virtual machine using the
base set of instructions of a programming language as its machine language
19
Language Translation
• A compiler or an interpreter is a computer program (or set of
programs) that converts program written in high-level
language into machine code understood by the computer
• Native-code compiler: produces machine code
• Compiled languages: Pascal, C, C++, …
• Interpreter: translates into internal form and immediately
executes
• Interpreted languages: Javascript, PHP, Haskell…
• Hybrid approaches: VB.net, Python, Java
20
Language Compilation
• Compiler: program that translates a source language into a
target language
• Target language is often, but not always, the assembly language for a
particular machine
C
source code C ASM Assembler/ op codes
compiler Interpreter
21
Compilers
• Scans the entire program and translates it as a whole into
machine code.
• Takes large amount of time to analyze the source code but
the overall execution time is comparatively faster.
• Generates the error message only after scanning the whole
program.
Input
Source Target
Compiler
Program Program
Error messages Output
22
C translator: compiler
23
Compilers (cont’d)
• The most common reason for wanting to transform source
code is to create an executable program.
• To run a program, execute the compiler (and possibly an
assembler) to translate the source program into a machine
language program.
• Execute the resulting machine language program, supplying
appropriate input.
24
Checks During Compilation
• Syntactically invalid constructs
• Invalid type conversions
• A value is used in the “wrong” context, e.g., assigning a float to an int
• Static determination of type information is also used to
generate more efficient code
• Know what kind of values will be stored in a given memory region
during program execution
• Some programmer logic errors (not all compilers)
• Can be subtle: if (a = b) … instead of if (a == b) …
slide 25
Compilation Process
Syntax and static
type errors
Lexical tokens Syntax ASTs Intermediate IC ICopt Final code
analyzer + Optimizer gen
analyzer code gen
type checker
raw source ASM
code text
Assembler/
Preprocessor Interpreter
Source code with Machine code
preprocessor directives
26
Phases of Compilation
• Preprocessing: conditional macro text substitution
• Lexical analysis: convert keywords, identifiers, constants into a
sequence of tokens
• Syntactic analysis: check that token sequence is syntactically
correct
• Semantic Analysis: Generate abstract syntax trees (AST), check
types
• Intermediate code generation: “walk” the ASTs (or Parse Trees) and
generate intermediate code
• Apply optimizations to produce efficient code
• Target code generation: produce Assembly code
slide 27
Interpreters
• Translates program one statement at a time
• Takes less amount of time to analyze the source code but the overall
execution time is slower.
• Continues translating the program until the first error is met, in which
case it stops.
• Oversimplified view:
Source
Program Interpreter Output
Input
Error messages
• Accepts the source language program and the appropriate input
• Itself produces the output of the program.
28
Language Interpretation
• Parse the source code and perform its behaviour directly
• Translate source code into some efficient intermediate representation
of object code and immediately execute that
• Explicitly execute stored precompiled bytecode made by a compiler
and matched with the interpreter virtual machine.
• Read-eval-print loop (REPL)
• Read in an expression, translate into internal form
• Evaluate internal form
• This requires an abstract machine and a “run-time” component (usually a compiled
program that runs on the native machine)
input
• Print the result of evaluation expression REPL result
• Loop back to read the next expression interpreter
Interpreter
runtime
slide 29
Console of javascript
• JavaScript is an interpreted language
• It has no compilation step.
• An interpreter in the browser reads over the JavaScript code, interprets each
line, and runs it.
• The function console.log prints
“Hello world” to the console.
30
Write a program (.js file) with Kdevelop
31
Bytecode Compilation
• Combine compilation with interpretation
• Idea: remove inefficiencies of read-eval-print loop
• Bytecodes are conceptually similar to real machine opcodes,
but they represent compiled instructions to a virtual machine
instead of a real machine
• Source code statically compiled into a set of bytecodes
• Bytecode interpreter implements the virtual machine
Bytecode Bytecode
source compiler bytecodes interpreter result
program
Virtual machine
runtime
32
Python translator: hybrid solution
Python editor window
33
Python translator: hybrid solution
Python shell window
34
Pros and cons of compiled and interpreted languages
35
Interpreter as a part of compiler
• In a compiler implementations
• The source code is compiled to a machine language for an
idealized virtual machine
• The interpreter of accepts the codes and the input, produces
the output.
• This technique is quite popular to make the compiler
independent with the computer (portability of the compiler)
36
Cousins of the compiler
• Interpreter
• Assembler
• Linker
• Loader
• Preprocessor
• Editor
• Debugger
• Profiler
37
The context of a compiler in a language processor
Skeletal Source Program
Preprocessor
Source Program
Compiler
Target Assembly Program
Assembler/Interpreter
Relocatable Object Code
Linker+Loader Libraries and
Relocatable Object Files
Absolute Machine Code
38