19CSC62 -
COMPILER DESIGN
              Syllabus
..\19CSC62 CD.docx
                 Outline
• Introduction of Compilation and
  Interpretation
• Disciplines involved
• Brief History
• Abstract view for a compiler
• Analysis of the source program
• Grouping of phases
• The phases of compiler
• Compiler Construction Tools
          Disciplines involved
•   Algorithms
•   Languages and machines
•   Operating systems
•   Computer architectures
1.1 Why? A Brief History
               Why Compiler
• Writing machine language-numeric codes is time
  consuming and tedious
                  C7 06 0000 0002
                      Mov x, 2
                        X=2
• The assembly language has a number of defects
  – Not easy to write
  – Difficult to read and understand
     Brief History of Compiler
• The first compiler was developed between 1954
  and 1957
  – The FORTRAN language and its compiler by a team at
    IBM led by John Backus
  – The structure of natural language was studied at about
    the same time by Noam Chomsky
      Brief History of Compiler
• The related theories and algorithms in the 1960s and 1970s
   – The classification of language: Chomsky hierarchy
   – The parsing problem was pursued:
       • Context-free language, parsing algorithms
   – The symbolic methods for expressing the structure of
     the words of a programming language:
       • Finite automata, Regular expressions
   – Methods have been developed for generating efficient
     object code:
       • Optimization techniques or code, improvement techniques
     Brief History of Compiler
• Programs were developed to automate the
  complier development for parsing
  – Parser generators,
     • such as Yacc by Steve Johnson in 1975 for the
       Unix system
  – Scanner generators,
     • such as Lex by Mike Lesk for Unix system about
       same time
     Brief History of Compiler
• Projects focused on automating the
  generation of other parts of a compiler
  – Code generation was undertaken during the late
    1970s and early 1980s
  – Less success due to our less than perfect
    understanding of them
      Brief History of Compiler
• Recent advances in compiler design
   – More sophisticated algorithms for inferring and/or
     simplifying the information contained in program,
      • such as the unification algorithm of Hindley-Milner type
        checking
   – Window-based Interactive Development Environment,
      • IDE, that includes editors, linkers, debuggers, and project
        managers.
   – However, the basic of compiler design have not
     changed much in the last 20 years.
                                                                      BACK
             Compiler learning
• Isn’t it an old discipline?
   – Yes, it is a well-established discipline
   – Algorithms, methods and techniques are researched and
     developed in early stages of computer science growth
   – There are many compilers around and many tools to
     generate them automatically
• So, why we need to learn it?
   – Although you may never write a full compiler
   – But the techniques we learn is useful in many tasks like
     writing an interpreter for a scripting language,
     validation checking for forms and so on
                Terminology
• Assembler
• Loaders And Linkers
• Macro processors
• Compiler
• Interpreter
Language Processing System
      Preprocessor & its Functions
    A preprocessor produce input to compilers. They may perform the
    following functions.
•   1. Macro processing: A preprocessor may allow a user to
  define macros that are short hands for longer constructs.
• 2. File inclusion: A preprocessor may include header
  files into the program text.
• 3. Rational preprocessor: these preprocessors augment
  older languages with more modern flow-of-control and
  data structuring facilities.
• 4. Language Extensions: These preprocessor attempts to
  add capabilities to the language by certain amounts to
  build-in macro
                Compiler
• Compiler is a translator program that
  translates a program written in (HLL) the
  source program and translates it into an
  equivalent program in (MLL) the target
  program.
              Assembler
• written to automate the translation of
  assembly language into machine language.
• input to an assembler program is called
  source program, the output is a machine
  language translation (object program).
     Loader & its Functions
• A loader is a system program, which takes
  the object code of a program as input and
  prepares it for execution.
• Loader Function :
  – Allocation - The loader determines and allocates
    the required memory space for the program to
    execute properly.
  – Linking -- The loader analyses and resolve the
    symbolic references made in the object modules.
• Relocation - The loader maps and relocates the
  address references to correspond to the newly
  allocated memory space during execution.
• Loading - The loader actually loads the machine
  code corresponding to the object modules into the
  allocated memory space and makes the program
  ready to execute.
                Translator
1. Translating the HLL program input into an
   equivalent ML program.
2. Providing diagnostic messages wherever
   the programmer violates specification of
    the HLL.
TYPE OF TRANSLATORS:-
• Interpreter
• Compiler
• preprocessor
               Abstract view
     Source                            Machine
     code            Compiler          code
                          errors
•   Recognizes legal (and illegal) programs
•   Generate correct code
•   Manage storage of all variables and code
•   Agreement on format for object (or
    assembly) code
 PHASES OF A COMPILER:
• Two phases of compilation
a. Analysis (Machine Independent/Language
   Dependent)
b. Synthesis (Machine Dependent/Language
   independent)
Compilation process is partitioned into no-of-
   sub processes called “phases”
     Front-end, Back-end division
    Source                 IR                 Machine
              Front end            Back end
    code                                      code
                          errors
•   Front end maps legal code into IR
•   Back end maps IR onto target machine
•   Simplify retargeting
•   Allows multiple front ends
•   Multiple passes -> better code
Phases of a Compiler
                Example
• Suppose a source program contains the
  assignment statement
   position = initial + rate * 60
                 Front end
    Source             tokens            IR
             Scanner            Parser
    code
                       errors
•   Recognize legal code
•   Report errors
•   Produce IR
•   Preliminary storage maps
                       Front end
   Source                    tokens                     IR
                 Scanner                Parser
   code
                              errors
• Scanner:
   – Maps characters into tokens – the basic unit of syntax
      • x = x + y becomes <id, x> = <id, x> + <id, y>
   – Typical tokens: number, id, +, -, *, /, do, end
   – Eliminate white space (tabs, blanks, comments)
• A key issue is speed so instead of using a tool like
  LEX it sometimes needed to write your own
  scanner
                     Front end
  Source                  tokens            IR
                Scanner            Parser
  code
                          errors
• Parser:
   –   Recognize context-free syntax
   –   Guide context-sensitive analysis
   –   Construct IR
   –   Produce meaningful error messages
   –   Attempt error correction
• There are parser generators like YACC which
  automates much of the work
                Front end
• Context free grammars are used to represent
  programming language syntaxes:
<expr> ::= <expr> <op> <term> |
 <term>
<term> ::= <number> | <id>
<op> ::= + | -
                Front end
• A parser tries to map a
  program to the syntactic
  elements defined in the
  grammar
• A parse can be
  represented by a tree
  called a parse or syntax
  tree
                Front end
• A parse tree can be
  represented more
  compactly referred to as
  Abstract Syntax Tree
  (AST)
• AST is often used as IR
  between front end and
  back end
                 Back end
         Instruction             Register    Machine code
    IR    selection             Allocation
                       errors
• Translate IR into target machine code
• Choose instructions for each IR operation
• Decide what to keep in registers at each
  point
• Ensure conformance with system interfaces
                 Back end
         Instruction             Register    Machine code
    IR    selection             Allocation
                       errors
• Produce compact fast code
• Use available addressing modes
                  Back end
          Instruction             Register    Machine code
    IR     selection             Allocation
                        errors
• Have a value in a register when used
• Limited resources
• Optimal allocation is difficult
   Traditional three pass compiler
Source               IR   Middle   IR              Machine
         Front end                      Back end
code                       end                     code
                          errors
 • Code improvement analyzes and change IR
 • Goal is to reduce runtime
        Middle end (optimizer)
• Modern optimizers are usually built as a set
  of passes
• Typical passes
  –   Constant propagation
  –   Common sub-expression elimination
  –   Redundant store elimination
  –   Dead code elimination
                Some Questions
• What is the difference between a compiler and an
  interpreter?
• What are the advantages of (a) a compiler over an
  interpreter (b) an interpreter over a compiler?
• What advantages are there to a language-processing
  system in which the compiler produces assembly language
  rather than machine language?
• A compiler that translates a high-level language into
  another high-level language is called a source-to-source
  translator. What advantages are there to using C as a target
  language for a compiler?
                Readings
• Chapter 1 of the book