Compiler Design
Dr. Sherif Eletriby
Textbook
❑Compilers: Principles, Techniques, and Tools, A. V. Aho, R.
 Sethi, J. D. Ullman.
❑Compiler Design: Theory, Tools, and Examples, Seth D.
 Bergmann.
  Welcome!
•Programming languages are notations for describing
   computations to people and to machines.
•Before a program can be run, it first must be translated
   into a form in which it can be executed by a computer.
•The software systems that do this translation are
   called compilers.
Why to Learn Compiler design?
 Computers are a balanced mix of software and hardware. Hardware is
 just a piece of mechanical device and its functions are being controlled
 by a compatible software. Hardware understands instructions in the form
 of electronic charge, which is the counterpart of binary language in
 software programming. Binary language has only two alphabets, 0 and
 1. To instruct, the hardware codes must be written in binary format,
 which is simply a series of 1s and 0s. It would be a difficult and
 cumbersome task for computer programmers to write such codes, which
 is why we have compilers to write such codes.
Planned Topics
  ⚫   Chapter 1: Introduction
  ⚫   Chapter 2: Lexical Analysis
  ⚫   Chapter 3: Syntax Analysis
  ⚫   Chapter 4: Top-Down Parsing
  ⚫   Chapter 5: Bottom-Up Parsing
  ⚫   Chapter 6: Code Generation
  ⚫   Chapter 7: Optimization
Chapter 1: Introduction
                     What is a Compiler?
• Recall from your study of assembly language or computer
  organization the kinds of instructions that the computer’s CPU is
  capable of executing.
   (1) add two numbers stored in memory,
   (2) move numbers from one location in memory to another,
   (3) move information between the CPU and memory.
• But, How computer can see like this instruction?
            ➢if (array6[loc]<MAX) sum=0; else array6[loc]=0;
                    It is needing Compiler
Definition: Compiler
     A compiler is a program which translates any program written in a
     high-level language (source code/ source program) into an equivalent
     program in the machine language (Target program).
     Input: A program (source file/ source program)
     Output: A program (object file), in machine language (Target program)
           Source
           program                                          Equivalent
          written by                                        program in
                                Compiler                     low level
            source
          language                                           language
       source program                                     Target program
                                 Compiler
• A compiler accepts as input a program written in a particular high-level language
  and produces as output an equivalent program in machine language.
• The input program is known as the source program, and its language is the source
  language.
• The output program is known as the object program, and its language is the object
  language.
• A compiler translates source language programs into equivalent object language
  programs.
                                                                                  9
If a portion of the input to a compiler looked like this:
                          A = B + C * D;
The output corresponding to this input might look
something like this:
The compiler must be smart enough to know that the multiplication
should be done before the addition even though the addition is read
first when scanning the input.
                       Compiler
• Show the output of a Java native code compiler, in any typical
  assembly language, for the following Java input string:
                   while (x<a+b) x = 2*x;
                                                                   11
Language Processing System
We have learnt that any
computer system is made of
hardware and software.
The hardware understands a
language, which humans cannot
understand.
So we write programs in high-
level language, which is easier
for us to understand and
remember. These programs are
then fed into a series of tools and
OS components to get the
desired code that can be used by
the machine. This is known as
Language Processing System.
    Let us first understand how a program, using compiler, is
    executed on a host machine.
•    User writes a program in language
     (high-level language).
•    The compiler, compiles the program and
     translates it to assembly program (low-
     level language).
•    An assembler then translates the
     assembly program into machine code
     (object).
•    A linker tool is used to link all the parts
     of the program together for execution
     (executable machine code).
•    A loader loads all of them into memory
     and then the program is executed.
 Example of compiler
             C Compiler
             C++ compiler for the Mac (Intel).
             Java compiler for the Sun (MIPS).
The major task of a linker is to search and locate
referenced module/routines in a program and to
determine the memory location where these codes
will be loaded, making the program instruction to
have absolute references.
       Example of compiler
                      Save
                      code
                                                       Object files
            Source files           Compiler               .Obj
              .cpp, .h     Run -
                .java      Build
                                                          Linker
⚫   Linker : computer program that takes one or more
    objects files generated by a compiler and             Object
    combines them into a single executable program       Program
                                                        .exe - com
   Example of compiler
       Examples:
              C++ compiler for the Mac (Intel).
Assembler:
 An assembler translates assembly language
 programs into machine code. The output of an
 assembler is called an object file, which contains a
 combination of machine instructions as well as the
 data required to place these instructions in memory.
       Example of compiler
                      A Java compiler for the Sun (MIPS).
    ▪ The java compiler compiles to 'byte code', an
      intermediate form for the java 'virtual machine'.
في بعض األحيان تتم عملية الترجمة على شكل خليط من عمل المترجم والمفسر معا كما هو الحال في معالج لغة  JAVAالذي يجمع
بين الترجمة والتفسير حيث تتم أوال ترجمة البرنامج المكتوب بلغة  JAVAإلى برنامج وسيط يسمى  bytecodesوالذي يتم
                                                           الحقا تفسيره بواسطة اآللة االفتراضية .Virtual Machine
الفائدة من هذا الترتيب هو أن  bytecodesالتي تمت ترجمتها على آلة معينة يمكن تفسيرها على آلة أخرى كما يمكن ذلك من
خالل شبكة .باإلضافة إلى المترجم فإن هناك مجموعة من البرامج ال بد من توفرها حتى تتم عملية إنتاج برنامج الهدف القابل
                                                                                                           للتنفيذ.
Example of compiler   A Java compiler for the Sun (MIPS).
                     Compiler and Interpreter
Interpreters: is software translate source program statement by statement
(or line by line) and then this statement is given to the CPU to be executed
before the next statement in being translated.
• The input to an interpreter is a program written in a high-level language, but rather
  than generating a machine language program, the interpreter actually carries out the
  computations specified in the source program.
  الذي يتلخص عمله في أنه بدال من إنتاج برنامج الهدفInterpreter هناك نوع آخر شائع من معالجات اللغة يسمى المفسر
  يقوم المفسر بإدخال البيانات في نفس الوقت الذي تجري فيه ترجمة البرنامج أي،أوال ثم إدخال البيانات الالزمة للتنفيذ بعد ذلك
                                                                                                          .بشكل متزامن
    Compiler and Interpreter
The output of a compiler is a program (Target program), whereas
the output of an interpreter is the source program’s output.
                      Compiler vs Interpreter
Sample Problem:
Show the compiler output and the interpreter output for the following source code:
for (i=1; i<=4; i++) cout << i*3;
                    Compiler output                 Interpreter output
Difference between compiler and Interpreter
The machine-language target program produced by a
compiler is usually much faster than an interpreter at
mapping inputs to outputs .
An interpreter, however, can usually give better error
diagnostics than a compiler, because it executes the
source program statement by statement.
        Compile-Time vs. Run-Time Errors
• Compile time: the time at which a source program is compiled.
• Run time: the time at which the resulting object program is loaded
  and executed.
                                                                       24
                               Compiler Language
      • It is important to remember that a compiler is a program, and it must
        be written in some language (machine, assembly, high-level).
      • In describing this program, we are dealing with three languages:
          (1) the source language, i.e. the input to the compiler,
          (2) the object language, i.e. the output of the compiler,
          (3) the language in which the compiler is written, or the language in which it
              exists,
                               A compiler which translates Java    A compiler which translates PC
A Java compiler for the Mac.
                               programs to Mac machine language,   machine language programs to
                               and which runs on a Sun machine.    Java, written in Ada
                                                                                            25
                      Compiler Language
Using the big C notation to show each of the following compilers:
1. An Ada compiler which runs on the PC and compiles to the PC machine
   language.
2. An Ada compiler which compiles to the PC machine language, but which is
   written in Ada.
3. An Ada compiler which compiles to th PC machine language, but which runs on
   a Sun.
                                                                             26
The Structure of a Compiler
The Phases of a Compiler
The Structure of a Compiler
The Structure of a Compiler
            The Phases of a Compiler
The compilation process        is   a
sequence of various phases.
Each phase takes input       from its
previous stage, has        its own
representation of source    program,
and feeds its output to     the next
phase of the compiler.
Let us now understand the phases
of a compiler.
The Phases of a Compiler
                  Lexical Analysis (Scanner)
• The first phase of a compiler is called lexical analysis which generate a
  stream of tokens as output from source program.
• Also, it build a symbol table to store all identifiers used in the source
  program.
   •   key words - while, void, if, for, ...
   •   identifiers - declared by the programmer
   •   operators - +, -, *, /, =, ==, ...
   •   numeric constants - numbers such as 124, 12.35, 0.09E-23, etc.
   •   character constants - single characters or strings of characters
   •   special characters - characters used as delimiters such as . ( ) , ; :
   •   comments - ignored by subsequent phases. These must be identified by the
       scanner, but are not included in the output.
          Lexical Analysis (Scanner) Example
• Show the token classes, or “words”, put out by the lexical analysis phase
  corresponding to this Java source input:
                      sum = sum + unit * /* accumulate sum */ 12 ;
•   Identifier          (sum)
•   operator            (=)
•   Identifier          (sum)
•   operator           (+)
•   identifier         (unit)
•   operator           (*)
•   numeric constant (12)
•   special characters (;)
                  Syntax Analysis Phase
• The syntax analysis phase is often called the parser.
• The parser will check for the program syntax error and determine the
  underlying structure of the source program.
• The output of this phase may be a stream of atoms or a collection of
  syntax trees.
• For example, (ADD, A, B, C): The meaning of the following atom
  would be to add A and B, and store the result into C.
                  Syntax Analysis Phase
Example:
• Show the atoms corresponding to the following Java statement:
                           a=b+c*d;
   (MULT, c, d, temp1)
   (ADD, b, temp1, temp2)
   (MOVE, temp2, a)
                   Syntax Analysis Phase
Example:
• Show the atoms corresponding to the following Java statement:
                      while (a <= b) a = a + 1;
(LBL, L1)
(Test, a, <=, b, L2)
(JMP, L3)
(LBL, L2)
(ADD, a, 1, a)
(JMP, L1)
(LBL, L3)
                 Syntax Analysis Phase
• Show a syntax tree for the Java statement:
                    if (a+3 < 400) a =0; else b = a*a;
• Assume that an if statement consists of three subtrees, one for the
  condition, one for the consequent statement, and one for the else
  statement, if necessary.
                             Global Optimization
   • The global optimization phase is optional. Its purpose is simply to make the object
     program more efficient in space and/or time.
   • Examining the sequence of atoms put out by the parser to find redundant or
     unnecessary instructions or inefficient code.
   • Since it is invoked before the code generator, this phase is often called machine-
     independent optimization.
                                        • Examples:
stmt1                 y=16;                                        x = Math.sqrt (y); // loop invariant
go to label1          for (i=1; i<=100000; i++)                    for (i=1; i<=100000; i++)
stmt2                 { x = Math.sqrt (y); // square root method   System.out.println (x+i);
stmt3                 System.out.println (x+i);
label1: stmt4         }
                                                                   This would eliminate 99,999 unnecessary
stmt2 and stmt3 can   the assignment to x need not be inside       calls to the sqrt method at run time.
never be executed.    the loop since y does not change as the
                      loop repeats
                                 Code Generation
• Java compilers produce an intermediate form, known as byte code, which can be interpreted by the
  Java run-time environment.
• But, in this course, we will be assuming that our compiler is to produce native code for a particular
  machine.
• So, It is assumed that you have had some experience with assembly language and machine
  language.
• For example, an ADD atom might be translated to three machine language instructions: (1) load the
  first operand into a register, (2) add the second operand to that register, and (3) store the result, as
  shown for the atom (ADD, a, b,temp):
                       • LOD r1,a              // Load a into reg. 1
                       • ADD r1,b              // Add b to reg. 1
                       • STO r1,temp           // Store reg. 1 in temp
                   Code Generation Example
• Show assembly language instructions corresponding to the following atom
  string:
   •   (ADD, a, b, temp1)
   •   (TEST, a, ==, b, L1)
   •   (MOVE, temp1, a)
   •   (LBL, L1)
   •   (MOVE, temp1, b)
    LOD r1,a
    ADD r1,b
    STO r1,temp1        // ADD, a, b, temp1
    CMP a,b
    BE L1                // TEST, A, ==, B, L1
    MOV a,temp1         // MOVE, temp1, a
L1: MOV b,temp1         // MOVE, temp1, b
                        Local Optimization
• It involves examining sequences of instructions put out by the code generator to
 find unnecessary or redundant instructions.
• For this reason, local optimization is often called machine-dependent optimization.
• the expression a + b + c in the source program might result in the following
 instructions as code generator output:
   LOD r1,a           // Load a into register 1
   ADD r1,b           // Add b to register 1
   STO r1,temp1       // Store the result in temp1*
   LOD r1,temp1       // Load result into reg 1*
   ADD r1,c           // Add c to register 1
   STO r1,temp2       // Store the result in temp2
                        Local Optimization
• It involves examining sequences of instructions put out by the code generator to
 find unnecessary or redundant instructions.
• For this reason, local optimization is often called machine-dependent optimization.
• the expression a + b + c in the source program might result in the following
 instructions as code generator output:
   LOD r1,a           // Load a into register 1
   ADD r1,b           // Add b to register 1
   STO r1,temp1       // Store the result in temp1*
   LOD r1,temp1       // Load result into reg 1*
   ADD r1,c           // Add c to register 1
   STO r1,temp2       // Store the result in temp2
                   Implementation Techniques
 • A new compiler, with all optimizations, could take over a person-year
   to implement.
 • For this reason, we are always looking for techniques or shortcuts
   which will speed up the development process.
 • So, we can generate compilers from other compilers that have been
   developed previously.
If the source program is a compiler which translates language A into language B, then the object
program will also be a compiler which translates language A into language B.
Implementation Techniques
The End
          Questions?