Atcd-Unit-5 (1) - 2
Atcd-Unit-5 (1) - 2
UNIT-5
                       Machine Dependent Phases
Syllabus:
UNIT-V: Machine Dependent Phases
Intermediate Code Generation: Intermediate code, three address code,
quadruples, triples, directed acyclic graph.
Code Optimization: Common sub expression elimination, copy propagation, dead
code elimination, constant folding, strength reduction, loop optimization.
Code Generation: Basic blocks & flow graphs, Peephole optimization, Register
allocation and assignment.
 Example: x = (a + b * c) / (a – b * c)
AST DAG
     A node in a Directed Acyclic Graph (DAG) may have more than one
      parent.
III IT-I SEM                   Regulation: R20                      ATCD: UNIT-5
Problem: Consider the following expression and construct a DAG for it-
               (((a+a)+(a+a))+((a+a)+(a+a)))
Solution-
Postfix Notation:
   Also known as reverse Polish notation or suffix notation.
   The ordinary (infix) way of writing the sum of a and b is with an
      operator in the middle: a + b The postfix notation for the same
      expression places the operator at the right end as ab+
   In general, if e1 and e2 are any postfix expressions, and + is any binary
      operator, the result of applying + to the values denoted by e1 and e2 is
      postfix notation by e1e2 +
   No parentheses are needed in postfix notation because the position and
      arity (number of arguments) of the operators permit only one way to
      decode a postfix expression.
   Unlike infix notation, postfix notation places the operator after the two
      operands. And, it doesn’t use any parenthesis to group expressions.
   In postfix notation, the operator follows the operand.
      Example 1: The postfix representation of the expression (a + b) * c is :
      ab + c *
III IT-I SEM                      Regulation: R20                     ATCD: UNIT-5
    Infix Expression: ( ( A + ( B * C ) ) / ( D + E ) )
    Postfix Expression: A B C* + D E + /
    Infix Expression: ( ( A + B ) * ( C + E ) )
    Postfix Expression: A B + C E + *
    Infix Expression: ( A * ( B * ( ( ( C + A) + B) * C ) ) )
    Postfix Expression: ABC A + B + C * * *
  Infix Expression: ( ( H * ( ( ( ( A + ( ( B + C ) * D ) ) * F ) * G ) * E ) ) + J)
  Postfix Expression: H A B C + D * + F * G * E * * J +
Three-Address Code (TAC)
   A statement involving no more than three addresses (two for operands
     and one for result) is known as a three address statement. A sequence of
     three address statements is known as a three address code.
   Three address statement is of form x = y op z, where x, y, and z will
     have address (memory location). Sometimes a statement might contain
     less than three references but it is still called a three address statement.
   Three address codes is a type of intermediate code which is easy to
     generate and can be easily converted to machine code.
   It makes use of at most three addresses and one operator to represent an
     expression and the value computed at each instruction is stored in
     temporary variable generated by compiler.
   The compiler decides the order of operation given by three address
     code.
III IT-I SEM                      Regulation: R20                   ATCD: UNIT-5
     General representation:
      a = b op c
     Where a, b or c represents operands like names, constants or compiler
     generated temporaries and op represents the operator
   Example: The three address code for the expression a + b * c + d is
        t1 = b * c
        t2 = a + t1
        t3 = t2 + d
      Where t1 , t2 , t3 are temporary variables.
Implementation of Three Address Codes:
 (Data structures for three address codes)
     There are 3 representations of three address code namely
              1. Quadruples
              2. Triples
              3. Indirect Triples
Quadruples:
   Quadruple is defined as a record structure used to represent a three-
     address statement.
   It consists of four fields. The first field contains the operator, the second
     and third fields contain the operand 1 and operand 2, respectively, and the
     last field contains the result of that three-address statement.
   Advantages:
          Easy to rearrange code for global optimization.
          One can quickly access value of temporary variables using
             symbol table.
   Disadvantage –
         Contain lot of temporaries.
         Temporary variable creation increases time and space complexity.
   Example: Consider expression a = b * – c + b * – c.
     The three address code is:
    t1 = uminus c
    t2 = b * t1
    t3 = uminus c
    t4 = b * t3
    t5 = t2 + t4
    a = t5
III IT-I SEM                        Regulation: R20                    ATCD: UNIT-5
Triples
    A triple is also defined as a record structure that is used to represent a
        three address statement.
       In triples, for representing any three-address statement three fields are
        used, namely, operator, operand 1 and operand 2, where operand 1 and
        operand 2 are pointers to either symbol table or they are pointers to the
        records (for temporary variables) within the triple representation itself.
        In this representation, the result field is removed to eliminate the use of
        temporary names referring to symbol table entries. Instead, we refer the
        results by their positions.
       The pointers to the triple structure are represented by parenthesized
        numbers, whereas the symbol-table pointers are represented by the names
        themselves.
       Note that instead of referring the temporary t by its name, we refer it by
        its position in the triple.
     Disadvantages:
                Temporaries are implicit and difficult to rearrange code.
                It is difficult to optimize because optimization involves moving
                 intermediate code. When a triple is moved, any other triple
                 referring to it must be updated also. With help of pointer one can
                 directly access symbol table entry.
     Example:Consider expression a = b * – c + b * – c
III IT-I SEM                   Regulation: R20                     ATCD: UNIT-5
Indirect Triples:
    An indirect triple representation consists of an additional array that
       contains the pointers to the triples in the desired order
      Its similar in utility as compared to quadruple representation but
       requires less space than it.
      Temporaries are implicit and easier to rearrange code.
Question: Write quadruple, triples and indirect triples for following expression
          : (x + y) * (y + z) + (x + y + z)
Explanation: The three address code is:
t1   =   x + y
t2   =   y + z
t3   =   t1 * t2
t4   =   t1 + z
t5   =   t3 + t4
III IT-I SEM                     Regulation: R20                      ATCD: UNIT-5
Problem:
Convert S = -z/a * (x + y) into three address code,where Unary minus is represented
by -z.
The following is the three-address code:
t1 = x + y
t2 = a * t1
t3 = - z
t4 = t3/t2
S = t4
III IT-I SEM                    Regulation: R20                     ATCD: UNIT-5
Problem: Construct Quadruples, Triples, and Indirect Triples for the expression
         -(a + b) * (c + d) - (a + b + c)
Solution
First of all this statement will be converted into Three Address Code as:
t1 = a + b
t2 = −t1
t3 = c + d
t4 = t2 ∗ t3
t5 = t1 + c
t6 = t4 − t5
III IT-I SEM   Regulation: R20   ATCD: UNIT-5
III IT-I SEM                            Regulation: R20                         ATCD: UNIT-5
Code Optimization
     Code Optimization is an approach to improve the code by eliminating
      unnecessary code lines and arranging the statement in such a sequence that
      speed up the program execution without changing the meaning of the
      program
     The code produced by the straight forward compiling algorithms can often be made to r
      faster or take less space, or both. This improvement is achieved by program transforma
      that are traditionally called optimizations. Compilers that apply code-improving
      transformations are called optimizing compilers.
   The process of code optimization involves:
        Eliminating the unwanted code lines
        Rearranging the statements of the code
   The criteria for code Optimization:
        The transformation must preserve the meaning of programs. That
           is, the optimization must not change the output produced by a
           program for a given input.
        A transformation must, on the average, speed up programs by a
           measurable amount.
        The transformation must be worth the effort. i.e., yield the most
           benefit for the least effort.
   Advantages of Code optimization:
        Optimized code has faster execution speed.
        Optimized code utilizes the memory efficiently.
        Optimized code gives better performance.
   Optimizations are classified into two categories. They are
        Machine independent optimizations
        Machine dependant optimizations
   Machine independent optimizations: Machine independent optimizations
    are program transformations that improve the target code without taking
    into consideration any properties of the target machine.
   Machine dependant optimizations: Machine dependant optimizations are
    based on register allocation and utilization of special machine-instruction
    sequences
Common Types of Code Optimization Techniques
 (Principle Sources of Optimization (or) Structure-Preserving Transformation)
Local Optimization:
  1. Common sub-expression elimination
  2. Dead Code Elimination
  3. Compile Time Evaluation
III IT-I SEM                        Regulation: R20                      ATCD: UNIT-5
Loop Optimization:
  4. Code Movement
  5. Strength Reduction
  6. Induction Variable elimination
Common Sub-Expression Elimination:
   An occurrence of an expression E is called a common sub-expression if E
     was previously computed, and the values of variables in E have not
     changed since the previous computation.
   We can avoid re-computing the expression if we can use the previously
     computed value.
   In this technique,
          As the name suggests, it involves eliminating the common sub
            expressions.
          The redundant expressions are eliminated to avoid their re-
            computation.
          The already computed result is used in the further program when
            required.
   Example 1:
      a:= b+c
      b:=a-d
      c:=b+c
      d:=a-d
     in the above code a-d is common sub expression but b+c is not common
     sub expression
     The above code can be optimized using the common sub-expression
     elimination as
      a:= b+c
      b:=a-d
      c:=b+c
      d:=b
   Example 2:
           Code Before Optimization             Code After Optimization
               i=0;
               if (i == 1)
               {                             i=0;
               a=x+5;
               }
           Here, ‘if’ statement is dead code because this condition will never get
           satisfied
Example 1:
Circumference of Circle = (22/7) x Diameter
Here,
       This technique evaluates the expression 22/7 at compile time.
       The expression is then replaced with its result 3.14.
       This saves the time at run time.
Example 2:
   For example, a=3.14157/2 can be replaced by a=1.570 thereby eliminating a
  division operation.
ii) Constant Propagation:
In this technique,
     If some variable has been assigned some constant value, then it replaces
       that variable with its constant value in the further program during
       compilation.
     The condition is that the value of variable must not get alter in between.
Example:
radius = 10
Area = pi x radius x radius
Here,
     This technique substitutes the value of variables ‘pi’ and ‘radius’ at
       compile time.
     It then evaluates the expression 3.14 x 10 x 10.
     The expression is then replaced with its result 314.
     This saves the time at run time.
 Code Movement or Frequency Reduction
  (Moving loop-invariant computation)
     An important modification that decreases the amount of code in a loop is
      code motion. This transformation takes an expression that yields the same
      result independent of the number of times a loop is executed ( a loop-
      invariant computation) and places the expression before the loop.
     This technique reduces the execution frequency of expression by moving
       the code.
     In this technique,
          As the name suggests, it involves movement of the code.
          The code present inside the loop is moved out if it does not matter
             whether it is present inside or outside.
          Such a code unnecessarily gets execute again and again with each
             iteration of the loop.
          This leads to the wastage of time at run time.
III IT-I SEM                           Regulation: R20                          ATCD: UNIT-5
Example1:
               Code Before Optimization           Code After Optimization
Strength Reduction:
In this technique,
     As the name suggests, it involves reducing the strength of expressions.
     This technique replaces the expensive and costly operators with the
       simple and cheaper ones.
     For example, x*x cheaper to implement as x+ x. x² is invariably cheaper
       to implement as x*x than as a call to an exponentiation routine. Fixed-
       point multiplication or division by a power of two is cheaper to
       implement as a shift. Floating-point division by a constant can be
       implemented as multiplication by a constant, which may be cheaper.
Example:
               B=Ax2                              B=A+A
     Here,
      The expression “A x 2” is replaced with the expression “A + A”.
      This is because the cost of multiplication operator is higher than that
         of addition operator.
Induction variable Elimination:
    A basic induction variable is a variable X whose only definitions within
     the loop are assignments of the form: X = X+c or X = X-c, where c is
     either a constant or a loop-invariant variable. (e.g., i)
III IT-I SEM                       Regulation: R20                   ATCD: UNIT-5
                                                              s=3*i + 1;
           s=3*i + 1;                                         While(s<31)
           While(i<10)                                        {
           {                                                  a[s] = a[s]-2;
           a[s] = a[s]-2;                                     s=s+6;
           i= i+2;                                            }
           s=s+6;
           }
Here s and i are induction variables. We can eliminate i from the loop.
Code Generation
     The final phase in compiler model is the code generator.
     It is machine dependent phase.
     It takes as input an intermediate representation of the source program and
      produces as output an equivalent target program.
     It is the most complex phase of a compiler, since it depends not only on
      the characteristics of the source language but also on detailed information
      about the target machine architecture, the structure of the run time
      environment and the operating system running on the target machine.
     The position of code generator in compilation process is illustrated by
      following figure.
III IT-I SEM                   Regulation: R20                     ATCD: UNIT-5
5. Register allocation:
    Instructions involving register operands are shorter and faster than those
      involving operands in memory.
    The use of registers is subdivided into two sub problems :
       Register allocation – the set of variables that will reside in registers at
          a point in the program is selected.
       Register assignment – the specific register that a variable will reside in
          is picked.
    Certain machine requires even-odd register pairs for some operands and
      results. For example , consider the division instruction of the form :
                    D x, y
           Where, x – dividend even register in even/odd register pair
                   y – divisor
                  even register holds the remainder
                  odd register holds the quotient
6. Evaluation order:
    The order in which the computations are performed can affect the
      efficiency of the target code.
    Some computation orders require fewer registers to hold intermediate
       results than others.
III IT-I SEM                      Regulation: R20                        ATCD: UNIT-5
 Example 2:
1) i = 1
2) j = 1
3) t1 = 10 * i
4) t2 = t1 + j
5) t3 = 8 * t2
6) t4 = t3 - 88
7) a[t4] = 0.0
8) j = j + 1
9) if j <= 10 goto (3)
10) i = i + 1
11) if i <= 10 goto (2)
12) i = 1
13) t5 = i - 1
14) t6 = 88 * t5
15) a[t6] = 1.0
16) i = i + 1
17) if i <= 10 goto (13)
                B5 for statement 12
                B6 for statement 13-17.
III IT-I SEM   Regulation: R20   ATCD: UNIT-5
III IT-I SEM   Regulation: R20   ATCD: UNIT-5
III IT-I SEM   Regulation: R20   ATCD: UNIT-5
III IT-I SEM                  Regulation: R20                    ATCD: UNIT-5
     Advantage
         Heavily used values reside in registers
   Disadvantage
      Does not consider non-uniform distribution of uses
Global register allocation
   Local allocation does not take into account that some instructions (e.g.
      those in loops) execute more frequently. It forces us to store/load at basic
      block endpoints since each block has no knowledge of the context of
      others.
   To find out the live range(s) of each variable and the area(s) where the
      variable is used/defined global allocation is needed. Cost of spilling will
      depend on frequencies and locations of uses.
   Register allocation depends on:
          Size of live range
          Number of uses/definitions
          Frequency of execution
          Number of loads/stores needed.
          Cost of loads/stores needed.
Register Allocation by Usage Counts
   A slightly more sophisticated method for global register allocation is
      called usage counts.
   In this method, registers are allocated first to the variables that are used
      the most.
   Example :
            Consider the following loop:
            LOOP: X = 2 * E
                     Z=Y+X+1
                      IF some condition THEN
                     Y=Z+Y
                      D=Y-1
                     ELSE Z = X - Y
                      D=2
                      ENDIF
                      X=Z+D
                      Z=X
              ENDLOOP
            Here, there are five references to X, and Z, five references
            to Y, three references to D, and one to E. Thus, if there are three
            registers, a reasonable approach would be to allocate X and Z to
            two of them, saving the third for local computations.
III IT-I SEM                         Regulation: R20                           ATCD: UNIT-5
Tutorial Questions:
    1. What is role of intermediate Code generator in compilation process? Explain Various
       Forms Of Intermediate Codes Used By Compiler.
    2. Explain various methods of implementing three address statements with suitable
       examples.
       (or) Explain about quadruples, triples and indirect triples of three-address statements
       of intermediate code
    3. Write quadruples, triples and indirect triples for the expression:
           -(a*b)+(c+d)-(a+b+c+d)
        (or) For the given expression generate different kinds of three-address
        codes -(a*b)+(c+d)-(a+b+c+d)
    4. Write the short note on: (i) Abstract syntax tree (ii) Polish notation
       (iii) Three address code
III IT-I SEM                        Regulation: R20                         ATCD: UNIT-5
    5. Construct abstract syntax tree & DAG for the assignment statement
             x:=a*b+c-a*b+d
    6. Write the quadruple, triple, indirect triples for the statement
             a:= b * -c + b * -c
    7. Translate the assignment A:= -B*(C+D) into following
           i) Quadruple ii) Triples iii) Indirect triples
    8. Translate the expression -(a+b)*(c+d)+(a+b+c) into the following
           i) Quadruples ii) Triples iii) Indirect triples
    9. Convert the following arithmetic expressions into Abstract syntax tree, DAG, postfix
        notation and three-address code:
                    i)      b*-(a+b)
                    ii)     a+b*(a+b)+c+d
    10. What is code optimization? Compare machine dependent and independent code
        optimization techniques.
    11. What is code optimization? Explain about various levels and types of optimizations
        (or) Explain different principle sources of optimization techniques with suitable
        examples
    12. Discuss about principal sources of optimization.
    13. Write short note on
             a. Constant Folding
             b. Dead Code Elimination
             c. Code Motion
             d. Induction Variable Elimination
    14. Discuss briefly various loop optimization techniques.
    15. Define flow graph. Explain the optimization of Basic Blocks.
    16. Write about all issues in code generation. Describe it.
        (or) Explain the different issues in the design of a code generator
    17. Explain the peephole optimization Techniques?
        (or) Discuss the transformations that are characteristic of peephole optimizations.
        (or) What kinds of peephole techniques can be used to perform machine-dependent
        optimizations?+
    18. What is a basic block and flow graph? Explain how flow graph can be constructed for
        a given program.
    19. What is peephole optimization? How can it be performed? Give its role in code
        generation.
    20. Discuss about register allocation and assignment in target code generation.