Unit 5 Compiler Design
Unit 5 Compiler Design
Machine-independent Optimizations:
Introduction, The Principal sources of optimization, Optimization of basic blocks, Loops in flow graphs,
Introduction to global data-flow analysis, Iterative solution of data-flow equations, Code- improving
transformations, Dealing with aliases, Data-flow analysis of structured flow graphs, Efficient data-flow
algorithms, A tool for data-flow analysis, Estimation of types, Symbolic debugging of optimized code.
                            MODULE-4- CODE OPTIMIZATION
INTRODUCTION
 The code produced by the straight forward compiling algorithms can often be made to run
   faster or take less space, or both. This improvement is achieved by program transformations
   that are traditionally called optimizations. Compilers that apply code-improving
   transformations are called optimizingcompilers.
 Machine independent optimizations are program transformations that improve the targetcode
   without taking into consideration any properties of the targetmachine.
 Machine dependant optimizations are based on register allocation and utilization of special
   machine-instructionsequences.
 Simply stated, the best program transformations are those that yield the most benefit for the
   leasteffort.
 The transformation must preserve the meaning of programs. That is, the optimization must
   not change the output produced by a program for a given input, or cause an error such as
   division by zero, that was not present in the original source program. At all times we take the
   “safe” approach of missing an opportunity to apply a transformation rather than risk
   changing what the programdoes.
 The transformation must be worth the effort. It does not make sense for a compiler writer to
   expend the intellectual effort to implement a code improving transformation and to have the
   compiler expend the additional time compiling source programs if this effort is not repaid
   when the target programs are executed. “Peephole” transformations of this kind are simple
   enough and beneficial enough to be included in anycompiler.
Organization for an Optimizing Compiler:
    Atransformationofaprogramiscalledlocalifitcanbeperformedbylookingonlyatthe
      statements in a basic block; otherwise, it is calledglobal.
    Many transformations can be performed at both the local and global levels. Local
      transformations are usually performedfirst.
Function-Preserving Transformations
    There are a number of ways in which a compiler can improve a program without
       changing the function itcomputes.
    Thetransformations
      The above code can be optimized using the common sub-expression elimination as
             t1: = 4*i
             t2: = a [t1]
             t3: = 4*j
             t5: = n
             t6: = b [t1] +t5
      The common sub expression t4: =4*i is eliminated as its computation is alre ady in t1. And
      value of i is not been changed from definition to use.
 CopyPropagation:
    Assignments of the form f : = g called copy statements, or copies for short. The idea
       behind the copy-propagation transformation is to use g for f, whenever possible after the
       copy statement f: = g. Copy propagation means use of one variable instead of another.
       This may not appear to be an improvement, but as we shall see it gives us an opportunity
       to eliminatex.
    Forexample:
      x=Pi;
      ……
      A=x*r*r;
A=Pi*r*r;
 Dead-CodeEliminations:
    Avariableisliveatapointinaprogramifitsvaluecanbeusedsubsequently;otherwise,
      itisdeadatthatpoint.Arelatedideaisdeadoruselesscode,statementsthatcompute
      values that never get used. While the programmer is unlikely to introduce any dead code
      intentionally, it may appear as the result of previous transformations. An optimization can
      be done by eliminating dead code.
      Example:
      i=0;
      if(i=1)
      {
       a=b+5;
      }
Here, ‘if’ statement is dead code because this condition will never get satisfied.
 Constantfolding:
    We can eliminate both the test and printing from the object code. More generally,
      deducing at compile time that the value of an expression is a constant and using the
      constant instead is known as constantfolding.
    One advantage of copy propagation is that it often turns the copy statement into dead
      code.
    Forexample,
       a=3.14157/2 can be replaced by
       a=1.570 there by eliminating a division operation.
 LoopOptimizations:
    We now give a brief introduction to a very important place for optimizations, namely
       loops, especially the inner loops where programs tend to spend the bulk of their time. The
       running time of a program may be improved if we decrease the number of instructions in
       an inner loop, even if we increase the amount of code outside thatloop.
    Three techniques are important for loopoptimization:
 CodeMotion:
    An important modification that decreases the amount of code in a loop is code motion.
      This transformation takes an expression that yields the same result independent of the
      number of times a loop is executed ( a loop-invariant computation) and places the
      expression before the loop. Note that the notion “before the loop” assumes the existence
      of an entry for the loop. For example, evaluation of limit-2 is a loop-invariant
      computation in the followingwhile-statement:
 Induction Variables:
       Loops are usually processed inside out. For example consider the loop aroundB3.
       Note that the values of j and t 4remain in lock-step; every time the value of j decreases by 1,
         that of t4decreases by 4 because 4*j is assigned to t4. Such identifiers are called
         inductionvariables.
       When there are two or more induction variables in a loop, it may be possible to get rid of
         all but one, by the process of induction-variable elimination. For the inner loop around
         B3 in Fig. we cannot get rid of either j or t4completely; t4is used in B3 and j in B4.
         However, we can illustrate reduction in strength and illustrate a part of the process of
         induction-variableelimination.EventuallyjwillbeeliminatedwhentheouterloopofB2
         - B5 is considered.
          Example:
          As the relationship t4:=4*j surely holds after such an assignment to t4in Fig. and t4is not
          changed elsewhere in the inner loop around B3, it follows that just after the statement
          j:=j-1 the relationship t4:= 4*j-4 must hold. We may therefore replace the assignment t4:=
          4*j by t4:= t4-4. The only problem is that t4does not have a value when we enterblock B3
          forthefirsttime.Sincewemustmaintaintherelationshipt4=4*jonentrytotheblockB3, we place
          an initializations of t4at the end of the block where j itselfis
before after
 Reduction In Strength:
      Structure-PreservingTransformations
      AlgebraicTransformations
Structure-Preserving Transformations:
         Common sub-expressionelimination
         Dead codeelimination
         Renaming of temporaryvariables
         Interchange of two independent adjacentstatements.
 Common sub-expressionelimination:
  Common sub expressions need not be computed over and over again. Instead they can be
  computed once and kept in store from where it’s referenced when encountered aga in – of course
  providing the variable values in the expression still remain constant.
Example:
         a: =b+c
         b: =a-d
         c: =b+c
         d: =a-d
The 2ndand 4thstatements compute the same expression: b+c and a-d
        a: = b+c
        b: = a-d
        c: =a
        d: =b
Dead codeelimination:
          It’s possible that a large amount of dead (useless) code may exist in the program. This
  might be especially caused when introducing variables and procedures as part of construction or
  error-correction of a program – once declared and defined, one forgets to remove them in case
they serve no purpose. Eliminating these will definitely optimize the code.
 Twostatements
t1:=b+c
t2:=x+y
      can be interchanged or reordered in its computation in the basic block when value of t 1
      does not affect the value of t2.
Algebraic Transformations:
      a :=b+c
      e :=c+d+b
      a :=b+c
      t :=c+d
      e :=t+b
 Example:
Dominators:
       In a flow graph, a node d dominates node n, if every path from initial node of the flow
graph to n goes through d. This will be denoted byd dom n. Every initial node dominates all the
remaining nodes in the flow graph and the entry of a loop dominates all nodes in the loop.
Similarly every node dominates itself.
Example:
D(1)={1}
D(2)={1,2}
D(3)={1,3}
D(4)={1,3,4}
D(5)={1,3,4,5}
D(6)={1,3,4,6}
D(7)={1,3,4,7}
D(8)={1,3,4,7,8}
D(9)={1,3,4,7,8,9}
D(10)={1,3,4,7,8,10}
  Natural Loop:
   Oneapplicationof dominatorinformationisindeterminingtheloopsofaflowgraphsuitable
     forimprovement.
      A loop must have a single entry point, called the header. This entry point-dominates all
        nodes in the loop, or it would not be the sole entry to theloop.
      There must be at least one way to iterate the loop(i.e.)at least one path back to theheader.
   One way to find all the loops in a flow graph is to search for edges in the flow graph whose
     heads dominate their tails. If a→b is an edge, b is the head and a is the tail. These types of
     edges are called as backedges.
 Example:
7→4 4 DOM7
10→7 7 DOM10
4→3
8→3
9→1
Output:The set loop consisting of all nodes in the natural loop n→d.
  Method:Beginning with node n, we consider each node m*d that we know is in loop, to make
  sure that m’s predecessors are also placed in loop. Each node in loop, except for d, is placed once
  on stack, so its predecessors will be examined. Note that because d is put in the loop initially, we
  never examine its predecessors, and thus find only those nodes that reach n without going
  throughd.
  Procedureinsert(m);
  ifm is not inloopthen begin
        loop:=loopU {m};
        pushmontostack
  end;
  stack: = empty;
loop: = {d};
  insert(n);
whilestackis not emptydo begin
      popm, the first element ofstack, offstack;
      foreach predecessorpofmdoinsert(p)
end
Inner loop:
    If we use the natural loops as “the loops”, then we have the useful property that unless
      two loops have the same header, they are either disjointed or one is entirely contained in
      the other. Thus, neglecting loops with the same header for the moment, we have a natural
      notion of inner loop: one that contains no otherloop.
    When two natural loops have the same header, but neither is nested within the other, they
      are combined and treated as a singleloop.
Pre-Headers:
    The pre-header has only the header as successor, and all edges which formerly entered
       the header of L from outside L instead enter thepre-header.
 Initially the pre-header is empty, but transformations on L may place statements init.
        header
                                                     pre-header
           loop L
he der
loop L
    Reducible flow graphs are special flow graphs, for which several code optimization
       transformations are especially easy to perform, loops are unambiguously defined,
       dominators can be easily calculated, data flow analysis problems can also be solved
       efficiently.
 Definition:
     A flow graph G is reducible if and only if we can partition the edges into two disjoint
     groups,forwardedges andbackedges, with the following properties.
  The forward edges from an acyclic graph in which every node can be reached from initial
     node ofG.
 The back edges consist only of edges where heads dominate theirstails.
  If we know the relation DOM for a flow graph, we can find and remove all the back
     edges.
 If the forward edges form an acyclic graph, then we can say the flow graphreducible.
  In the above example remove the five back edges 4→3, 7→4, 8→3, 9→1 and 10→7
     whose heads dominate their tails, the remaining graph isacyclic.
  The key property of reducible flow graphs for loop analysis is that in such flow graphs
     everysetofnodesthatwewouldinformallyregardasaloopmustcontainaback edge.
PEEPHOLE OPTIMIZATION
         Redundant-instructionselimination
         Flow-of-controloptimizations
         Algebraicsimplifications
         Use of machineidioms
         UnreachableCode
Redundant Loads And Stores:
(1) MOVR0,a
(2) MOVa,R0
we can delete instructions (2) because whenever (2) is executed. (1) will ensure that the value of
ais already in register R 0.If (2) had a label we could not be sure that (1) was always executed
immediately before (2) and so we could not remove (2).
Unreachable Code:
 Anotheropportunityforpeepholeoptimizationsistheremovalofunreachableinstructions.
     An unlabeled instruction immediately following an unconditional jump may be removed.
     This operation can be repeated to eliminate a sequence of instructions. For example, for
     debugging purposes, a large program may have within it certain segments that are executed
     only if a variabledebugis 1. In C, the source code might look like:
#define debug 0
….
If ( debug ) {
If debug =1 goto L2
goto L2
L2:.........................................................................................(a)
 One obvious peephole optimization is to eliminate jumps over jumps .Thus no matter what
   the value ofdebug; (a) can be replacedby:
If debug≠1 goto L2
L2:...........................................................................................(b)
L2:..........................................................................................(c)
   As the argument of the first statement of (c) evaluates to a constant true, it can be replaced by
      goto L2. Then all the statement that print debugging aids are manifestly unreachable and
      can be eliminated one at atime.
Flows-Of-Control Optimizations:
   The unnecessary jumps can be eliminated in either the intermediate code or the target code
      by the following types of peephole optimizations. We can replace the jumpsequence
gotoL1
….
L1: gotoL2
by the sequence
goto L2
….
L1: goto L2
   If there are now no jumps to L1, then it may be possible to eliminate the statement L1:goto
      L2 provided it is preceded by an unconditional jump .Similarly, thesequence
if a < b goto L1
….
L1: goto L2
can be replaced by
If a < b goto L2
….
L1: goto L2
   Finally,supposethereisonlyonejumptoL1andL1isprecededbyanunconditionalgoto.
      Then the sequence
goto L1
             ……..
L1: if a < b goto L2
             L3:..........................................................................(1)
 May be replacedby
If a < b goto L2
goto L3
…….
L3:............................................................................(2)
     While the number of instructions in (1) and (2) is the same, we sometimes skip the
       unconditional jump in (2), but never in (1).Thus (2) is superior to (1) in executiontime
Algebraic Simplification:
     There is no end to the amount of algebraic simplification that can be attempted through
       peephole optimization. Only a few algebraic identities occur frequently enough that it is
       worth considering implementing them .For example, statements suchas
x := x+0
Or
x := x * 1
Reduction in Strength:
     Reduction in strength replaces expensive operations by equivalent cheaper ones on the target
        machine. Certain machine instructions are considerably cheaper than others and can often be
        used as special cases of more expensiveoperators.
     For example, x² is invariably cheaper to implement as x*x than as a call to an exponentiation
        routine. Fixed-point multiplication or division by a power of two is cheaper to implement as
        a shift. Floating-point division by a constant can be implemented as multiplication by a
        constant, which may becheaper.
X2→X*X
     The target machine may have hardware instructions to implement certain specific operations
       efficiently. For example, some machines have auto-increment and auto-decrement addressing
       modes. These add or subtract one from an operand before or after using itsvalue.
    The use of these modes greatly improves the quality of code when pushing or popping a
       stack,asinparameterpassing.Thesemodescanalsobeusedincodeforstatementslikei:
       =i+1.
i:=i+1→i++ i:=i-1→i-
-
INTRODUCTION TO GLOBAL DATAFLOW ANALYSIS
    In order to do code optimization and a good job of code generation , compiler needs to
      collectinformationabouttheprogramasawholeandtodistributethisinformationto each
      block in the flowgraph.
    Data-flowinformationcanbecollectedbysettingupandsolvingsystemsofequationsof the
      form:
      This equation can be read as “ the information at the end of a statement is either generated
      within the statement , or enters at the beginning and is not killed as control flows through
      the statement.”
 The details of how data-flow equations are set and solved depend on threefactors.
    The notions of generating and killing depend on the desired information, i.e., on the data
      flowanalysisproblemtobesolved.Moreover,forsomeproblems,insteadofproceeding along
      with flow of control and defining out[s] in terms of in[s], we need to proceed backwards
      and define in[s] in terms ofout[s].
    Since data flows along control paths, data-flow analysis is affected by the constructs ina
      program. In fact, when we write out[s] we implicitly assume that there is unique end
      pointwherecontrolleavesthestatement;ingeneral,equationsaresetupatthelevelof basic
      blocks rather than statements, because blocks do have unique endpoints.
    Therearesubtletiesthatgoalongwithsuchstatementsasprocedurecalls,assignments
      through pointer variables, and even assignments to arrayvariables.
    Withinabasicblock,wetalkofthepointbetweentwoadjacentstatements,aswellasthe point
      before the first statement and after the last. Thus, block B1 has four points: one before
      any of the assignments and one after each of the threeassignments.
                                                                 B1
                    d1 : i :=m-1
                    d2: j :=n d3=u1
                                                           B2
                               d4 : I := i+1
                                                           B3
                               d5: j := j-1
B4
               B5                                                              B6
               d6 :a :=u2
 Nowletustakeaglobal viewandconsiderallthepointsinalltheblocks.Apathfromp 1
to pnis a sequence of points p1, p2,….,pnsuch that for each i between 1 and n-1, either
       P iis the point immediately preceding a statement and pi+1is the point immediately
          following that statement in the same block,or
 P iis the end of some block and pi+1is the beginning of a successorblock.
Reaching definitions:
       These statements certainly define a value for x, and they are referred to asunambiguous
          definitionsofx.Therearecertainkindsofstatementsthatmaydefineavalueforx;they are
          calledambiguousdefinitions. The most usual forms ofambiguousdefinitions of x are:
       An assignment through a pointer that could refer to x. For example, the assignment *q:=
          y is a definition of x if it is possible that q points to x. we must assume that an assignment
          through a pointer is a definition of every variable.
       We say a definition d reaches a point p if there is a path from the point immediately
          following d to p, such that d is not “killed” along that path. Thus a point can bereached
      by an unambiguous definition and an ambiguous definition of the same variable
      appearing later along one path.
   Flow graphs for control flow constructs such as do-while statements have a useful
      property:thereisasinglebeginningpointatwhichcontrolentersandasingleendpoint
      thatcontrolleavesfromwhenexecutionofthestatementisover.Weexploitthisproperty when
      we talk of the definitions reaching the beginning and the end of statements with the
      followingsyntax.
E id + id|id
   Expressionsinthislanguagearesimilartothoseintheintermediatecode,buttheflow graphs
      for statements have restrictedforms.
                                                                                               S1
              S1                        If E goto s1
              S2
                                          S1                    S2                      If E goto s1
S1 ; S2
   We define a portion of a flow graph called aregionto be a set of nodes N that includes a
      header, which dominates all other nodes in the region. All edges between nodes in N are
      in the region, except for some that enter theheader.
   The portion of flow graph corresponding to a statement S is a region that obeys the
      further restriction that control can flow to just one outside block when it leaves the
      region.
 We say that the beginning points of the dummy blocks at the entry and exit of a
    statement’s region are the beginning and end points, respectively, of the statement. The
    equations are inductive, or syntax-directed, definition of the sets in[S], out[S], gen[S],
    and kill[S] for all statementsS.
 gen[S] is the set of definitions “generated” by S while kill[S] is the set of definitions
    that never reach the end ofS.
 Consider the following data-flow equations for reaching definitions:
i)
                                              d:a:=b+c
                          S
                                 gen [S] = { d }
                                 kill [S] = Da– { d }
                                 out [S] = gen [S] U ( in[S] – kill[S] )
 Observe the rules for a single assignment of variable a. Surely that assignment is a
       definition of a, say d.Thus
Gen[S]={d}
Kill[S] = Da–{d}
Where, Dais the set of all definitions in the program for variable a.
ii )
S S1
S2
            gen[S]=gen[S2] U (gen[S1]-kill[S2])
            Kill[S] = kill[S2] U (kill[S1] – gen[S2])
            in [S1] = in [S]
            in [S2] = out [S1]
            out [S] = out[S2]
      Under what circumstances is definition d generated by S=S 1; S2? First of all, if it is
         generated by S2, then it is surely generated by S. if d is generated by S1, it will reach the
         end of S provided it is not killed by S2. Thus, wewrite
gen[S]=gen[S2] U (gen[S1]-kill[S2])
      There is a subtle miscalculation in the rules for gen and kill. We have made the
         assumption that the conditional expression E in the if and do statements are
         “uninterpreted”;thatis,thereexistsinputstotheprogramthatmaketheirbranchesgo
         eitherway.
      Weassumethatanygraph-theoreticpathintheflowgraphisalsoanexecutionpath,i.e., a path
         that is executed when the program is run with least one possibleinput.
      When we compare the computed gen with the “true” gen we discover that the true gen is
         alwaysasubsetofthecomputedgen.ontheotherhand,thetruekillisalwaysasuperset of the
         computedkill.
      These containments hold even after we consider the other rules. It is natural to wonder
         whether these differences between the true and computed gen and kill sets present a
         seriousobstacletodata-flowanalysis.Theanswerliesintheuseintendedforthesedata.
      Overestimating the set of definitions reaching a point does not seem serious; it merely
         stopsusfromdoinganoptimizationthatwecouldlegitimatelydo.Ontheotherhand,
         underestimatingthesetofdefinitionsisafatalerror;itcouldleadusintomakinga
         change in the program that changes what the program computes. For the case of reaching
         definitions, then, we call a set of definitions safe or conservative if the estimate is a
         superset of the true set of reaching definitions. We call the estimate unsafe, if it is not
         necessarily a superset of the truth.
      Returning now to the implications of safety on the estimation of gen and kill for reaching
         definitions, note that our discrepancies, supersets for gen and subsets for kill are both in
         thesafedirection.Intuitively,increasinggenaddstothesetofdefinitionsthatcanreacha point,
         and cannot prevent a definition from reaching a place that it truly reached. Decreasing
         kill can only increase the set of definitions reaching any givenpoint.
      However, there are other kinds of data-flow information, such as the reaching-definitions
         problem. It turns out that in is an inherited attribute, and out is a synthesized attribute
         depending on in. we intend that in[S] be the set of definitions reaching the beginning of
         S, taking into account the flow of control throughout the entire program, including
         statements outside of S or within which S isnested.
      Thesetout[S]isdefinedsimilarlyfortheendofs. itisimportanttonotethedistinction between
        out[S] and gen[S]. The latter is the set of definitions that reach the end of S without
        following paths outsideS.
      Consideringif-statementwehaveconservativelyassumedthatcontrolcanfolloweither
        branch, a definition reaches the beginning of S1or S2exactly when it reaches the
        beginning ofS.
      IfadefinitionreachestheendofSifandonlyifitreachestheendofoneor bothsub
        statements;i.e,
Out[S]=out[S1] U out[S2]
Representation of sets:
      Sets of definitions, such as gen[S] and kill[S], can be represented compactly using bit
        vectors.Weassignanumbertoeachdefinitionofinterestintheflowgraph.Thenbit vector
        representing a set of definitions will have 1 in position I if and only if the definition
        numbered I is in theset.
      The number of definition statement can be taken as the index of statement in an array
        holding pointers to statements. However, not all definitions may be of interest during
        globaldata-flowanalysis.Thereforethenumberofdefinitionsofinterestwilltypicallybe
        recorded in a separatetable.
      A bit vector representation for sets also allows set operations to be implemented
        efficiently.Theunion andintersectionoftwosets canbeimplementedbylogicalorand
        logical and, respectively, basic operations in most systems-orientedprogramming
languages.ThedifferenceA-BofsetsAandBcanbeimplementedbytakingthe complement of B
and then using logical and tocomputeA                              .
      Space for data-flow information can be traded for time, by saving information only at
        certain points and, as needed, recomputing information at intervening points. Basic
        blocksareusuallytreatedasaunitduringglobalflowanalysis,withattentionrestrictedto only
        those points that are the beginnings ofblocks.
      Since there are usually many more points than blocks, restricting our effort to blocks is a
        significant savings. When needed, the reaching definitions for all points in a block canbe
        calculated from the reaching definitions for the beginning of ablock.
Use-definition chains:
Evaluation order:
    The techniques for conserving space during attribute evaluation, also apply to the
       computation of data-flow information using specifications. Specifically, the only
       constraint on the evaluation order for the gen, kill, in and out sets for statements is that
       imposedbydependenciesbetweenthesesets.Havingchosenanevaluationorder,weare free to
       release the space for a set after all uses of it haveoccurred.
    Earliercirculardependenciesbetweenattributeswerenotallowed,butwehaveseenthat data-
       flow equations may have circulardependencies.
    Data-flow analysis must take all control paths into account. If the control paths are
       evident from the syntax, then data-flow equations can be set up and solved in asyntax-
       directedmanner.
    When programs can contain goto statements or even the more disciplined break and
       continuestatements,theapproachwehavetakenmustbemodifiedtotaketheactual control
       paths intoaccount.
    Several approaches may be taken. The iterative method works arbitrary flow graphs.
       Sincetheflowgraphsobtainedinthepresenceofbreakandcontinuestatementsare
       reducible, such constraints can be handled systematically using the interval-based
       methodsHowever,thesyntax-
       directedapproachneednotbeabandonedwhenbreakandcontinue statements
       areallowed.
 Global transformations are not substitute for local transformations; both must beperformed.
    The available expressions data-flow problem discussed in the last section allows us to
   determine if an expression at point p in a flow graph is a common sub-expression. The
   followingalgorithmformalizestheintuitiveideaspresentedforeliminatingcommonsub-
   expressions.
  METHOD: For every statement s of the form x := y+z 6such that y+z is available at the
  beginning of block and neither y nor r z is defined prior to statement s in that block,
  do the following.
        Todiscovertheevaluationsofy+zthatreachs’sblock,wefollow flowgraph
           edges,searchingbackwardfroms’sblock.However,wedonotgothrough any
           block that evaluates y+z. The last evaluation of y+z in each block
           encountered is an evaluation of y+z that reachess.
 The search in step(1) of the algorithm for the evaluations of y+z that reach statements
    can also be formulated as a data-flow analysis problem. However, it does not make sense
    to solve it for all expressions y+z and all statements or blocks because too much
    irrelevant information is gathered.
       Notallchangesmadebyalgorithmareimprovements.Wemight wishtolimitthe
           number of different evaluations reaching s found in step (1), probably toone.
 Algorithm will miss the fact that a*z and c*z must have the same valuein
a:=x+y c:=x+y
vs
b:=a*z d:=c*z
       Becausethissimpleapproachtocommonsubexpressionsconsidersonlytheliteral
           expressions themselves, rather than the values computed byexpressions.
Copy propagation:
       Various algorithms introduce copy statements such as x :=copies may also be generated
           directly by the intermediate code generator, although most of these involve temporaries
           localtooneblockand canberemovedbythedagconstruction.Wemaysubstitute yforx in all
           these places, provided the following conditions are met every such use u ofx.
       Oneverypathfromstoincludingpathsthatgothroughuseveraltimes,thereareno
           assignments toy.
       Condition(1)canbecheckedusingud-changinginformation.Weshallsetupanewdata- flow
           analysis problem in which in[B] is the set of copies s: x:=y such that every path from
           initial node to the beginning of B contains the statement s, and subsequent to the last
           occurrence of s, there are no assignments toy.
 ALGORITHM: Copypropagation.
           INPUT: a flow graph G, with ud-chains giving the definitions reaching block B, and
           with c_in[B] representing the solution to equations that is the set of copies x:=y that
           reach block B along every path, with no assignment to x or y following the last
           occurrence of x:=y on the path. We also need ud-chains giving the uses of each
           definition.
 Determine those uses of x that are reached by this definition of namely, s: x:=y.
       Determine whether for every use of x found in (1) , s is in c_in[B], where B is the
           blockofthisparticularuse,andmoreover,nodefinitionsofxoryoccurpriortothis use of x
           within B. Recall that if s is in c_in[B]then s is the only definition of x that reachesB.
If s meets the conditions of (2), then remove s and replace all uses of x found in(1) byy.
          INPUT: A loop L consisting of a set of basic blocks, each block containing sequence
          of three-address statements. We assume ud-chains are available for the individual
          statements.
          OUTPUT: the set of three-address statements that compute the same value each time
          executed, from the time control enters the loop L until control next leaves L.
       Mark “invariant” those statements whose operands are all either constant orhave
           all their reaching definitions outsideL.
 Repeat step (3) until at some repetition no new statements are marked“invariant”.
       Mark “invariant” all those statements not previously so marked all of whose
           operandseitherareconstant,havealltheirreachingdefinitionsoutsideL,orhave
           exactly one reaching definition, and that definition is a statement in L marked
           invariant.
       Having found the invariant statements within a loop, we can apply to some of them an
           optimizationknownascodemotion,inwhichthestatementsaremovedtopre-headerof the
           loop. The following three conditions ensure that code motion does not change what the
           program computes. Consider s: x:=y+z.
               ALGORITHM: Codemotion.
      INPUT: A loop L with ud-chaining information and dominator information.
      OUTPUT: A revised version of the loop with a pre-header and some statements
      moved to the pre-header.
METHOD:
iii) ThatallusesinLofxcanonlybereachedbythedefinitionofxinstatement s.
   To understand why no change to what the program computes can occur, condition(2i)
      and (2ii) of this algorithm assure that the value of x computed at s must be the value of x
      after any exit block of L. When we move s to a pre-header, s will still be the definition of
      x that reaches the end of any exit block of L. Condition (2iii) assures that any uses of x
      within L did, and will continue to, use the value of x computed by s.
   The condition (1) can be relaxed if we are willing to take the risk that we may actually
      increase the running time of the program a bit; of course, we never change what the
      programcomputes.Therelaxedversionofcodemotioncondition(1)isthatwemay move a
      statement s assigning x onlyif:
      1’. The block containing s either dominates all exists of the loop, or x is not used outside
          the loop. For example, if x is a temporary variable, we can be sure that the value will
          be used only in its own block.
   If code motion algorithm is modified to use condition (1’), occasionally the running time
      will increase, but we can expect to do reasonably well on the average. The modified
      algorithmmaymovetopre-headercertaincomputationsthatmaynotbeexecutedinthe
      loop. Not only does this risk slowing down the program significantly, it may also cause
      an error in certain circumstances.
   Even if none of the conditions of (2i), (2ii), (2iii) of code motion algorithm are met by an
      assignment x: =y+z, we can still take the computation y+z outside a loop. Create a new
      temporaryt,andsett:=y+zinthepre-header.Thenreplacex:=y+z byx:=tintheloop. In many
      cases we can propagate out the copy statement x: =t.
   Thetransformationsofcodemotionalgorithmdonotchangeud-chaininginformation,
      sincebycondition(2i),(2ii),and(2iii),allusesofthevariableassignedbyamoved
      statement s that were reached by s are still reached by s from its newposition.
   Definitions of variables used by s are either outside L, in which case they reach thepre-
      header, or they are inside L, in which case by step (3) they were moved topre-header
      ahead ofs.
   Weputthepointeroneachud-chaincontainings.Then,nomatterwherewemoves,we                    have
      only to change ps, regardless of how many ud-chains s is on.
   The dominator information is changed slightly by code motion. The pre-header isnow
      the immediate dominator of the header, and the immediate dominator of the pre-header is
      the node that formerly was the immediate dominator of the header. That is, thepre-header
      is inserted into the dominator tree as the parent of theheader.
    However,ourmethodsdealwithvariablesthatareincrementedordecrementedzero,one, two,
      or more times as we go around a loop. The number of changes to an induction variable
      may even differ at differentiterations.
    A common situation is one in which an induction variable, say i, indexes an array, and
      someotherinductionvariable,sayt,whosevalueisalinearfunctionofi,istheactual offset
      used to access the array. Often, the only use made of i is in the test for loop
      termination. We can then get rid of i by replacing its test by one ont.
    We shall look for basic induction variables, which are those variables i whoseonly
      assignments within loop L are of the form i := i+c or i-c, where c is aconstant.
METHOD:
  Consider each basic induction variable i whose only uses are to compute other
    induction variables in its family and in conditional branches. Take some j in i’s
    family, preferably one such that c and d in its triple are as simple as possibleand
    modify each test that i appears in to use j instead. We assume in the followingtat
    c is positive. A test of the form ‘if i relop x goto B’, where x is not an induction
    variable, is replacedby
r:=c*x /* r := x if c is 1.*/
r:=r+d /* omit if d is 0 */
if j relop r gotoB