Data Flow Analysis
Why:
       CS591-5 - Selected topics in Compiler Design                                         Provide information about a program manipulates its data.
                                    Data Flow Analysis
                                                                                            Study functions behavior.
                                                                                            To help build control flow information.
                                   V. Krishna Nandivada                                     Program understanding (a function sorts an array!).
                                                                                            Generating a model of the original program and verify the model.
                                          IIT Madras
                                                                                            The DFA should give information about that program that does not
                                                                                            misrepresent what the procedure being analyzed does.
                                                                                            Program validation.
                                                                                      V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                    2/1
Reaching Definitions                                                                 Different types of analysis
A particular definition of a variable is said to reach a given point if
      there is an execution path from the definition to that point
      the variable might may have the value assigned by the definition.
In general undecidable.                                                                     Intra procedural analysis.
                                                                                            Whole program (inter-procedural) analysis.
Our goal:
                                                                                            Generate intra procedural analysis and extend it to whole
      The analysis must be conservative – the analysis should not tell                      program.
      us that a particular definition does not reach a particular use, if it
      may reach.                                                                     We will study an iterative mechanism to perform such analyses.
      A ‘may’ conservative analysis gives us a larger set of reaching
      definitions than it might, if it could produce the minimal result.
To make maximum benefit from our analysis, we want the analysis to
be conservative, but be as aggressive as possible.
                                                                           *                                                                             *
V.Krishna Nandivada (IIT Madras)        CS591-5 - May 2018                     3/1    V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                    4/1
Iterative Dataflow Analysis                                                         Example program
                                                                                    1 int g(int m, inti);
       Build a collection of data flow equations – specifying which data
                                                                                    2 int f(int n)
       may flow to which variable.                                                  3 {
       Solve it iteratively.                                                        4    int i = 0, j = 1;
       Start from a conservative set of initial values – and continuously           5    if (n == 1) i = 2;
                                                                                    6    while (n > 0) {
       improve the precision.
                                                                                    7        j = i + 1;
       Disadvantage: We may be handling large data sets.                            8        n = g (n, i);
       Start from an aggressive set of initial values – and continuously            9    }
       improve the precision.                                                       10   return j;
       Advantage: Datasets are small to start with.                                 11 }
       Choice – depends on the problem at hand.
                                                                                           Does def of i in line 4 reach the uses in line 7 and 8?
                                                                                           Does def of j in line 7 reach the use in line 10?
                                                                          *                                                                                          *
 V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                        5/1    V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                               6/1
Definitions                                                                         Representation and Initialization
                                                                                                                                    Bit Pos   Definition        Basic Block
                                                                                                                                    1         m in node 1       B1
       GEN : GEN(b) returns the set of definitions generated in the basic                                                           2         f0 in node 2
       block b; assigned values in the block and not subsequently killed                                                            3         f1 in node 3
       in it.                                                                                                                       4         i in node 5       B3
       KILL : KILL(b) returns the set of definitions killed in the basic block                                                      5         f2 in node 8      B6
       b.                                                                                                                           6         f0 in node 9
                                                                                                                                    7         f1 in node 10
       IN : IN(b) returns the set of definitions reaching the basic block b.                                                        8         i in node 11
       OUT : OUT(b) returns the set of definitions going out of basic                                                                          Set rep          Bit vector
       block b.                                                                                                                     GEN(B1)    = {1, 2, 3}      h11100000i
       PRSV : Negation of KILL                                                                                                      GEN(B3)    = {4}            h00010000i
                                                                                                                                    GEN(B6)    = {5, 6, 7, 8}   h00001111i
                                                                                                                                    GEN(.)     = {}             h00000000i
                                                                          *                                                                                          *
 V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                        7/1    V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                               8/1
Populating PRSV, OUT and IN                                                                    Dataflow equations
                                                                                               A definition may reach the end of a basic block i:
                                                                                                                           OUT(i) = GEN(i) ∪ (IN(i) ∩ PRSV(i))
                                                                                               or with bit vectors:
                      Set rep                           Bit vector
 PRSV(B1)             = {4, 5, 8}                       h00011001i                                                         OUT(i) = GEN(i) ∨ (IN(i) ∧ PRSV(i))
 PRSV(B3)             = {1, 2, 3, 5, 6, 7}              h11101110i
                                                                                               A definition may reach the beginning of a basicblock i:
 PRSV(B6)             = {1}                             h10000000i
 PRSV(.)              = {1, 2, 3, 4, 5, 6, 7, 8}        h11111111i
                                                                                                                                              [
                                                                                                                                  IN(i) =               OUT(j)
                                                                                                                                            j∈Pred(i)
                                                                                                     GEN, PRSV and OUT are created in each basic block.
                                                                                                     OUT(i) = {} // initialization
                                                                                                     But IN needs to be initialized to something safe.
                                                                                  *                  IN(entry) = {}                                                *
 V.Krishna Nandivada (IIT Madras)         CS591-5 - May 2018                           9/1     V.Krishna Nandivada (IIT Madras)        CS591-5 - May 2018              10 / 1
Solving the Dataflow equations: example                                                        Dataflow equations: behavior
Itr 1:
                    OUT(entry)      = h00000000i       IN(entry)   = h00000000i
                    OUT(B1)         = h11100000i       IN(B1)      = h00000000i
                                                                                                     We specify the relationship between the data-flow values before
                    OUT(B2)         = h11100000i       IN(B2)      = h11100000i
                    OUT(B3)         = h11110000i       IN(B3)      = h11100000i
                                                                                                     and after a block – transfer or flow equations.
                    OUT(B4)         = h11110000i       IN(B4)      = h11110000i                             Forward: OUT(s) = f (IN(s), · · · )
                    OUT(B5)         = h11110000i       IN(B5)      = h11110000i                             Backward: IN(s) = f (OUT(s), · · · )
                    OUT(B6)         = h00001111i       IN(B6)      = h11110000i                      The rules never change a 1 to 0. They may only change a 0 to a 1.
                    OUT(entry)      = h11110000i       IN(exit)    = h11110000i
Itr 2:
                                                                                                     They are monotone.
                    OUT(entry)      = h00000000i       IN(entry)   = h00000000i                      Implication – the iteration process will terminate.
                    OUT(B1)         = h11100000i       IN(B1)      = h00000000i
                    OUT(B2)         = h11100000i       IN(B2)      = h11100000i                      Q: What good is reaching definitions? undefined variables.
                    OUT(B3)         = h11110000i       IN(B3)      = h11100000i                      Q: Why do the iterations produce an acceptable solution to the set
                    OUT(B4)         = h11111111i       IN(B4)      = h11111111i
                    OUT(B5)         = h11111111i       IN(B5)      = h11111111i
                                                                                                     of equations? – lattices and fixed points.
                    OUT(B6)         = h10001111i       IN(B6)      = h11111111i
                    OUT(entry)      = h11111111i       IN(exit)    = h11111111i
                                                                                  *                                                                                *
 V.Krishna Nandivada (IIT Madras)         CS591-5 - May 2018                          11 / 1   V.Krishna Nandivada (IIT Madras)        CS591-5 - May 2018              12 / 1
Lattice                                                                           Lattice definition
                                                                                  A lattice L consists of a set of values, and two operations called meet
                                                                                  (t) and join (u). Satisfies properties:
       What : Lattice is an algebraic structure                                          closure: For all x, y ∈ L, ∃ a unique z and w ∈ L, such that x u y = z
       Why : To represent abstract properties of variables, expressions,                 and x t y = w – each pair of elements have a unique lub and glb.
       functions, etc etc.                                                               commutative: For all x, y ∈ L, x u y = y u x, and x t y = y t x.
              Values
                                                                                         associative: For all x, y, z ∈ L, (x u y) u z = x u (y u z), and
              Attributes
              ...                                                                        (x t y) t z = x t (y t z)
       Why “abstract? Exact interpretation (execution) gives exact                       There exists two special elements of L called bottom (⊥), and top
       values, abstract interpretation gives abstract values.                            (>).
                                                                                         ∀x ∈ L, x u ⊥ = ⊥ and x t > = >.
                                                                                         distributive : (optional). ∀x, y, z ∈ L, x t (y u z) = (x t y) u (x t z), and
                                                                                         x u (y t z) = (x u y) t (x u z)
                                                                     *                                                                                           *
 V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                   13 / 1    V.Krishna Nandivada (IIT Madras)           CS591-5 - May 2018                     14 / 1
Lattice properties                                                                Monotones and fixed point
       Meet (and join) induce a partial order (v):
       ∀x, y ∈ L, x v y, iff x u y = x.                                                  A function f : L → L, is a monotone, if for all x, y ∈ L,
                                                                                         x v y ⇒ f (x) v f (y).
       Transitive, antisymmetry and reflexive.
                                                                                         Example: bit-vector lattice:
Example Lattices:                                                                               f (x1 x2 x3 ) = hx1 1x2 i
                                                                                                f (x1 x2 x3 ) = hx2 x3 x1 i
                                                                                         A flow function models the effect of a programming language
                                                                                         construct. as a mapping from the lattice for that particular analysis
                                                                                         to itself.
                                                                                         We want the flow functions to be monotones. Why?
                                                                                         A fixed point of a function f : L → L is an element z ∈ L, such that
                                                                                         f (z) = z.
                                                                                         For a set of data-flow equations, a fixed-point is a solution of the
                                                                                         set of equations – cannot generate any further refinement.
                                                                     *                                                                                           *
 V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                   15 / 1    V.Krishna Nandivada (IIT Madras)           CS591-5 - May 2018                     16 / 1
Meet Over All Paths solutions                                                     A worklist based implementation (a forward analysis)
      The value we wish to compute in solving data-flow equations is –
      meet over all paths (MOP) solution.
      Start with some prescribed information at the entry (or exit
      depending on forward or backward).
      Repeatedly apply the composition of the appropriate flow
      functions.
      For each node form the meet of the results.
                                                                     *                                                                    *
V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                    17 / 1   V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018       18 / 1
Example: Constant Propagation                                                     Constant Propagation lattice meet rules
Goal: Discover values that are constants on all possible executions of
a program and to propagate these constant values as far forward
through the program as possible                                                         ⊥ = Constant value cannot be guaranteed.
Conservative: Can discover only a subset of all the possible                            > = May be a constant, not yet determined.
constants.                                                                              ∀x
Lattice:
                                                                                               xu> = x
                                                                                               xu⊥ = ⊥
                                                                                               c1 u c1 = c1
                                                                                               c2 u c1 = ⊥
                                                                     *                                                                    *
V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                    19 / 1   V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018       20 / 1
Simple constant propagation                                                             Kildall’s algorithm
      Gary A. Kildall: A Unified Approach to Global Program                                    Start with an entry node in the program graph.
      Optimization - POPL 1973.
                                                                                                Process the entry node, and produce the constant propagation
      Reif, Lewis: Symbolic evaluation and the global value graph -                            information. Send it to all the immediate successors of the entry
      POPL 1977.                                                                               node.
      Simple constant Constants that can be proved to be constant                              At a merge point, get an intersection of the information.
      provided,
             no information is assumed about which direction branches will take.
                                                                                               If at any successor node, if for any variable the value is “reduced,
             Only one value of each variable is maintained along each path in                  the process the successor, similar to the processing done for entry
             the program.                                                                      node.
                                                                           *                                                                                            *
V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                          21 / 1    V.Krishna Nandivada (IIT Madras)    CS591-5 - May 2018                             22 / 1
Constant propagation - equations                                                        Constant propagation: equations (contd)
                                                                                               Start with the entry node.
                                                                                               If s is not an assignment statement, then fs is simply the identity
                                                                                               function.
      Let us assume that one basic block per statement.
                                                                                               If s is an assignment statement to variable v, then fs (m) = m0 ,
      Transfer functions set F - a set of transfer functions.                                  where:
      fs ∈ F is the transfer function for statement s.                                                For all v0 6= v, m0 (v0 ) = m(v0 ).
      The dataflow values are given by a map: m: Vars → ConstantVal                                   If the RHS of the statement is a constant c, then m0 (v) = c.
                                                                                                      If the RHS is an expression (say y op z),
      If m is the set of input dataflow values, then m0 = fs (m) gives the
                                                                                                                   
      output dataflow values.                                                                                       m(y) op m(z) if m(y) and m(z) are constant values
      Generate equations like before.                                                                     m0 (v) =     ⊥                  if either of m(y) and m(z) is ⊥
                                                                                                                       >                  Otherwise
                                                                                                                   
                                                                                                      If the RHS is an expression that cannot be evaluated, then
                                                                                                      m0 (v) = ⊥.
                                                                                               At a merge point, get a meet of the flow maps.
                                                                           *                                                                                            *
V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                          23 / 1    V.Krishna Nandivada (IIT Madras)    CS591-5 - May 2018                             24 / 1
Constant Propagation - example I                                                             Constant Propagation - example II
      x = 10;
      y = 1;                                                                                        x = 10;
      z = 5;                                                                                        y = 1;
      if  (cond) {                                                                                  z = 1;
         y = y / x;                                                                                 while (x > 1) {
         x = x - 1;                                                                                     y = x * y;
         z = z + 1;                                                                                     x = x - 1;
      } else {                                                                                          z = z * z;
         z = z + y;                                                                                 }
         y = 0;                                                                                     A[x] = y + z;
      }
      print x + y + z;
                                                                                *                                                                    *
 V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                              25 / 1   V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018       26 / 1
constant propagation: Non distributive
                                              Say f1 , f2 , and f3 represent the
                                              transfer functions of B1 , B2 and B3 ,
                                              respectively.
                                              f3 (f1 (m0 ) ∧ f2 (m0 )) v
                                              f3 (f1 (m0 )) ∧ f3 (f2 (m0 ))
 V.Krishna Nandivada (IIT Madras)   CS591-5 - May 2018                              27 / 1