Lecture 3
Foundation of Data Flow Analysis
I Semi-lattice (set of values, meet operator)
II Transfer functions
III Correctness, precision and convergence
IV Meaning of data flow solution
Reading: Chapter 9.3
Recapping Lecture 2: Data Flow Framework
Reaching Definitions Live Variables
Domain Sets of definitions Sets of variables
Direction forward: backward:
out[b] = fb(in[b]) in[b] = fb(out[b])
in[b] = Ù out[pred(b)] out[b] = Ù in[succ(b)]
Transfer function fb(x) = Genb È (x –Killb) fb(x) = Useb È (x -Defb)
Meet Operation (Ù) È È
Boundary Condition out[entry] = Æ in[exit] = Æ
Initial interior points out[b] = Æ in[b] = Æ
CS243: Foundation of Data Flow 2 M. Lam
Thought Problem 1. “Must-Reach” Definitions
• A definition D (a = b+c) must reach point P iff
– D appears at least once along on all paths leading to P
– a is not redefined along any path after last appearance of D
and before P
• How do we formulate the data flow algorithm for this problem?
MAY Reach MUST Reach
Domain Sets of definitions Sets of definitions
Direction forward: forward:
out[b] = fb(in[b]) out[b] = fb(in[b])
in[b] = Ù out[pred(b)] in[b] = Ù out[pred(b)]
Transfer function fb(x) = Genb È (x –Killb) fb(x) = Genb È (x –Killb)
Meet Operation (Ù) È ∩
Boundary Condition out[entry] = Æ out[entry] = Æ
Initial interior pts out[b] = Æ out[b] = U
CS243: Foundation of Data Flow 3 M. Lam
Problem 1: Example
d1: b = 1
• Quiz 1: Does the precise answer satisfy the equations?
• Quiz 2: Are there alternative solutions that satisfy the equations?
• Quiz 3: Are the alternative solutions incorrect?
• Quiz 4: How do you get the precise answer?
• Quiz 5: What is a wrong answer?
CS243: Foundation of Data Flow 4 M. Lam
Problem 2: A legal solution to (May) Reaching Def?
• Quiz 1: Is this a solution to the equation?
• Quiz 2: Will the worklist algorithm generate this answer?
• Quiz 3: Is this solution safe or incorrect?
CS243: Foundation of Data Flow 6 M. Lam
Summary
• There may be multiple (fixed-point) solutions to a set of equations
– Non-solutions are incorrect
– All solutions to the equations are considered safe, but maybe imprecise
– Initialization of the interior points gives different solutions
• There is a direction to the solutions
• Answer is precise only when the initialization of interior points is
– U for must-reach
– {} for may-reach
Precise Answer
Must Reach {} U Definitions
Safe Incorrect
Imprecise
Precise Answer
May Reach {} U Definitions
Incorrect Safe
Imprecise
CS243: Foundation of Data Flow 7 M. Lam
Problem 3. What are the algorithm properties?
• Correctness
• Precision: how good is the answer?
• Convergence: will the analysis terminate?
• Speed: how long does it take?
CS243: Foundation of Data Flow 8 M. Lam
I. Purpose of a Framework
• Purpose 1
– Prove properties of entire family of problems once and for all
• Will the program converge?
• What does the solution to the set of equations mean?
• Purpose 2:
– Aid in software engineering: re-use code
CS243: Foundation of Data Flow 9 M. Lam
The Data-Flow Framework
• Data-flow problems (F, V, Ù) are defined by
– A semi-lattice
• domain of values V
• meet operator Ù: V x V à V
– A family of transfer functions F: V à V
CS243: Foundation of Data Flow 10 M. Lam
Lecture Outline
I Semi-lattice (set of values, meet operator)
II Transfer functions
III Correctness, precision and convergence
IV Meaning of data flow solution
CS243: Foundation of Data Flow 11 M. Lam
Semi-lattice: Structure of the Domain of Values
• A semi-lattice S = <a set of values V, a meet operator Ù>
• Properties of the meet operator
– idempotent: x Ù x = x
– commutative: x Ù y = y Ù x
– associative: x Ù (y Ù z) = (x Ù y) Ù z
• Examples of meet operators ?
• Non-examples ?
CS243: Foundation of Data Flow 12 M. Lam
Example of a Semi-Lattice Diagram
• (V, Ù ) : V = {x | such that x Í {d1,d2,d3}}, Ù = U
{} (T)
{d1} {d2} {d3}
{d1,d2} {d1,d3} {d2,d3}
{d1,d2,d3} (^)
• x Ù y = first common descendant of x & y important
• A meet semi-lattice is bounded if there exists a top element T,
such that x Ù T = x for all x.
• A bottom element ^ exists, if x Ù ^ = ^ for all x.
CS243: Foundation of Data Flow 13 M. Lam
A Meet Operator Defines a Partial Order
y
• Partial order of a meet semi-lattice
(x Ù y = x) (x≤ y)
path
≡ ≡
≤ : x ≤ y if and only if x Ù y = x x
{} (T) (max)
• Meet operator: U
{d1} {d2} {d3}
Partial order ≤ :
{d1,d2} {d1,d3} {d2,d3}
{d1,d2,d3} (^) (min)
• Properties of meet operator guarantee that ≤ is a partial order
– Reflexive: x ≤ x
– Antisymmetric: if x ≤ y and y ≤ x then x = y
– Transitive: if x ≤ y and y ≤ z then x ≤ z
CS243: Foundation of Data Flow 14 M. Lam
Another Example
• Semi-lattice
– V = {x | such that x Í {d1, d2, d3}}
– Ù=∩
{d1,d2,d3} (T)
{d1,d2} {d1,d3} {d2,d3}
{d1} {d2} {d3}
{} (^)
– ≤ is
CS243: Foundation of Data Flow 15 M. Lam
Meet Semi-Lattices vs Partially Ordered Sets
• A meet-semilattice is a partially ordered set which
has a meet (or greatest lower bound) for any nonempty finite subset.
{} (T)
{d1} {d2} {d3}
{d1,d2} {d1,d3} {d2,d3}
{d1,d2,d3} (^)
• Greatest lower bound: x Ù y = First common descendant of x & y
• Largest: top element T, if x Ù T = x for all x.
• Smallest: bottom element ^, if x Ù ^ = ^ for all x.
CS243: Foundation of Data Flow 16 M. Lam
Drawing a Semi-Lattice Diagram
• (x < y) ≡ (x ≤ y) Ù (x ≠ y)
• A semi-lattice diagram:
– Set of nodes: set of values
– Set of edges {(y, x): x < y and ¬ $z s.t. (x < z) Ù (z < y)}
CS243: Foundation of Data Flow 17 M. Lam
Summary
Three ways to define a semi-lattice:
• Set of values + meet operator
– idempotent: x Ù x = x
– commutative: x Ù y = y Ù x
– associative: x Ù (y Ù z) = (x Ù y) Ù z
• Set of values
+ partial order with a greatest lower bound for any nonempty subset
– Reflexive: x ≤ x
– Antisymmetric: if x ≤ y and y ≤ x then x = y
– Transitive: if x ≤ y and y ≤ z then x ≤ z
• A semi-lattice diagram
CS243: Foundation of Data Flow 18 M. Lam
One Element at a Time
• A semi-lattice for data flow problems can get quite large:
2n elements for n var/definition
• A useful technique:
– define semi-lattice for 1 element
– product of semi-lattices for all elements
– <x1, x2> ≤ <y1, y2> iff x1 ≤ y1 and x2 ≤ y2
• Example: Union of definitions
– For each element
def1 def2 def1 x def2
{} {} {},{}
{d1} {d2}
{d1},{} {},{d2}
{d1},{d2}
CS243: Foundation of Data Flow 19 M. Lam
Descending Chain
• Definition
– The height of a lattice is the largest number of >
relations that will fit in a descending chain.
x 0 > x1 > …
• Height of values in reaching definitions?
• Important property: finite descending chains
CS243: Foundation of Data Flow 20 M. Lam
II. Transfer Functions
• A family of transfer functions F
• Basic Properties f : V à V
– Has an identity function
• $f such that f(x) = x, for all x.
– Closed under composition
• if f1,f2Î F, f1•f2Î F
CS243: Foundation of Data Flow 21 M. Lam
Monotonicity: 2 Equivalent Definitions
• A framework (F, V, Ù) is monotone iff
– x ≤ y implies f(x) ≤ f(y)
• Equivalently,
a framework (F, V, Ù) is monotone iff
– f(x Ù y) ≤ f(x) Ù f(y),
– meet inputs, then apply f
≤
apply f individually to inputs, then meet results
CS243: Foundation of Data Flow 22 M. Lam
Example
• Reaching definitions: f(x) = Gen U (x - Kill), Ù = U
– Definition 1:
• Let x1 ≤ x2,
f(x1): Gen U (x1 - Kill)
f(x2): Gen U (x2 - Kill)
– Definition 2:
• f(x1 Ù x2) = (Gen U ((x1 U x2) - Kill))
f(x1) Ù f(x2) = (Gen U (x1 - Kill) ) U (Gen U (x2 - Kill) )
CS243: Foundation of Data Flow 23 M. Lam
Distributivity
• A framework (F, V, Ù) is distributive if and only if
f(x Ù y)= f(x) Ù f(y),
meet input, then apply f is equal to
apply the transfer function individually then merge result
CS243: Foundation of Data Flow 24 M. Lam
Important Note
• Monotone framework does not mean that f(x) ≤ x
– e.g. Reaching definition for two definitions in program
– suppose: f: Gen = {d1} ; Kill = {d2}
– Quiz: What are the inputs and outputs of f?
CS243: Foundation of Data Flow 25 M. Lam
III. Properties of Iterative Algorithm
• Given
A monotone data flow framework
With finite descending chains
• The iterative algorithm where all interior points are initialized to T
– Converges
– To the Maximum Fixed Point (MFP) solution of equations
CS243: Foundation of Data Flow 26 M. Lam
Key Concept
• The answer is a set of values for all basic block boundaries:
{ in[b], out[b] | b in the program}
• We need to prove the invariant:
– Values assigned to the same in[b] or out[b]
cannot increase in each iteration of the algorithm
• The algorithm converges
if the semilattice has finite descending chains
• Given an initialization of T,
the answer is the MFP (Maximum Fixed Point),
because any larger value is not a solution.
CS243: Foundation of Data Flow 27 M. Lam
Sketch of Inductive Proof
The boundary is set to the correct answer and never changed
For each IN/OUT of an interior program point:
• Invariant: new value ≤ old value in any step
• Initialize interior points with T (largest value)
• Proof by induction
– Base case:
1st transfer function or meet operator: new value ≤ old value (T)
– Meet operation:
• Assume new inputs ≤ old inputs, new output ≤ old output
– Transfer function (in a monotone framework):
• Assume new inputs ≤ old inputs, new output ≤ old output
CS243: Foundation of Data Flow 28 M. Lam
IV. What Does the Solution Mean?
• IDEAL data flow solution
– Let f1, ..., fm : Î F, fi is the transfer function for node i
fp = fnk•… • fn1, p is a path through nodes n1, ..., nk
fp = identify function, if p is an empty path
– For each node n: Ùfpi (boundary value),
for all possibly executed paths pi reaching n
– Example
if sqr(y) >= 0
false true
x=0 x=1
• Determining all possibly executed paths is undecidable
CS243: Foundation of Data Flow 29 M. Lam
Meet-Over-Paths MOP
• Err in the conservative direction
• Meet-Over-Paths MOP
– Assume every edge is traversed
– For each node n:
– MOP(n) = Ùfpi (boundary value), for all paths pi reaching n
• Compare MOP with IDEAL
– MOP includes more paths than IDEAL
– MOP = IDEAL Ù Result(Unexecuted-Paths)
– MOP ≤ IDEAL
– MOP is a “smaller” solution, more conservative, safe
• MOP ≤ IDEAL
– Goal: as close to MOP from below as possible
CS243: Foundation of Data Flow 30 M. Lam
Solving Data Flow Equations
• What is the difference between MOP and MFP of data flow
equations?
F1 F2
MOP: f3 ( f1 (x)) Ù f2 ( f1 (x))
F3
Worklist algorithm: f3 ( f1(x) Ù f2 (x))
• Therefore
– FP ≤ MFP ≤ MOP ≤ IDEAL
– FP, MFP, MOP are safe
– If framework is distributive, FP ≤ MFP = MOP ≤ IDEAL
CS243: Foundation of Data Flow 31 M. Lam
Summary
• A data flow framework
– Semi-lattice
• set of values (top)
• meet operator
• finite descending chains?
– Transfer functions
• summarizes each basic block
• boundary conditions
• Properties of data flow framework:
– Monotone framework and finite descending chains
⇒ iterative algorithm converges
⇒ finds maximum fixed point (MFP)
⇒ FP ≤ MFP ≤ MOP ≤ IDEAL
– Distributive framework
⇒ FP ≤ MFP = MOP ≤ IDEAL
CS243: Foundation of Data Flow 32 M. Lam