D t Fl
Data Flow A
Analysis
l i 1
Compiler Design
Compiler
p Structure
Abstract Control
Source Object
Syntax Flow
code code
tree Graph
•Source code parsed to produce abstract syntax tree.
•Abstract syntax tree transformed to control flow graph.
•Data flow analysis
y operates
p on the control flow g
graph
p
(and other intermediate representations).
Abstract Syntax
y Tree (AST)
( )
•Programs
g are written in text
• as sequences of characters
• may be awkward to work with.
•First step: Convert to structured representation.
• Use lexer ((like lex)) to recognize
g tokens
• Use parser (like yacc) to group tokens structurally
• often produce to produce AST
Abstract Syntax
y Tree Example
p
x := a + b;; program
p g
y := a * b
= … while
w
While (y > a){
a := a +1; x + > block
x := a + b a b = …
y a
} a +
a 1
ASTs
• ASTs are abstract
• don’t contain all information in the program
• e.g., spacing, comments, brackets, parenthesis.
• Any ambiguity has been resolved
• e.g., a + b + c produces the same AST as
(a +b) + c.
Disadvantages
g of ASTs
•ASTs
ASTs have many similar forms
• e.g., for while, repeat , until, etc
• e.g.,
e g if,
if ?,
? switch
•Expressions in AST may be complex, nested
(42 * y) + ( z > 5 ? 12 * z : z +20)
•Want simpler representation for analysis
• … at least for dataflow analysis.
Control-Flow Graph
p ((CFG))
•A directed graph where
•Each node represents a statement
•Edges
g represent
p control flow
•Statements may be
•Assignments x = y op z or x = op z
•Copy
C statements
t t t x=y
•Branches goto L or if relop y goto L
•etc
Control-flow Graph
p Example
p
x := a + b;
y := a * b
While (y > a){
a := a +1;
x := a + b
}
Variations on CFGs
•Usually
Usually don’t
don t include declarations (e.g. int x;).
•May want a unique entry and exit point.
•May group statements into basic blocks.
• A basic
b i blblock
k is
i a sequence off instructions
i t ti with
ith no
branches into or out of the block.
Control-Flow Graph
p with Basic Blocks
X := a + b;
Y := a * b
While (y > a){
a := a +1;
x := a + b
•Can lead to more efficient implementations
• But more complicated to explain so…
•We will use single-statement blocks in lecture
CFG vs. AST
•CFGs are much simpler
p than ASTs
• Fewer forms, less redundancy, only simple
expressions
•But, ASTs are a more faithful representation
• CFGs introduce temporaries
• Lose block structure of program
•So for AST,
• Easier to report error + other messages
• Easier to explain to programmer
• Easier to unparse to produce readable code
Data Flow Analysis
y
• A framework for proving facts about program
• Reasons about lots of little facts
• Little or no interaction between facts
• Works best on properties about how program
computes
• Based on all paths through program
• including infeasible paths
Available Expressions
p
•An expression
p e = x op
p y is available at a p
program
g
point
i t p, if
• on every path from the entry node of the graph to node p, e
is computed at least once, and
• And there are no definitions of x or y since the most recent
occurance of e on the path
•Optimization
• If an expression is available
available, it need not be recomputed
• At least, if it is in a register somewhere
Data Flow Facts
•Is
I expression
i e available?
il bl ?
•Facts:
•a + b is available
•a * b is available
•a + 1 is available
Gen and Kill
What is the effect of each
statement on the set of facts?
stmt gen kill
x = a + b a + b
y = a * b a * b
a + b
a = a + 1 a * b
a + 1
Computing Available Expressions
{a + b}
{a + b, a * b}
{a + b}
{a + b, a * b} {a + b}
{a + b}
{a + b}
Terminology
gy
•A
A join point is a program point where two branches
meet
•Available expressions is a forward,
forward must problem
• Forward = Data Flow from in to out
• Must
M t = At joint
j i t point,
i t property
t mustt hold
h ld on all
ll
paths that are joined.
Data Flow Equations
q
• Let s be a statement
• succ(s) = {immediate successor statements of s}
• Pred(s) = {immediate predecessor statements of s}
• In(s) program point just before executing s
• Out(s) = program point just after executing s
• In(s)
( )=I s’ ∈ pred(s)
s Out(s’)
( )
• Out(s) = Gen(s) ∪ (In(s) – Kill(s))
• Note these are also called transfer functions
Liveness Analysis
y
•A variable v is live at a program point p if
• v will be used on some execution path
originating from p before v is overwritten
•Optimization
p
• If a variable is not live, no need to keep it
in a register
g
• If a variable is dead at assignment, can
eliminate assignment.
g
Data Flow Equations
q
• Available expressions is a forward must analysis
• Data flow propagate in same direction as CFG edges
• Expression is available if available on all paths
•Liveness is a backward may problem
• to kow if variable is live,
live need to look at future uses
• Variable is live if available on some path
• In(s) = Gen(s) ∪ (Out(s) – Kill(s))
• Out(s) = U s’’ ∈ succ(s)
( ) In(s’))
In(s
Gen and Kill
What is the effect of each
statement on the set of facts?
stmt gen kill
x = a + b a, b x
y = a * b a, b y
y > a a, y
a = a + 1 a
a
Computing
p g Live Variables
{x}
Computing
p g Live Variables
{x, y, a}
{x}
Computing
p g Live Variables
{x, y, a}
{x}
{x yy, a}
{x,
Computing
p g Live Variables
{x, y, a}
{x}
{y, a, b}
{x yy, a}
{x,
Computing
p g Live Variables
{x, y, a}
{y, a, b} {x}
{y, a, b}
{x yy, a}
{x,
Computing
p g Live Variables
{x, y, a, b}
{y, a, b} {x}
{y, a, b}
{x yy, a}
{x,
Computing
p g Live Variables
{x, y, a, b}
{y, a, b} {x}
{y, a, b}
{x y,
{x, y a,
a b}
Computing
p g Live Variables
{x, a, b}
{x, y, a, b}
{y, a, b} {x}
{y, a, b}
{x y,
{x, y a,
a b}
Computing
p g Live Variables
{a, b}
{x, a, b}
{x, y, a, b}
{y, a, b} {x}
{y, a, b}
{x y,
{x, y a,
a b}
Very
y Busy
y Expressions
p
•An expression
p e is very
y busy
y at point
p p if
• On every path from p, e is evaluated before the
value of e is changed
•Optimization
• Can hoist very
y busy
y expression
p computation
p
•What kind of problem?
• Forward or backward? Backward
• May or must? Must
Code Hoisting
g
• Code hoisting finds expressions that are always
evaluated following some point in a program,
regardless of the execution path and moves them
to the latest point beyond which they would always
be evaluated.
• It is a transformation that almost always reduces
the space occupied but that may affect its
execution time positively or not at all.
all
Reaching
g Definitions
•A definition of a variable v is an assignment
g to v
•A definition of variable v reaches point p if
• There
Th is
i no intervening
i t i assignment
i t to
t v
•Also called def-use information
•What kind of problem?
• Forward or backward? Forward
• May or must? may
Space
p of Data Flow Analyses
y
May Must
Reaching Available
F
Forward
d expressions
definitions
Backward Li
Live V
Very busy
b
Variables expressions
• Most data flow analyses can be classified this way
• A few don
don’tt fit: bidirectional
• Lots of literature on data flow analysis
Data Flow Facts and lattices
Typically data flow facts form a lattice
Typically,
Example, Available expressions
“top”
“bottom”
Partial Orders
•A partial order is a pair (P,
(P ) such that
• ⊆ P × P
• is reflexive: x x
y
• is anti-symmetric: x y and y x implies
p
x=y
• is transitive: x y and y z implies
p x z
Lattices
• A partial order is a lattice if and are defined so that
• is the meet or greatest lower bound operation
• x y x and x y y
• If z x and z y then z x y
• is the join or least upper bound operation
• x x y and y x y
• If x z and y z,
z then x y z
Lattices (cont.)
( )
A finite p
partial order is a lattice if meet and jjoin exist
f every pair
for i off elements
l t
A lattice has unique elements bot and top such that
x ⊥ =⊥ x ⊥ =x
x =x x =
In a lattice
x y iff x y = x
x y iff x y = y
Useful Lattices
• (2S , ⊆ ) forms a lattice for any set S.
• 2S is the powerset of S (set of all subsets)
• If (S, ) is a lattice, so is (S,≥ )
• i.e., lattices can be flipped
• The lattice for constant propagation
1 2 3 …
⊥
Forward Must Data Flow Algorithm
g
Out(s) = Gen(s) for all statements s
W = {all statements} (worklist)
Repeat
Take s from W
In(s) = I s’ ∈ pred(s) Out(s’)
Temp = Gen(s) ∪ (In(s) – Kill(s))
If (temp != Out (s)) {
Out(s) = temp
W = W ∪ succ(s)
Until W = ∅
Monotonicity
y
• A function f on a partial order is monotonic if
x y implies f(x) f(y)
• Easy to check that operations to compute In and
Out are monotonic
• In(s) = I s’ ∈ pred(s) Out(s’)
• Temp = Gen(s) ∪ (In(s) – Kill(s))
• Putting the two together
• Temp = fs (I s’ ∈ pred(s) Out(s’))
Termination
•We know algorithm terminates because
• The lattice has finite height
• The operations to compute In and Out are
monotonic
• On every iteration we remove a statement
from the worklist and/or move down the
l tti
lattice.