Code Optimization Techniques Guide
Code Optimization Techniques Guide
T-4
PART-A
Code Optimization
Introdu
ction
• Concerns with machine-independent code optimization
Loop, proc
Profile calls, addr Reg usage,
instruction
and choice,
optimize calculation peephole opt
(user) improvem (compiler)
ent
(compiler)
Introdu
ction
• Organization of an optimizing compiler
Control
Data flow
flow Transformation
analysis
analysis
Code optimizer
Classifications of
Optimization techniques
Peephole optimization
Local Optimization
Global Optimization
Inter-procedural
Intra-procedural
Loop Optimization
Factors influencing
Optimization
• The target machine: machine dependent factors can be
parameterized to compiler for fine tuning
• Machine Architecture
– Cache Size and type
– Cache/Memory transfer
rate
Themes behind
Optimization Techniques
• Avoid redundancy: something already computed need
not be computed again
• Smaller code: less work for CPU, cache, and memory!
• Less jumps: jumps interfere with code pre-fetch
• Code locality: codes executed close together in time is
generated close together in memory – increase locality of
reference
• Extract more information about code: More info –
better code generation
Redundancy
• elimination
Redundancy elimination = determining that two computations
are equivalent and eliminating one.
– Value numbering
• Associates symbolic values to computations and
identifies expressions that have the same value
• Two ways:
– Constant folding
– Constant propagation
Compile-Time
Evaluation
• Constant folding: Evaluation of an expression with constant
operands to replace the expression with single value
• Example:
area := (22.0/7.0) * r ** 2
area := 3.14286 * r ** 2
Compile-Time
Evaluation
• Constant Propagation: Replace a variable with constant
which has been assigned to it earlier.
• Example:
pi := 3.14286
area = pi * r ** 2
area
3.14286
*
Constant
• What does it mean?
Propagation
– Given an assignment x = c, where c is a constant, replace
later uses of x with uses of c, provided there are no
intervening assignments to x.
• Similar to copy propagation
• Extra feature: It can analyze constant-value
conditionals to determine whether a branch should
be executed or not.
• When is it performed?
– Early in the optimization process.
• What is the result?
– Smaller code
– Fewer registers
Common Sub-
•
expression Evaluation
Identify common sub-expression present in different
expression, compute once, and use the result in all the
places.
– The definition of the variables involved should not
change
Example:
a := b * c temp := b * c
… a := temp
… …
x := b * c + 5 x := temp + 5
Common
Subexpression
• Elimination
Local common subexpression elimination
– Performed within basic blocks
– Algorithm sketch:
• Traverse BB from top to bottom
• Maintain table of expressions evaluated so far
– if any operand of the expression is redefined,
remove it from the table
Common
Subexpression
Elimination
• Modify applicable instructions as you go
– generate temporary variable, store the expression
in it and use the variable next time the
expression is encountered.
X=a+b T=a+b
…… X=t
…… …..
Y=a+b Y=t
Common
Subexpression
Elimination
t1 = a + b
c=a+b c = t1
d = m * t2 = m * n
n e = b d = t2
+ d f = t3 = b + d
a + b g e = t3
=-b f = t1
h=b+a g=-
a=j+a b
k = m * h = t1 /* commutative */
n j = b a=j+a
+ d a = k = t2
-b j =
if m * n t3 a
go to L = -b
the table contains quintuples: if t2
go to
(pos, opd1, opr, opd2, tmp) L
Common
Subexpression
•
Elimination
Global common subexpression elimination
– Performed on flow graph
– Requires available expression information
• In addition to finding what expressions are available
at the endpoints of basic blocks, we need to know
where each of those expressions was most recently
evaluated (which block and which position within
that block).
Common Sub-
expression Evaluation
1 x:=a+b
“a + b” is not a
2 a:= b 3 common sub-
expression in 1
and 4
z : = a + b + 10 4
Example:
if (a<b) then if (a<b) then
z=x*2 temp = x * 2
z = temp
else else
y = 10 y = 10
temp = x * 2
g=x*2 g=
temp;
Code
Motion
• Move expression out of a loop if the evaluation
does not change inside the loop.
Example:
while ( i < (max-2) ) …
Equivalent to:
t := max -
2 while
( i < t
) …
Code
Motion
• Safety of Code movement
Movement of an expression e from a basic block bi to
another block bj, is safe if it does not introduce any new
occurrence of e along any path.
Example:
temp = 5;
for i=1 to 10 do for i=1 to 10 do
… …
x=i*5 x = temp
… …
temp = temp + 5
end end
• Typical cases of strength reduction occurs in address
calculation of array references.
• Applies to integer expressions involving induction variables
(loop optimization)
Dead Code
Elimination
• Dead Code are portion of the program which will not be
executed in any path of the program.
– Can be removed
• Examples:
– No control flows into a basic block
– A variable is dead at a point -> its value is not used
anywhere in the program
– An assignment is dead -> assignment assigns a value to a
dead variable
Dead Code
Elimination
• Examples:
•i=j;
•…
•X=i+10
•The optimization can be performed by
•Eliminating the assignment statement
•i=j
.
This assignment statement is called dead
assignment .
Copy
Propagatio
• What does it mean?
–
n
Given an assignment x = y, replace later uses of x
with uses of y, provided there are no intervening
assignments to x or y.
• When is it performed?
– At any level, but usually early in the optimization
process.
i := i + 1
ADD i, #1 INC i
Local
Optimizatio
n
Optimization of
Basic Blocks
• Many structure preserving transformations can be
implemented by construction of DAGs of basic blocks
representati
on
•
of Basic
Leaves are labeled with unique identifier (var name or const)
• Block
Interior nodes are labeled by an(BB)
operator symbol
• Nodes optionally have a list of labels (identifiers)
• Edges relates operands to the operator (interior nodes are
operator)
• Interior node represents computed value
– Identifier in the label are deemed to hold the value
Example:
DAG for BB
t1
t1 := 4 * *
i 4 i
t1 := 4 * i
t3 := 4 * i
t2 := t1 + if (i <= 20)goto L1
t3
+ t2 (L1)
<=
* i 20
4 t1, i t3
Construction of
DAGs for BB
• I/p: Basic block, B
• O/p: A DAG for B containing the following information:
1) A label for each node
2) For leaves the labels are ids or consts
3) For interior nodes the labels are operators
4) For each node a list of attached ids (possible empty list,
no consts)
Construction of
DAGs for BB
• Data structure and functions:
– Node:
1) Label: label of the node
2) Left: pointer to the left child node
3) Right: pointer to the right child node
4) List: list of additional labels (empty for leaves)
– Node (id): returns the most recent node created for id.
Else return undef
– Create(id,l,r): create a node with label id with l as left
child and r as right child. l and r are optional params.
Construction of
• Method:
DAGs for BB
For each 3AC, A in B
A if of the following forms:
1. x := y op z
2. x := op y
3. x := y
1. if ((ny = node(y)) ==
undef) ny = Create (y);
if (A == type 1)
and ((nz = node(z)) == undef)
nz = Create(z);
Construction of
2. If (A == type 1) DAGs for BB
Find a node labelled ‘op’ with left and right as ny and nz
respectively [determination of common sub-
expression]
If (not found) n = Create (op, ny,
nz); If (A == type 2)
Find a node labelled ‘op’ with a
single child as ny
If (not found) n = Create (op, ny);
If (A == type 3) n = Node (y);
3. Remove x from Node(x).list
Add x in n.list
Node(x) = n;
Example: DAG
construction from BB
t1 := 4 *
i
* t1
4 i
Example: DAG
construction from BB
t1 := 4 * i
t2 := a [ t1 ]
[] t2
*
a 4 i
t1
Example: DAG
construction from BB
t1 := 4 * i
t2 := a [ t1]
t3 := 4 * i
[] t2
*
a 4 i
t1, t3
Example: DAG
construction from BB
t1 := 4 * i
t2 := a [ t1
] t3 := 4
* i
t4 := b [ t3 t4 [] [] t2
]
* t1, t3
b a 4 i
Example: DAG
construction from BB
t1 := 4 * i
t2 := a [ t1
] t3 := 4 + t5
* i
t4 := b [ t3 t4 [] [] t2
]
t5 := t2 + t4 * t t
1, 3
b a 4 i
Example: DAG
construction from BB
t1 := 4 * i
t2 := a [ t1
] t3 := 4 + t5,i
* i
t4 := b [ t3 t4 [] [] t2
i
] :=
t5 t:=
5 t2
* t1, 3
+ t4
t
b a 4 i
DAG of a
Basic Block
• Observations:
– A leaf node for the initial value of an id
– A node n for each statement s
– The children of node n are the last definition (prior to s)
of the operands of n
Optimization of
Basic Blocks
• Common sub-expression elimination: by construction of DAG
– Note: for common sub-expression elimination, we are
actually targeting for expressions that compute the same
value.
a := b +c
Common expressions
b := b –
But do not generate the
d c := c same result
+d e :=
b +c
Optimization of
Basic Blocks
• DAG representation identifies expressions that yield
the same result
+ e
a.:= b + c
b.:= b –
d c :=
+ a
- b + c
c + d
e := b +
c b0 c0 d0
Optimization of
Basic Blocks
• Dead code elimination: Code generation from DAG
eliminates dead code.
c +
a.:= b + c a := b + c
b.:= a – d
d := a – d
×b,d - d := a - d
c := d + c
c := d + c a +
d0
b is not
b0 c0
live
Loop
Optimizatio
n
Loop
Optimizatio
•
ns
Most important set of optimizations
– Programs are likely to spend more time in loops
• Presumption: Loop has been identified
• Optimizations:
– Loop invariant code removal
– Induction variable strength reduction
– Induction variable reduction
Loops in
Flow Graph
• Dominators:
A node d of a flow graph G dominates a node n, if every
path in G from the initial node to n goes through d.
Corollaries:
Every node dominates itself.
The initial node dominates all nodes in G.
The entry node of a loop dominates all nodes in the loop.
Loops in
Flow Graph
• Each node n has a unique immediate dominator m, which is
the last dominator of n on any path in G from the initial
node to n.
(d ≠ n) && (d dom n) → d dom m
• Dominator tree (T):
A representation of dominator information of flow
graph
G.
• The root node of T is the initial node of G
• A node d in T dominates all node in its sub-tree
Example: Loops in
Flow Graph
1 1
2 3
2 3
4
4
5 6
5 6 7
7
8 9
8 9
Dominator Tree
Flow Graph
Loops in
Flow Graph
• Natural loops:
1. A loop has a single entry point, called the “header”.
Header dominates all node in the loop
2. There is at least one path back to the header from the
loop nodes (i.e. there is at least one way to iterate the
loop)
• Loop fission: break a loop into multiple loops over the same
index range but each taking only a part of the loop's body.
• Derived: r2 Loop: r2 = r1 * 4
r4 = r7 +
3 r7 = r7
+1
r10 = *r2
r3 = *r4
r9 = r1 * r3
r10 = r9 >> 4
*r2 = r10
r1 = r1 + 4
If(r1 < 100)
goto Loop
Induction Variable
Strength Reduction
• Create basic induction variables from derived induction
variables.
• Rules: (S: x := y op z)
– op is *, <<, +, or –
– y is a induction variable
– z is invariant
– No other statement modifies x
– x is not y or z
– x is a register
Induction Variable
Strength Reduction
• Transformation:
Insert the following into the bottom of pre-header:
new_reg = expression of target statement S
if (opcode(S)) is not add/sub, insert to the bottom of the
preheader
Function: inc()
new_inc = inc(y,op,z)
else Calculate the amount of inc
new_inc = inc(x) for 1st param.
Insert the following at each update of y
new_reg = new_reg + new_inc
Change S: x = new_reg
Example: Induction Variable
Strength Reduction
new_reg = r4 * r9
new_inc = r9
r5 = r4 - 3 r5 = r4 - 3
r4 = r4 + 1 r4 = r4 + 1
new_reg += new_inc
r7 = r4 *r9
r7 = new_reg
r6 = r4 << 2 r6 = r4 << 2
Induction Variable
• Elimination
Remove unnecessary basic induction variables from the loop
by substituting uses with another basic induction variable.
• Rules:
– Find two basic induction variables, x and y
– x and y in the same family
• Incremented at the same place
– Increments are equal
– Initial values are equal
– x is not live at exit of loop
– For each BB where x is defined, there is no use of x
between the first and the last definition of y
Example: Induction
Variable Elimination
r1 = 0 r2 = 0
r2 = 0
r1 = r1 - 1 r2 = r2 - 1
r2 = r2 -
1
r9 = r2 + r4 r7 = r1 * r9 r9 = r2 + r4 r7 = r2 * r9
r4 = *(r1) r4 = *(r2)
*r2 = r7 *r7 = r2
Induction Variable
• Variants:
Elimination
1. Trivial: induction variable that are never used except to
Complexity of elimination
r4 := r2 + 8 r4 := r1 + rx
r3 := r1 + 4 r3 := r1 = 4
. .
. .
r1 := r1 + 4 r1 := r1 + 4
r2 := r2 +
4
Loop
Unrolling
• Replicate the body of a loop (N-1) times, resulting in total N
copies.
– Enable overlap of operations from different iterations
– Increase potential of instruction level parallelism (ILP)
• Variants:
– Unroll multiple of known trip counts
– Unroll with remainder loop
– While loop unroll
Optimization
Constant
Folding
• Evaluate constant expressions at compile time
• Only possible when side-effect freeness guaranteed
c:= 1 + 3 c:= 4
553
Global Data
Flow Analysis
• Collect information about the whole program.
• Distribute the information to each block in the flow graph.
554
Data flow
• IMPORTANT!
analysis
– Data flow analysis should never tell us that a
transformation is safe when in fact it is not.
– When doing data flow analysis we must be
• Conservative
– Do not consider information that may
not
preserve the behavior of the program
• Aggressive
– Try to collect information that is as exact as
possible, so we can get the greatest benefit from
our optimizations.
555
Global Iterative Data
Flow Analysis
• Global:
– Performed on the flow graph
– Goal = to collect information at the beginning and end
of each basic block
• Iterative:
– Construct data flow equations that describe how
information flows through each basic block and solve
them by iteratively converging on a solution.
556
Global Iterative Data
• Flow
Components of data Analysis
flow equations
– Sets containing collected information
• in set: information coming into the BB from outside
(following flow of data)
• gen set: information generated/collected within the
BB
• kill set: information that, due to action within the BB,
will affect what has been collected outside the BB
• out set: information leaving the BB
– Functions (operations on these sets)
• Transfer functions describe how information changes
as it flows through a basic block
• Meet functions describe how information from
multiple paths is combined. 557
Global Iterative Data
Flow Analysis
• Algorithm sketch
– Typically, a bit vector is used to store the information.
• For example, in reaching definitions, each bit
position corresponds to one definition.
– We use an iterative fixed-point algorithm.
– Depending on the nature of the problem we are solving,
we may need to traverse each basic block in a forward
(top-down) or backward direction.
• The order in which we "visit" each BB is not
important in terms of algorithm correctness, but is
important in terms of efficiency.
– In & Out sets should be initialized in a conservative and
aggressive way.
558
Typical
problems
• Reaching definitions
– For each use of a variable, find all definitions that reach
it.
• Upward exposed uses
– For each definition of a variable, find all uses that it
reaches.
• Live variables
– For a point p and a variable v, determine whether v is
live at p.
• Available expressions
– Find all expressions whose value is available at some
point p.
559
Global Data
Flow Analysis
• A typical data flow equation:
S: statement
out[S] gen[S] (in[S]
kill[S])
in[S]: Information goes into S
kill[S]: Information get killed by S
gen[S]: New information generated by S
out[S]: Information goes out from S
560
Global Data
Flow Analysis
• The notion of gen and kill depends on the desired
information.
• In some cases, in may be defined in terms of out - equation
is solved as analysis traverses in the backward direction.
• Data flow analysis follows control flow graph.
– Equations are set at the level of basic blocks, or even for
a statement
561
Points and
Paths
• Point within a basic block:
– A location between two consecutive statements.
– A location before the first statement of the basic block.
– A location after the last statement of the basic block.
• Path: A path from a point p1 to pn is a sequence of points
p1, p2, … pn such that for each i : 1 ≤ i ≤ n,
– pi is a point immediately preceding a statement and pi+1 is
the point immediately following that statement in the
same block, or
– pi is the last point of some block and pi+1 is first point in
the successor block.
562
Example: Paths
and Points
d1: i := m – 1
d2: := n B1
j
d3: a := u1 Path:
p3 p1, p2, p3, p4,
d4: i := i + 1 B2 p5, p 6 … pn
p4
p5
p6 d5: j := j 1 B3
-
B4
p1 pn
d6 : a := u2 B5 B6
p2 563
Reaching
Definition
• Definition of a variable x is a statement that assigns or may
assign a value to x.
– Unambiguous Definition: The statements that certainly
assigns a value to x
• Assignments to x
• Read a value from I/O device to x
– Ambiguous Definition: Statements that may assign a
value to x
• Call to a procedure with x as parameter (call by
ref)
• Call to a procedure which can access x (x being in the
scope of the procedure)
• x is an alias for some other variable (aliasing)
564
• Assignment through a pointer that could refer x
Reaching
Definition
• A definition d reaches a point p
– if there is a path from the point immediately following d
to p and
– d is not killed along the path (i.e. there is not
redefinition of the same variable in the path)
• A definition of a variable is killed between two points when
there is another definition of that variable along the path.
565
Example: Reaching
Definition
d1: i := m – 1
d2: := B1
d3: j := n Definition of i (d1)
reaches p1
a u1
p1 Killed as d4, does not
reach p2.
p2 d4: i := i + 1 B2
Definition of i (d1)
does not reach B3, B4,
d5: j := j - 1 B3
B5 and B6.
B4
d6: a := u2 B5 B6
566
Reaching
Definition
• Non-Conservative view: A definition might reach a point
even if it might not.
– Only unambiguous definition kills a earlier definition
– All edges of flow graph are assumed to be traversed.
if (a == b) then a = 2
else if (a == b) then a =
4
The definition “a=4” is not reachable.
568
Data Flow analysis of a
Structured Program
• Region: A graph G’= (N’,E’) which is portion of the control
flow graph G.
– The set of nodes N’ is in G’ such that
• N’ includes a header h
• h dominates all node in N’
– The set of edges E’ is in G’ such that
• All edges a → b such that a,b are in N’
569
Data Flow analysis of a
Structured Program
570
Data Flow analysis of a
Structured Program:
Composition of
Regions
S1
S → “1 ; S2
S2
571
Data Flow analysis of a
Structured Program:
Composition of
Regions
if E goto S1
“ → if E then S1 else S2
S1 S2
572
Data Flow analysis of a
Structured Program:
Composition of
Regions
S1
“ → do S1 while E
if E goto S1
573
Data Flow
Equations
• Each region (or NT) has four attributes:
– gen[S]: Set of definitions generated by the block S.
If a definition d is in gen[S], then d reaches the end
of block S.
– kill[S]: Set of definitions killed by block S.
If d is in kill[S], d never reaches the end of block S.
Every path from the beginning of S to the end S
must have a definition for a (where a is defined
by d).
574
Data Flow
Equations
– in[S]: The set of definition those are live at the entry
point of block S.
– out[S]: The set of definition those are live at the exit
point of block S.
• The data flow equations are inductive or syntax
directed.
– gen and kill are synthesized attributes.
– in is an inherited attribute.
575
Data Flow
Equations
• gen[S] concerns with a single basic block. It is the set of
definitions in S that reaches the end of S.
• In contrast out[S] is the set of definitions (possibly defined
in some other block) live at the end of S considering all
paths through S.
576
Data Flow Equations
Single statement
gen[S] {d}
kill[S] Da {d}
S d: a := b + c
out[S] (in[S]
gen[S] kill[S])
577
Data Flow Equations
Composition
gen[S ] (gen[S1 ]
gen[S2 ] kill[S2 ])
kill[S ] kill[S2 ] (kill[S1 ] gen[S2 ]) S1
S
in[S1 ] in[S ]
in[S2 ] out[S1 ] S2
out[S ] out[S2
]
578
Data Flow Equations
if-then-else
gen[S] gen[S2 ]
gen[S1 ] kill[S] kill[S2 ]
kill[S1 ]
S S1 S2
in[S1 ] in[S ]
in[S2 ] in[S ]
out[S ] out[S2 ]
out[S1 ]
579
Data Flow
Equations Loop
gen[S]
gen[S1 ] kill[S]
kill[S1 ]
S S1
580
Data Flow
Analysis
• The attributes are computed for each region. The equations
can be solved in two phases:
– gen and kill can be computed in a single pass of a basic
block.
– in and out are computed iteratively.
• Initial condition for in for the whole program is
• In can be computed top- down
• Finally out is computed
581
Dealing
with loop
• Due to back edge, in[S] cannot be used as
in [S1]
• in[S1] and out[S1] are interdependent.
• The equation is solved iteratively.
• The general equations for in and out:
in[S] (out[Y ]: Y is a
predecessor of S)
out[S] gen[S] (in[S] 582
Reaching
definitions
• What is safe?
– To assume that a definition reaches a point even if it
turns out not to.
– The computed set of definitions reaching a point p
will be a superset of the actual set of definitions
reaching p
– Goal : make the set of reaching definitions as small
as possible (i.e. as close to the actual set as possible)
583
Reaching
definitions
• How are the gen and kill sets defined?
– gen[B] = {definitions that appear in B and reach the
end of B}
– kill[B] = {all definitions that never reach the end of
B}
• What is the direction of the analysis?
– forward
– out[B] = gen[B] (in[B] - kill[B])
584
Reaching
definitions
• What is the confluence operator?
– union
– in[B] = out[P], over the predecessors P of B
• How do we initialize?
– start small
• Why? Because we want the resulting set to be as
small as possible
– for each block B initialize out[B] = gen[B]
585
Computation of gen
and kill sets
for each basic block BB do
gen(BB) = kill(BB) = ;
for each statement
; (d: x := y op z) in sequential order in BB, do
kill(BB) = kill(BB) U G[x];
G[x] = d;
endfor
gen(BB) = U G[x]: for all id x
endfor
586
Computation of in
and out sets
for all basic blocks BB
in(BB) =
for all basic blocks
out(BB) = gen(BB)
BB change = true
while (change) do
change = false
for each basic block BB, do
old_out = out(BB)
in(BB) = U(out(Y)) for all predecessors Y
of BB
out(BB) = gen(BB) + (in(BB) –
kill(BB)) if (old_out != out(BB)) then
change = true
endfor 587
endfor
Live Variable
(Liveness) Analysis
• Liveness: For each point p in a program and each variable
y, determine whether y can be used before being
redefined, starting at p.
• Attributes
– use = set of variable used in the BB prior to its
definition
– def = set of variables defined in BB prior to any use of
the variable
– in = set of variables that are live at the entry point of
a
BB
– out = set of variables that are live at the exit point of a
BB
588
Live Variable
(Liveness) Analysis
• Data flow equations:
in[B] use[B] (out[B] def
[B])
out[B]
S succ( B) in[S]
– 1st Equation: a var is live, coming in the block, if either
• it is used before redefinition in B
or
• it is live coming out of B and is
not redefined in B
– 2nd Equation: a var is live coming out of B, iff it is live
coming in to one of its successors.
589
Example:
Liveness
r2, r3, r4, r5 are all live as they
r1 = r2 + r3 are consumed later, r6 is dead
r6 = r4 – as it is redefined later
r5 r4 is dead, as it is redefined.
r4 = 4 So is r6. r2, r3, r5 are
r6 = 8 live
r6 = r2 + r3
r7 = r4 – What does this mean?
r5 r6 = r4 – r5 is useless,
it produces a dead
value !!
590
Get rid of it!
Computation of use
and def sets
for each basic block BB do
def(BB) = ; use(BB) = ;
for each statement (x := y op z) in sequential order, do
for each operand y, do
if (y not in def(BB))
use(BB) = use(BB) U {y};
endfor
def(BB) = def(BB) U {x};
endfor
def is the union of all the LHS’s
use is all the ids used before defined 591
Computation of in
and out sets
for all basic blocks BB
in(BB) = ;
change = true;
while (change) do
change = false
for each basic
block BB do
old_in =
in(BB);
out(BB) =
U{in(Y): for
all
successors Y 592
DU/UD
Chains
• Convenient way to access/use reaching definition
information.
• Def-Use chains (DU chains)
– Given a def, what are all the possible consumers of the
definition produced
• Use-Def chains (UD chains)
– Given a use, what are all the possible producers of the
definition consumed
593
Example:
DU/UD Chains
1: r1 = MEM[r2+0]
2: r2 = r2 + 1 DU Chain of r1:
3: r3 = r1 * r4 (1) -> 3,4
(4) ->5
DU Chain of r3:
(3) -> 11
4: r1 = r1 + 5 7: r7 = r6 (5) -> 11
5: r3 = r5 – r1 8: r2 = 0 (12) ->
6: r7 = r3 * 2 9: r7 = r7 + 1
UD Chain of r1:
10: r8 = r7 + 5 (12) -> 11
11: r1 = r3 – r8
12: r3 = r1 * 2 UD Chain of r7:
(10) -> 6,9 594
Some-things to
Think About
• Liveness and Reaching definitions are basically the same
thing!
– All dataflow is basically the same with a few parameters
• Meaning of gen/kill (use/def)
• Backward / Forward
• All paths / some paths (must/may)
– So far, we have looked at may analysis algorithms
– How do you adjust to do must algorithms?
• Dataflow can be slow
– How to implement it efficiently?
– How to represent the info?
595
Generalizing
Dataflow Analysis
• Transfer function
– How information is changed by BB
out[BB] = gen[BB] + (in[BB] – kill[BB]) forward
analysis
in[BB] = gen[BB] + (out[BB] – kill[BB]) backward
analysis
• Meet/Confluence function
– How information from multiple paths is combined
in[BB] = U out[P] : P is pred of BB forward analysis
out[BB] = U in[P] : P is succ of BB backward
analysis
596
Generalized Dataflow
Algorithm
change = true;
while
(change)
change = false;
for each BB
apply meet
function
apply
transfer
function
if any 597
Example: Liveness by
upward exposed uses
for each basic block BB, do
gen[BB]
kill[BB]
1,2 reach 3: r4 =
1,2 available 4
4: r6 =
8 1,3,4 reach
1,3,4 available
5: r6 = r2
+ r3 6: 1,2,3,4 reach
r7 = r4 – r5 1 available
601
Available Definition
Analysis (Adefs)
• A definition d is available at a point p if along all paths
from d to p, d is not killed
• Remember, a definition of a variable is killed between 2
points when there is another definition of that variable
along the path
– r1 = r2 + r3 kills previous definitions of r1
• Algorithm:
– Forward dataflow analysis as propagation occurs from
defs downwards
– Use the Intersect function as the meet operator to
guarantee the all-path requirement
– gen/kill/in/out similar to reaching defs
• Initialization of in/out is the tricky part
602
Compute Adef
gen/kill Sets
for each basic block BB do
gen(BB) = ; kill(BB) = ;
for each statement(d: x := y opz) in sequential order in BB, do
kill(BB) = kill(BB) U G[x];
G[x] = d;
endfor
gen(BB) = U G[x]: for all id x
endfor
Exac
tly
the 603
Compute Adef
in/out Sets
U = universal set of all definitions in the prog
in(0) = 0; out(0) = gen(0)
for each basic block BB, (BB != 0), do
in(BB) = 0; out(BB) = U – kill(BB)
change = true
while (change) do
change = false
for each basic block BB, do
old_out = out(BB)
in(BB) = out(Y) : for all predecessors Y
of BB
if (old_out
out(BB) != out(X))
= GEN(X) then change
+ (IN(X) = true
– KILL(X))
endfor
endfor
604
Available Expression
Analysis (Aexprs)
• An expression is a RHS of an operation
– Ex: in “rβ = rγ + r4” “rγ + r4” is an expression
• An expression e is available at a point p if along all paths
from e to p, e is not killed.
• An expression is killed between two points when one of its
source operands are redefined
– Ex: “r1 = rβ + rγ” kills all expressions involving r1
• Algorithm:
– Forward dataflow analysis
– Use the Intersect function as the meet operator
to guarantee the all-path requirement
– Looks exactly like adefs, except gen/kill/in/out are
the
RHS’s of operations rather than the LHS’s 605
Available
Expression
• Input: A flow graph with e_kill[B] and e_gen[B]
• Output: in[B] and out[B]
• Method:
foreach basic block B
in[B1] := ; out[B1] :=
e_gen[B1]; out[B] =U - e_kill[B];
change=true
while(change)
chang
e=false
;
for
each
basic 606
Efficient Calculation
of Dataflow
• Order in which the basic blocks are visited is important
(faster convergence)
• Forward analysis – DFS order
– Visit a node only when all its predecessors have been
visited
• Backward analysis – Post DFS order
– Visit a node only when all of its successors have been
visited
607
Representing Dataflow
Information
• Requirements – Efficiency!
– Large amount of information to store
– Fast access/manipulation
• Bitvectors
– General strategy used by most compilers
– Bit positions represent defs (rdefs)
– Efficient set operations: union/intersect/isone
– Used for gen, kill, in, out for each BB
608
Optimization using
Dataflow
• Classes of optimization
1. Classical (machine independent)
• Reducing operation count (redundancy
elimination)
• Simplifying operations
2. Machine specific
• Peephole optimizations
• Take advantage of specialized
hardware features
3. Instruction Level Parallelism (ILP)
enhancing
• Increasing parallelism
• Possibly increase instructions 609
Types of Classical
Optimizations
• Operation-level – One operation in isolation
– Constant folding, strength reduction
– Dead code elimination (global, but 1 op at a time)
• Local – Pairs of operations in same BB
– May or may not use dataflow analysis
• Global – Again pairs of operations
– Pairs of operations in different BBs
• Loop – Body of a loop
610
Constant
•
Folding
Simplify operation based on values of target operand
– Constant propagation creates opportunities for this
• All constant operands
– Evaluate the op, replace with a move
• r1 = 3 * 4 r1 = 12
• r1 = 3 / 0 ??? Don’t evaluate excepting
ops !,
what about FP?
– Evaluate conditional branch, replace with BRU or noop
• if (1 >
< 2) goto BB2 convert
goto BB2to a noop Dead
code
• Algebraic identities
– r1 = r2 + 0, r2 – 0, r2 | 0, r2 ^ 0, r2 << 0, r2 >> 0 r1 =
r2
– r1 = 0 * r2, 0 / r2, 0 & r2 r1 = 0
– r1 = r2 * 1, r2 / 1 r1 = r2 611
Strength
•
Reduction
Replace expensive ops with cheaper ones
– Constant propagation creates opportunities for this
• Power of 2 constants
– Mult by power of 2: r1 = r2 * 8 r1 = r2 << 3
– Div by power of 2: r1 = r2 / 4 r1 = r2 >>
2
– Rem by power of 2: r1 = r2 % 16 r1 = r2 & 15
• More exoticmultiply by constant by sequence of shift and
– Replace
adds/subs
• r1 = r2 * 6
– r100 = r2 << 2; r101 = r2 << 1; r1 = r100 +
r101
• r1 = r2 * 7
– r100 = r2 << 3; r1 = r100 – r2 612
Dead Code
Elimination
• Remove statement d: x := y op z whose result is never
consumed.
• Rules:
– DU chain for d is empty
– y and z are not live at d
613
Constant
Propagation
• Forward propagation of moves/assignment of the form
d:rx := L where L is literal
614
Forward Copy
Propagation
• Forward propagation of ‘ H “ of assignment or mov’s.
r1 := r2 r1 := r2
. .
. .
. .
r4 := r1 + 1 r4 := r2 + 1
– Reduce chain of dependency
– Possibly create dead code
615
Forward Copy
Propagation
• Rules:
Statement dS is source of copy propagation
Statement dT is target of copy propagation
– dS is a mov statement
– src(dS) is a register
– dT uses dest(dS)
– dS is available definition at dT
– src(dS) is a available expression at dT
616
Backward Copy
• Propagation
Backward propagation of LHS of an assignment.
dT: r1 := r2 + r3 r4 := r2 + r3
r5 := r1 + r6 r5 := r4 + r6
dS: r4 := r1 Dead Code
• Rules:
– dT and dS are in the same basic block
– dest(dT) is register
– dest(dT) is not live in out[B]
– dest(dS) is a register
– dS uses dest(dT)
– dest(dS) not used between dT and dS
– dest(dS) not defined between d1 and dS
– There is no use of dest(dT) after the first definition of
dest(dS)
617
Local Common Sub-
Expression Elimination
• Benefits:
– Reduced computation
– Generates mov statements,
which can get copy dS: r1 := r2 + r3
propagated
• Rules: dT: r4 := r2 + r3
– dS and dT has the same
expression
– src(dS) == src(dT) for all dS: r1 := r2 + r3
sources r100 := r1
– For all sources x, x is not
redefined between dS and dT
dT: r4 := r100
618
Global Common Sub-
Expression Elimination
• Rules:
– dS and dT has the same expression
– src(dS) == src(dT) for all sources of dS and dT
– Expression of dS is available at dT
619
Unreachable Code
Elimination