Lecture 11
Lecture 11
SMD163
Understanding Optimization:
Lecture 11: Introduction to optimization
1 2
In most cases, optimization has two principal No optimizing compiler can make up for poor
goals: choice of algorithms or data structures.
! Time: make programs run faster;
! Space: make programs use less memory.
3 4
Optimization is not Absolute: Optimization is not Free:
Some optimization techniques give clear wins in Some optimizations only apply in particular
both time and space. situations, and require time-consuming analysis
of the program.
Others may require us to trade one against the
other. Use of such optimizations is only justified for
programs that will be run often or for a long
The priorities that we attach to different time.
optimizations will depend on the application.
! In embedded systems, memory is often limited, and Optimization is appropriate in the construction
speed may be less of an issue (e.g., a VCR). and testing of products before wide
! In high performance systems, execution speed is release/distribution.
critical (e.g., video games).
5 6
Then Comp will compile any program that goes into an We will focus instead on techniques for
infinite loop without any output to the following, easily improving programs.
recognizable loop:
lab : jmp lab
But, following common usage, we will still refer
This is impossible, because a program that did this to each one as an “optimization.”
would be able to solve the halting problem. 7 8
Optimization by Transformation: Correctness is Essential!
Optimizations are program transformations. In all cases, it is essential that optimization
preserves meaning: the optimized program
Most apply only in particular circumstances. must have the same meaning/behavior as the
The effectiveness of an optimization depends on original program.
the program in which it is applied. Such transformations are often described as
! Optimization of a particular language feature will being safe. Better safe than sorry ... if an
have no impact on a program that does not use it. optimization isn’t safe, you shouldn't use it!
In some cases, an “optimization” may actually result
!
A slow program that gives the right answer is
in a slower or bigger program.
better than a program that gives the wrong
answer quickly.
9 10
If we can ensure that the values of x and y do not So this is only an optimization if we can
change on each iteration, then we can “optimize” it to: be sure that the loop will be executed at
z = x/y;
least once.
for (int i=0; i<N; i++) {
…z…
}
For example, some of the most common int f(int x) { int f(int x) {
optimization techniques try to remove: return x+1; return x+1;
! Code that serves no useful purpose; … }
! Code that repeats earlier computations; }
! Code that uses an inefficient method to calculate a
value; ! Code that appears in functions that are never called
! Code that carries an unnecessary overhead; can be eliminated. (This process is sometimes
! Etc… described as “tree-shaking”.)
19 20
Continued … But be Careful!
Code that has no effect can be eliminated. Items that have an effect cannot be eliminated:
! An assignment to a variable that will not be used ! An assignment to a variable that will not be used
again can be eliminated. again can be eliminated.
int f(int x) { int f(int x) { int f(int x) { int f(int x) {
int temp = x*x; return x+1; int temp = g(x); g(x);
return x+1; } return x+1; return x+1;
} } }
! An assignment to a variable that will be overwritten ! An assignment to a variable that will be overwritten
before it is used again can be eliminated. before it is used again can be eliminated.
x = y; x = z; x = f1()+f2(); f1();
x = z; x = z; f2();
x = z;
21 22
x = a + b; x = a + b; x = y; x = y;
y = a + b; y = x; z = x; z = y;
x = (a+b)*(a+b); t = a + b; x = 0; x = 0;
x = t * t;
z = x; z = 0;
But beware side effects!
What have we gained? Nothing directly, but if
x = f(a)*f(a); t = f(a); we manage to remove all references to x, then
x = t * t; the first assignment will become dead code.
23 24
Constant-Folding: Strength-Reduction:
“Do not put off until run-time what you can do Replace expensive operations with cheaper, but
at compile-time.” equivalent ones.
For example:
x = 2^8 – 1; x = 255;
x**2 = x*x
More generally: evaluate expressions involving 2*x = x+x
only constant values at compile-time. x/2 = x >> 1
x * 16 = x << 4
x % 128 = x & 127
25 26
x+y = y+x
a = b + c;
(But don’t forget the role that commutativity
played in our register allocator). t = a + d;
29 30
41 42
Other forms of instruction, for example, goto, In practice, three address code is often
unary operations, conditional branches, etc. are represented or described by quadruples
also used in practice. 51
(quads), of the form (op, dest, arg1, arg2). 52
Finding Basic Blocks: An Example:
A basic block is a sequence of statements where control Consider the following implementation of quicksort:
only enters at the first statement and leaves at the last.
void quicksort(int m, int n) { // sorts global array a
if (m<n) {
To partition a sequence of instructions into basic blocks,
int i = m - 1, j = n, p = a[n], t;
start by identifying the leaders: while (1) {
! The first statement is a leader. do { i = i + 1; } while (a[i] < p);
! An statement that is the target of a call or a goto (conditional or do { j = j - 1; } while (a[j] > p);
unconditional) is a leader. if (i >= j) break;
! Any statement that immediately follows a call or a goto is a t = a[i]; a[i] = a[j]; a[j] = t;
leader. }
t = a[i]; a[i] = a[n]; a[n] = t;
For each leader, there is a basic block consisting of the quicksort(m,j); quicksort(i+1,n);
leader and all following statements up to, but not }
}
a b c d B3
0 0 0 0
j := j-1 B6
t4 := 4*j
1) a := b + c a1 := b0 + c0 t6 := 4*i t5 := a[t4]
1 0 0 0 B5 t := a[t6]
t11 := 4*i
if t5>p goto B3 t := a[t11]
2) b := a - d b2 := a1 - d0 t7 := 4*i
1 2 0 0 t12 := 4*i
t8 := 4*j t13 := 4*n
3) c := b + c c3 := b2 + c0 t9 := a[t8] if i>=j goto B6
1 2 3 0 t14 := a[t13]
a[t7] := t9 a[t12] := t14
4) d := a - d d4 := a1 - d0 t10 := 4*j B4 t15 := 4*n
1 2 3 4 a[t10] := t a[t15] := t
goto B2
61 62
Example: B1 i := m-1
j := n
Local CSE on
blocks B5 and B6
Example: B1 i := m-1
j := n
Local CSE on
blocks B5 and B6
t1 := 4*n t1 := 4*n
p := a[t1] p := a[t1]
B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2
B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := 4*i t5 := a[t4] t6 := 4*i t5 := a[t4]
B5 t := a[t6]
t11 := 4*i B5 t := a[t6]
t11 := 4*i
if t5>p goto B3 t := a[t11] if t5>p goto B3 t := a[t11]
t7 := 4*i t12 := 4*i t7 := t6 t12 := t11
t8 := 4*j t13 := 4*n t8 := 4*j t13 := 4*n
t9 := a[t8] if i>=j goto B6 t14 := a[t13] t9 := a[t8] if i>=j goto B6 t14 := a[t13]
a[t7] := t9 a[t12] := t14 a[t7] := t9 a[t12] := t14
t10 := 4*j B4 t15 := 4*n t10 := t8 B4 t15 := t13
a[t10] := t a[t15] := t a[t10] := t a[t15] := t
goto B2 goto B2
63 64
Example: B1 i := m-1
j := n
Copy
Propagation for
Example: B1 i := m-1
j := n
Copy
Propagation for
t1 := 4*n t1 := 4*n
p := a[t1]
t7, t10, t12, t15 p := a[t1]
t7, t10, t12, t15
B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2
B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := 4*i t5 := a[t4] t6 := 4*i t5 := a[t4]
B5 t := a[t6]
t11 := 4*i B5 t := a[t6]
t11 := 4*i
if t5>p goto B3 t := a[t11] if t5>p goto B3 t := a[t11]
t7 := t6 t12 := t11 t7 := t6 t12 := t11
t8 := 4*j t13 := 4*n t8 := 4*j t13 := 4*n
t9 := a[t8] if i>=j goto B6 t14 := a[t13] t9 := a[t8] if i>=j goto B6 t14 := a[t13]
a[t7] := t9 a[t12] := t14 a[t6] := t9 a[t11] := t14
t10 := t8 B4 t15 := t13 t10 := t8 B4 t15 := t13
a[t10] := t a[t15] := t a[t8] := t a[t13] := t
goto B2 goto B2
65 66
Example: B1 i := m-1
j := n
Dead Code
elimination for
Example: B1 i := m-1
j := n
Dead Code
elimination for
t1 := 4*n t1 := 4*n
p := a[t1]
t7, t10, t12, t15 p := a[t1]
t7, t10, t12, t15
B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2
B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := 4*i t5 := a[t4] t6 := 4*i t5 := a[t4]
B5 t := a[t6]
t11 := 4*i B5 t := a[t6]
t11 := 4*i
if t5>p goto B3 t := a[t11] if t5>p goto B3 t := a[t11]
t7 := t6 t12 := t11
t8 := 4*j t13 := 4*n t8 := 4*j t13 := 4*n
t9 := a[t8] if i>=j goto B6 t14 := a[t13] t9 := a[t8] if i>=j goto B6 t14 := a[t13]
a[t6] := t9 a[t11] := t14 a[t6] := t9 a[t11] := t14
t10 := t8 B4 t15 := t13 B4
a[t8] := t a[t13] := t a[t8] := t a[t13] := t
goto B2 goto B2
67 68
Example: B1 i := m-1
j := n
Global CSE:
Example: B1 i := m-1
j := n
Global CSE:
t1 := 4*n t1 := 4*n
p := a[t1] p := a[t1]
B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2
B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := 4*i t5 := a[t4] t6 := t2 t5 := a[t4]
B5 t := a[t6]
t11 := 4*i B5 t := a[t6]
t11 := t2
if t5>p goto B3 t := a[t11] if t5>p goto B3 t := a[t11]
t8 := 4*j t13 := 4*n t8 := t4 t13 := t1
t9 := a[t8] if i>=j goto B6 t14 := a[t13] t9 := a[t8] if i>=j goto B6 t14 := a[t13]
a[t6] := t9 a[t11] := t14 a[t6] := t9 a[t11] := t14
B4 B4
a[t8] := t a[t13] := t a[t8] := t a[t13] := t
goto B2 goto B2
69 70
Example: B1 i := m-1
j := n
Copy
Propagation on
Example: B1 i := m-1
j := n
Copy
Propagation on
t1 := 4*n t1 := 4*n
p := a[t1]
t6, t8, t11, t13 p := a[t1]
t6, t8, t11, t13
B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2
B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := t2 t5 := a[t4] t6 := t2 t5 := a[t4]
B5 t := a[t6]
t11 := t2 B5 t := a[t2]
t11 := t2
if t5>p goto B3 t := a[t11] if t5>p goto B3 t := a[t2]
t8 := t4 t13 := t1 t8 := t4 t13 := t1
t9 := a[t8] if i>=j goto B6 t14 := a[t13] t9 := a[t4] if i>=j goto B6 t14 := a[t1]
a[t6] := t9 a[t11] := t14 a[t2] := t9 a[t2] := t14
B4 B4
a[t8] := t a[t13] := t a[t4] := t a[t1] := t
goto B2 goto B2
71 72
Example: B1 i := m-1
j := n
Dead Code
Elimination on
Example: B1 i := m-1
j := n
Dead Code
Elimination on
t1 := 4*n t1 := 4*n
p := a[t1]
t6, t8, t11, t13 p := a[t1]
t6, t8, t11, t13
B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2
B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := t2 t5 := a[t4] t5 := a[t4]
B5 t := a[t2]
t11 := t2 B5 t := a[t2]
if t5>p goto B3 t := a[t2] if t5>p goto B3 t := a[t2]
t8 := t4 t13 := t1
t9 := a[t4] if i>=j goto B6 t14 := a[t1] t9 := a[t4] if i>=j goto B6 t14 := a[t1]
a[t2] := t9 a[t2] := t14 a[t2] := t9 a[t2] := t14
B4 B4
a[t4] := t a[t1] := t a[t4] := t a[t1] := t
goto B2 goto B2
73 74
Example: B1 i := m-1
j := n
Global CSE
Example: B1 i := m-1
j := n
Global CSE
t1 := 4*n t1 := 4*n
p := a[t1] p := a[t1]
B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2
B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t5 := a[t4] t5 := a[t4]
B5 t := a[t2]
B5 t := t3
if t5>p goto B3 t := a[t2] if t5>p goto B3 t := t3
B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2
B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t5 := a[t4] t5 := a[t4]
B5 t := t3
B5 t := t3
if t5>p goto B3 t := t3 if t5>p goto B3 t := t3
Example: B1 i := m-1
j := n
Dead Code
Elimination on
Example: B1 i := m-1
j := n
Dead Code
Elimination on
t1 := 4*n t1 := 4*n
p := a[t1]
t, t9, t14 p := a[t1]
t, t9, t14
B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2
B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t5 := a[t4] t5 := a[t4]
B5 t := t3
B5
if t5>p goto B3 t := t3 if t5>p goto B3
Avoid duplicated computation and you duplicate Use algebraic identities to increase the size of
the savings instead! the loop invariant expressions. For example by
rewriting (x-z)+y as (x+y)-z, we might be able
We need to find loop invariants: expressions to extract x+y as an invariant expression.
that are guaranteed to have the same value on
each pass through the loop. But beware of aliasing! Expression a[i] is not
necessarily invariant just because a and i are.
83 84
Moving the Code: 2) Strength Reduction:
When we move the code, we might need to introduce Instead of using expensive multiplications, we can
new temporaries. obtain the same results using simple shifts, if one of the
operands is a constant power of 2.
For example, the C/C++ loop:
For example, in the quicksort code, we can replace t2 :=
for (i = 0; i < n * n; i++) 4*i and t4 := 4*j with the cheaper operations t2 := i << 2
... code which doesn't change n ... and t4 := j << 2 .
Can be rewritten as: The same idea can be used to simplify uses of other
t1 = n * n; expensive operations, including / and %.
for (i = 0; i < t1; i++)
... code which doesn't change n ... Strength reduction is however a much more general
technique, and more opportunities can be revealed if we
because the expression n*n is a loop invariant. are able to identify some induction variables.
85 86
B3
This is another form of strength reduction. j := j-1
t4 := 4*j
B6
t5 := a[t4]
B5
if t5>p goto B3
if i>=j goto B6
a[t2] := t5 a[t2] := p
B4
a[t4] := t3 a[t1] := t3
goto B2
89 90
B3 j := j-1
This code does N multiplications, and N
t4 := t4 - 4 additions. Multiplies are expensive: can we
B5 t5 := a[t4]
if t5>p goto B3 B6 eliminate them?
a[t2] := t5 a[t2] := p
a[t4] := t3 if i>=j goto B6 a[t1] := t3
goto B2
B4
91 92
Identifying Induction Variables: Differences of Differences:
As i takes on values 0, 1, 2, 3, 4, 5, 6, … We’ve seen that the difference from one value of (i*i) to
so i*i takes values 0, 1, 4, 9, 16, 25, 36, … the next is d = 2*i + 1.
Doesn’t our choice of intermediate code force us and in which the address calculation must be
to use the less efficient version? done explicitly …
! imull $4, %eax; movl a(%eax), %eax
95 96
i := m-1 i := m-1
Before: B1 j := n
t1 := 4*n
This is where we
would have got
After: B1 j := n
t1 := 4*n
Now we’ve used
an optimization
u1 := a + t1 to previously if u1 := a + t1 based on the
p := [u1]
we’d had only t2 := 4*i observation that
load instructions: u2 := a + t2 the addresses of
B2 i := i+1 u4 := u1
a[i] and a[j] are
t2 := 4*i v := [u] p := [u1]
induction
u2 := a + t2
t3 := [u2] and save B2 variables!
instructions: i := i+1
if t3<p goto B2 u2 := u2 + 4
[v] := u. t3 := [u2]
if t3<p goto B2
B3 j := j-1
t4 := 4*j
u4 := a + t4 B3 j := j-1
B5 t5 := [u4] B6 B5 u4 := u4 - 4
B6
t5 := [u4]
if t5>p goto B3
[u2] := t5 [u2] := t5 if t5>p goto B3
[u2] := p [u2] := p
[u4] := t3 [u1] := t3 [u4] := t3 [u1] := t3
goto B2 if i>=j goto B6 goto B2 if i>=j goto B6
B4 B4
97 98