0% found this document useful (0 votes)
7 views26 pages

Lecture 11

The document discusses compiler optimization, focusing on improving program performance in terms of execution speed and memory usage. It outlines various optimization techniques, their applicability, and the importance of correctness, emphasizing that optimizations should preserve program behavior. Additionally, it highlights the challenges and considerations in applying optimizations, such as trade-offs and potential pitfalls.

Uploaded by

Abdullah Opadeji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views26 pages

Lecture 11

The document discusses compiler optimization, focusing on improving program performance in terms of execution speed and memory usage. It outlines various optimization techniques, their applicability, and the importance of correctness, emphasizing that optimizations should preserve program behavior. Additionally, it highlights the challenges and considerations in applying optimizations, such as trade-offs and potential pitfalls.

Uploaded by

Abdullah Opadeji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Compiler Construction

SMD163
Understanding Optimization:
Lecture 11: Introduction to optimization

Viktor Leijon & Peter Jonsson with slides by Johan Nordlander


Contains material generously provided by Mark P. Jones

1 2

Goals of Optimization: Optimization is not Magic:


Optimization is about improving the target Optimizing compilers are just a tool; they work
programs that are generated by a compiler. with what they’re given.

In most cases, optimization has two principal No optimizing compiler can make up for poor
goals: choice of algorithms or data structures.
! Time: make programs run faster;
! Space: make programs use less memory.

Other applications of optimization include:


! Adapting code to particular architectures; etc…

3 4
Optimization is not Absolute: Optimization is not Free:
Some optimization techniques give clear wins in Some optimizations only apply in particular
both time and space. situations, and require time-consuming analysis
of the program.
Others may require us to trade one against the
other. Use of such optimizations is only justified for
programs that will be run often or for a long
The priorities that we attach to different time.
optimizations will depend on the application.
! In embedded systems, memory is often limited, and Optimization is appropriate in the construction
speed may be less of an issue (e.g., a VCR). and testing of products before wide
! In high performance systems, execution speed is release/distribution.
critical (e.g., video games).
5 6

Optimization is a Misnomer: Terminology:


A compiler writer’s job will never be done; there Terms like “program optimization” and
are always opportunities for new optimizations. “optimizing compiler” are firmly established.
Proof: Suppose that there is an optimizing compiler
Comp that can optimize any program to the shortest But we cannot build a truly optimizing compiler!
possible equivalent program.

Then Comp will compile any program that goes into an We will focus instead on techniques for
infinite loop without any output to the following, easily improving programs.
recognizable loop:
lab : jmp lab
But, following common usage, we will still refer
This is impossible, because a program that did this to each one as an “optimization.”
would be able to solve the halting problem. 7 8
Optimization by Transformation: Correctness is Essential!
Optimizations are program transformations. In all cases, it is essential that optimization
preserves meaning: the optimized program
Most apply only in particular circumstances. must have the same meaning/behavior as the
The effectiveness of an optimization depends on original program.
the program in which it is applied. Such transformations are often described as
! Optimization of a particular language feature will being safe. Better safe than sorry ... if an
have no impact on a program that does not use it. optimization isn’t safe, you shouldn't use it!
In some cases, an “optimization” may actually result
!
A slow program that gives the right answer is
in a slower or bigger program.
better than a program that gives the wrong
answer quickly.
9 10

An Example: Take Care! (part one)


Suppose that we have a loop: If N=0, then the “optimized” code will
for (int i=0; i<N; i++) {
evaluate x/y once, but the original won’t
… x/y …
evaluate it at all!
}

If we can ensure that the values of x and y do not So this is only an optimization if we can
change on each iteration, then we can “optimize” it to: be sure that the loop will be executed at
z = x/y;
least once.
for (int i=0; i<N; i++) {
…z…
}

This is an example of code motion.


11 12
Take Care! (part two) Caveat Optimizer:
If N=0, then the “optimized” program Optimizations can be quite subtle, and may
might raise an divide by zero exception require detailed analysis of a program to
where the original runs without fault. determine:
! whether they are applicable;
So the optimization is applicable only if: ! whether they are safe; and
! whether they will actually improve the program.
! We know that y will never be zero; or
! We know that the loop will always be
executed at least once, and that there are no In general, these questions are undecidable.
other observable effects in the code between
the new and old positions of x/y. But we always have the option not to use a
given optimization!
13 14

Combining Optimizations: Controlling Optimization:


The task of an optimizing compiler is to choose Compilers often allow programmers to control
and apply an appropriate sequence of the use of optimization techniques:
optimizations:
o1 o2 o3 o4 o5 o6 o7 ! To set priorities (for example, time or space?);
P0 P1 P2 P3 P4 P5 P6 P7
! To select/deselect particular kinds of optimization;
Applying one optimization may create new
opportunities to apply another; the order in ! To limit the time or the number of steps that are
which they are applied can make a difference. spent in optimization.

If intermediate steps preserve behavior, each


program will be equivalent to the original.
15 16
Optimization by Hand:
Programmers sometimes have an opportunity to
optimize their code by hand.

Beware: A Catalogue of Common


! It’s difficult to get right, it can obscure the code, it
can make it less portable, and it can introduce bugs.
Optimization Techniques:
! It’s hard to compete with a good optimizing compiler.

If performance is critical and you need to


optimize by hand:
! Wait until the program is almost finished;
! Use a profiler to identify the “hot spots”.
17 18

An Overview: Dead Code Elimination:


Compiler writers and researchers have Unreachable code can be eliminated.
discovered many different techniques for ! Code that follows a return, break, continue, or goto
optimization. and has no label can be eliminated.

For example, some of the most common int f(int x) { int f(int x) {
optimization techniques try to remove: return x+1; return x+1;
! Code that serves no useful purpose; … }
! Code that repeats earlier computations; }
! Code that uses an inefficient method to calculate a
value; ! Code that appears in functions that are never called
! Code that carries an unnecessary overhead; can be eliminated. (This process is sometimes
! Etc… described as “tree-shaking”.)
19 20
Continued … But be Careful!
Code that has no effect can be eliminated. Items that have an effect cannot be eliminated:
! An assignment to a variable that will not be used ! An assignment to a variable that will not be used
again can be eliminated. again can be eliminated.
int f(int x) { int f(int x) { int f(int x) { int f(int x) {
int temp = x*x; return x+1; int temp = g(x); g(x);
return x+1; } return x+1; return x+1;
} } }
! An assignment to a variable that will be overwritten ! An assignment to a variable that will be overwritten
before it is used again can be eliminated. before it is used again can be eliminated.
x = y; x = z; x = f1()+f2(); f1();
x = z; x = z; f2();
x = z;
21 22

Common-Subexpression Elimination: Copy and Constant Propagation:


The results of computations can be shared An assignment of the form x = y; is called a
rather than duplicated. copy instruction.

x = a + b; x = a + b; x = y; x = y;
y = a + b; y = x; z = x; z = y;
x = (a+b)*(a+b); t = a + b; x = 0; x = 0;
x = t * t;
z = x; z = 0;
But beware side effects!
What have we gained? Nothing directly, but if
x = f(a)*f(a); t = f(a); we manage to remove all references to x, then
x = t * t; the first assignment will become dead code.
23 24
Constant-Folding: Strength-Reduction:
“Do not put off until run-time what you can do Replace expensive operations with cheaper, but
at compile-time.” equivalent ones.

For example:
x = 2^8 – 1; x = 255;
x**2 = x*x
More generally: evaluate expressions involving 2*x = x+x
only constant values at compile-time. x/2 = x >> 1
x * 16 = x << 4
x % 128 = x & 127

25 26

Algebraic Identities: Who writes x+0 in source code?


Standard algebraic identities can often be put to How many programmers would actually write an
good use when only part of a program’s data is expression like x + 0 in their source code?
known at compile-time.
Are these optimizations of any use in practice?
For example:
x+0 = x Yes!
x–0 = x ! Examples like this can occur in handwritten programs
when symbolic constants are used.
x–x = 0
! Examples like this can show up in the code that we
x*1 = x generate for other language constructs. For
x*0 = 0 example, the address of a[0] is:
x/1 = x a + 4*0 = a + 0 = a.
27 28
Continued… Enabling Transformations:
At first, some identities might not seem to have Use of associativity and commutativity laws can
any significant applications: open up opportunities for other optimizations.
Examples include associativity:
(x + y) + z = x + (y + z) a = b+c; a = b + c;
t = (c+d)+b; t = (b+c) + d;
and commutativity:

x+y = y+x
a = b + c;
(But don’t forget the role that commutativity
played in our register allocator). t = a + d;

29 30

Another Example: Identities for Floating Point:


Suppose that d is an array of Date objects: Floating point numbers do not behave like real
numbers … floating point operators do not
tag day month year satisfy many useful laws.
! Associativity?
Now suppose that we want to access small+(big+(-big)) = small + 0 = small, but
d[3].month; then we need to load the value at (small+big)+(-big) = big+(-big) = 0.
address:
! Additive Identities?
(d + 16*3) + 8 NaN + 0 raises an exception, NaN does not.
= d + (16*3 + 8) associativity
! Multiplicative zeroes?
= d + 56. constant folding
inf * 0 = NaN.
31 32
“What the Language Permits”: Removing Overhead:
Even for integer values, algebraic identities can When we call a function, we spend some time
only be used within whatever scope the host constructing and then destroying the stack
language permits. frame.

For example, the definition of Fortran 77 states


that the order of evaluation of expressions When we execute a loop, we spend some time
involving parentheses must respect the setting up and testing the loop variable.
parentheses.
If the body of the function, or the body of the
(So Fortran 77 is a language in which loop is small, then the overhead associated with
parenthesized expressions would show up in the either of these will be quite large in proportion.
abstract syntax.)
33 34

Function Inlining: Cautionary Notes:


If we know the body of a function, we can use Inlining a large function many times can
that instead of a call. increase the size of the compiled program.

Naïve inlining—by copying text—can increase


For example, suppose we know the amount of work to be done. For example,
int square(int x) { return x*x; }
changing:
… square(square(f(x))) …
Then we can rewrite a section of code:
to
{ … square(square(x)) … }
… (f(x)*f(x)) * (f(x)*f(x)) …
as:
will require 3 multiplications, not 2, and will
{ … int t1 = x*x; t2 = t1*t1; … t2 … }
duplicate any side-effect of f(x).
35 36
Loop Unrolling: Peephole Optimization:
For example, we can rewrite a section of code: It is often possible to implement useful
for (int i=0; i<3; i++) { optimizations by looking for simple patterns in
f(i); small sections of generated assembly code.
}
as: “Looking at the code through a peephole.”
f(0);
f(1);
To a large extent, the choice of peephole
f(2);
optimizations depends on the target machine.
Typically produces more code …
But faster because we have eliminated a
temporary variable, and all the operations on it.
37 38

Examples for IA 32: Summary:


An instruction of the form addl $1,reg can be replaced We have looked at:
by incl reg.
The basic goals – and limits – of optimization.
An instruction of the form imul $2,reg can be replaced A catalogue of standard optimization techniques:
by addl reg,reg.
! Dead-code elimination;
! Common subexpression elimination;
In a sequence of instructions: ! Constant and copy propagation;
movl reg,var ! Constant folding;
movl var,reg
! Strength reduction;
the second instruction can be deleted, provided that it
! Algebraic identities;
does not have a label.
! Function inlining;
In a sequence of instructions: ! Loop unrolling;
! Peephole optimization.
addl $4,%esp
movl %ebp, %esp
the first instruction is dead code. 39
Next: Putting these techniques into practice … 40
Source High, Target Low:
Source code is often too high-level to reveal
opportunities for optimization.
For example, the assignment a[i] = x + y requires an
Optimization using an
!

(implicit) calculation of the address of the array


element a[i].
Intermediate Language:
Target code is often too low-level to reveal
opportunities for optimization.
! For example, temporary values have already been
assigned to registers and it is more difficult to
identity repeated or redundant computations.

41 42

Code Generation Example: Breaking Down Programs:


Here’s part of a MiniJava C_makeArray: The constructs of a language, and the nature of
program and the code that we
might generate from it:
pushl %ebp
the problem that is being solved, will lead a
movl %esp,%ebp
subl $4,%esp programmer to break down a program into a
class C { pushl $800 particular sequence of tasks.
… call _malloc
addl $4,%esp
int[] makeArray(int n) { movl %eax,-4(%ebp) The output from a compiler is supposed to
int[] a;
a = new int[200];
movl 8(%ebp),%eax
movl -4(%ebp),%ebx
execute the same sequence of tasks.
a[3] = n;
movl %eax,12(%ebx)
return a;
movl -4(%ebp),%eax
}
movl %ebp,%esp
There is no reason, however, for it to use
}
popl %ebp exactly the same break down as the
There is an independent sequence ret programmer.
of instructions for each statement … 43 44
Code Generation Example: Intermediate Code:
The same program broken C_makeArray:
down into “basic blocks” pushl %ebp Intermediate code provides a compromise between the
movl %esp,%ebp extremes of source and target code. It aims to be:
class C { subl $4,%esp
! Sufficiently low-level to capture single steps in the program.
… pushl $800
int[] makeArray(int n) { call _malloc ! Sufficiently high-level to avoid machine dependencies, and
int[] a; addl $4,%esp premature code generation.
a = new int[200]; movl %eax,-4(%ebp)
a[3] = n;
movl 8(%ebp),%eax Intermediate codes are usually some kind of idealized
return a;
}
movl -4(%ebp),%ebx machine code.
movl %eax,12(%ebx)
}
movl -4(%ebp),%eax As a useful side-benefit, intermediate code provides a
The correspondence is less
direct … but there are more
movl %ebp,%esp degree of portability (e.g., RTL in gcc, Java bytecodes
opportunities for optimization.
popl %ebp
and the JVM).
ret

But working at the level of 386 assembly code is difficult! 45 46

A High-Level View: UNCOL …The Search Goes On:


Flat input UNCOL is an old (1958), and as yet unrealized dream of
compiler writers:
UNiversal Computer Oriented Language
Structure It was hoped that a universal intermediate code could
serve as a meeting point for all languages.
One front end for each language, one backend for each
machine, and smooth interoperability between
Intermediate Optimizations languages.
Code
But no satisfactory UNCOL has been found yet; The
range of programming languages is very diverse!
There have been numerous attempts, some ongoing:
ANDF, C, JVM, UVM, C--, …
Flat output 47 48
Three-Address Code: For Example:
Three-address code: a simple UNCOL? In three address code, the statement
a[i] = a[i]+a[j]
Three address code is primarily a sequence of
statements of the general form: can be expressed as:
t1 := 4 * i
x := y op z
t2 := a[t1]
where x, y and z are names, constants or t3 := 4 * j
compiler generated temporaries.
t4 := a[t3]
For example, evaluation of x+y*z becomes:
t5 := t2 + t4 The calculation of
t1 := y * z t6 := 4 * i 4*i is duplicated!
t2 := x + t1 a[t6] := t5
49 50

Intermediate Code as Trees: Quads:


Three address code is really just a linear The statement a = b * (-c) + b * (-c) is
representation of the syntax tree for the represented by the following three-address
intermediate code. code:
t2:+ 0) t1 := -c (uminus, t1, c, _)
1) t2 := b * t1 (mult, t2, b, t1)
2) t3 := -c (uminus, t3, b, _)
x t1:*
3) t4 := b * t3 (mult, t4, b, t3)
4) t5 := t2 + t4 (add, t5, t2, t4)
y z 5) a := t5 (save, a, t5, _)

Other forms of instruction, for example, goto, In practice, three address code is often
unary operations, conditional branches, etc. are represented or described by quadruples
also used in practice. 51
(quads), of the form (op, dest, arg1, arg2). 52
Finding Basic Blocks: An Example:
A basic block is a sequence of statements where control Consider the following implementation of quicksort:
only enters at the first statement and leaves at the last.
void quicksort(int m, int n) { // sorts global array a
if (m<n) {
To partition a sequence of instructions into basic blocks,
int i = m - 1, j = n, p = a[n], t;
start by identifying the leaders: while (1) {
! The first statement is a leader. do { i = i + 1; } while (a[i] < p);
! An statement that is the target of a call or a goto (conditional or do { j = j - 1; } while (a[j] > p);
unconditional) is a leader. if (i >= j) break;
! Any statement that immediately follows a call or a goto is a t = a[i]; a[i] = a[j]; a[j] = t;
leader. }
t = a[i]; a[i] = a[n]; a[n] = t;

For each leader, there is a basic block consisting of the quicksort(m,j); quicksort(i+1,n);
leader and all following statements up to, but not }
}

including, the next leader or the end of the program.


53
We will focus on optimizing the highlighted section. 54

In Three-Address code … Flow Graphs:


Translation of the core of quicksort into 3-address code: We add directed edges between basic blocks to
1) i := m-1
2) j := n
16) t7 := 4*i
17) t8 := 4*j
capture control flow.
3) t1 := 4*n 18) t9 := a[t8]
4) p := a[t1] 19) a[t7] := t9
5) i := i+1
6) t2 := 4*i
20) t10 := 4*j
21) a[t10] := t One basic block is distinguished as initial; it
7) t3 := a[t2]
8) if t3<p goto 5
22) goto 5
23) t11 := 4*i
contains the first statement to be executed…
9) j := j-1 24) t := a[t11]
10) t4 := 4*j 25) t12 := 4*i
11) t5 := a[t4]
12) if t5>p goto 9
26) t13 := 4*n
27) t14 := a[t13]
For any pair of basic blocks B1 and B2, there is
13) if i>=j goto 23 28) a[t12] := t14 an edge from B1 to B2 if B2 can directly follow B1
14) t6 := 4*i 29) t15 := 4*n
15) t := a[t6] 30) a[t15] := t in some execution sequence.
There are six basic blocks: 1—4, 5—8, 9—12, 13,
14— 22, 23—30.
55 56
Example: B1 i := m-1
j := n Program = Flow Graph + Blocks:
t1 := 4*n
p := a[t1]
i := m-1 t6 := 4*i
j := n t := a[t6]
B2 i := i+1 t1 := 4*n t7 := 4*i
t2 := 4*i p := a[t1] t8 := 4*j
t3 := a[t2] B1 t9 := a[t8]
if t3<p goto B2 i := i+1 a[t7] := t9
t10 := 4*j
B3 B2 t2 := 4*i
a[t10] := t
t3 := a[t2]
j := j-1 B6 if t3<p goto B2 goto B2
t4 := 4*j
B5 t6 := 4*i t5 := a[t4] t11 := 4*i B3
t := a[t6] if t5>p goto B3 t11 := 4*i
t := a[t11] j := j-1
t7 := 4*i t4 := 4*j t := a[t11]
t12 := 4*i
t8 := 4*j t13 := 4*n B5 B4 B6 t5 := a[t4] t12 := 4*i
t9 := a[t8] if i>=j goto B6 if t5>p goto B3 t13 := 4*n
t14 := a[t13]
a[t7] := t9 t14 := a[t13]
a[t12] := t14
t10 := 4*j B4 t15 := 4*n
a[t12] := t14
a[t10] := t if i>=j goto B6 t15 := 4*n
a[t15] := t
goto B2 a[t15] := t
57 58

Optimizing Basic Block Code: Common Subexpression Elimination:


Our goal is to optimize programs expressed in this Consider the basic block:
format by transforming their basic blocks. 1) a := b + c
2) b := a - d
3) c := b + c
In general, there are two kinds of transformation that 4) d := a - d
we might want to use:
! Local transformations, which can be applied to individual basic The second and fourth statements calculate the same
blocks, regardless of where they appear in the flowgraph. value, so this block can be rewritten as:
! Global transformations, which typically make use of information 1) a := b + c
about larger sections of the flowgraph. 2) b := a - d
3) c := b + c
4) d := b
Many transformations can be performed at both the
local and global levels. Note that, even though the same expression, b + c,
appears on the right of both the first and third lines, it
Local transformations are usually performed first. does not have the same value in each case.
59 60
A More Careful Analysis: Example: B1 i := m-1
j := n
t1 := 4*n
To see this more formally, consider the p := a[t1]

following, annotating each variable on the right


B2 i := i+1
with the number of the step where it was last t2 := 4*i
defined: t3 := a[t2]
if t3<p goto B2

a b c d B3
0 0 0 0
j := j-1 B6
t4 := 4*j
1) a := b + c a1 := b0 + c0 t6 := 4*i t5 := a[t4]
1 0 0 0 B5 t := a[t6]
t11 := 4*i
if t5>p goto B3 t := a[t11]
2) b := a - d b2 := a1 - d0 t7 := 4*i
1 2 0 0 t12 := 4*i
t8 := 4*j t13 := 4*n
3) c := b + c c3 := b2 + c0 t9 := a[t8] if i>=j goto B6
1 2 3 0 t14 := a[t13]
a[t7] := t9 a[t12] := t14
4) d := a - d d4 := a1 - d0 t10 := 4*j B4 t15 := 4*n
1 2 3 4 a[t10] := t a[t15] := t
goto B2
61 62

Example: B1 i := m-1
j := n
Local CSE on
blocks B5 and B6
Example: B1 i := m-1
j := n
Local CSE on
blocks B5 and B6
t1 := 4*n t1 := 4*n
p := a[t1] p := a[t1]

B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2

B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := 4*i t5 := a[t4] t6 := 4*i t5 := a[t4]
B5 t := a[t6]
t11 := 4*i B5 t := a[t6]
t11 := 4*i
if t5>p goto B3 t := a[t11] if t5>p goto B3 t := a[t11]
t7 := 4*i t12 := 4*i t7 := t6 t12 := t11
t8 := 4*j t13 := 4*n t8 := 4*j t13 := 4*n
t9 := a[t8] if i>=j goto B6 t14 := a[t13] t9 := a[t8] if i>=j goto B6 t14 := a[t13]
a[t7] := t9 a[t12] := t14 a[t7] := t9 a[t12] := t14
t10 := 4*j B4 t15 := 4*n t10 := t8 B4 t15 := t13
a[t10] := t a[t15] := t a[t10] := t a[t15] := t
goto B2 goto B2
63 64
Example: B1 i := m-1
j := n
Copy
Propagation for
Example: B1 i := m-1
j := n
Copy
Propagation for
t1 := 4*n t1 := 4*n
p := a[t1]
t7, t10, t12, t15 p := a[t1]
t7, t10, t12, t15

B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2

B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := 4*i t5 := a[t4] t6 := 4*i t5 := a[t4]
B5 t := a[t6]
t11 := 4*i B5 t := a[t6]
t11 := 4*i
if t5>p goto B3 t := a[t11] if t5>p goto B3 t := a[t11]
t7 := t6 t12 := t11 t7 := t6 t12 := t11
t8 := 4*j t13 := 4*n t8 := 4*j t13 := 4*n
t9 := a[t8] if i>=j goto B6 t14 := a[t13] t9 := a[t8] if i>=j goto B6 t14 := a[t13]
a[t7] := t9 a[t12] := t14 a[t6] := t9 a[t11] := t14
t10 := t8 B4 t15 := t13 t10 := t8 B4 t15 := t13
a[t10] := t a[t15] := t a[t8] := t a[t13] := t
goto B2 goto B2
65 66

Example: B1 i := m-1
j := n
Dead Code
elimination for
Example: B1 i := m-1
j := n
Dead Code
elimination for
t1 := 4*n t1 := 4*n
p := a[t1]
t7, t10, t12, t15 p := a[t1]
t7, t10, t12, t15

B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2

B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := 4*i t5 := a[t4] t6 := 4*i t5 := a[t4]
B5 t := a[t6]
t11 := 4*i B5 t := a[t6]
t11 := 4*i
if t5>p goto B3 t := a[t11] if t5>p goto B3 t := a[t11]
t7 := t6 t12 := t11
t8 := 4*j t13 := 4*n t8 := 4*j t13 := 4*n
t9 := a[t8] if i>=j goto B6 t14 := a[t13] t9 := a[t8] if i>=j goto B6 t14 := a[t13]
a[t6] := t9 a[t11] := t14 a[t6] := t9 a[t11] := t14
t10 := t8 B4 t15 := t13 B4
a[t8] := t a[t13] := t a[t8] := t a[t13] := t
goto B2 goto B2
67 68
Example: B1 i := m-1
j := n
Global CSE:
Example: B1 i := m-1
j := n
Global CSE:
t1 := 4*n t1 := 4*n
p := a[t1] p := a[t1]

B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2

B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := 4*i t5 := a[t4] t6 := t2 t5 := a[t4]
B5 t := a[t6]
t11 := 4*i B5 t := a[t6]
t11 := t2
if t5>p goto B3 t := a[t11] if t5>p goto B3 t := a[t11]
t8 := 4*j t13 := 4*n t8 := t4 t13 := t1
t9 := a[t8] if i>=j goto B6 t14 := a[t13] t9 := a[t8] if i>=j goto B6 t14 := a[t13]
a[t6] := t9 a[t11] := t14 a[t6] := t9 a[t11] := t14
B4 B4
a[t8] := t a[t13] := t a[t8] := t a[t13] := t
goto B2 goto B2
69 70

Example: B1 i := m-1
j := n
Copy
Propagation on
Example: B1 i := m-1
j := n
Copy
Propagation on
t1 := 4*n t1 := 4*n
p := a[t1]
t6, t8, t11, t13 p := a[t1]
t6, t8, t11, t13

B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2

B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := t2 t5 := a[t4] t6 := t2 t5 := a[t4]
B5 t := a[t6]
t11 := t2 B5 t := a[t2]
t11 := t2
if t5>p goto B3 t := a[t11] if t5>p goto B3 t := a[t2]
t8 := t4 t13 := t1 t8 := t4 t13 := t1
t9 := a[t8] if i>=j goto B6 t14 := a[t13] t9 := a[t4] if i>=j goto B6 t14 := a[t1]
a[t6] := t9 a[t11] := t14 a[t2] := t9 a[t2] := t14
B4 B4
a[t8] := t a[t13] := t a[t4] := t a[t1] := t
goto B2 goto B2
71 72
Example: B1 i := m-1
j := n
Dead Code
Elimination on
Example: B1 i := m-1
j := n
Dead Code
Elimination on
t1 := 4*n t1 := 4*n
p := a[t1]
t6, t8, t11, t13 p := a[t1]
t6, t8, t11, t13

B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2

B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t6 := t2 t5 := a[t4] t5 := a[t4]
B5 t := a[t2]
t11 := t2 B5 t := a[t2]
if t5>p goto B3 t := a[t2] if t5>p goto B3 t := a[t2]
t8 := t4 t13 := t1
t9 := a[t4] if i>=j goto B6 t14 := a[t1] t9 := a[t4] if i>=j goto B6 t14 := a[t1]
a[t2] := t9 a[t2] := t14 a[t2] := t9 a[t2] := t14
B4 B4
a[t4] := t a[t1] := t a[t4] := t a[t1] := t
goto B2 goto B2
73 74

Example: B1 i := m-1
j := n
Global CSE
Example: B1 i := m-1
j := n
Global CSE
t1 := 4*n t1 := 4*n
p := a[t1] p := a[t1]

B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2

B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t5 := a[t4] t5 := a[t4]
B5 t := a[t2]
B5 t := t3
if t5>p goto B3 t := a[t2] if t5>p goto B3 t := t3

t9 := a[t4] if i>=j goto B6 t14 := a[t1] t9 := t5 if i>=j goto B6 t14 := p


a[t2] := t9 a[t2] := t14 a[t2] := t9 a[t2] := t14
B4 B4
a[t4] := t a[t1] := t a[t4] := t a[t1] := t
goto B2 goto B2
75 76
Example: B1 i := m-1
j := n
Copy
Propagation on
Example: B1 i := m-1
j := n
Copy
Propagation on
t1 := 4*n t1 := 4*n
p := a[t1]
t, t9, t14 p := a[t1]
t, t9, t14

B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2

B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t5 := a[t4] t5 := a[t4]
B5 t := t3
B5 t := t3
if t5>p goto B3 t := t3 if t5>p goto B3 t := t3

t9 := t5 if i>=j goto B6 t14 := p t9 := t5 if i>=j goto B6 t14 := p


a[t2] := t9 a[t2] := t14 a[t2] := t5 a[t2] := p
B4 B4
a[t4] := t a[t1] := t a[t4] := t3 a[t1] := t3
goto B2 goto B2
77 78

Example: B1 i := m-1
j := n
Dead Code
Elimination on
Example: B1 i := m-1
j := n
Dead Code
Elimination on
t1 := 4*n t1 := 4*n
p := a[t1]
t, t9, t14 p := a[t1]
t, t9, t14

B2 i := i+1 B2 i := i+1
t2 := 4*i BEFORE … t2 := 4*i … AFTER
t3 := a[t2] t3 := a[t2]
if t3<p goto B2 if t3<p goto B2

B3 B3
j := j-1 B6 j := j-1 B6
t4 := 4*j t4 := 4*j
t5 := a[t4] t5 := a[t4]
B5 t := t3
B5
if t5>p goto B3 t := t3 if t5>p goto B3

t9 := t5 if i>=j goto B6 t14 := p if i>=j goto B6


a[t2] := t5 a[t2] := p a[t2] := t5 a[t2] := p
B4 B4
a[t4] := t3 a[t1] := t3 a[t4] := t3 a[t1] := t3
goto B2 goto B2
79 80
Summary: Loop Optimization:
A fairly complex process, but described using Loops are an obvious source of repeated
simple steps, that are sequenced and repeated computation, and good candidates for optimization.
until we get a good result. The most important loop optimizations are:

1) Code motion: move loop-invariant code outside


What more can we do? the loop.

Where should we focus our efforts? 2) Strength reduction: replacing expensive


operations with cheaper ones.

3) Induction variables: recognize relationships


between the values of variables on each pass
through the body of a loop.
81 82

1) Code Motion: Loop Invariants:


If we can decrease the amount of code in the Start by looking for variables that do not
body of a loop, then we can also decrease the change … then extend to expressions. For
execution time for each iteration of the loop. example, if x and y are invariant, then so are
x+y and x*y.

Avoid duplicated computation and you duplicate Use algebraic identities to increase the size of
the savings instead! the loop invariant expressions. For example by
rewriting (x-z)+y as (x+y)-z, we might be able
We need to find loop invariants: expressions to extract x+y as an invariant expression.
that are guaranteed to have the same value on
each pass through the loop. But beware of aliasing! Expression a[i] is not
necessarily invariant just because a and i are.
83 84
Moving the Code: 2) Strength Reduction:
When we move the code, we might need to introduce Instead of using expensive multiplications, we can
new temporaries. obtain the same results using simple shifts, if one of the
operands is a constant power of 2.
For example, the C/C++ loop:
For example, in the quicksort code, we can replace t2 :=
for (i = 0; i < n * n; i++) 4*i and t4 := 4*j with the cheaper operations t2 := i << 2
... code which doesn't change n ... and t4 := j << 2 .

Can be rewritten as: The same idea can be used to simplify uses of other
t1 = n * n; expensive operations, including / and %.
for (i = 0; i < t1; i++)
... code which doesn't change n ... Strength reduction is however a much more general
technique, and more opportunities can be revealed if we
because the expression n*n is a loop invariant. are able to identify some induction variables.
85 86

3) Induction Variables: Induction Variables:


An induction variable takes values in some arithmetic In the quicksort example, notice that:
progression as we step through a loop.
Every time that i increases by 1, the value of
Variables that are used to control a loop often show t2 = 4*i increases by 4.
this behavior: Every time that j decreases by 1, the value of
for (int j = 10; j<20; j++) … t4 = 4*j decreases by 4.
And it can happen to other variables in the loop too… As a result:
for (int j = 10; j<20; j++) {
Every time that i increases by 1, the address of
int jthOdd = 2*j + 1;
a[t2] increases by 4.

}
Every time that j decreases by 1, the address
of a[t4] decreases by 4.
87 88
Another Strength Reduction: Before: B1 i := m-1
j := n
This is where
we had got to
t1 := 4*n
p := a[t1]
previously …
Instead of using expensive multiplications, why
not initialize t2 := 4*i (and t4 := 4*j) at the B2 i := i+1
beginning of the loop and then increment t2 t2 := 4*i
t3 := a[t2]
(and decrement t4) on each iteration... ? if t3<p goto B2

B3
This is another form of strength reduction. j := j-1
t4 := 4*j
B6
t5 := a[t4]
B5
if t5>p goto B3

if i>=j goto B6
a[t2] := t5 a[t2] := p
B4
a[t4] := t3 a[t1] := t3
goto B2
89 90

After: More Strength Reduction:


i := m-1
j := n Induction
B1 t1 := 4*n Variables t2
p := a[t1]
and t4, and
t2 := 4 * i
strength
Consider the following loop:
t4 := 4 * j
reduction: for (int i = 0; i<N; i++) {
B2 i := i+1 t = t + (i*i);
t2 := t2 + 4
t3 := a[t2]
}
if t3<p goto B2

B3 j := j-1
This code does N multiplications, and N
t4 := t4 - 4 additions. Multiplies are expensive: can we
B5 t5 := a[t4]
if t5>p goto B3 B6 eliminate them?
a[t2] := t5 a[t2] := p
a[t4] := t3 if i>=j goto B6 a[t1] := t3
goto B2
B4
91 92
Identifying Induction Variables: Differences of Differences:
As i takes on values 0, 1, 2, 3, 4, 5, 6, … We’ve seen that the difference from one value of (i*i) to
so i*i takes values 0, 1, 4, 9, 16, 25, 36, … the next is d = 2*i + 1.

What is the difference from one value of d to the next?


Note that: (i+1)*(i+1) = i*i + 2*i + 1
So now we can rewrite the loop again as:
int u = 0;
So we could rewrite the loop as:
int d = 1;
int u = 0;
for (int i = 0; i<N; i++) {
for (int i = 0; i<N; i++) {
t = t + u;
t = t + u; Can be reduced u = u + d; Longer, but each
u = u + 2*i + 1; to a shift. multiply has been
d = d + 2;
replaced by two adds!
} }
93 94

Choosing an Intermediate Code: On the other hand …


Our intermediate code separates out the By including indexed addressing, perhaps we’ve
process of accessing an array element into two chosen an intermediate code that is too high-
stages: level.
! Multiply the index by 4;
! Look up the value in the array.
Suppose that we used an intermediate code
On a 386, we can use a single instruction to that has only a simple load instruction
accomplish the same thing (and it’s fast):
! movl a(,%eax,4), %eax x := [y]

Doesn’t our choice of intermediate code force us and in which the address calculation must be
to use the less efficient version? done explicitly …
! imull $4, %eax; movl a(%eax), %eax
95 96
i := m-1 i := m-1
Before: B1 j := n
t1 := 4*n
This is where we
would have got
After: B1 j := n
t1 := 4*n
Now we’ve used
an optimization
u1 := a + t1 to previously if u1 := a + t1 based on the
p := [u1]
we’d had only t2 := 4*i observation that
load instructions: u2 := a + t2 the addresses of
B2 i := i+1 u4 := u1
a[i] and a[j] are
t2 := 4*i v := [u] p := [u1]
induction
u2 := a + t2
t3 := [u2] and save B2 variables!
instructions: i := i+1
if t3<p goto B2 u2 := u2 + 4
[v] := u. t3 := [u2]
if t3<p goto B2
B3 j := j-1
t4 := 4*j
u4 := a + t4 B3 j := j-1
B5 t5 := [u4] B6 B5 u4 := u4 - 4
B6
t5 := [u4]
if t5>p goto B3
[u2] := t5 [u2] := t5 if t5>p goto B3
[u2] := p [u2] := p
[u4] := t3 [u1] := t3 [u4] := t3 [u1] := t3
goto B2 if i>=j goto B6 goto B2 if i>=j goto B6

B4 B4
97 98

The Right Level of Abstraction? Instruction Selection


Designing an intermediate code is hard because it Finding the best match between an intermediate
is difficult to get the right level of abstraction: program and the available machine instructions is the
act of instruction selection.
! Too high-level, and you will hide opportunities for
optimization
Together with register allocation, instruction selection
constitutes the proper code generation phase in a
! Too low level, and it will be harder to utilize compiler that uses an intermediate representation.
advanced target instructions and addressing modes.
Increasing level of abstraction If the intermediate language is low level, instruction
selection might involve mapping several intermediate
Simple Indexed Indexed and operations onto a single target machine instruction.
loads and loads and scaled loads
saves only saves and saves
99
More on instruction selection next week. 100
Summary:
In this lecture we have seen:
How optimization techniques can be used in
practice.

The role of intermediate code, illustrated by


three-address code.

Using flow graphs to capture the control flow in


a particular program.
Using basic blocks to coalesce the effects of
multiple program statements into a single unit.
101

You might also like