0% found this document useful (0 votes)

7 views26 pages

Lecture 11

The document discusses compiler optimization, focusing on improving program performance in terms of execution speed and memory usage. It outlines various optimization techniques, their applicability, and the importance of correctness, emphasizing that optimizations should preserve program behavior. Additionally, it highlights the challenges and considerations in applying optimizations, such as trade-offs and potential pitfalls.

Uploaded by

Abdullah Opadeji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views26 pages

Lecture 11

Uploaded by

Abdullah Opadeji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Compiler Construction

SMD163
Understanding Optimization:
Lecture 11: Introduction to optimization

Viktor Leijon & Peter Jonsson with slides by Johan Nordlander

Contains material generously provided by Mark P. Jones

1 2

Goals of Optimization: Optimization is not Magic:

Optimization is about improving the target Optimizing compilers are just a tool; they work
programs that are generated by a compiler. with what they’re given.

In most cases, optimization has two principal No optimizing compiler can make up for poor
goals: choice of algorithms or data structures.
! Time: make programs run faster;
! Space: make programs use less memory.

Other applications of optimization include:

! Adapting code to particular architectures; etc…

3 4
Optimization is not Absolute: Optimization is not Free:
Some optimization techniques give clear wins in Some optimizations only apply in particular
both time and space. situations, and require time-consuming analysis
of the program.
Others may require us to trade one against the
other. Use of such optimizations is only justified for
programs that will be run often or for a long
The priorities that we attach to different time.
optimizations will depend on the application.
! In embedded systems, memory is often limited, and Optimization is appropriate in the construction
speed may be less of an issue (e.g., a VCR). and testing of products before wide
! In high performance systems, execution speed is release/distribution.
critical (e.g., video games).
5 6

Optimization is a Misnomer: Terminology:

A compiler writer’s job will never be done; there Terms like “program optimization” and
are always opportunities for new optimizations. “optimizing compiler” are firmly established.
Proof: Suppose that there is an optimizing compiler
Comp that can optimize any program to the shortest But we cannot build a truly optimizing compiler!
possible equivalent program.

Then Comp will compile any program that goes into an We will focus instead on techniques for
infinite loop without any output to the following, easily improving programs.
recognizable loop:
lab : jmp lab
But, following common usage, we will still refer
This is impossible, because a program that did this to each one as an “optimization.”
would be able to solve the halting problem. 7 8
Optimization by Transformation: Correctness is Essential!
Optimizations are program transformations. In all cases, it is essential that optimization
preserves meaning: the optimized program
Most apply only in particular circumstances. must have the same meaning/behavior as the
The effectiveness of an optimization depends on original program.
the program in which it is applied. Such transformations are often described as
! Optimization of a particular language feature will being safe. Better safe than sorry ... if an
have no impact on a program that does not use it. optimization isn’t safe, you shouldn't use it!
In some cases, an “optimization” may actually result
!
A slow program that gives the right answer is
in a slower or bigger program.
better than a program that gives the wrong
answer quickly.
9 10

An Example: Take Care! (part one)

Suppose that we have a loop: If N=0, then the “optimized” code will
for (int i=0; i<N; i++) {
evaluate x/y once, but the original won’t
… x/y …
evaluate it at all!
}

If we can ensure that the values of x and y do not So this is only an optimization if we can
change on each iteration, then we can “optimize” it to: be sure that the loop will be executed at
z = x/y;
least once.
for (int i=0; i<N; i++) {
…z…
}

This is an example of code motion.

11 12
Take Care! (part two) Caveat Optimizer:
If N=0, then the “optimized” program Optimizations can be quite subtle, and may
might raise an divide by zero exception require detailed analysis of a program to
where the original runs without fault. determine:
! whether they are applicable;
So the optimization is applicable only if: ! whether they are safe; and
! whether they will actually improve the program.
! We know that y will never be zero; or
! We know that the loop will always be
executed at least once, and that there are no In general, these questions are undecidable.
other observable effects in the code between
the new and old positions of x/y. But we always have the option not to use a
given optimization!
13 14

Combining Optimizations: Controlling Optimization:

The task of an optimizing compiler is to choose Compilers often allow programmers to control
and apply an appropriate sequence of the use of optimization techniques:
optimizations:
o1 o2 o3 o4 o5 o6 o7 ! To set priorities (for example, time or space?);
P0 P1 P2 P3 P4 P5 P6 P7
! To select/deselect particular kinds of optimization;
Applying one optimization may create new
opportunities to apply another; the order in ! To limit the time or the number of steps that are
which they are applied can make a difference. spent in optimization.

If intermediate steps preserve behavior, each

program will be equivalent to the original.
15 16
Optimization by Hand:
Programmers sometimes have an opportunity to
optimize their code by hand.

Beware: A Catalogue of Common

! It’s difficult to get right, it can obscure the code, it
can make it less portable, and it can introduce bugs.
Optimization Techniques:
! It’s hard to compete with a good optimizing compiler.

If performance is critical and you need to

optimize by hand:
! Wait until the program is almost finished;
! Use a profiler to identify the “hot spots”.
17 18

An Overview: Dead Code Elimination:

Compiler writers and researchers have Unreachable code can be eliminated.
discovered many different techniques for ! Code that follows a return, break, continue, or goto
optimization. and has no label can be eliminated.

For example, some of the most common int f(int x) { int f(int x) {
optimization techniques try to remove: return x+1; return x+1;
! Code that serves no useful purpose; … }
! Code that repeats earlier computations; }
! Code that uses an inefficient method to calculate a
value; ! Code that appears in functions that are never called
! Code that carries an unnecessary overhead; can be eliminated. (This process is sometimes
! Etc… described as “tree-shaking”.)
19 20
Continued … But be Careful!
Code that has no effect can be eliminated. Items that have an effect cannot be eliminated:
! An assignment to a variable that will not be used ! An assignment to a variable that will not be used
again can be eliminated. again can be eliminated.
int f(int x) { int f(int x) { int f(int x) { int f(int x) {
int temp = x*x; return x+1; int temp = g(x); g(x);
return x+1; } return x+1; return x+1;
} } }
! An assignment to a variable that will be overwritten ! An assignment to a variable that will be overwritten
before it is used again can be eliminated. before it is used again can be eliminated.
x = y; x = z; x = f1()+f2(); f1();
x = z; x = z; f2();
x = z;
21 22

Common-Subexpression Elimination: Copy and Constant Propagation:

The results of computations can be shared An assignment of the form x = y; is called a
rather than duplicated. copy instruction.

x = a + b; x = a + b; x = y; x = y;
y = a + b; y = x; z = x; z = y;
x = (a+b)*(a+b); t = a + b; x = 0; x = 0;
x = t * t;
z = x; z = 0;
But beware side effects!
What have we gained? Nothing directly, but if
x = f(a)*f(a); t = f(a); we manage to remove all references to x, then
x = t * t; the first assignment will become dead code.
23 24
Constant-Folding: Strength-Reduction:
“Do not put off until run-time what you can do Replace expensive operations with cheaper, but
at compile-time.” equivalent ones.

For example:
x = 2^8 – 1; x = 255;
x**2 = x*x
More generally: evaluate expressions involving 2*x = x+x
only constant values at compile-time. x/2 = x >> 1
x * 16 = x << 4
x % 128 = x & 127

25 26

Algebraic Identities: Who writes x+0 in source code?

Standard algebraic identities can often be put to How many programmers would actually write an
good use when only part of a program’s data is expression like x + 0 in their source code?
known at compile-time.
Are these optimizations of any use in practice?
For example:
x+0 = x Yes!
x–0 = x ! Examples like this can occur in handwritten programs
when symbolic constants are used.
x–x = 0
! Examples like this can show up in the code that we
x*1 = x generate for other language constructs. For
x*0 = 0 example, the address of a[0] is:
x/1 = x a + 4*0 = a + 0 = a.
27 28
Continued… Enabling Transformations:
At first, some identities might not seem to have Use of associativity and commutativity laws can
any significant applications: open up opportunities for other optimizations.
Examples include associativity:
(x + y) + z = x + (y + z) a = b+c; a = b + c;
t = (c+d)+b; t = (b+c) + d;
and commutativity:

x+y = y+x
a = b + c;
(But don’t forget the role that commutativity
played in our register allocator). t = a + d;

29 30

Another Example: Identities for Floating Point:

Suppose that d is an array of Date objects: Floating point numbers do not behave like real
numbers … floating point operators do not
tag day month year satisfy many useful laws.
! Associativity?
Now suppose that we want to access small+(big+(-big)) = small + 0 = small, but
d[3].month; then we need to load the value at (small+big)+(-big) = big+(-big) = 0.
address:
! Additive Identities?
(d + 16*3) + 8 NaN + 0 raises an exception, NaN does not.
= d + (16*3 + 8) associativity
! Multiplicative zeroes?
= d + 56. constant folding
inf * 0 = NaN.
31 32
“What the Language Permits”: Removing Overhead:
Even for integer values, algebraic identities can When we call a function, we spend some time
only be used within whatever scope the host constructing and then destroying the stack
language permits. frame.

For example, the definition of Fortran 77 states

that the order of evaluation of expressions When we execute a loop, we spend some time
involving parentheses must respect the setting up and testing the loop variable.
parentheses.
If the body of the function, or the body of the
(So Fortran 77 is a language in which loop is small, then the overhead associated with
parenthesized expressions would show up in the either of these will be quite large in proportion.
abstract syntax.)
33 34

Function Inlining: Cautionary Notes:

If we know the body of a function, we can use Inlining a large function many times can
that instead of a call. increase the size of the compiled program.

Naïve inlining—by copying text—can increase

For example, suppose we know the amount of work to be done. For example,
int square(int x) { return x*x; }
changing:
… square(square(f(x))) …
Then we can rewrite a section of code:
to
{ … square(square(x)) … }
… (f(x)*f(x)) * (f(x)*f(x)) …
as:
will require 3 multiplications, not 2, and will
{ … int t1 = x*x; t2 = t1*t1; … t2 … }
duplicate any side-effect of f(x).
35 36
Loop Unrolling: Peephole Optimization:
For example, we can rewrite a section of code: It is often possible to implement useful
for (int i=0; i<3; i++) { optimizations by looking for simple patterns in
f(i); small sections of generated assembly code.
}
as: “Looking at the code through a peephole.”
f(0);
f(1);
To a large extent, the choice of peephole
f(2);
optimizations depends on the target machine.
Typically produces more code …
But faster because we have eliminated a
temporary variable, and all the operations on it.
37 38

Examples for IA 32: Summary:

An instruction of the form addl $1,reg can be replaced We have looked at:
by incl reg.
The basic goals – and limits – of optimization.
An instruction of the form imul $2,reg can be replaced A catalogue of standard optimization techniques:
by addl reg,reg.
! Dead-code elimination;
! Common subexpression elimination;
In a sequence of instructions: ! Constant and copy propagation;
movl reg,var ! Constant folding;
movl var,reg
! Strength reduction;
the second instruction can be deleted, provided that it
! Algebraic identities;
does not have a label.
! Function inlining;
In a sequence of instructions: ! Loop unrolling;
! Peephole optimization.
addl $4,%esp
movl %ebp, %esp
the first instruction is dead code. 39
Next: Putting these techniques into practice … 40
Source High, Target Low:
Source code is often too high-level to reveal
opportunities for optimization.
For example, the assignment a[i] = x + y requires an
Optimization using an
!

(implicit) calculation of the address of the array

element a[i].
Intermediate Language:
Target code is often too low-level to reveal
opportunities for optimization.
! For example, temporary values have already been
assigned to registers and it is more difficult to
identity repeated or redundant computations.

41 42

Code Generation Example: Breaking Down Programs:

Here’s part of a MiniJava C_makeArray: The constructs of a language, and the nature of
program and the code that we
might generate from it:
pushl %ebp
the problem that is being solved, will lead a
movl %esp,%ebp
subl $4,%esp programmer to break down a program into a
class C { pushl $800 particular sequence of tasks.
… call _malloc
addl $4,%esp
int[] makeArray(int n) { movl %eax,-4(%ebp) The output from a compiler is supposed to
int[] a;
a = new int[200];
movl 8(%ebp),%eax
movl -4(%ebp),%ebx
execute the same sequence of tasks.
a[3] = n;
movl %eax,12(%ebx)
return a;
movl -4(%ebp),%eax
}
movl %ebp,%esp
There is no reason, however, for it to use
}
popl %ebp exactly the same break down as the
There is an independent sequence ret programmer.
of instructions for each statement … 43 44
Code Generation Example: Intermediate Code:
The same program broken C_makeArray:
down into “basic blocks” pushl %ebp Intermediate code provides a compromise between the
movl %esp,%ebp extremes of source and target code. It aims to be:
class C { subl $4,%esp
! Sufficiently low-level to capture single steps in the program.
… pushl $800
int[] makeArray(int n) { call _malloc ! Sufficiently high-level to avoid machine dependencies, and
int[] a; addl $4,%esp premature code generation.
a = new int[200]; movl %eax,-4(%ebp)
a[3] = n;
movl 8(%ebp),%eax Intermediate codes are usually some kind of idealized
return a;
}
movl -4(%ebp),%ebx machine code.
movl %eax,12(%ebx)
}
movl -4(%ebp),%eax As a useful side-benefit, intermediate code provides a
The correspondence is less
direct … but there are more
movl %ebp,%esp degree of portability (e.g., RTL in gcc, Java bytecodes
opportunities for optimization.
popl %ebp
and the JVM).
ret

But working at the level of 386 assembly code is difficult! 45 46

A High-Level View: UNCOL …The Search Goes On:

Flat input UNCOL is an old (1958), and as yet unrealized dream of
compiler writers:
UNiversal Computer Oriented Language
Structure It was hoped that a universal intermediate code could
serve as a meeting point for all languages.
One front end for each language, one backend for each
machine, and smooth interoperability between
Intermediate Optimizations languages.
Code
But no satisfactory UNCOL has been found yet; The
range of programming languages is very diverse!
There have been numerous attempts, some ongoing:
ANDF, C, JVM, UVM, C--, …
Flat output 47 48
Three-Address Code: For Example:
Three-address code: a simple UNCOL? In three address code, the statement
a[i] = a[i]+a[j]
Three address code is primarily a sequence of
statements of the general form: can be expressed as:
t1 := 4 * i
x := y op z
t2 := a[t1]
where x, y and z are names, constants or t3 := 4 * j
compiler generated temporaries.
t4 := a[t3]
For example, evaluation of x+y*z becomes:
t5 := t2 + t4 The calculation of
t1 := y * z t6 := 4 * i 4*i is duplicated!
t2 := x + t1 a[t6] := t5
49 50

Intermediate Code as Trees: Quads:

Three address code is really just a linear The statement a = b * (-c) + b * (-c) is
representation of the syntax tree for the represented by the following three-address
intermediate code. code:
t2:+ 0) t1 := -c (uminus, t1, c, _)
1) t2 := b * t1 (mult, t2, b, t1)
2) t3 := -c (uminus, t3, b, _)
x t1:*
3) t4 := b * t3 (mult, t4, b, t3)
4) t5 := t2 + t4 (add, t5, t2, t4)
y z 5) a := t5 (save, a, t5, _)

Other forms of instruction, for example, goto, In practice, three address code is often
unary operations, conditional branches, etc. are represented or described by quadruples
also used in practice. 51
(quads), of the form (op, dest, arg1, arg2). 52
Finding Basic Blocks: An Example:
A basic block is a sequence of statements where control Consider the following implementation of quicksort:
only enters at the first statement and leaves at the last.
void quicksort(int m, int n) { // sorts global array a
if (m<n) {
To partition a sequence of instructions into basic blocks,
int i = m - 1, j = n, p = a[n], t;
start by identifying the leaders: while (1) {
! The first statement is a leader. do { i = i + 1; } while (a[i] < p);
! An statement that is the target of a call or a goto (conditional or do { j = j - 1; } while (a[j] > p);
unconditional) is a leader. if (i >= j) break;
! Any statement that immediately follows a call or a goto is a t = a[i]; a[i] = a[j]; a[j] = t;
leader. }
t = a[i]; a[i] = a[n]; a[n] = t;

For each leader, there is a basic block consisting of the quicksort(m,j); quicksort(i+1,n);
leader and all following statements up to, but not }
}

including, the next leader or the end of the program.

53
We will focus on optimizing the highlighted section. 54

In Three-Address code … Flow Graphs:

Translation of the core of quicksort into 3-address code: We add directed edges between basic blocks to
1) i := m-1
2) j := n
16) t7 := 4*i
17) t8 := 4*j
capture control flow.
3) t1 := 4*n 18) t9 := a[t8]
4) p := a[t1] 19) a[t7] := t9
5) i := i+1
6) t2 := 4*i
20) t10 := 4*j
21) a[t10] := t One basic block is distinguished as initial; it
7) t3 := a[t2]
8) if t3<p goto 5
22) goto 5
23) t11 := 4*i
contains the first statement to be executed…
9) j := j-1 24) t := a[t11]
10) t4 := 4*j 25) t12 := 4*i
11) t5 := a[t4]
12) if t5>p goto 9
26) t13 := 4*n
27) t14 := a[t13]
For any pair of basic blocks B1 and B2, there is
13) if i>=j goto 23 28) a[t12] := t14 an edge from B1 to B2 if B2 can directly follow B1
14) t6 := 4*i 29) t15 := 4*n
15) t := a[t6] 30) a[t15] := t in some execution sequence.
There are six basic blocks: 1—4, 5—8, 9—12, 13,
14— 22, 23—30.
55 56
Example: B1 i := m-1
j := n Program = Flow Graph + Blocks:
t1 := 4*n
p := a[t1]
i := m-1 t6 := 4*i
j := n t := a[t6]
B2 i := i+1 t1 := 4*n t7 := 4*i
t2 := 4*i p := a[t1] t8 := 4*j
t3 := a[t2] B1 t9 := a[t8]
if t3<p goto B2 i := i+1 a[t7] := t9
t10 := 4*j
B3 B2 t2 := 4*i
a[t10] := t
t3 := a[t2]
j := j-1 B6 if t3<p goto B2 goto B2
t4 := 4*j
B5 t6 := 4*i t5 := a[t4] t11 := 4*i B3
t := a[t6] if t5>p goto B3 t11 := 4*i
t := a[t11] j := j-1
t7 := 4*i t4 := 4*j t := a[t11]
t12 := 4*i
t8 := 4*j t13 := 4*n B5 B4 B6 t5 := a[t4] t12 := 4*i
t9 := a[t8] if i>=j goto B6 if t5>p goto B3 t13 := 4*n
t14 := a[t13]
a[t7] := t9 t14 := a[t13]
a[t12] := t14
t10 := 4*j B4 t15 := 4*n
a[t12] := t14
a[t10] := t if i>=j goto B6 t15 := 4*n
a[t15] := t
goto B2 a[t15] := t
57 58

Optimizing Basic Block Code: Common Subexpression Elimination:

Our goal is to optimize programs expressed in this Consider the basic block:
format by transforming their basic blocks. 1) a := b + c
2) b := a - d
3) c := b + c
In general, there are two kinds of transformation that 4) d := a - d
we might want to use:
! Local transformations, which can be applied to individual basic The second and fourth statements calculate the same
blocks, regardless of where they appear in the flowgraph. value, so this block can be rewritten as:
! Global transformations, which typically make use of information 1) a := b + c
about larger sections of the flowgraph. 2) b := a - d
3) c := b + c
4) d := b
Many transformations can be performed at both the
local and global levels. Note that, even though the same expression, b + c,
appears on the right of both the first and third lines, it
Local transformations are usually performed first. does not have the same value in each case.
59 60
A More Careful Analysis: Example: B1 i := m-1
j := n
t1 := 4*n
To see this more formally, consider the p := a[t1]

following, annotating each variable on the right

B2 i := i+1
with the number of the step where it was last t2 := 4*i
defined: t3 := a[t2]
if t3<p goto B2

a b c d B3
0 0 0 0
j := j-1 B6
t4 := 4*j
1) a := b + c a1 := b0 + c0 t6 := 4*i t5 := a[t4]
1 0 0 0 B5 t := a[t6]
t11 := 4*i
if t5>p goto B3 t := a[t11]
2) b := a - d b2 := a1 - d0 t7 := 4*i
1 2 0 0 t12 := 4*i
t8 := 4*j t13 := 4*n
3) c := b + c c3 := b2 + c0 t9 := a[t8] if i>=j goto B6
1 2 3 0 t14 := a[t13]
a[t7] := t9 a[t12] := t14
4) d := a - d d4 := a1 - d0 t10 := 4*j B4 t15 := 4*n
1 2 3 4 a[t10] := t a[t15] := t
goto B2
61 62

Example: B1 i := m-1
j := n
Local CSE on
blocks B5 and B6
Example: B1 i := m-1
j := n
Local CSE on
blocks B5 and B6
t1 := 4*n t1 := 4*n
p := a[t1] p := a[t1]