Performance Evaluation
Why Performance Evaluation?
DESIGN
EVALUATION
2
Performance Evaluation Concepts
• Computer performance evaluation is based on:
– Throughput
– Execution time
• Processor performance evaluation
– Based on execution time of a program:
PerformanceX = 1/ Execution timeX
Relative performance: n = PerformanceX / PerformanceY
n > 1 => X is n times faster than Y
• Terminology
– Improve performance = increase performance
– Improve execution time = decrease execution time
Example
• Impact on throughput and Execution time
of:
– Using a faster processor
• Decrease in Execution time
• Increase in throughput
– Adding more processors to a system
• Increase in throughput
• Decrease in Execution time if the program parts can
run in parallel (if overhead is low)
CPU Performance
• CPU performance is measured in terms of time!
• CPU time deals with
– Number of instructions executed to complete a
job
– How many clock cycles are needed to execute a
single instruction
– The length of the clock cycle (clock cycle time)
5
CPU Performance Equation
• CPU time = CPU clock cycles * cycle time
= CPU clock cycles / clock rate
– CPU clock cycles = IC * CPI
• IC: instruction count (number of instructions per program)
• CPI: average cycles per instruction
CPU time = IC * CPI * cycle time
seconds instructions clock cycles seconds
* *
=
program program instruction clock cycle
Scope of Performance Sources
CPU time = IC * CPI * Cycle time
Program
Compiler
ISA
Microarchitecture
Hardware
Abstraction level interdependence
Scope of Performance Sources
CPU time = IC * CPI * Cycle time
Program
Compiler
ISA
Microarchitecture
Hardware
Abstraction level interdependence
Scope of Performance Sources
CPU time = IC * CPI * Cycle time
Program
Compiler
ISA
Microarchitecture
Hardware
Abstraction level interdependence
Example 1
• A program runs in 10 secs on a 3.5 GHz processor.
A designer wants to build a new computer that can
run the program in 6 secs by increasing the clock
frequency. However the average “new” CPI will be
1.2 times higher.
• What faster clock rate should the designer use?
10 IC * CPI / 3.5 GHz (current exec. time)
=
6 IC * 1.2 CPI / X GHz (target execution time)
Solve for X, to obtain X = ___ GHz
X= 1.2*3.5*10/6
CPU Performance Equation
• CPU clock cycles = Σi (CPIi * ICi)
– ICi : count of instructions of class i
– CPIi : cycles that takes to execute instructions of class i
Example 2
• Comparing two compiler code segments
Instruction class i CPIi for instruction class i
A 1
B 2
C 3
Code Instruction counts (ICi) for instruction class
sequence A B C
1 2 1 2
2 4 1 1
• Which will be faster? S1 = 2 . 1 + 1 . 2 + 2 . 3 = 10 cyc
S2 = 4 . 1 + 1 . 2 + 1 . 3 = 9 cyc
Example 3
• Suppose we have 2 implementations of the same
instruction set architecture. Computer A has a clock cycle
time of 10 nsec and a CPI of 2.0 for some program, and
computer B has a clock cycle time of 20 nsec and a CPI of
1.2 for the same program. Which machine is faster for this
program?
• Answer:
Assume the program requires I instructions to be executed:
CPU time A = I × 2.0 × 10 nsec = 20 × I nsec
CPU timeB = I × 1.2 × 20 nsec = 24 × I nsec
∴ Computer A is faster than computer B.
Example 4
• Machine A runs a program in 20 seconds
machine B runs the same program in 25
seconds
how many times faster is machine
A? 25 /20 = 1.25
Example 5
Assume that a benchmark has 100
instructions: 25 instructions are loads/stores
(each take 2 cycles) 50 instructions are adds
(each takes 1 cycle) 25 instructions are
square root (each takes 50 cycles)
What is the CPI for this benchmark?
CPI = ((0.25 * 2) + (0.50 * 1) + (0.25 * 50))
= 13.5
Amdahl’s Law
• Law of diminishing returns for acceleration
Execution time before improvement
Speedup =
Execution time after improvement
Example: Two cases - F1: 25% improved 2X, F2: 50% improved 2X
75 percent not improved F1
50 percent n.i. F2
Time →
1
Speedup =
(1 - fraction enhanced) + (fraction enhanced/factor of improvement)
Example
• A program runs in 10 seconds
• What is the speedup after a faster floating
point unit is incorporated?
FP Unit 5 times faster 5 seconds in FP operations
6 2
1
Speedup
Speedup = 1.9
5
(1 – Fraction FP) + FractionFP 1.8
4 5 1.7
1.6
3 1.5 1
Speedup =
2
1.4 0.5
1.3 0.5 + FP factor of improvement
1 1.2
Fraction of FP 1.1 FP Unit im provem ent
0 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Example
• A program runs in 10 seconds
• What is the speedup after a faster floating
point unit is incorporated?
FP Unit 5 times faster 5 seconds in FP operations
6 2
1
Speedup
Speedup = 1.9
5
(1 – Fraction FP) + FractionFP 1.8
4 5 1.7
1.6
3 1.5 1
Speedup =
2
1.4 0.5
1.3 0.5 + FP factor of improvement
1 1.2
Fraction of FP 1.1 FP Unit im provem ent
0 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Amdahl’s Law - example
• A program runs in 100 seconds on a computer, with
multiply operations responsible for 80 seconds of this time
• How much do I have to improve the speed of
multiplication, if I want my program to run 5 times faster?
• Timeimproved = Timeunaffected + Timeaffected/(Improvement
Factor)
• 20 s = 20 s + 80 s / n
• 0 = 80 s / n
• There is no amount by which we can improve multiply to
achieve a fivefold increase in performance!
19
Amdahl’s Law
• On a large system, suppose we can upgrade a
CPU to make it 50% faster for $10,000 or
upgrade its disk drives for $7,000 to make them
150% faster.
• Processes spend 70% of their time running in the
CPU and 30% of their time waiting for disk
service.
• An upgrade of which component would offer the
greater benefit for the lesser cost?
• The processor option offers a 30% speedup:
• And the disk drive option gives a 22% speedup:
• Each 1% of improvement for the processor costs
$333 (10000/30), and for the disk a 1%
improvement costs $318 (7000/22).