L08
02/03/2020
CS104: Computer Organization
Lecture 08, 2nd March 2020
Manojit Ghose
Computer Organization (IS F242)
Dip Sankar
Lecture Banerjee
1 : Introductory Thoughts
Dip Sankar Banerjee
dsb@hyderabad.bits-pilani.ac.in
Department
Indian Institute of CS &Technology,
of Information IS Guwahati
Jan-Apr 2020
L08
02/03/2020
CPU Performance Evaluation (CPI)
• Most computers run synchronously utilizing a CPU clock running at a
constant clock rate: Or clock frequency: f Clock cycle
where: Clock rate = 1 / clock cycle
f = 1 /C cycle 1 cycle 2 cycle 3
• The CPU clock rate depends on the specific CPU organization (design) and hardware
implementation technology (VLSI) used.
• A computer machine (ISA) instruction is comprised of a number of elementary or
micro operations which vary in number and complexity depending on the the
instruction and the exact CPU organization (Design).
– A micro operation is an elementary hardware operation that can be performed
during one CPU clock cycle.
– This corresponds to one micro-instruction in microprogrammed CPUs.
– Examples: register operations: shift, load, clear, increment, ALU operations:
add , subtract, etc.
• Thus: A single machine instruction may take one or more CPU cycles to complete
termed as the Cycles Per Instruction (CPI). Instructions Per Cycle = IPC = 1/CPI
• Average (or effective) CPI of a program: The average CPI of all instructions executed
in the program on a given CPU design.
Cycles/sec = Hertz = Hz
MHz = 106 Hz GHz = 109 Hz
L08
02/03/2020
Generic CPU Machine Instruction Processing Steps
Obtain instruction from program memory
Instruction
The Program Counter (PC) points to the instruction to be processed
Fetch
Instruction Determine required actions and instruction size
Decode
Operand Locate and obtain operand data
From data memory or registers
Fetch
Compute result value or status
Execute
Deposit results in storage (data memory or
Result
register) for later use
Store
Next Determine successor or next instruction
(i.e Update PC to fetch next instruction to be processed)
Instruction
CPI = Cycles per instruction
L08
02/03/2020
Computer Performance Measures:
Program Execution Time
• For a specific program compiled to run on a specific machine (CPU)
“A”, has the following parameters:
– The total executed instruction count of the program. I
– The average number of cycles per instruction (average CPI). CPI
– Clock cycle of machine “A” C/CC Or effective CPI
• How can one measure the performance of this machine (CPU) running this
program?
– Intuitively the machine (or CPU) is said to be faster or has better
performance running this program if the total execution time is
shorter.
– Thus the inverse of the total measured program execution time is a
possible performance measure or metric:
Seconds/program
Programs/second PerformanceA = 1 / Execution TimeA
How to compare performance of different machines?
What factors affect performance? How to improve performance?
L08
02/03/2020
Comparing Computer Performance Using
Execution Time
• To compare the performance of two machines (or CPUs) “A”, “B” running a
given specific program: The two CPUs may target
PerformanceA = 1 / Execution TimeA different ISAs provided
the program is written in a high
PerformanceB = 1 / Execution TimeB
level language (HLL)
• Machine A is n times faster than machine B means (or slower? if n < 1) :
PerformanceA Execution TimeB
Speedup = n = =
PerformanceB Execution TimeA
• Example: (i.e Speedup is ratio of performance, no units)
For a given program:
Execution time on machine A: ExecutionA = 1 second
Execution time on machine B: ExecutionB = 10 seconds
Speedup= PerformanceA / PerformanceB = Execution TimeB / Execution TimeA
= 10 / 1 = 10
The performance of machine A is 10 times the performance of
machine B when running this program, or: Machine A is said to be 10
times faster than machine B when running this program.
L08
02/03/2020
CPU Execution Time: The CPU Equation
• A program is comprised of a number of instructions executed , I
– Measured in: instructions/program AKA Dynamic Executed Instruction Count
• The average instruction executed takes a number of cycles per
instruction (CPI) to be completed.
Or Instructions Per Cycle (IPC):
– Measured in: cycles/instruction, CPI IPC = 1/CPI
• CPU has a fixed clock cycle time C = 1/clock rate C = 1/f
– Measured in: seconds/cycle
• CPU execution time is the product of the above three
parameters as follows: Executed
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
T = I x CPI x C
execution Time Number of Average CPI for program CPU Clock Cycle
per program in seconds instructions executed
(This equation is commonly known as the CPU performance equation)
L08
02/03/2020
CPU Average CPI/Execution Time
For a given program executed on a given machine (CPU):
CPI = Total program execution cycles / Instructions count
(i.e average or effective CPI) Executed (I)
CPU clock cycles = Instruction count x CPI
(executed, I)
CPU execution time =
= CPU clock cycles x Clock cycle
= Instruction count x CPI x Clock cycle
T = I x CPI x C
Average
execution Time Number of or effective CPU Clock Cycle
per program in seconds instructions executed CPI for
program
(This equation is commonly known as the CPU performance equation)
CPI = Cycles Per Instruction
L08
02/03/2020
CPU Execution Time: Example
• A Program is running on a specific machine (CPU) with the
following parameters: I
– Total executed instruction count: 10,000,000
instructions
– Average CPI for the program: 2.5 cycles/instruction.
i.e 5 nanoseconds
– CPU clock rate: 200 MHz. (clock cycle = C = 5x10-9
seconds)
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
• What is the execution time for this program:
CPU time = Instruction count x CPI x Clock cycle
= 10,000,000 x 2.5 x 1 / clock rate
= 10,000,000 x 2.5 x 5x10-9
= 0.125 seconds
Nanosecond = nsec =ns = 10-9 second
MHz = 106 Hz T = I x CPI x C
L08
02/03/2020
Factors Affecting CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Average
T = I x CPI x C
Instruction Cycles per Clock Rate
Count Instruction (1/C)
Program
Compiler
Instruction Set
Architecture (ISA)
Organization
(CPU Design)
Technology
(VLSI)
T = I x CPI x C
L08
02/03/2020
Aspects of CPU Execution Time
CPU Time = Instruction count executed x CPI x Clock cycle
T = I x CPI x C Depends on:
Program Used
Compiler
ISA
Instruction Count I
(executed)
Depends on:
Program Used Depends on:
Compiler CPI Clock CPU Organization
ISA (Average Cycle Technology (VLSI)
CPU Organization
CPI) C
L08
02/03/2020
Performance Comparison: Example
• From the previous example: A Program is running on a specific machine
(CPU) with the following parameters:
– Total executed instruction count, I: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz. Thus: C = 1/(200x106)= 5x10-9 seconds
• Using the same program with these changes:
– A new compiler used: New executed instruction count, I: 9,500,000
New CPI: 3.0
– Faster CPU implementation: New clock rate = 300 MHz
Thus: C = 1/(300x106)= 3.33x10-9 seconds
• What is the speedup with the changes?
Speedup = Old Execution Time = Iold x CPIold x Clock cycleold
New Execution Time Inew x CPInew x Clock Cyclenew
Speedup = (10,000,000 x 2.5 x 5x10-9) / (9,500,000 x 3 x 3.33x10-9 )
= .125 / .095 = 1.32
or 32 % faster after changes.
Clock Cycle = C = 1/ Clock Rate T = I x CPI x C
L08
02/03/2020
Instruction Types & CPI
• Given a program with n types or classes of instructions
executed on a given CPU with the following characteristics:
e.g ALU, Branch etc.
Ci = Count of instructions of typei executed
i = 1, 2, …. n
CPIi = Cycles per instruction for typei
Then: Depends on CPU Design
CPI = CPU Clock Cycles / Instruction Count I
i.e average or effective CPI
Where:
Executed
CPI
n
CPU clock cycles i
Ci
i 1
Executed Instruction Count I = S Ci
T = I x CPI x C
L08
02/03/2020
Instruction Types & CPI: An Example
• An instruction set has three instruction classes:
Instruction class CPI
A 1 For a specific
e.g ALU, Branch etc. B 2 CPU design
C 3
• Two code sequences have the following instruction counts:
Program Instruction counts for instruction class
Code Sequence A B C
1 2 1 2
2 4 1 1
• CPU cycles for sequence 1 = 2 x 1 + 1 x 2 + 2 x 3 = 10 cycles
CPI for sequence 1 = clock cycles / instruction count
i.e average or effective CPI = 10 /5 = 2
• CPU cycles for sequence 2 = 4 x 1 + 1 x 2 + 1 x 3 = 9 cycles
CPI for sequence 2 = 9 / 6 = 1.5
CPI C
n
CPU clock cycles i i
CPI = CPU Cycles / I
i 1
L08
02/03/2020
Instruction Frequency & CPI
• Given a program with n types or classes of instructions
with the following characteristics:
i = 1, 2, …. n
Ci = Count of instructions of typei executed
CPIi = Average cycles per instruction of typei
Fi = Frequency or fraction of instruction typei executed
= Ci/ total executed instruction count = Ci/ I
Where: Executed Instruction Count I = S Ci
Then:
CPI CPI i F i
n
i 1
i.e average or effective CPI
Fraction of total execution time for instructions of type i = CPIi x Fi
CPI
T = I x CPI x C
L08
02/03/2020
Instruction Type Frequency & CPI
Program Profile or Executed Instructions Mix CPIi x Fi
CPI
Base Machine (Reg / Reg) Depends on CPU Design
Op Freq, Fi CPIi CPIi x Fi % Time
ALU 50% 1 .5 23% = .5/2.2
Given
Load 20% 5 1.0 45% = 1/2.2
Store 10% 3 .3 14% = .3/2.2
Branch 20% 2 .4 18% = .4/2.2
Sum = 2.2
Typical Mix
CPI CPI i F i
n
i.e average or effective CPI i 1
CPI = .5 x 1 + .2 x 5 + .1 x 3 + .2 x 2 = 2.2
= .5 + 1 + .3 + .4
T = I x CPI x C
L08
02/03/2020
Metrics of Computer Performance
(Measures)
Application Execution time: Target workload,
SPEC, etc.
Programming
Language
Compiler
(millions) of Instructions per second – MIPS
(millions) of (F.P.) operations per second – MFLOP/s
ISA
Datapath
Control Megabytes per second.
Function Units
Transistors Wires Pins Cycles per second (clock rate).
Each metric has a purpose, and each can be misused.
L08
02/03/2020
Choosing Programs To Evaluate Performance
Levels of programs or benchmarks that could be used to evaluate
performance:
– Actual Target Workload: Full applications that run on the target
machine.
– Real Full Program-based Benchmarks:
• Select a specific mix or suite of programs that are typical of targeted
applications or workload (e.g SPEC95, SPEC CPU2000).
– Small “Kernel” Benchmarks: Also called synthetic benchmarks
• Key computationally-intensive pieces extracted from real programs.
– Examples: Matrix factorization, FFT, tree search, etc.
• Best used to test specific aspects of the machine.
– Microbenchmarks:
• Small, specially written programs to isolate a specific aspect of
performance characteristics: Processing: integer, floating point, local
memory, input/output, etc.
L08
02/03/2020
Types of Benchmarks
Pros Cons
• Very specific.
• Representative Actual Target Workload • Non-portable.
• Complex: Difficult
to run, or measure.
• Portable.
• Widely used. • Less representative
Full Application Benchmarks
• Measurements than actual workload.
useful in reality.
Small “Kernel” • Easy to “fool” by
• Easy to run, early in designing hardware
the design cycle. Benchmarks
to run them well.
• Peak performance
• Identify peak results may be a long
performance and Microbenchmarks
way from real application
potential bottlenecks. performance
L08
02/03/2020
SPEC: System Performance Evaluation Corporation
The most popular and industry-standard set of CPU benchmarks.
• SPECmarks, 1989: Programs application domain: Engineering and scientific computation
– 10 programs yielding a single number (“SPECmarks”).
• SPEC92, 1992:
– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs).
• SPEC95, 1995:
– SPECint95 (8 integer programs):
• go, m88ksim, gcc, compress, li, ijpeg, perl, vortex
– SPECfp95 (10 floating-point intensive programs):
• tomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5
– Performance relative to a Sun SuperSpark I (50 MHz) which is given a score of SPECint95 =
SPECfp95 = 1
• SPEC CPU2000, 1999:
– CINT2000 (11 integer programs). CFP2000 (14 floating-point intensive programs)
– Performance relative to a Sun Ultra5_10 (300 MHz) which is given a score of SPECint2000 =
SPECfp2000 = 100
• SPEC CPU2006, 2006:
– CINT2006 (12 integer programs). CFP2006 (17 floating-point intensive programs)
– Performance relative to a Sun Ultra Enterprise 2 workstation with a 296-MHz UltraSPARC II
processor which is given a score of SPECint2006 = SPECfp2006 = 1
All based on execution time and give speedup over a reference CPU
L08
02/03/2020
Example Problem
• A 1 GHz processor takes 100 seconds to execute a program,
while consuming 70 W of dynamic power and 30 W of
leakage power. Does the program consume less energy
in Turbo boost mode when the frequency is increased to
1.2 GHz?
20
L08
02/03/2020
Example Problem
• A 1 GHz processor takes 100 seconds to execute a program,
while consuming 70 W of dynamic power and 30 W of
leakage power. Does the program consume less energy
in Turbo boost mode when the frequency is increased to
1.2 GHz?
Normal mode energy = 100 W x 100 s = 10,000 J
Turbo mode energy = (70 x 1.2 + 30) x 100/1.2 = 9,500 J
Note:
Frequency only impacts dynamic power, not leakage power.
We assume that the program’s CPI is unchanged when
frequency is changed, i.e., exec time varies linearly
with cycle time. 21