0% found this document useful (0 votes)
51 views15 pages

Lecture 3: Evaluating Computer Architectures: - Announcements - Last Time - Technology Back Ground

This lecture discusses evaluating computer performance. It introduces different metrics used to measure performance, including execution time, throughput, and cycles per instruction. When comparing two systems, it is important to consider whether latency or throughput is more important. The lecture also discusses what types of programs are best for performance evaluation, including actual workloads, full applications, kernel benchmarks, and microbenchmarks, each with their own pros and cons. A brief history of benchmarking is also provided. The lecture warns that direct comparisons between systems can be compromised if the benchmark code is not carefully designed to measure the intended aspects of performance.

Uploaded by

Fazal Jadoon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views15 pages

Lecture 3: Evaluating Computer Architectures: - Announcements - Last Time - Technology Back Ground

This lecture discusses evaluating computer performance. It introduces different metrics used to measure performance, including execution time, throughput, and cycles per instruction. When comparing two systems, it is important to consider whether latency or throughput is more important. The lecture also discusses what types of programs are best for performance evaluation, including actual workloads, full applications, kernel benchmarks, and microbenchmarks, each with their own pros and cons. A brief history of benchmarking is also provided. The lecture warns that direct comparisons between systems can be compromised if the benchmark code is not carefully designed to measure the intended aspects of performance.

Uploaded by

Fazal Jadoon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Lecture 3: Evaluating Computer Architectures

Announcements
- Reminder: Homework 1 due Thursday 2/2

Last Time technology back ground


Computer elements
Circuits and timing
Virtuous cycle of the past and future?

Today
What is computer performance?
What programs do I care about?
Performance equations
Amdahls Law

UTCS CS352 Lecture 3 1

Software & Hardware: The Virtuous Cycle?

Faster Single
Processor
Frequency Scaling Larger, More
Capable Software
Managed Languages

?
More Cores Scalable Software
Scalable Apps +
Multi/Many Core
Scalable Runtime

UTCS CS352 Lecture 3 2

1
Performance Hype

AMD Performance Preview: Taking Phenom II to 4.2 GHz


Intel Core i78 processing threads They are the best
desktop processor family on the planet.
With 8 cores, each supporting 4 threads, the UltraSPARC T1 processor
executes 32 simultaneous threads within a design consuming only 72 watts of power.
improves throughput by up to 41x
speed up by 10-25% in many cases
about 2x in two cases
more than 10x in two small benchmarks
speedups of 1.2x to 6.4x on a variety of benchmarks
can reduce garbage collection time by 50% to 75%
demonstrating high efficiency and scalability
our prototype has usable performance
speedups. are very significant (up to 54-fold)
sometimes more than twice as fast
our . is better or almost as good as . across the board
UTCS CS352 Lecture 3 3

What Does this Graph Mean?


Performance Trends on SPEC Int 2000

UTCS CS352 Lecture 3 4

2
Computer Performance Evaluation

Metric = something we measure


Goal: Evaluate how good/bad a design is
Examples
Clock rate of computer
Power consumed by a program
Execution time for a program
Number of programs executed per second
Cycles per program instruction
How should we compare two computer systems?

UTCS CS352 Lecture 3 5

Tradeoff: latency vs. throughput

Pizza delivery
Do you want your pizza hot?

Or do you want your pizza to be inexpensive?

Two different delivery strategies for pizza company!

This course focuses primarily on latency (hot pizza)

Latency = execution time for a single task


Throughput = number of tasks per unit time

UTCS CS352 Lecture 3 6

3
Two notions of performance
Throughput
Plane DC to Paris Speed Passengers
(pmph)

Boeing 747 6.5 hours 610 mph 470 286,700

Concorde 3 hours 1350 mph 132 178,200

Which has plane higher performance?


Time to do the task (Execution Time)
execution time, response time, latency
Tasks per day, hour, week, sec, ns. .. (Performance)
throughput, bandwidth

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 7

Definitions

Performance is in units of things-per-second


bigger is better
Response time of a system Y running program Z
performance (Y) = 1
execution time (Z on Y)
Throughput of system Y running many programs
performance (Y) = number of programs
unit time

" System X is n times faster than Y" means


n = performance(X)
performance(Y)

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 8

4
Definitions

Performance is in units of things-per-second


bigger is better
Response time of a system Y running program Z
performance (Y) = 1
execution time (Z on Y)
Throughput of system Y running many programs
performance (Y) = number of programs
unit time

" System X is n times faster than Y" means


n = performance(X)
performance(Y)

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 9

Which Programs Should I Measure?

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 10

5
Which Programs Should I Measure?
Pros Cons

Actual Target Workload

Full Application Benchmarks

Small Kernel
Benchmarks

Microbenchmarks

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 11

Which Programs Should I Measure?


Pros Cons
very specific
representative Actual Target Workload non-portable
difficult to run, or
measure
hard to identify cause
portable
widely used Full Application Benchmarks less representative
improvements
useful in reality

Small Kernel
easy to run, early in easy to fool
Benchmarks
design cycle

identify peak peak may be a long


capability and Microbenchmarks way from application
potential bottlenecks performance
UTCS CS352 Slide courtesy of D. Patterson Lecture 3 12

6
Brief History of Benchmarking

Early days (1960s) Real Applications (1989-now)


Single instruction execution SPEC CPU C/Fortran
time Scientific, Irregular
Average instruction time 89, 92, 95, 00, 07, ??
[Gibson 1970]
TPC C: Transaction Processing
Pure MIPS (1/AIT)
SPECWeb
WinBench: Desktop
Simple programs(early 70s)
Graphics C/C++
Synthetic benchmarks
Quake III, Doom 3
(Whetstone, etc.)
MediaBench
Kernels (Livermore Loops)
Java: SPECJVM98
Relative Performance (late 70s) Problem: Programming Language
Parallel?, Java, C#, JavaScript??
VAX 11/780 1-MIPS
DaCapo Java Benchmarks 06, 09
but was it?
Parsec: Parallel C/C++, 2008
MFLOPs

UTCS CS352 Lecture 3 13

How to Compromise a Comparison:


C programs running on two architectures

UTCS CS352 Lecture 3 14

7
The compiler reorganized the code!

Change the memory system performance


Matrix multiply cache blocking

After

Before

UTCS CS352 Lecture 3 15

There are lies, damn lies, and statistics


Desraeli

UTCS CS352 Lecture 3 16

8
benchmarks

There are lies, damn lies, and statistics
Desraeli

UTCS CS352 Lecture 3 17

Benchmarking Java Programs

Lets consider the performance of the DaCapo


Java Benchmarks
What do we need to think about when comparing
two computers running Java programs?

http://dacapo.anu.edu.au/regression/perf
/2006-10-MR2.html

UTCS CS352 Lecture 3 18

9
Pay Attention to Benchmarks & System

Benchmarks measure the Benchmark timings are


whole system sensitive
application alignment in cache
compiler, VM, memory location of data on disk
management values of data
operating system
architecture Danger of inbreeding or
implementation positive feedback
Popular benchmarks often if you make an operation
reflect yesterdays fast (slow) it will be used
programs more (less) often
what about the programs therefore you make it
people are running today? faster (slower)
need to design for and so on, and so on
tomorrows problems the optimized NOP
UTCS CS352 Lecture 3 19

Performance Summary so Far

Key concepts
Throughput and Latency

Best benchmarks are real programs


DaCapo, Spec, TPC, Doom3

Pitfalls
Whole system measurement
Workload may not match users
Compiler, VM, memory management

Next
Amdahls Law

UTCS CS352 Lecture 3 20

10
Improving Performance: Fundamentals

Suppose we have a machine with two instructions


Instruction A executes in 100 cycles
Instruction B executes in 2 cycles

We want better performance.


Which instruction do we improve?

UTCS CS352 Lecture 3 21

Speedup

Make a change to an architecture


Measure how much faster/slower it is

UTCS CS352 Lecture 3 22

11
Speedup when we know details about the change

Performance improvements depend on:


how good is enhancement (factor S)
how often is it used (fraction p)
Speedup due to enhancement E:
ExTime w/out E Perf w/ E
Speedup(E) = =
ExTime w/ E Perf w/out E

UTCS CS352 Lecture 3 23

Amdahls Law: Example

FP instructions improved by 2x
But.only 10% of instructions are FP

0.1
ExTimenew = ExTimeold 0.9 + = 0.95 ExTimeold
2

Amdahls Law: Speedup bounded by

UTCS CS352 Lecture 3 24

12
How Does Amdahls Law Apply to Multicore?

Given N cores what is our ideal speedup?

UTCS CS352 Lecture 3 25

How Does Amdahls Law Apply to Multicore?

Given N cores what is our ideal speedup?


ExTimenew = ExTimeold /N

Say 90% of the code is parallel and N = 16?


UTCS CS352 Lecture 3 26

13
How Does Amdahls Law Apply to Multicore?

Given N cores what is our ideal speedup?


ExTimenew = ExTimeold /N

Say 90% of the code is parallel and N = 16?


p
ExTimenew = ExTimeold (1 p) +
N
0.9
ExTimenew = ExTimeold 0.1+ = 0.15625 ExTimeold
16

1
Speeduptotal = = 6.2
0.15625

UTCS CS352 Lecture 3 27


How Does Amdahls Law Apply to Multicore?

UTCS CS352 Lecture 3 28

14
Performance Summary so Far

Amdahls law: Pay attention to what are you speeding up.

Next Time
More on Performance
Cycles per Instruction
Means
Start: Instruction Set Architectures (ISA)
Read: P&H 2.1 2.5
Turn in your homework at the beginning of class

UTCS CS352 Lecture 3 29

15

You might also like