0% found this document useful (0 votes)

51 views15 pages

Lecture 3: Evaluating Computer Architectures: - Announcements - Last Time - Technology Back Ground

This lecture discusses evaluating computer performance. It introduces different metrics used to measure performance, including execution time, throughput, and cycles per instruction. When comparing two systems, it is important to consider whether latency or throughput is more important. The lecture also discusses what types of programs are best for performance evaluation, including actual workloads, full applications, kernel benchmarks, and microbenchmarks, each with their own pros and cons. A brief history of benchmarking is also provided. The lecture warns that direct comparisons between systems can be compromised if the benchmark code is not carefully designed to measure the intended aspects of performance.

Uploaded by

Fazal Jadoon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views15 pages

Lecture 3: Evaluating Computer Architectures: - Announcements - Last Time - Technology Back Ground

Uploaded by

Fazal Jadoon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Lecture 3: Evaluating Computer Architectures

Announcements
- Reminder: Homework 1 due Thursday 2/2

Last Time technology back ground

Computer elements
Circuits and timing
Virtuous cycle of the past and future?

Today
What is computer performance?
What programs do I care about?
Performance equations
Amdahls Law

UTCS CS352 Lecture 3 1

Software & Hardware: The Virtuous Cycle?

Faster Single
Processor
Frequency Scaling Larger, More
Capable Software
Managed Languages

?
More Cores Scalable Software
Scalable Apps +
Multi/Many Core
Scalable Runtime

UTCS CS352 Lecture 3 2

1
Performance Hype

AMD Performance Preview: Taking Phenom II to 4.2 GHz

Intel Core i78 processing threads They are the best
desktop processor family on the planet.
With 8 cores, each supporting 4 threads, the UltraSPARC T1 processor
executes 32 simultaneous threads within a design consuming only 72 watts of power.
improves throughput by up to 41x
speed up by 10-25% in many cases
about 2x in two cases
more than 10x in two small benchmarks
speedups of 1.2x to 6.4x on a variety of benchmarks
can reduce garbage collection time by 50% to 75%
demonstrating high efficiency and scalability
our prototype has usable performance
speedups. are very significant (up to 54-fold)
sometimes more than twice as fast
our . is better or almost as good as . across the board
UTCS CS352 Lecture 3 3

What Does this Graph Mean?

Performance Trends on SPEC Int 2000

UTCS CS352 Lecture 3 4

2
Computer Performance Evaluation

Metric = something we measure

Goal: Evaluate how good/bad a design is
Examples
Clock rate of computer
Power consumed by a program
Execution time for a program
Number of programs executed per second
Cycles per program instruction
How should we compare two computer systems?

UTCS CS352 Lecture 3 5

Tradeoff: latency vs. throughput

Pizza delivery
Do you want your pizza hot?

Or do you want your pizza to be inexpensive?

Two different delivery strategies for pizza company!

This course focuses primarily on latency (hot pizza)

Latency = execution time for a single task

Throughput = number of tasks per unit time

UTCS CS352 Lecture 3 6

3
Two notions of performance
Throughput
Plane DC to Paris Speed Passengers
(pmph)

Boeing 747 6.5 hours 610 mph 470 286,700

Concorde 3 hours 1350 mph 132 178,200

Which has plane higher performance?

Time to do the task (Execution Time)
execution time, response time, latency
Tasks per day, hour, week, sec, ns. .. (Performance)
throughput, bandwidth

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 7

Definitions

Performance is in units of things-per-second

bigger is better
Response time of a system Y running program Z
performance (Y) = 1
execution time (Z on Y)
Throughput of system Y running many programs
performance (Y) = number of programs
unit time

" System X is n times faster than Y" means

n = performance(X)
performance(Y)

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 8

4
Definitions

Performance is in units of things-per-second

bigger is better
Response time of a system Y running program Z
performance (Y) = 1
execution time (Z on Y)
Throughput of system Y running many programs
performance (Y) = number of programs
unit time

" System X is n times faster than Y" means

n = performance(X)
performance(Y)

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 9

Which Programs Should I Measure?

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 10

5
Which Programs Should I Measure?
Pros Cons

Actual Target Workload

Full Application Benchmarks

Small Kernel
Benchmarks

Microbenchmarks

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 11

Which Programs Should I Measure?

Pros Cons
very specific
representative Actual Target Workload non-portable
difficult to run, or
measure
hard to identify cause
portable
widely used Full Application Benchmarks less representative
improvements
useful in reality

Small Kernel
easy to run, early in easy to fool
Benchmarks
design cycle

identify peak peak may be a long

capability and Microbenchmarks way from application
potential bottlenecks performance
UTCS CS352 Slide courtesy of D. Patterson Lecture 3 12

6
Brief History of Benchmarking

Early days (1960s) Real Applications (1989-now)

Single instruction execution SPEC CPU C/Fortran
time Scientific, Irregular
Average instruction time 89, 92, 95, 00, 07, ??
[Gibson 1970]
TPC C: Transaction Processing
Pure MIPS (1/AIT)
SPECWeb
WinBench: Desktop
Simple programs(early 70s)
Graphics C/C++
Synthetic benchmarks
Quake III, Doom 3
(Whetstone, etc.)
MediaBench
Kernels (Livermore Loops)
Java: SPECJVM98
Relative Performance (late 70s) Problem: Programming Language
Parallel?, Java, C#, JavaScript??
VAX 11/780 1-MIPS
DaCapo Java Benchmarks 06, 09
but was it?
Parsec: Parallel C/C++, 2008
MFLOPs

UTCS CS352 Lecture 3 13

How to Compromise a Comparison:

C programs running on two architectures

UTCS CS352 Lecture 3 14

7
The compiler reorganized the code!

Change the memory system performance

Matrix multiply cache blocking

After

Before

UTCS CS352 Lecture 3 15

There are lies, damn lies, and statistics

Desraeli

UTCS CS352 Lecture 3 16

8
benchmarks

There are lies, damn lies, and statistics
Desraeli

UTCS CS352 Lecture 3 17

Benchmarking Java Programs

Lets consider the performance of the DaCapo

Java Benchmarks
What do we need to think about when comparing
two computers running Java programs?

http://dacapo.anu.edu.au/regression/perf
/2006-10-MR2.html

UTCS CS352 Lecture 3 18

9
Pay Attention to Benchmarks & System

Benchmarks measure the Benchmark timings are

whole system sensitive
application alignment in cache
compiler, VM, memory location of data on disk
management values of data
operating system
architecture Danger of inbreeding or
implementation positive feedback
Popular benchmarks often if you make an operation
reflect yesterdays fast (slow) it will be used
programs more (less) often
what about the programs therefore you make it
people are running today? faster (slower)
need to design for and so on, and so on
tomorrows problems the optimized NOP
UTCS CS352 Lecture 3 19

Performance Summary so Far

Key concepts
Throughput and Latency

Best benchmarks are real programs

DaCapo, Spec, TPC, Doom3

Pitfalls
Whole system measurement
Workload may not match users
Compiler, VM, memory management

Next
Amdahls Law

UTCS CS352 Lecture 3 20

10
Improving Performance: Fundamentals

Suppose we have a machine with two instructions

Instruction A executes in 100 cycles
Instruction B executes in 2 cycles

We want better performance.

Which instruction do we improve?

UTCS CS352 Lecture 3 21

Speedup

Make a change to an architecture

Measure how much faster/slower it is

UTCS CS352 Lecture 3 22

11
Speedup when we know details about the change

Performance improvements depend on:

how good is enhancement (factor S)
how often is it used (fraction p)
Speedup due to enhancement E:
ExTime w/out E Perf w/ E
Speedup(E) = =
ExTime w/ E Perf w/out E

UTCS CS352 Lecture 3 23

Amdahls Law: Example

FP instructions improved by 2x
But.only 10% of instructions are FP

0.1
ExTimenew = ExTimeold 0.9 + = 0.95 ExTimeold
2

Amdahls Law: Speedup bounded by

UTCS CS352 Lecture 3 24

12
How Does Amdahls Law Apply to Multicore?

Given N cores what is our ideal speedup?

UTCS CS352 Lecture 3 25

How Does Amdahls Law Apply to Multicore?

Given N cores what is our ideal speedup?

ExTimenew = ExTimeold /N

Say 90% of the code is parallel and N = 16?

UTCS CS352 Lecture 3 26

13
How Does Amdahls Law Apply to Multicore?

Given N cores what is our ideal speedup?

ExTimenew = ExTimeold /N

Say 90% of the code is parallel and N = 16?

p
ExTimenew = ExTimeold (1 p) +
N
0.9
ExTimenew = ExTimeold 0.1+ = 0.15625 ExTimeold
16

1
Speeduptotal = = 6.2
0.15625

UTCS CS352 Lecture 3 27

How Does Amdahls Law Apply to Multicore?

UTCS CS352 Lecture 3 28

14
Performance Summary so Far

Amdahls law: Pay attention to what are you speeding up.

Next Time
More on Performance
Cycles per Instruction
Means
Start: Instruction Set Architectures (ISA)
Read: P&H 2.1 2.5
Turn in your homework at the beginning of class

UTCS CS352 Lecture 3 29

Lecture: Performance Measurement and Instruction Set Architectures - Last Time
No ratings yet
Lecture: Performance Measurement and Instruction Set Architectures - Last Time
14 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
Advanced Computer Architecture Course Overview
No ratings yet
Advanced Computer Architecture Course Overview
56 pages
Processor Design: The Role of Performance
No ratings yet
Processor Design: The Role of Performance
27 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
No ratings yet
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
52 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
Computer Architecture and Performance
No ratings yet
Computer Architecture and Performance
33 pages
Performance
No ratings yet
Performance
4 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
Computer Performance Insights
No ratings yet
Computer Performance Insights
22 pages
Advanced Computer Architecture: 563 L02.1 Fall 2011
No ratings yet
Advanced Computer Architecture: 563 L02.1 Fall 2011
57 pages
Lec 3
No ratings yet
Lec 3
20 pages
CPU Performance & Power Evaluation
No ratings yet
CPU Performance & Power Evaluation
76 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
513 Lec 02 Quantifying Performance
No ratings yet
513 Lec 02 Quantifying Performance
50 pages
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
CS-3006 4 PerformanceAnalysis
No ratings yet
CS-3006 4 PerformanceAnalysis
62 pages
Lecture 2: Performance/Power, MIPS Instructions
No ratings yet
Lecture 2: Performance/Power, MIPS Instructions
28 pages
Lec 01
No ratings yet
Lec 01
10 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
1aca L1
No ratings yet
1aca L1
35 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
Lecture 2: Metrics To Evaluate Systems
No ratings yet
Lecture 2: Metrics To Evaluate Systems
33 pages
Quatitative Principle
No ratings yet
Quatitative Principle
56 pages
CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation
No ratings yet
CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation
41 pages
Computer Architecture: Fundamentals
No ratings yet
Computer Architecture: Fundamentals
36 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
Computer Performance Evaluation Guide
No ratings yet
Computer Performance Evaluation Guide
17 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Lecture 3: Performance/Power, MIPS Instructions
No ratings yet
Lecture 3: Performance/Power, MIPS Instructions
18 pages
Lec 3
No ratings yet
Lec 3
21 pages
Fundamentals of Computer Design - 1
No ratings yet
Fundamentals of Computer Design - 1
32 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
23 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
Lecture1 2
No ratings yet
Lecture1 2
30 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Lec02 1 Measuring Profiling
No ratings yet
Lec02 1 Measuring Profiling
25 pages
Computer Architecture Introduction
No ratings yet
Computer Architecture Introduction
61 pages
L14 Introduction To Performance Evaluation
No ratings yet
L14 Introduction To Performance Evaluation
48 pages
Performance
No ratings yet
Performance
51 pages
Computer Performance Analysis
No ratings yet
Computer Performance Analysis
23 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
CCE 131 Lecture1
No ratings yet
CCE 131 Lecture1
26 pages
Measuring and Reasoning About Performance: Readings: 1.4-1.5
No ratings yet
Measuring and Reasoning About Performance: Readings: 1.4-1.5
26 pages
CPU Performance Evaluation Guide
No ratings yet
CPU Performance Evaluation Guide
36 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
68 pages
ACSA1 Introduction
No ratings yet
ACSA1 Introduction
33 pages
Co
No ratings yet
Co
154 pages
Lecture 9: Exam Review - Last Time: Format
No ratings yet
Lecture 9: Exam Review - Last Time: Format
3 pages
Role of The Compiler - Take QUIZ 5 Before 11:59pm Today Over Chapter 3 Readings - Topics - Number Representations - Computer Arithmetic
No ratings yet
Role of The Compiler - Take QUIZ 5 Before 11:59pm Today Over Chapter 3 Readings - Topics - Number Representations - Computer Arithmetic
12 pages
Lecture 7: Instruction Set Architectures IV - Previously - Today
No ratings yet
Lecture 7: Instruction Set Architectures IV - Previously - Today
12 pages
Lecture 6: Instruction Set Architectures III - Last Time: ISA Design Principles
No ratings yet
Lecture 6: Instruction Set Architectures III - Last Time: ISA Design Principles
10 pages
Lecture 2: Computer Technology & Abstractions - Last Time: Review: Don't Forget The Simple View
No ratings yet
Lecture 2: Computer Technology & Abstractions - Last Time: Review: Don't Forget The Simple View
16 pages
PIS S-2210 Internal Treatment PDF
No ratings yet
PIS S-2210 Internal Treatment PDF
1 page
Philosophy of Science Tutorial
No ratings yet
Philosophy of Science Tutorial
2 pages
Matarbari Deep Sea Port
No ratings yet
Matarbari Deep Sea Port
9 pages
4 Best Extension Activities
No ratings yet
4 Best Extension Activities
6 pages
Earths Interior Quiz
No ratings yet
Earths Interior Quiz
2 pages
1950 - Edsall Et Al. - Light Scattering in Solutions of Serum Albumin Effects of Charge and Ionic
No ratings yet
1950 - Edsall Et Al. - Light Scattering in Solutions of Serum Albumin Effects of Charge and Ionic
16 pages
Scale Andrew Herod 2024 Scribd Download
100% (28)
Scale Andrew Herod 2024 Scribd Download
84 pages
A496
No ratings yet
A496
2 pages
Grade 7 Quadratic Sequences Guide
No ratings yet
Grade 7 Quadratic Sequences Guide
13 pages
(Bolts) Smf92012a2
No ratings yet
(Bolts) Smf92012a2
41 pages
SAF-12 Electrical Safety
100% (1)
SAF-12 Electrical Safety
8 pages
Marketing Strategies of Maruti Suzuki Research Report 2017
No ratings yet
Marketing Strategies of Maruti Suzuki Research Report 2017
84 pages
Study On Calculation Model of The Lightning Protection Performance of Shielding Failure For 500kV Double Circuit Transmission Line
No ratings yet
Study On Calculation Model of The Lightning Protection Performance of Shielding Failure For 500kV Double Circuit Transmission Line
3 pages
The Manual of Allergy and Clinical Immunology, 1st Edition All Chapter
94% (18)
The Manual of Allergy and Clinical Immunology, 1st Edition All Chapter
14 pages
Concept Map - Chemistry - 2018 - June
100% (2)
Concept Map - Chemistry - 2018 - June
1 page
Unit Conversion Tables
No ratings yet
Unit Conversion Tables
1 page
Dale Power Solutions LTD: DC Thyristor Systems Datasheet Erskine 3 Phase Input-6Cv Range
No ratings yet
Dale Power Solutions LTD: DC Thyristor Systems Datasheet Erskine 3 Phase Input-6Cv Range
4 pages
Animal Hospital Management System Thesis
100% (3)
Animal Hospital Management System Thesis
7 pages
Beyond SVGF
No ratings yet
Beyond SVGF
66 pages
Glass Fiber NR Composite
No ratings yet
Glass Fiber NR Composite
46 pages
Rock Mechanics and Mining Engineering
100% (1)
Rock Mechanics and Mining Engineering
29 pages
0004B RST RSTAR Affinity Gateway Installation and Commissioning - 2
No ratings yet
0004B RST RSTAR Affinity Gateway Installation and Commissioning - 2
57 pages
JH Checksheet Awareness
No ratings yet
JH Checksheet Awareness
13 pages
Drive 2-4 iP5A
No ratings yet
Drive 2-4 iP5A
10 pages
AWQAF Regulation - English
No ratings yet
AWQAF Regulation - English
20 pages
Airbus A350-900 Japan Airlines Red-1
No ratings yet
Airbus A350-900 Japan Airlines Red-1
1 page
Air Gun Spring Hammer Set
No ratings yet
Air Gun Spring Hammer Set
2 pages
124.80 KWp Solar PV Layout Plan
No ratings yet
124.80 KWp Solar PV Layout Plan
1 page
Dynamic 4, 8 & 16 CH DVR User Manual
No ratings yet
Dynamic 4, 8 & 16 CH DVR User Manual
53 pages
List of 2000 Fully Remote Companies in USA
No ratings yet
List of 2000 Fully Remote Companies in USA
47 pages

Lecture 3: Evaluating Computer Architectures: - Announcements - Last Time - Technology Back Ground

Uploaded by

Lecture 3: Evaluating Computer Architectures: - Announcements - Last Time - Technology Back Ground

Uploaded by

Lecture 3: Evaluating Computer Architectures

Last Time technology back ground

UTCS CS352 Lecture 3 1

Software & Hardware: The Virtuous Cycle?

UTCS CS352 Lecture 3 2

AMD Performance Preview: Taking Phenom II to 4.2 GHz

What Does this Graph Mean?

UTCS CS352 Lecture 3 4

Metric = something we measure

UTCS CS352 Lecture 3 5

Tradeoff: latency vs. throughput

Or do you want your pizza to be inexpensive?

Two different delivery strategies for pizza company!

This course focuses primarily on latency (hot pizza)

Latency = execution time for a single task

UTCS CS352 Lecture 3 6

Boeing 747 6.5 hours 610 mph 470 286,700

Concorde 3 hours 1350 mph 132 178,200

Which has plane higher performance?

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 7

Performance is in units of things-per-second

" System X is n times faster than Y" means

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 8

Performance is in units of things-per-second

" System X is n times faster than Y" means

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 9

Which Programs Should I Measure?

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 10

Actual Target Workload

Full Application Benchmarks

UTCS CS352 Slide courtesy of D. Patterson Lecture 3 11

Which Programs Should I Measure?

identify peak peak may be a long

Early days (1960s) Real Applications (1989-now)

UTCS CS352 Lecture 3 13

How to Compromise a Comparison:

UTCS CS352 Lecture 3 14

Change the memory system performance

UTCS CS352 Lecture 3 15

There are lies, damn lies, and statistics

UTCS CS352 Lecture 3 16

UTCS CS352 Lecture 3 17

Benchmarking Java Programs

Lets consider the performance of the DaCapo

UTCS CS352 Lecture 3 18

Benchmarks measure the Benchmark timings are

Performance Summary so Far

Best benchmarks are real programs

UTCS CS352 Lecture 3 20

Suppose we have a machine with two instructions

We want better performance.

UTCS CS352 Lecture 3 21

Make a change to an architecture

UTCS CS352 Lecture 3 22

Performance improvements depend on:

UTCS CS352 Lecture 3 23

Amdahls Law: Example

Amdahls Law: Speedup bounded by

UTCS CS352 Lecture 3 24

Given N cores what is our ideal speedup?

UTCS CS352 Lecture 3 25

How Does Amdahls Law Apply to Multicore?

Given N cores what is our ideal speedup?

Say 90% of the code is parallel and N = 16?

UTCS CS352 Lecture 3 26

Given N cores what is our ideal speedup?

Say 90% of the code is parallel and N = 16?

UTCS CS352 Lecture 3 27

How Does Amdahls Law Apply to Multicore?

UTCS CS352 Lecture 3 28

Amdahls law: Pay attention to what are you speeding up.

UTCS CS352 Lecture 3 29

You might also like