0% found this document useful (0 votes)

52 views11 pages

L7 Performance

Uploaded by

Hitin Chandra Reddy P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views11 pages

L7 Performance

Uploaded by

Hitin Chandra Reddy P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

EC340 COA

Performance
• Speed up of execution
– Response time
• How long it takes to do a task
• Throughput
– Total work done per unit time
• Instruction set, Hardware
• Software – OS, Compilers

• CPU time – user CPU time + system CPU time

Page 22 COA August 2024

Understanding performance
• Algorithm
– Determines number of operations executed
• Programming language, compiler,
architecture
– Determine number of machine instructions
executed per operation
• Processor and memory system
– Determine how fast instructions are executed
• I/O system (including OS)
– Determines how fast I/O operations are executed
Page 23 COA August 2024

Dept of E&C, NITK Surathkal 1

EC340 COA

Understanding performance
• How programs are translated into the machine
language
– And how the hardware executes them
• The hardware/software interface
• What determines performance of a program
• How hardware designers improve performance
– parallel processing
• How to improve energy efficiency

Page 24 COA August 2024

Seven great ideas

• Use abstraction to simplify design
• Make the common case fast
• Performance via parallelism
• Performance via pipelining
• Performance via prediction
• Hierarchy of memories
• Dependability via redundancy

Page 25 COA August 2024

Dept of E&C, NITK Surathkal 2

EC340 COA

CPU performance
• Performance = 1/Execution Time
• CPU time = CPU clocks x clock cycle time (tc)
• CPU clocks = Instruction count x clocks/instruction
• CPU time = IC x CPI/clock frequency (fc)
– CPI – average clocks per instruction
• Determined by CPU hardware
• If different instructions have different CPI
– Average CPI affected by instruction mix
• compare two different implementations of the same ISA
– IC – Instruction count
• Determined by program, ISA and compiler
– ISA – Instruction set architecture
Page 26 COA August 2024

Example
Computer tc CPI CPU time Rel Perf
A 250ps 2 ICx2x250ps 𝑃𝑒𝑟𝑓 𝐴 𝐶𝑃𝑈𝑡𝑖𝑚𝑒𝐵
= = 1.2
B 500ps 1.2 ICx1.2x500ps 𝑃𝑒𝑟𝑓 𝐵 𝐶𝑃𝑈𝑡𝑖𝑚𝑒𝐴

If different instruction classes take different numbers of cycles

n
Clock Cycles =  (CPIi  Instructio n Count i )
i=1

Weighted average CPI

Clock Cycles n
 Instructio n Count i 
CPI = =   CPIi  
Instructio n Count i=1  Instructio n Count 

Page 27 COA August 2024

Dept of E&C, NITK Surathkal 3

EC340 COA

Power

5-<1V
Power  C V2 fc • Dynamic Power
• Leakage
×30 ×1000 Courtesy- H&P, Computer Organisation, 6e

Page 28 COA August 2024

Reducing power
• Suppose a new CPU has
– 85% of capacitive load of old CPU
– 15% voltage and 15% frequency reduction

Pnew Cold  0.85  (Vold  0.85)2  Fold  0.85

= = 0.854 = 0.52
Cold  Vold  Fold
2
Pold

◼ The power wall

◼ We can’t reduce voltage further

◼ We can’t remove more heat

◼ How else can we improve performance?

Page 29 COA August 2024

Dept of E&C, NITK Surathkal 4

EC340 COA

Processor performance

Constrained by power, instruction-level parallelism,

Courtesy- H&P, Computer Organisation, 6e
memory latency
Page 30 COA August 2024

Multicore processors
• Requires explicitly parallel programming
• Hardware executes multiple instructions at once
• Hidden from the programmer
• Programming for performance
• Scheduling
• Load balancing
• Optimizing communication and synchronization

Page 31 COA August 2024

Dept of E&C, NITK Surathkal 5

EC340 COA

SPEC CPU Benchmark

• Programs used to measure performance
– Supposedly typical of actual workload
• Standard Performance Evaluation Coop (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
• SPEC CPU2017
– Elapsed time to execute a selection of programs
– Negligible I/O, so focuses on CPU performance
– Normalize relative to reference machine
– Integer (10) and floating-point (13)
– Summarize as geometric mean of performance ratios

n
n
Execution time ratio
i=1
i

Page 32 COA August 2024

SPECspeed 2017 Integer benchmarks on a

1.8 GHz Intel Xeon E5-2650L

Courtesy- H&P, Computer Organisation, 6e

Page 33 COA August 2024

Dept of E&C, NITK Surathkal 6

EC340 COA

SPEC power benchmark

• Power consumption of server at different
workload levels
– Performance: ssj_ops/sec
– Power: Watts (Joules/sec)

 10   10 
Overall ssj_ops per Watt =   ssj_ops i    power i 
 i=0   i=0 

Page 34 COA August 2024

SPECpower_ssj2008 for Xeon E5-2650L

 10   10 
Overall ssj_ops per Watt =   ssj_ops i    power i  Courtesy- H&P, Computer Organisation, 6e
 i=0   i=0 
Page 35 COA August 2024

Dept of E&C, NITK Surathkal 7

EC340 COA

Amdahl’s Law
• Improving an aspect of a computer and expecting a
proportional improvement in overall performance
• Make the common case fastest

Taf f ected
Timprov ed = + Tunaf f ected
improvemen t factor

◼ Example: multiply accounts for 80s/100s

◼ How much improvement in multiply performance to
get 5× overall?
80 ◼ Can’t be done!
20 = + 20
n
Page 36 COA August 2024

Example
• Consider three different processors P1, P2, and P3 executing the
same instruction set. P1 has a 3 GHz clock rate and a CPI of
1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0
GHz clock rate and has a CPI of 2.2.
– Which processor has the highest performance expressed in instructions per
second?

Processor Instns/sec
P1 3x109/ 1.5 =2x109
P2 2.5x109/ 1 = 2.5x109
P3 4x109 / 2.2 = 1.8x109

Page 37 COA August 2024

Dept of E&C, NITK Surathkal 8

EC340 COA

Example
• Consider two different implementations of the same ISA. The
instructions can be divided into four classes according to their CPI
(class A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2,
3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2.
Given a program with a dynamic instruction count of 1.0E6 instructions
divided into classes as follows: 10% class A, 20% class B, 50% class C,
and 20% class D, which implementation is faster? What is the global
CPI for each implementation?

– Time = No. instr. x CPI/clock rate

Processor Total Time CPI

P1 10.4x10-4 s 2.6
P2 6.66 x10-4 s 2

Page 38 COA August 2024

Exercise
• A processor has CPIs of 1, 12, and 5, respectively for arithmetic,
load/store, and branch instructions, Assume that
– On a single processor a program requires the execution of 2.56E9
arithmetic instructions, 1.28E9 load/store instructions, and 256
million branch instructions.
– Each processor has a 2 GHz clock frequency.
– As the program is parallelized to run over multiple cores, the
number of arithmetic and load/store instructions per processor is
divided by 0.7 x p (where p is the number of processors) but the
number of branch instructions per processor remains the same.
• Find the total execution time for this program on 1, 2, 4, and 8
processors, and show the relative speedup of the 2, 4, and 8
processor result relative to the single processor result.

Page 39 COA August 2024

Dept of E&C, NITK Surathkal 9

EC340 COA

Exercise
• A computer spends 30 percent of its time accessing memory, 20
percent performing multiplications, and 50 percent executing
other instructions. As a computer architect, you have to choose
between improving either the memory, multiplication hardware,
or execution of non multiplication instructions. There is only
space on the chip for one improvement, and each of the
improvements will improve its associated part of the
computation by a factor of 2.
– Without performing any calculations, which improvement would
you expect to give the largest performance increase, and why?
– What speedup would making each of the three changes give?

Page 40 COA August 2024

MIPS as performance benchmark

• MIPS: Millions of Instructions Per Second
– Doesn’t account for
• Differences in ISAs between computers
• Differences in complexity between instructions

Instructio n count
MIPS =
Execution time  106
Instructio n count Clock rate
= =
Instructio n count  CPI CPI 106
 106
Clock rate

◼ CPI varies between programs on a given CPU

Page 41 COA August 2024

Dept of E&C, NITK Surathkal 10

EC340 COA

Summary
• Cost/performance is improving
– Due to underlying technology development
• Hierarchical layers of abstraction
– In both hardware and software
• Instruction set architecture
– The hardware/software interface
• Execution time: the best performance measure
• Power is a limiting factor
– Use parallelism to improve performance

Page 42 COA August 2024

Dept of E&C, NITK Surathkal 11

Lecture # 2
No ratings yet
Lecture # 2
33 pages
Bản Sao Của Lecture 2 - Performance Measurement
No ratings yet
Bản Sao Của Lecture 2 - Performance Measurement
9 pages
Discussion Session 4-11
No ratings yet
Discussion Session 4-11
12 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
Intro
No ratings yet
Intro
14 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
09 Perf
No ratings yet
09 Perf
22 pages
Computer Performance Insights
No ratings yet
Computer Performance Insights
22 pages
Cs23402 - Computer Architecture - Unit - 1
No ratings yet
Cs23402 - Computer Architecture - Unit - 1
161 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
SEN307 Lecture 5
No ratings yet
SEN307 Lecture 5
34 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
Chapter 1 Notes
No ratings yet
Chapter 1 Notes
28 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
2 CPU Performance
No ratings yet
2 CPU Performance
35 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Advanced Computer Architecture Course Overview
No ratings yet
Advanced Computer Architecture Course Overview
56 pages
CMPS343Chapter1 Part B
No ratings yet
CMPS343Chapter1 Part B
22 pages
CPU Performance Metrics Guide
No ratings yet
CPU Performance Metrics Guide
31 pages
Inroduction and Performance Analysis
No ratings yet
Inroduction and Performance Analysis
29 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
Computer Performance
No ratings yet
Computer Performance
27 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
2 - Computer Organization and Architecture
No ratings yet
2 - Computer Organization and Architecture
21 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
Lecture4 Performance Evaluation 2011
No ratings yet
Lecture4 Performance Evaluation 2011
34 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
Lec 2
No ratings yet
Lec 2
31 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
550 12 6 2011 PDF
No ratings yet
550 12 6 2011 PDF
45 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
Performance
No ratings yet
Performance
51 pages
Lec 2
No ratings yet
Lec 2
31 pages
DHXD - Chuong 8. Performance
No ratings yet
DHXD - Chuong 8. Performance
27 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
52 pages
Performance Numericals
No ratings yet
Performance Numericals
24 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
CA Performance
No ratings yet
CA Performance
26 pages
2024 Lecture3 Come321
No ratings yet
2024 Lecture3 Come321
23 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
Comp Org Notes On Measuring Cpu Performance
No ratings yet
Comp Org Notes On Measuring Cpu Performance
4 pages
COAL - Week 5 - Chap 2 (William Stallings)
No ratings yet
COAL - Week 5 - Chap 2 (William Stallings)
52 pages
Cs2100 14 Understanding Performance
No ratings yet
Cs2100 14 Understanding Performance
46 pages
It3030e CA Chap1 Introduction 2.0m
No ratings yet
It3030e CA Chap1 Introduction 2.0m
25 pages
Lec 2 Performance
No ratings yet
Lec 2 Performance
28 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
Cse - 321 - 2
No ratings yet
Cse - 321 - 2
37 pages
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
28 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Performance of Processor1
No ratings yet
Performance of Processor1
9 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
CPU Performance & Power Evaluation
No ratings yet
CPU Performance & Power Evaluation
76 pages
Lecture 6 Unit 1 Introduction To Microcomputer Systems
No ratings yet
Lecture 6 Unit 1 Introduction To Microcomputer Systems
20 pages
RV Tools
No ratings yet
RV Tools
80 pages
Radiator Fan
No ratings yet
Radiator Fan
6 pages
8086 Microprocessor Addressing Modes
No ratings yet
8086 Microprocessor Addressing Modes
10 pages
1 - Introduction To Computers and Computer Organization
No ratings yet
1 - Introduction To Computers and Computer Organization
35 pages
Cross Reference
No ratings yet
Cross Reference
4 pages
Cloud Computing Answers
No ratings yet
Cloud Computing Answers
87 pages
Mini Project Report
No ratings yet
Mini Project Report
19 pages
Cse - Esiot Study Material (All Units)
No ratings yet
Cse - Esiot Study Material (All Units)
226 pages
8051 Instruction Set Overview
No ratings yet
8051 Instruction Set Overview
36 pages
Seminar Report On Cluster Computing
No ratings yet
Seminar Report On Cluster Computing
21 pages
Computer Architecture
No ratings yet
Computer Architecture
7 pages
Embedded System Design BSC 01
No ratings yet
Embedded System Design BSC 01
80 pages
Syllabus SVIIT CSE B.Tech (BDA-CMC-AI-DS-FSDB-IBM) WoS II Sem 20-21 05.07.2021
No ratings yet
Syllabus SVIIT CSE B.Tech (BDA-CMC-AI-DS-FSDB-IBM) WoS II Sem 20-21 05.07.2021
19 pages
Operating System - LESSON-1
No ratings yet
Operating System - LESSON-1
22 pages
Tiled Chip Multicore Processors
No ratings yet
Tiled Chip Multicore Processors
3 pages
Caal Practical File
No ratings yet
Caal Practical File
26 pages
Q and L Programming Manual
No ratings yet
Q and L Programming Manual
1,320 pages
Chapter 4 Practice
No ratings yet
Chapter 4 Practice
10 pages
Vocational Training Authority of Sri Lanka
67% (3)
Vocational Training Authority of Sri Lanka
5 pages
CU ALU Registers PDF
No ratings yet
CU ALU Registers PDF
7 pages
Interrupts
No ratings yet
Interrupts
80 pages
Lec 2
No ratings yet
Lec 2
17 pages
2620 Final PDF
No ratings yet
2620 Final PDF
45 pages
1.1 Differences From ARM7 Cores
No ratings yet
1.1 Differences From ARM7 Cores
4 pages
CS621 Week 6
No ratings yet
CS621 Week 6
44 pages
Comparativa Procesadores
No ratings yet
Comparativa Procesadores
6 pages
Microprocessor Course Overview
No ratings yet
Microprocessor Course Overview
2 pages
Micro Assembler Case Study
50% (2)
Micro Assembler Case Study
3 pages
Sony Kdl-32ex657 40ex657 46ex657 Chassis Az3f Ver.1.0 Segm.p-2a STM
No ratings yet
Sony Kdl-32ex657 40ex657 46ex657 Chassis Az3f Ver.1.0 Segm.p-2a STM
51 pages

L7 Performance

Uploaded by

L7 Performance

Uploaded by

EC340 COA

• CPU time – user CPU time + system CPU time

Page 22 COA August 2024

Dept of E&C, NITK Surathkal 1

Page 24 COA August 2024

Seven great ideas

Page 25 COA August 2024

Dept of E&C, NITK Surathkal 2

If different instruction classes take different numbers of cycles

Weighted average CPI

Page 27 COA August 2024

Dept of E&C, NITK Surathkal 3

Page 28 COA August 2024

Pnew Cold  0.85  (Vold  0.85)2  Fold  0.85

◼ The power wall

◼ We can’t remove more heat

◼ How else can we improve performance?

Page 29 COA August 2024

Dept of E&C, NITK Surathkal 4

Constrained by power, instruction-level parallelism,

Page 31 COA August 2024

Dept of E&C, NITK Surathkal 5

SPEC CPU Benchmark

Page 32 COA August 2024

SPECspeed 2017 Integer benchmarks on a

Courtesy- H&P, Computer Organisation, 6e

Page 33 COA August 2024

Dept of E&C, NITK Surathkal 6

SPEC power benchmark

Page 34 COA August 2024

SPECpower_ssj2008 for Xeon E5-2650L

Dept of E&C, NITK Surathkal 7

◼ Example: multiply accounts for 80s/100s

Page 37 COA August 2024

Dept of E&C, NITK Surathkal 8

– Time = No. instr. x CPI/clock rate

Processor Total Time CPI

Page 38 COA August 2024

Page 39 COA August 2024

Dept of E&C, NITK Surathkal 9

Page 40 COA August 2024

MIPS as performance benchmark

◼ CPI varies between programs on a given CPU

Page 41 COA August 2024

Dept of E&C, NITK Surathkal 10

Page 42 COA August 2024

Dept of E&C, NITK Surathkal 11

You might also like