0% found this document useful (0 votes)

12 views36 pages

Lec 13

Basic lecture that you can help for reviewing and other outputs as student

Uploaded by

caballerojrsamuel759

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views36 pages

Lec 13

Basic lecture that you can help for reviewing and other outputs as student

Uploaded by

caballerojrsamuel759

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Lecture 13

 Today’s lecture:
— What about branches?
— Crystal ball
— Look at performance again

1
Adding hazard detection to the CPU
ID/EX.MemRead
Hazard
Unit ID/EX.RegisterRt
IF/ID Write

ID/EX
PC Write

Rs Rt 0 0 WB EX/MEM
M WB MEM/WB
Control 1
EX M WB
PC
IF/ID
Read Read 0
register 1 data 1 1
Addr Instr 2
Read ALU
register 2 Zero
ALUSrc
Write Read Result Address
0
Instruction register data 2
1 0 Data
memory
Write Registers 2 memory
data 1
Write Read
Instr [15 - 0] 1
RegDst data data
Extend
Rt 0
0
Rd
1 EX/MEM.RegisterRd
Rs

Forwarding
Unit

MEM/WB.RegisterRd

2
The hazard detection unit
 The hazard detection unit’s inputs are as follows.
— IF/ID.RegisterRs and IF/ID.RegisterRt, the source registers for the
current instruction.
— ID/EX.MemRead and ID/EX.RegisterRt, to determine if the previous
instruction is LW and, if so, which register it will write to.
 By inspecting these values, the detection unit generates three outputs.
— Two new control signals PCWrite and IF/ID Write, which determine
whether the pipeline stalls or continues.
— A mux select for a new multiplexer, which forces control signals for
the current EX and future MEM/WB stages to 0 in case of a stall.

3
Generalizing Forwarding/Stalling
 What if data memory access was so slow, we wanted to pipeline it over 2
cycles?
Clock cycle
1 2 3 4 5 6

IM Reg DM Reg

 How many bypass inputs would the muxes in EXE have?

 Which instructions in the following require stalling and/or bypassing?

lw r13, 0(r11)
add r7, r8, r9
add r15, r7, r13

4
Branches in the original pipelined datapath
1
When are they resolved?
0
here ID/EX
WB EX/MEM
PCSrc M WB
Control MEM/WB
IF/ID EX M WB
4
Add
P Add
C Shift
RegWrite left 2

Read Read
register 1 data 1 MemWrite
ALU
Read Instruction Zero
Read Read
address [31-0] 0
register 2 data 2 Result Address
Write
1 Data
Instruction register MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
1
data data
Instr [15 - 0] Sign
RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1

5
Branches
 Most of the work for a branch computation is done in the EX stage.
— The branch target address is computed.
— The source registers are compared by the ALU, and the Zero flag is set
or cleared accordingly.
 Thus, the branch decision cannot be made until the end of the EX stage.
— But we need to know which instruction to fetch next, in order to keep
the pipeline running!
— This leads to what’s called a control hazard.
Clock cycle
1 2 3 4 5 6 7 8

beq $2, $3, Label IM Reg DM Reg

??? IM

6
Stalling is one solution
 Again, stalling is always one possible solution.

Clock cycle
1 2 3 4 5 6 7 8

beq $2, $3, Label IM Reg DM Reg

??? IM IM Reg DM Reg

 Here we just stall until cycle 4, after we do make the branch decision.

7
Branch prediction
 Another approach is to guess whether or not the branch is taken.
— In terms of hardware, it’s easier to assume the branch is not taken.
— This way we just increment the PC and continue execution, as for
normal instructions.
 If we’re correct, then there is no problem and the pipeline keeps going at
full speed.
Clock cycle
How would you guess? 1 2 3 4 5 6 7

beq $2, $3, Label IM Reg DM Reg

next instruction 1 IM Reg DM Reg

next instruction 2 IM Reg DM Reg

8
Branch misprediction
 If our guess is wrong, then we would have already started executing two
instructions incorrectly. We’ll have to discard, or flush, those instructions
and begin executing the right ones from the branch target address, Label.

Clock cycle
1 2 3 4 5 6 7 8

beq $2, $3, Label IM Reg DM Reg

next instruction 1 IM Reg

flush

next instruction 2 IM
flush

Label: ... IM Reg DM Reg

9
Performance gains and losses
 Overall, branch prediction is worth it.
— Mispredicting a branch means that two clock cycles are wasted.
— But if our predictions are even just occasionally correct, then
this is preferable to stalling and wasting two cycles for every
branch.
 All modern CPUs use branch prediction.
— Accurate predictions are important for optimal performance.
— Most CPUs predict branches dynamically—statistics are kept at
run-time to determine the likelihood of a branch being taken.
 The pipeline structure also has a big impact on branch prediction.
— A longer pipeline may require more instructions to be flushed
for a misprediction, resulting in more wasted time and lower
performance.
— We must also be careful that instructions do not modify
registers or memory before they get flushed.

10
Implementing branches
 We can actually decide the branch a little earlier, in ID instead of EX.
— Our sample instruction set has only a BEQ.
— We can add a small comparison circuit to the ID stage, after the
source registers are read.
 Then we would only need to flush one instruction on a misprediction.
Clock cycle
1 2 3 4 5 6 7

beq $2, $3, Label IM Reg DM Reg

next instruction 1 IM
flush

Label: ... IM Reg DM Reg

11
Implementing flushes
 We must flush one instruction (in its IF stage) if the previous instruction is
BEQ and its two source registers are equal.
 We can flush an instruction from the IF stage by replacing it in the IF/ID
pipeline register with a harmless nop instruction.
— MIPS uses sll $0, $0, 0 as the nop instruction.
— This happens to have a binary encoding of all 0s: 0000 .... 0000.
 Flushing introduces a bubble into the pipeline, which represents the one-
cycle delay in taking the branch.
 The IF.Flush control signal shown on the next page implements this idea,
but no details are shown in the diagram.

12
Branching without forwarding and load stalls
1
ID/EX
0 WB EX/MEM
IF/ID Control M WB MEM/WB
PCSrc EX M WB
4

P Add
C Shift
left 2

Read Read
register 1 data 1
ALU
Addr Instr
Read Zero
register 2 = ALUSrc
Result
Write Read Address
0
Instruction register data 2
Data
memory
Write Registers 1 memory
data
Write Read
1
RegDst data data
IF.Flush Extend
Rt 0
0
Rd
1

13
Timing
 If no prediction:
Assuming extra comparison.
IF ID EX MEM WB
IF IF ID EX MEM WB --- lost 1 cycle

 If prediction:
— If Correct
IF ID EX MEM WB
IF ID EX MEM WB -- no cycle lost
— If Misprediction:
IF ID EX MEM WB
IF0 IF1 ID EX MEM WB --- 1 cycle lost

14
Summary of Pipeline Hazards
 Three kinds of hazards conspire to make pipelining difficult.
 Structural hazards result from not having enough hardware available to
execute multiple instructions simultaneously.
— These are avoided by adding more functional units (e.g., more adders
or memories) or by redesigning the pipeline stages.
 Data hazards can occur when instructions need to access registers that
haven’t been updated yet.
— Hazards from R-type instructions can be avoided with forwarding.
— Loads can result in a “true” hazard, which must stall the pipeline.
 Control hazards arise when the CPU cannot determine which instruction
to fetch next.
— We can minimize delays by doing branch tests earlier in the pipeline.
— We can also take a chance and predict the branch direction, to make
the most of a bad situation.

15
Performance

 Now we’ll discuss issues related to performance:

— Latency/Response Time/Execution Time vs. Throughput
— How do you make a reasonable performance comparison?
— The 3 components of CPU performance
— The 2 laws of performance

16
Why know about performance
 Purchasing Perspective:
— Given a collection of machines, which has the
• Best Performance?
• Lowest Price?
• Best Performance/Price?
 Design Perspective:
— Faced with design options, which has the
• Best Performance Improvement?
• Lowest Cost?
• Best Performance/Cost ?
 Both require
— Basis for comparison
— Metric for evaluation

17
Many possible definitions of performance
 Every computer vendor will select one that makes them look good. How
do you make sense of conflicting claims?

Q: Why do end users need a new performance metric?

A: End users who rely only on megahertz as an indicator for
performance do not have a complete picture of PC processor
performance and may pay the price of missed expectations.

18
Two notions of performance

Plane DC to Paris Speed Passengers Throughput

(pmph)
747 6.5 hours 610 mph 470 286,700
Concorde 3 hours 1350 mph 132 178,200

 Which has higher performance?

— Depends on the metric
• Time to do the task (Execution Time, Latency, Response Time)
• Tasks per unit time (Throughput, Bandwidth)
— Response time and throughput are often in opposition

19
Some Definitions
 Performance is in units of things/unit time
— E.g., Hamburgers/hour
— Bigger is better

 If we are primarily concerned with response time

— Performance(x) = 1
execution_time(x)

 Relative performance: “X is N times faster than Y”

N =Performance(X) = execution_time(Y)
Performance(Y) execution_time(X)

20
Some Examples

Plane DC to Paris Speed Passengers Throughput

(pmph)
747 6.5 hours 610 mph 470 286,700
Concorde 3 hours 1350 mph 132 178,200

 Time of Concorde vs. 747?

 Throughput of Concorde vs. 747?

21
Basis of Comparison
 When comparing systems, need to fix the workload
— Which workload?

Workload Pros Cons

Actual Target Representative Very specific

Workload Non-portable
Difficult to run/measure
Full Application Portable Less representative
Benchmarks Widely used
Realistic
Small “Kernel” or Easy to run Easy to “fool”
“Synthetic” Useful early in design
Benchmarks
Microbenchmarks Identify peak capability Real application performance
and potential bottlenecks may be much below peak

22
Benchmarking
 Some common benchmarks include:
— Adobe Photoshop for image processing
— BAPCo SYSmark for office applications
— Unreal Tournament 2003 for 3D games
— SPEC2000 for CPU performance

 The best way to see how a system

performs for a variety of programs is to
just show the execution times of all of the
programs.
 Here are execution times for several
different Photoshop 5.5 tasks, from

http://www.tech-report.com

23
Summarizing performance
 Summarizing performance with a single number can be misleading—just
like summarizing four years of school with a single GPA!
 If you must have a single number, you
could sum the execution times.
This example graph displays the total
execution time of the individual tests
from the previous page.
 A similar option is to find the average of
all the execution times.
For example, the 800MHz Pentium III (in
yellow) needed 227.3 seconds to run 21
programs, so its average execution time
is 227.3/21 = 10.82 seconds.
 A weighted sum or average is also possible, and lets you emphasize some
benchmarks more than others.

24
The components of execution time
 Execution time can be divided into two parts.
— User time is spent running the application program itself.
— System time is when the application calls operating system code.
 The distinction between user and system time is not always clear,
especially under different operating systems.
 The Unix time command shows both.

salary.125 > time distill 05-examples.ps

Distilling 05-examples.ps (449,119 bytes)
10.8 seconds (0:11)
449,119 bytes PS => 94,999 bytes PDF (21%)
10.61u 0.98s 0:15.15 76.5%

User time “Wall clock” time (including other processes)

System time
CPU usage = (User + System) / Total

25
Three Components of CPU Performance

CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX

Cycles Per Instruction

26
Instructions Executed
 Instructions executed:
— We are not interested in the static instruction count, or how many
lines of code are in a program.
— Instead we care about the dynamic instruction count, or how many
instructions are actually executed when the program runs.
 There are three lines of code below, but the number of instructions
executed would be 2001.

li $a0, 1000
Ostrich: sub $a0, $a0, 1
bne $a0, $0, Ostrich

27
CPI
 The average number of clock cycles per instruction, or CPI, is a function
of the machine and program.
— The CPI depends on the actual instructions appearing in the program—
a floating-point intensive application might have a higher CPI than an
integer-based program.
— It also depends on the CPU implementation. For example, a Pentium
can execute the same instructions as an older 80486, but faster.
 In CS231, we assumed each instruction took one cycle, so we had CPI = 1.
— The CPI can be >1 due to memory stalls and slow instructions.
— The CPI can be <1 on machines that execute more than 1 instruction
per cycle (superscalar).

28
Clock cycle time

 One “cycle” is the minimum time it takes the CPU to do any work.
— The clock cycle time or clock period is just the length of a cycle.
— The clock rate, or frequency, is the reciprocal of the cycle time.
 Generally, a higher frequency is better.
 Some examples illustrate some typical frequencies.
— A 500MHz processor has a cycle time of 2ns.
— A 2GHz (2000MHz) CPU has a cycle time of just 0.5ns (500ps).

29
Execution time, again
CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX

 The easiest way to remember this is match up the units:

Seconds Instructions Clock cycles Seconds

= * *
Program Program Instructions Clock cycle
 Make things faster by making any component smaller!!

Program Compiler ISA Organization Technology

Instruction
Executed
CPI

Clock Cycle
TIme

 Often easy to reduce one component by increasing another

30
Example 1: ISA-compatible processors
 Let’s compare the performances two 8086-based processors.
— An 800MHz AMD Duron, with a CPI of 1.2 for an MP3 compressor.
— A 1GHz Pentium III with a CPI of 1.5 for the same program.
 Compatible processors implement identical instruction sets and will use
the same executable files, with the same number of instructions.
 But they implement the ISA differently, which leads to different CPIs.

CPU timeAMD,P = InstructionsP * CPIAMD,P * Cycle timeAMD

=
=

CPU timeP3,P = InstructionsP * CPIP3,P * Cycle timeP3

=
=

31
Example 2: Comparing across ISAs
 Intel’s Itanium (IA-64) ISA is designed facilitate executing multiple
instructions per cycle. If an Itanium processor achieves an average CPI of
.3 (3 instructions per cycle), how much faster is it than a Pentium4
(which uses the x86 ISA) with an average CPI of 1?

a) Itanium is three times faster

b) Itanium is one third as fast
c) Not enough information

32
Improving CPI
 Many processor design techniques we’ll see improve CPI
— Often they only improve CPI for certain types of instructions

n
CPI = CPI F where F = I
i i i i
i =1
Instruction Count
 Fi = Fraction of instructions of type i

 First Law of Performance:

Make the common case fast

33
Example: CPI improvements

 Base Machine:

Op Type Freq (fi) Cycles CPIi

ALU 50% 3
Load 20% 5
Store 10% 3
Branch 20% 2

 How much faster would the machine be if:

— we added a cache to reduce average load time to 3 cycles?
— we added a branch predictor to reduce branch time by 1 cycle?
— we could do two ALU operations in parallel?

34
Amdahl’s Law
 Amdahl’s Law states that optimizations are limited in their effectiveness.

Execution
Time affected by improvement Time unaffected
time after = +
improvement Amount of improvement by improvement

 For example, doubling the speed of floating-point operations sounds like

a great idea. But if only 10% of the program execution time T involves
floating-point code, then the overall performance improves by just 5%.

Execution
0.10 T
time after = + 0.90 T = 0.95 T
improvement 2

 What is the maximum speedup from improving floating point?

 Second Law of Performance:

Make the fast case common

35
Summary
 Performance is one of the most important criteria in judging systems.
 There are two main measurements of performance.
— Execution time is what we’ll focus on.
— Throughput is important for servers and operating systems.
 Our main performance equation explains how performance depends on
several factors related to both hardware and software.

CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX

 It can be hard to measure these factors in real life, but this is a useful
guide for comparing systems and designs.
 Amdahl’s Law tell us how much improvement we can expect from specific
enhancements.
 The best benchmarks are real programs, which are more likely to reflect
common instruction mixes.

Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Computer Science 37 Lecture 22
No ratings yet
Computer Science 37 Lecture 22
14 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
L13 Stalls and Flushes
No ratings yet
L13 Stalls and Flushes
27 pages
Computer Pipelining Explained
No ratings yet
Computer Pipelining Explained
45 pages
Lecture On Global Informatics and Electronics
No ratings yet
Lecture On Global Informatics and Electronics
45 pages
Advanced Pipelining Techniques
No ratings yet
Advanced Pipelining Techniques
44 pages
Unit3 Pipelining
No ratings yet
Unit3 Pipelining
54 pages
Pipeline Control Hazards Explained
No ratings yet
Pipeline Control Hazards Explained
20 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
Pipelining New
No ratings yet
Pipelining New
33 pages
RFGHJ
No ratings yet
RFGHJ
20 pages
Instruction Pipelining Basics
No ratings yet
Instruction Pipelining Basics
20 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
No ratings yet
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
27 pages
CA7 2024S2 New
No ratings yet
CA7 2024S2 New
30 pages
Pipelining and Performance Optimization
No ratings yet
Pipelining and Performance Optimization
30 pages
Chapter 04 Processor 2
No ratings yet
Chapter 04 Processor 2
28 pages
Understanding Data Hazards in Pipelines
No ratings yet
Understanding Data Hazards in Pipelines
29 pages
App C
No ratings yet
App C
50 pages
Group 17 - 2151177
No ratings yet
Group 17 - 2151177
15 pages
Pipe Lining
No ratings yet
Pipe Lining
61 pages
Module 5 - Pipelining
No ratings yet
Module 5 - Pipelining
61 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Lec 24
No ratings yet
Lec 24
3 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
Lecture 34 Pipelined Execution and Control Hazard
No ratings yet
Lecture 34 Pipelined Execution and Control Hazard
18 pages
Pipelining in Modern Processors
No ratings yet
Pipelining in Modern Processors
61 pages
Chapter 6 - Pipelining
0% (1)
Chapter 6 - Pipelining
61 pages
Pipelining in Modern Processors
No ratings yet
Pipelining in Modern Processors
61 pages
Existential Questions On The CPU
No ratings yet
Existential Questions On The CPU
13 pages
Advanced Pipelining for CE Students
No ratings yet
Advanced Pipelining for CE Students
43 pages
Pooja Vashisth
No ratings yet
Pooja Vashisth
68 pages
3 Pipeline
No ratings yet
3 Pipeline
21 pages
Coa Lecture Unit 3 Pipelining
No ratings yet
Coa Lecture Unit 3 Pipelining
95 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
L10-L11-Instruction Pipelining
No ratings yet
L10-L11-Instruction Pipelining
38 pages
Lecture Notes Pipelining Stages 7B
No ratings yet
Lecture Notes Pipelining Stages 7B
7 pages
Pipeline
No ratings yet
Pipeline
39 pages
Contact Session 8
No ratings yet
Contact Session 8
63 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
CoA Batch13
No ratings yet
CoA Batch13
30 pages
Computer Architecture Insights
100% (1)
Computer Architecture Insights
55 pages
Chapter 4.5 - 4.8 Piplined Processor and Hazards
No ratings yet
Chapter 4.5 - 4.8 Piplined Processor and Hazards
68 pages
Sample Problems Pipe&Memory
No ratings yet
Sample Problems Pipe&Memory
57 pages
Computer Systems Pipelining Guide
No ratings yet
Computer Systems Pipelining Guide
7 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
31 pages
Azure Certification Pathways E Book
No ratings yet
Azure Certification Pathways E Book
31 pages
2008 Chevrolet Trailblazer Service Repair Manual PDF
100% (2)
2008 Chevrolet Trailblazer Service Repair Manual PDF
16 pages
D-155 - 3 Cylinder Diesel Engine (01/75 - 12/85) 00 - Complete Machine 16-06 - Compressor
No ratings yet
D-155 - 3 Cylinder Diesel Engine (01/75 - 12/85) 00 - Complete Machine 16-06 - Compressor
3 pages
Re Hacking Central Banks
No ratings yet
Re Hacking Central Banks
3 pages
IOT Based Distribution Transformers Health Monitoring System Using Arduino and Nodemcu
No ratings yet
IOT Based Distribution Transformers Health Monitoring System Using Arduino and Nodemcu
10 pages
2007 - Operational Modal Analysis and Finite Element Model Correlation of The Braga Stadium Suspended Roof - 2007
No ratings yet
2007 - Operational Modal Analysis and Finite Element Model Correlation of The Braga Stadium Suspended Roof - 2007
11 pages
Bibliometrics Tools for Researchers
No ratings yet
Bibliometrics Tools for Researchers
10 pages
(TDB) 2024 VRF Duct For Europe (R410A, 50Hz) - Ver.1.0 - 231018
No ratings yet
(TDB) 2024 VRF Duct For Europe (R410A, 50Hz) - Ver.1.0 - 231018
24 pages
2MP IR Cube Network Camera Specs
No ratings yet
2MP IR Cube Network Camera Specs
5 pages
Zfap410dk Service Manual PDF
100% (3)
Zfap410dk Service Manual PDF
85 pages
Python - Numbers
No ratings yet
Python - Numbers
37 pages
Gr.11 Media Notes Unit 1 - SESSION 1-4
No ratings yet
Gr.11 Media Notes Unit 1 - SESSION 1-4
16 pages
Unit 1-1
No ratings yet
Unit 1-1
2 pages
The Relationship Between Social Customer Engagement and Brand Loyalty of Vietnamese Young Generation The Moderation of Artificial Intelligence
No ratings yet
The Relationship Between Social Customer Engagement and Brand Loyalty of Vietnamese Young Generation The Moderation of Artificial Intelligence
89 pages
Computer-Paper Class 2
No ratings yet
Computer-Paper Class 2
7 pages
Project Charter Template - Blankf
No ratings yet
Project Charter Template - Blankf
2 pages
Ramesh New Adhar
No ratings yet
Ramesh New Adhar
1 page
Pec Application Content
No ratings yet
Pec Application Content
7 pages
Selenium With BDD Topics
No ratings yet
Selenium With BDD Topics
3 pages
7M-2 Plate Heat Exchanger
No ratings yet
7M-2 Plate Heat Exchanger
18 pages
EXPERIMENT5
No ratings yet
EXPERIMENT5
5 pages
Catalog - 副本
No ratings yet
Catalog - 副本
94 pages
Market Driven Strategies
No ratings yet
Market Driven Strategies
13 pages
(Ebook PDF) Accounting Information Systems 11th Edition by Patrick Wheeler Instant Download
No ratings yet
(Ebook PDF) Accounting Information Systems 11th Edition by Patrick Wheeler Instant Download
53 pages
Vertiv™ Knürr® RMS III®: Constant Reliability With Remote Monitoring
No ratings yet
Vertiv™ Knürr® RMS III®: Constant Reliability With Remote Monitoring
5 pages
Goodmans High Power Bass Party Speaker - Im
No ratings yet
Goodmans High Power Bass Party Speaker - Im
11 pages
Office 365 Home Tab Guide
No ratings yet
Office 365 Home Tab Guide
5 pages
05 Laboratory Exercise 1
No ratings yet
05 Laboratory Exercise 1
10 pages
CIS Apache HTTP Server 2.2 Benchmark v3.4.0
No ratings yet
CIS Apache HTTP Server 2.2 Benchmark v3.4.0
155 pages
OS Basics for System Engineering Students
No ratings yet
OS Basics for System Engineering Students
26 pages

Lec 13

Uploaded by

Lec 13

Uploaded by

Lecture 13

 How many bypass inputs would the muxes in EXE have?

beq $2, $3, Label IM Reg DM Reg

beq $2, $3, Label IM Reg DM Reg

??? IM IM Reg DM Reg

beq $2, $3, Label IM Reg DM Reg

next instruction 1 IM Reg DM Reg

next instruction 2 IM Reg DM Reg

beq $2, $3, Label IM Reg DM Reg

next instruction 1 IM Reg

Label: ... IM Reg DM Reg

beq $2, $3, Label IM Reg DM Reg

Label: ... IM Reg DM Reg

 Now we’ll discuss issues related to performance:

Q: Why do end users need a new performance metric?

Plane DC to Paris Speed Passengers Throughput

 Which has higher performance?

 If we are primarily concerned with response time

 Relative performance: “X is N times faster than Y”

Plane DC to Paris Speed Passengers Throughput

 Time of Concorde vs. 747?

 Throughput of Concorde vs. 747?

Workload Pros Cons

Actual Target Representative Very specific

 The best way to see how a system

salary.125 > time distill 05-examples.ps

User time “Wall clock” time (including other processes)

CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX

Cycles Per Instruction

 The easiest way to remember this is match up the units:

Seconds Instructions Clock cycles Seconds

Program Compiler ISA Organization Technology

 Often easy to reduce one component by increasing another

CPU timeAMD,P = InstructionsP * CPIAMD,P * Cycle timeAMD

CPU timeP3,P = InstructionsP * CPIP3,P * Cycle timeP3

a) Itanium is three times faster

 First Law of Performance:

Make the common case fast

Op Type Freq (fi) Cycles CPIi

 How much faster would the machine be if:

 For example, doubling the speed of floating-point operations sounds like

 What is the maximum speedup from improving floating point?

 Second Law of Performance:

Make the fast case common

CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX

You might also like