0% found this document useful (0 votes)

62 views6 pages

PS4 Solution

The document discusses a problem set on computer architecture, focusing on pipelining concepts and performance analysis in MIPS architecture. It includes exercises on data dependencies, instruction timing with and without forwarding hardware, cache memory stages, branch prediction, and the effects of deep pipelines on performance. The document also explores register-memory ALU operations and the necessary adjustments in the pipeline stages and forwarding paths to accommodate these operations.

Uploaded by

Kareem Hamdy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views6 pages

PS4 Solution

Uploaded by

Kareem Hamdy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

COE 501: Computer Architecture

Problem Set 4: Pipelining Basic and Intermediate Concepts

1) (15 pts) Use the following code fragment:

I1: LD R1, 0(R2) ; Load R1 = Memory(R2)

I2: DADDI R1, R1, 1 ; R1 = R1 + 1
I3: SD R1, 0(R2) ; Store Memory(R2) = R1
I4: DADDI R2, R2, 8 ; R2 = R2 + 8
I5: DADDI R4, R4, -1 ; R4 = R4 – 1
I6: BNE R4, R0, I1 ; Branch if R4 != 0

Assume that the initial value of R4 is 100.

a) (2 pts) List all the true data dependences in the code above within one loop iteration. Record
the register, source instruction, and destination instruction.

Data Dependences (within one loop iteration):

Register R1: I1 (LD) I2 (DADDI)

b) (4 pts) Show the timing of the above instruction sequence for the 5-stage MIPS pipeline
without any forwarding hardware. Use a pipeline timing chart to show all stall cycles. Assume
that the branch is handled by predicting it as NOT taken. If the branch outcome is TAKEN, it
kills the next two instructions in the pipeline. How many cycles does this loop take to execute?
What is the average CPI?

No forwarding hardware. Taken branch kills next two instructions.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
LD IF ID EX M WB
DADDI IF s s s ID EX M WB
SD IF s s s ID EX M
DADDI IF ID EX M WB
DADDI IF ID EX M WB
BNE IF s s s ID EX
next1
next2
LD IF ID EX M WB
Average of 17 cycles per iteration Start of next iteration

Total cycles = 17 × 100 = 1700 cycles

(Last iterate does not branch and we don’t kill next 2 instructions but this ignored for simplicity)
Average CPI = 17 cycles / 6 instructions = 2.83

Prepared by Dr. Muhamed Mudawar Page 1 of 6

c) (5 pts) Assuming delayed branching, rewrite the above code to take advantage of the branch
delay slot. Show the timing of the above instruction sequence for the 5-stage MIPS pipeline
with full forwarding hardware. How many cycles does this loop take to execute? What is the
average CPI?

Delayed Branching + Forwarding hardware.

Code can be rewritten as follows to take advantage of the branch delay slot:

I1: LD R1, 0(R2) ; Load R1 = Memory(R2)

I2: DADDI R1, R1, 1 ; R1 = R1 + 1
I3: SD R1, 0(R2) ; Store Memory(R2) = R1
I5: DADDI R4, R4, -1 ; R4 = R4 – 1
I6: BNE R4, R0, I1 ; Branch if R4 != 0
I4: DADDI R2, R2, 8 ; R2 = R2 + 8

1 2 3 4 5 6 7 8 9 10 11 12 13
LD IF ID EX M WB
DADDI IF s ID EX M WB
SD IF ID EX M
DADDI IF ID EX M WB
BNE IF ID EX
DADDI IF ID EX M WB
next
LD IF ID EX M WB
8 cycles per iteration Next iterate

Total cycles = 8 × 100 = 800 cycles

Average CPI = 8 cycles / 6 instructions = 1.33

d) (4 pts) Cache memory stages sometimes take longer to access than other pipeline stages.
Consider a 7-stage pipeline: IF1, IF2, ID, EX, MEM1, MEM2, WB, where instruction fetch is split
into two stages: IF1 and IF2, and the data memory is also split into two stages: MEM1 and
MEM2. Show the timing of the above instruction sequence for the 7-stage pipeline will full
forwarding hardware. Assume that the branch is handled by predicting it as always TAKEN
with zero delay in the IF1 stage. How many cycles does this loop take to execute? What is the
average CPI?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
LD IF1 IF2 ID EX M1 M2 WB
DADDI IF1 IF2 s s ID EX M1 M2 WB
SD IF1 s s IF2 ID EX M1 M2
DADDI IF1 IF2 ID EX M1 M2 WB
DADDI IF1 IF2 ID EX M1 M2 WB
BNE IF1 IF2 ID EX
LD IF1 IF2 ID EX M1 M2 WB
8 cycles per iteration Next Iterate

Total cycles = 8 × 100 = 800 cycles

Average CPI = 8 cycles / 6 instructions = 1.33

Prepared by Dr. Muhamed Mudawar Page 2 of 6

2) (5 pts) Consider the following branch and jump frequencies. Assume there is NO branch target
buffer (BTB) in the first stage and that branches and jumps are not resolved until later stages
in the pipeline.

Conditional branches = 20%

Unconditional Jumps and Calls = 3%
70% of conditional branches are taken

a) (2 pts) We are examining a 5-stage processor pipeline where the unconditional jump and call
instructions are resolved at the end of the second stage, and the conditional branches are
resolved at the end of the third stage. Ignoring other pipeline stalls, how much faster would
the processor pipeline be without any control hazards?

Unconditional Jump and Call Delay = 1 cycle (Kills next instruction)

Taken Conditional Branch Delay = 2 cycles (Kills next 2 instructions)
Untaken Conditional Branch kills 0 instructions

CPI with control hazards = 1 + 0.03 × 1 + 0.2 × 0.7 × 2 = 1.31

CPI without control hazards = 1
Speedup = 1.31 / 1 = 1.31

b) (3 pts) Now assume a 10-stage deep pipeline, where unconditional jumps and calls are
resolved at the end of the fourth stage and conditional branches are resolved at the end of
the seventh stage. Ignoring other pipeline stalls, how much faster would the processor
pipeline be without any control hazards?

Unconditional Jump and Call Delay = 3 cycles (Kills next 3 instructions)

Taken Conditional Branch Delay = 6 cycles (Kills next 6 instructions)
Untaken Conditional Branch kills 0 instructions

CPI with control hazards = 1 + 0.03 × 3 + 0.2 × 0.7 × 6 = 1.93

CPI without control hazards = 1
Speedup = 1.93 / 1 = 1.93

3) (7 pts) In this problem, we will explore how a deep processor pipeline affects performance in
two ways: faster clock cycle and increased stalls due to data and control hazards. Assume that
the original processor is a 5-stage pipeline with a 1 ns clock cycle. The second processor is a
12-stage pipeline with a 0.5 ns clock cycle. The 5-stage pipeline experiences one stall cycle due
to a data hazard every 5 instructions, whereas the 12-stage pipeline experiences 3 stall cycles
every 8 instructions. In addition, branches constitute 20% of the instruction count, and the
misprediction rate for both pipelines is 5%.

a) (3 pts) What is the speedup of the 12-stage pipeline over the 5-stage pipeline, taking into
account only data hazards?

Average CPI (5-stage pipeline) = 1 + 1/5 = 6/5 (Data hazard stalls only)
Average CPI (12-stage pipeline) = 1 + 3/8 = 11/8 (Data hazard stalls only)
Speedup = (6/5 × 1 ns) / (11/8 × 0.5 ns) = 1.745 (Data hazards only)

Prepared by Dr. Muhamed Mudawar Page 3 of 6

b) (4 pts) If the branch misprediction penalty is 2 cycles for the 5-stage pipeline, but 6 cycles for
the 12-stage pipeline, what are the CPIs of each, taking into account the stalls of the data
hazards and branch hazards?

Average CPI (5-stage pipeline) = 1 + 1/5 + 0.2 × 0.05 × 2 = 1.22 (Data + Branch Hazards)
Average CPI (12-stage pipeline) = 1 + 3/8 + 0.2 × 0.05 × 6 = 1.435 (Data + Branch Hazards)
Speedup = (1.22 × 1 ns) / (1.435 × 0.5 ns) = 1.70 (Data + Branch Hazards)

4) (13 pts) We will now add support for register-memory ALU operations to the classic five-stage
MIPS pipeline. To simplify the problem, all memory addressing will be restricted to register
indirect. All addresses are simply a value held in a register. No displacement may be added to
the register value. For example, ADD R4, R5, (R8) means R4 = R5 + Memory(R8). Only one
memory operand can be read, but not written. To write memory, the store instruction should
be used instead. Register-register ALU operations are unchanged. For example, the instruction
ADD R4, R5, R8 means R4 = R5 + R8.

a) (2 pts) List a rearranged order of the five traditional stages of the MIPS pipeline that will
support register-memory operations implemented exclusively by register indirect addressing.

IF = Instruction Fetch (as before)

ID = Instruction Decode (as before)
MEM = Memory Stage (comes before the Execute stage)
EX = Execute (comes after the Memory stage)
WB = Write Back stage (as before)

The memory stage should come before the execute stage to allow a memory operand to be
read from memory before execution.

b) (5 pts) Describe what forwarding paths are needed for the rearranged pipeline by stating the
source stage, destination stage, and information transferred on each needed new path. Give
an instruction sequence showing each data hazard that can be resolved by forwarding data
between stages. Draw a timing diagram showing the forwarding between stages.

Forwarding from MEM back to MEM stage:

LD R7, (R6) ; Load R7 = Memory(R6)
LD R8, (R7) ; Load R8 = Memory(R7)
Value of R7 should be forwarded from output of MEM back to address input of MEM.

LD R8, (R6) ; Load R8 = Memory(R6)

SD R8, (R7) ; Store Memory(R7) = R8
Value of R8 should be forwarded from output of MEM back to data input of MEM.

Forwarding from WB and EX stages back to the EX stage:

ADD R4, R5, (R6) ; R4 = R5 + Memory(R6)
SUB R7, R5, (R8) ; R7 = R5 – Memory (R8)
AND R9, R4, R7 ; R9 = R4 & R7
Values of R4 and R7 should be forwarded from WB and EX stages back to the EX stage.

Prepared by Dr. Muhamed Mudawar Page 4 of 6

Forwarding from WB and EX stages back to the MEM stage:
DADD R4, R5, (R6) ; R4 = R5 + Memory(R6)
DSUB R7, R5, (R8) ; R7 = R5 – Memory (R8)
SD R4, (R9) ; Memory(R9) = R4
AND R3, R3, (R4) ; R3 = R3 & Memory(R4)
Value of R4 should be forwarded from the output of the EX stage back data input of the
MEM stage (needed by SD). In addition, value of R4 should be forwarded from the WB stage
back to the address input of the MEM stage (needed by AND).

1 2 3 4 5 6 7 8 9 10
LD R7, (R6) IF ID MEM EX WB
LD R8, (R7) IF ID MEM EX WB

LD R8, (R6) IF ID MEM EX WB

SD R8, (R7) IF ID MEM

ADD R4, R5, (R6) IF ID MEM EX WB

SUB R7, R5, (R8) IF ID MEM EX WB
AND R9, R4, R7 IF ID MEM EX WB

DADD R4, R5, (R6) IF ID MEM EX WB

DSUB R7, R5, (R8) IF ID MEM EX WB
SD R4, (R9) IF ID MEM
AND R3, R7, (R4) IF ID MEM EX WB

c) (3 pts) For the reordered stages of the pipeline, what data hazards cannot be forwarded and
cause stall cycles? Give an instruction sequence showing each data hazard that causes stall
cycles. Draw a timing diagram showing the stall cycles caused by each data hazard.

Because the EX stage is rearranged after the MEM stage, some RAW data hazards cause stall
cycles in the new pipeline.
DADD R4, R5, (R6) ; R4 = R5 + Memory(R6)
SD R4, (R9) ; Memory(R9) = R4
Stall 1 cycle until the value of R4 is computed in the EX stage. The MEM stage is waiting for
data to be computed in the EX stage.

DADDI R7, R7, 16 ; R7 = R7 + 16

DSUB R8, R8, (R7) ; R8 = R8 – Memory(R7)
Stall 1 cycle until the value of R7 is computed in the EX stage. The MEM stage is waiting for
the address to be computed in the EX stage.

1 2 3 4 5 6 7 8 9 10
DADD R4, R5, (R6) IF ID MEM EX WB
SD R4, (R9) IF ID stall MEM

DADDI R7, R7, 16 IF ID MEM EX WB

DSUB R8, R8, (R7) IF ID stall MEM EX WB

Prepared by Dr. Muhamed Mudawar Page 5 of 6

d) (1 pts) What is the penalty of the branch instruction in the new pipeline?

Because the EX stage is now the fourth stage in the pipeline, the penalty of the branch
instruction has increased from 2 cycles to 3 cycles.

e) (2 pts) List all of the ways that the new pipeline with register-memory ALU operations can
have a different instruction count for a given program than the original pipeline (that supports
register-register ALU operations only). Give specific instruction sequences, one for the original
pipeline and one for the rearranged pipeline, to illustrate each way.

Because register-memory operations are supported, the number of load instructions can be
reduced. For example, to translate A = B + C requires 4 instructions in the original MIPS
architecture, while 3 instructions only if register-memory operations are supported.

A = B + C (No register-memory ALU operations) R4, R5, R6 contain addresses of A, B, and C:

LD R10, (R5) ; Load R10 = Memory(R5) = B
LD R11, (R6) ; Load R11 = Memory(R6) = C
DADD R12, R10, R11 ; R12 = B + C
SD R12, (R4) ; Store A = Memory(R4) = R12

A = B + C (Register-memory ALU operations are supported):

LD R10, (R5) ; Load R10 = Memory(R5) = B
DADD R12, R10, (R6) ; R12 = B + C
SD R12, (R4) ; Store A = Memory(R4) = R12

Because Register Indirect addressing can be used only and there is no displacement
addressing, additional ALU instructions are required to calculate memory addresses. For
example, the following instruction sequence:

LD R10, 8(R4)
LD R11, 16(R4)
LD R12, 24(R4)

Should be rewritten as follows if only register-indirect addressing is supported, causing an

increase in the instruction count:

DADDI R5, R4, 8

DADDI R6, R4, 16
DADDI R7, R4, 24
LD R10, (R5)
LD R11, (R6)
LD R12, (R7)

Prepared by Dr. Muhamed Mudawar Page 6 of 6

Sample Problems Pipe&Memory
No ratings yet
Sample Problems Pipe&Memory
57 pages
Quiz2 Soln spr12 PDF
No ratings yet
Quiz2 Soln spr12 PDF
2 pages
Sheet 9
No ratings yet
Sheet 9
12 pages
컴구 2021 1 중간고사답안 김성태
No ratings yet
컴구 2021 1 중간고사답안 김성태
23 pages
ECE 452 Spring 2010 Midterm Exam
No ratings yet
ECE 452 Spring 2010 Midterm Exam
9 pages
A4 Solution
No ratings yet
A4 Solution
4 pages
Computer Systems Architecture Exam Solutions
100% (1)
Computer Systems Architecture Exam Solutions
8 pages
Coa Applied
No ratings yet
Coa Applied
13 pages
CompEng 361 Final Review Problems - Solutions
No ratings yet
CompEng 361 Final Review Problems - Solutions
6 pages
MIPS Pipeline Homework CS433 Fall 2007
No ratings yet
MIPS Pipeline Homework CS433 Fall 2007
3 pages
COE301 Final Solution 162
No ratings yet
COE301 Final Solution 162
10 pages
Two Forms of Pipelining: - E.g., Floating Point Operations
No ratings yet
Two Forms of Pipelining: - E.g., Floating Point Operations
36 pages
Pipelining & Branch Prediction Analysis
No ratings yet
Pipelining & Branch Prediction Analysis
6 pages
Archi Second 2013 2014 JCE
No ratings yet
Archi Second 2013 2014 JCE
2 pages
CSE 560 - Practice Problem Set 4 Solution
No ratings yet
CSE 560 - Practice Problem Set 4 Solution
3 pages
F10 E1 Solution
No ratings yet
F10 E1 Solution
5 pages
Question 1 (50 Points) Pipelining
No ratings yet
Question 1 (50 Points) Pipelining
3 pages
Assignment Solution Week11
100% (1)
Assignment Solution Week11
5 pages
High Performance Computer Architecture (CS60003)
No ratings yet
High Performance Computer Architecture (CS60003)
2 pages
Numerical: Central Processing Unit
No ratings yet
Numerical: Central Processing Unit
28 pages
CO Assignment 4 Solution
100% (1)
CO Assignment 4 Solution
10 pages
Computer Architecture Homework
No ratings yet
Computer Architecture Homework
5 pages
BFE Final Organization Fall 2014 Answer
No ratings yet
BFE Final Organization Fall 2014 Answer
8 pages
Unit II Numericals
No ratings yet
Unit II Numericals
5 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
4 pages
PIPELINE
No ratings yet
PIPELINE
13 pages
cs146 Fall2017 Midterm1xx
No ratings yet
cs146 Fall2017 Midterm1xx
12 pages
Design of 32bit MIPS Processor
No ratings yet
Design of 32bit MIPS Processor
23 pages
Cs433 Fa20 Hw3 Solution
No ratings yet
Cs433 Fa20 Hw3 Solution
15 pages
CMPE361-Final - Sanple
No ratings yet
CMPE361-Final - Sanple
8 pages
Chap.4 - Summary Problems
No ratings yet
Chap.4 - Summary Problems
7 pages
Advanced Pipelining Techniques
No ratings yet
Advanced Pipelining Techniques
44 pages
Exam2 Practice Sol
No ratings yet
Exam2 Practice Sol
6 pages
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
Midterm1 Soln Fall09 PDF
No ratings yet
Midterm1 Soln Fall09 PDF
6 pages
Mid Term 13-14
No ratings yet
Mid Term 13-14
3 pages
Hpca Pyqp
No ratings yet
Hpca Pyqp
17 pages
CMP3010L05-Hazard Continue ILP
No ratings yet
CMP3010L05-Hazard Continue ILP
54 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
Pipe 3
No ratings yet
Pipe 3
32 pages
Computer Architecture M2 (Part 3)
No ratings yet
Computer Architecture M2 (Part 3)
34 pages
Es ZG642 Ec-3r First Sem 2023-2024-2
No ratings yet
Es ZG642 Ec-3r First Sem 2023-2024-2
2 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
CENG400-Final-Fall 2015
No ratings yet
CENG400-Final-Fall 2015
10 pages
Assignment5 Soln
No ratings yet
Assignment5 Soln
5 pages
Computer Architecture - Sheet 7 Solution
No ratings yet
Computer Architecture - Sheet 7 Solution
5 pages
111 Computer Organization - Final
No ratings yet
111 Computer Organization - Final
4 pages
Hazards PDF
No ratings yet
Hazards PDF
30 pages
Quiz For Chapter 4 With Solutions
100% (1)
Quiz For Chapter 4 With Solutions
30 pages
350 Exam 2 Spring 2024
No ratings yet
350 Exam 2 Spring 2024
7 pages
Homework Set - 5
No ratings yet
Homework Set - 5
2 pages
ACA Question Bank
No ratings yet
ACA Question Bank
19 pages
Comp Arch Nptel Questions
No ratings yet
Comp Arch Nptel Questions
13 pages
Self-Service - Grade Report - Karim Hamdy Zafan Ibrahim Fayed
No ratings yet
Self-Service - Grade Report - Karim Hamdy Zafan Ibrahim Fayed
1 page
Electronics - Final Exam Fall 23-24
No ratings yet
Electronics - Final Exam Fall 23-24
2 pages
Mesh Analysis
No ratings yet
Mesh Analysis
26 pages
Midterm Exam Schedule - Draft
No ratings yet
Midterm Exam Schedule - Draft
3 pages
SP 24-25 Midterm Exam Schedule-1
No ratings yet
SP 24-25 Midterm Exam Schedule-1
8 pages
Summer Midterm 1
No ratings yet
Summer Midterm 1
4 pages
8086 Microprocessor
No ratings yet
8086 Microprocessor
22 pages
Chapter 2 The Microprocessor and Its Architecture
No ratings yet
Chapter 2 The Microprocessor and Its Architecture
60 pages
MULTIcycle OPERATIONS
No ratings yet
MULTIcycle OPERATIONS
24 pages
Score Boarding
No ratings yet
Score Boarding
38 pages
ACA T1 Solutions
No ratings yet
ACA T1 Solutions
17 pages
CS501 MidTerm MCQs by Talha Sajid
No ratings yet
CS501 MidTerm MCQs by Talha Sajid
30 pages
Design of 64-Bit Decode Stage For VLIW Processor Architecture
No ratings yet
Design of 64-Bit Decode Stage For VLIW Processor Architecture
3 pages
Unit III and Unit IV - Question Bank With Answers
No ratings yet
Unit III and Unit IV - Question Bank With Answers
5 pages
Dspworkshop Part2 2006
No ratings yet
Dspworkshop Part2 2006
48 pages
Computer Architecture Exam Papers
No ratings yet
Computer Architecture Exam Papers
6 pages
8086 Microprocessor Architecture
No ratings yet
8086 Microprocessor Architecture
24 pages
Pipelining Hazards Explained
No ratings yet
Pipelining Hazards Explained
12 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
94 pages
HSE-6-Soc Introduction To The System Design Approach
No ratings yet
HSE-6-Soc Introduction To The System Design Approach
69 pages
Unit Iii General Purpose Processor Software Development
No ratings yet
Unit Iii General Purpose Processor Software Development
11 pages
CS647
No ratings yet
CS647
2 pages
Four-Stage Pipelined Controller
No ratings yet
Four-Stage Pipelined Controller
38 pages
Noc19-Cs72 Assignment Week 08
No ratings yet
Noc19-Cs72 Assignment Week 08
4 pages
Computer Architecture Question Bank
No ratings yet
Computer Architecture Question Bank
12 pages
Computer Organization Chapter 5 Lecture 15 - Lecture 18
No ratings yet
Computer Organization Chapter 5 Lecture 15 - Lecture 18
44 pages
32-bit Pipelined MIPS Processor Design
No ratings yet
32-bit Pipelined MIPS Processor Design
2 pages
Unit 4 Computer Architecture and Assembly Language in Bca 3rd Semester
No ratings yet
Unit 4 Computer Architecture and Assembly Language in Bca 3rd Semester
36 pages
ETE Question Bank I
No ratings yet
ETE Question Bank I
12 pages
Nonblocking Assignments in Verilog Synthesis Coding Styles That Kill!
No ratings yet
Nonblocking Assignments in Verilog Synthesis Coding Styles That Kill!
21 pages
MIPS Processor Execution & Datapath
No ratings yet
MIPS Processor Execution & Datapath
47 pages
1.1 Summary Notes Computer Science A Level OCR
0% (1)
1.1 Summary Notes Computer Science A Level OCR
5 pages
C28x Piccolo Workshop
No ratings yet
C28x Piccolo Workshop
374 pages
Daa 6
No ratings yet
Daa 6
59 pages
Coa Gate
No ratings yet
Coa Gate
45 pages
Design Metrics for Engineers
No ratings yet
Design Metrics for Engineers
34 pages

PS4 Solution

Uploaded by

PS4 Solution

Uploaded by

COE 501: Computer Architecture

Problem Set 4: Pipelining Basic and Intermediate Concepts

I1: LD R1, 0(R2) ; Load R1 = Memory(R2)

Assume that the initial value of R4 is 100.

Data Dependences (within one loop iteration):

Register R1: I1 (LD) I2 (DADDI)

No forwarding hardware. Taken branch kills next two instructions.

Total cycles = 17 × 100 = 1700 cycles

Prepared by Dr. Muhamed Mudawar Page 1 of 6

Delayed Branching + Forwarding hardware.

I1: LD R1, 0(R2) ; Load R1 = Memory(R2)

Total cycles = 8 × 100 = 800 cycles

Total cycles = 8 × 100 = 800 cycles

Prepared by Dr. Muhamed Mudawar Page 2 of 6

Conditional branches = 20%

Unconditional Jump and Call Delay = 1 cycle (Kills next instruction)

CPI with control hazards = 1 + 0.03 × 1 + 0.2 × 0.7 × 2 = 1.31

Unconditional Jump and Call Delay = 3 cycles (Kills next 3 instructions)

CPI with control hazards = 1 + 0.03 × 3 + 0.2 × 0.7 × 6 = 1.93

Prepared by Dr. Muhamed Mudawar Page 3 of 6

IF = Instruction Fetch (as before)

Forwarding from MEM back to MEM stage:

LD R8, (R6) ; Load R8 = Memory(R6)

Forwarding from WB and EX stages back to the EX stage:

Prepared by Dr. Muhamed Mudawar Page 4 of 6

LD R8, (R6) IF ID MEM EX WB

ADD R4, R5, (R6) IF ID MEM EX WB

DADD R4, R5, (R6) IF ID MEM EX WB

DADDI R7, R7, 16 ; R7 = R7 + 16

DADDI R7, R7, 16 IF ID MEM EX WB

Prepared by Dr. Muhamed Mudawar Page 5 of 6

A = B + C (No register-memory ALU operations) R4, R5, R6 contain addresses of A, B, and C:

A = B + C (Register-memory ALU operations are supported):

Should be rewritten as follows if only register-indirect addressing is supported, causing an

DADDI R5, R4, 8

Prepared by Dr. Muhamed Mudawar Page 6 of 6

You might also like