0% found this document useful (0 votes)

24 views47 pages

Lecture 10 & 11

The document discusses data hazards in pipelined computer architecture, focusing on techniques like forwarding and no-operation instructions to mitigate issues such as Read After Write (RAW), Write After Write (WAW), and Write After Read (WAR) hazards. It also covers control hazards, branch prediction methods, and the importance of dynamic branch prediction for improving pipeline efficiency. Various strategies for handling branch hazards, including freeze schemes and delayed branches, are explored to enhance performance in instruction execution.

Uploaded by

Anwar Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views47 pages

Lecture 10 & 11

Uploaded by

Anwar Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

1

CS-506 Advanced Computer Systems

Architecture

Lecture#10-11 ILP
Gul Munir Ujjan
Assistant Professor
CISE Department, NEDUET
Karachi
2
Data Hazards
One way to solve this problem is by using a simple
hardware technique called forwarding (also called
bypassing).
Forwarding can be generalized to include passing a
result directly to the functional unit that requires it.
In the previous example the result is not really needed
by the DSUB instruction until after the DADD
instruction actually produces it.
If the result can be moved from the pipeline register
where the DADD stores it to where the DSUB needs it,
then the need for a stall can be avoided.
3
Data Hazards
4
Data Hazards
Where to find the ALU result?
The ALU result generated in the EX stage is passed
through the pipeline registers to the MEM and WB
stages, before it is finally written to the register file.
Since the pipeline registers already contain the ALU
result, we could just forward that value to subsequent
instructions, to prevent data hazards.
5
Forwarding Unit
Forwarding unit selects the correct ALU inputs for the
EX stage.

If there is no hazard, the ALU’s operands will come from the

register file, just like before.
If there is a hazard, the operands will come from either the
EX/MEM or MEM/WB pipeline registers instead.
The ALU sources will be selected by two new multiplexers,
with control signals named ForwardA and ForwardB.
6
Forwarding Unit
7
Detecting Hazards
8
Data hazard
The second way is by using no-op instructions
9
Data hazard
Solution: Delay load approach: Insert a no-operation
instruction to avoid the data conflict

ADD R1  R2+R3
No-Op
No-Op
SUB R4  R1-R5
AND R6  R1 AND R7
OR R8  R1 OR R9
XOR R10  R1 XOR R11
10
Data hazard
FO: fetch data value WO: store the executed value
11
Data Hazard Classification
Three types of data hazards
RAW : Read After Write
WAW : Write After Write
Write After Read

RAR : Read After Read

►Is this a hazard?

Before discussing data hazard classification, let us get

a brief look on dependences.
12
Dependences
Determining how one instruction depends on another is critical
to determining how much parallelism exists in a program and
how that parallelism can be exploited.
If two instructions are dependent, they are not parallel and
must be executed in order, although they may often be partially
overlapped.
There are three different types of dependences:
data dependences (also called true data dependences)
name dependences
control dependences
13
Data Dependence
An instruction y is data dependent on instruction x if
either of the following holds:
1. Instruction x produces a result that may be used by
instruction y.
2. Instruction y is data dependent on instruction z, and
instruction z is data dependent on instruction x.
14
Name Dependence
A name dependence occurs when two instructions use the same
register or memory location, called a name, but there is no flow
of data between the instructions associated with that name.
There are two types of name dependences between an
instruction x that precedes instruction y in program order:
1. An antidependence between instruction x and instruction y occurs
when instruction y writes a register or memory location that
instruction x reads. The original ordering must be preserved to ensure
that x reads the correct value.
2. An output dependence occurs when instruction x and instruction y
write the same register or memory location. The ordering between the
instructions must be preserved to ensure that the value finally written
corresponds to instruction y.
15
Name Dependence
1. Antidependence
Register Renaming
S.D F4, 0(R1)
Because a name dependence is
DADDIU R1, R1, #-8
not a true dependence,
instructions involved in a name
2. Output Dependence
dependence can execute
DSUBU R4, R5, R6 simultaneously or be reordered, if
DADDU R4, R7, R8 the register number used in the
instructions is changed so the
instructions do not conflict.
16
Control Dependence
Control dependency is a situation in which a program
instruction executes if the previous instruction
evaluates in a way that allows its execution.
It is a situation where the execution of one program
statement is directly influenced by the outcome of a
previous statement, specifically a conditional
statement or a loop condition
Read After Write (RAW) 17
True data dependance
Example:
ADD R1  R2+R3
SUB R4  R1-R5

A read after write (RAW) data hazard refers to a situation

where an instruction refers to a result that has not yet
been calculated or retrieved.
This can occur because an instruction is executed after a
previous instruction, and the previous instruction has not
been completely processed through the pipeline.
Solution: Program order must be preserved by
inserting stalls to ensure that second instruction
receives the value from first.
Write After Read (WAR) 18
Antidependence
Example:
R4  R1 + R5
R5  R1 + R2

y tries to write a destination before it is read by x, so x

incorrectly gets the new value.
This hazard arises from an antidependence (or name
dependence).
WAR hazards cannot occur in most static issue pipelines.
The dependence is on the name R5, but not on actual dataflow
Using different output register for y instruction would
solve.
Write After Write (WAW) 19
Output dependence
Example:
R2  R4 + R7
R2  R1 + R3
y tries to write an operand before it is written by x. The writes
end up being performed in the wrong order, leaving the value
written by x rather than the value written by y in the
destination.
This hazard corresponds to an output dependence. WAW
hazards are present only in pipelines that write in more than
one pipe stage or allow an instruction to proceed even when a
previous instruction is stalled.
Solution: We must delay the WB (Write Back) of
second instruction until the execution of first
instruction is successful.
20
Handling Data Hazards
There are three techniques commonly used to handle
data hazards in pipelining:
Forwarding (add special circuitry to the pipeline)
Insertion of the stalls
Re-ordering of the code
21
Control/ Branch Hazards
Branch hazards can cause a greater
performance loss for pipelines
When a branch instruction is executed, it may
or may not change the PC

If a branch changes the PC to its target

address, it is a taken branch Otherwise,
it is untaken.
22
Control/ Branch Hazards
There are FOUR schemes to handle branch
hazards
Freeze scheme
Predict-untaken scheme
Predict-taken scheme
Delayed branch

Note: We will compare them for both

whether branch is taken o untaken.
23
Control/ Branch Hazards
The simplest scheme to handle branches is to
freeze or flush the pipeline, holding or deleting
any instructions after the branch until the branch
destination is known.
Predicted Not taken: In terms of hardware, it’s
easier to assume that the branch will be Not
Taken. Thus, we just need to increment the PC
and continue execution, as for normal
instructions.
If we’re correct, then there is no problem
of instruction flushing and the pipeline
keeps going without any performance
penalty.
24
Control/ Branch Hazards
Predicted Taken: An alternative scheme is to treat every
branch as taken. As soon as the branch is decoded and the
target address is computed, we assume the branch to be
taken and begin fetching and executing at the target.
Delayed Branch: The compiler rearranges the code so that
a branch instruction’s effect is delayed by whatever
number of cycles required to discharge the dependencies.
Instructions are placed after branch instruction
that will execute regardless of branch is taken
or not.
Such instructions are said to be in the branch
delay slot
25
Control/ Branch Hazards

5-Stage Pipelining
26
Control/ Branch Hazards
Branch Untaken (Freeze approach)
The simplest method of dealing with branches is to redo the fetch following
a branch
27
Control/ Branch Hazards
Branch Taken (Freeze approach)
The simplest method of dealing with branches is to redo the fetch following
a branch
28
Control/ Branch Hazards
Branch Taken (Freeze approach)
The simplest scheme to handle branches is to
freeze the pipeline holding or deleting any
instructions after the branch until the branch
destination is known.

The attractiveness of this solution lies

primarily in its simplicity both for hardware
and software.
29
Control/ Branch Hazards
Branch Untaken (Predicted-untaken)
A higher performance, and only slightly more
complex, scheme is to treat every branch as not
taken.
It is implemented by continuing to fetch
instructions as if the branch were normal
instruction.
The pipeline looks the same if the branch is not
taken
If the branch is taken, we need to redo the fetch
instruction
30
Control/ Branch Hazards
Branch Untaken (Predicted-untaken)
31
Control/ Branch Hazards
Branch Taken (Predicted-untaken)
32
Control/ Branch Hazards
Branch untaken (Predicted-Taken)
33
Control/ Branch Hazards
Branch Taken (Predicted-Taken)
An alternative scheme is to treat every branch
as taken
As soon as the branch is decoded and the target
address is computed; we assume the branch to
be taken and begin fetching and executing the
target
34
Control/ Branch Hazards
Branch Taken (Predicted-Taken)
35
Control/ Branch Hazards
Delayed Branch
A fourth scheme in use in some processors is called
delayed branch
It is done in compiler time. It modifies the code.
The general format is:

branch instruction
delay slot
branch target if taken
36
Branch Prediction
The idea is to guess outcome of branch's
condition test (i.e. whether or not the branch will
be taken).
In computer architecture, a branch predictor is
a digital circuit that tries to guess which way a
branch (e.g., an if–then–else structure) will go
before this is known definitively
37
Branch Prediction
The purpose of the branch predictor is to improve the flow in
the instruction pipeline. Branch predictors play a critical role in
achieving high effective performance in many modern pipelined
microprocessor architectures.
It is not known for certain whether a conditional jump will be
taken or not taken until the condition has been calculated and
the conditional jump has passed the execution stage in the
instruction pipeline.
38
Branch Prediction
Branch Prediction
Without branch prediction, the processor would have
to wait until the conditional jump instruction has
passed the execute stage before the next instruction
can enter the fetch stage in the pipeline.
The branch predictor attempts to avoid this waste of
time by trying to guess whether the conditional jump
is most likely to be taken or not taken.
39
Branch Prediction
40
Branch Prediction
1. Static Branch Prediction
When branch prediction is not based on branch
behavior i.e. the predictor always predicts the branch
in the same direction i.e.
Taken (MIPS, Stanford) or
Not-taken (Motorola MC88000),
Such prediction is called static branch prediction.
Static because the prediction is already known before
the program is executed.
41
Branch Prediction
1. Static Branch Prediction
Some of the static prediction schemes include:
Predict all branches to be taken. This makes use of the
observation that a majority of branches is taken. This
primitive mechanism yields 60% to 70% accuracy.
Predict backward branches to be taken and forward
branches not to be taken.
backward branches (branches which decrease the PC)
forward branches (branches which increase the PC)
Profiling can also be used to predict the outcome of a
branch.
42
Branch Prediction
2. Dynamic Branch Prediction
The goal of dynamic branch prediction is to make use of
run-time behavior of branch to more accurately predict
what direction (taken or not-taken) a given branch will
follow.
There are several dynamic branch predictor in use or being
researched nowadays.
The simplest dynamic branch-prediction scheme is a
branch-prediction buffer or Branch History Table (BHT)
A branch-prediction buffer is a small memory indexed
by the lower portion of the address (PC) of the branch
instruction.
The memory contains a single bit that says whether the
branch was recently taken or not.
43
Branch Prediction
2. Dynamic Branch Prediction
Branch History Table or Branch Prediction Buffer
With such a buffer, we don’t know, if the prediction is
correct—it may have been put there by another branch that
has the same low-order address bits.
The prediction is just a hint, if the hint turns out to be wrong,
the prediction bit is inverted and stored back.

Not Taken Taken Taken

0 1
(Predict Not-Taken) (Predict Taken)
Not Taken
44
Branch Prediction

2. Dynamic Branch Prediction

This simple 1-bit prediction scheme has a
performance shortcoming:
Even if a branch is almost always taken, we will
likely predict incorrectly twice, rather than once.
once upon loop exit and then again on loop entry
45
Branch Prediction
2. Dynamic Branch Prediction
To remedy this weakness, 2-bit prediction
schemes are often used. In a 2-bit scheme,
a prediction must miss twice before it is
changed.
46
Branch Prediction
2. Dynamic Branch Prediction
The 2-bit scheme is actually a specialization of a more
general scheme that has an n-bit saturating counter
for each entry in the prediction buffer.
Saturating counter: Each time the branch is taken the counter is
incremented; if the maximum value is already reached, nothing is done.
Each time the branch is not taken the counter is decremented; if the
minimum value = 0 is reached it stays the same.
The first half of states i.e. from 0 to 2n – 1 – 1 of
saturating counter are treated as predict not-taken
whereas next half of states i.e. from 2n – 1 to 2n – 1 are
treated as predict taken.
Studies of n-bit predictors have shown that the 2-bit
predictors do almost as well.
47
Correlating Branches
Recent branches are correlated; that is, behaviour of recently-executed
branches affects prediction of current branch
if (d == 0)
d = 1;
if (d == 1)
…
OR
if (aa==2) -- branch b1
aa = 0;
if (bb==2) --- branch b2
bb = 0;
if(aa!=bb) { … --- branch b3

branch b3 – Clearly depends on the results of b1 and b2

If the first two branches are not taken then the third will be taken

Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
214 pages
CAP EndSem Unit 5
No ratings yet
CAP EndSem Unit 5
8 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
4 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Unit 5 Pipeline Hazard
No ratings yet
Unit 5 Pipeline Hazard
31 pages
4-Advanced Pipelining - 241114 - 060906
No ratings yet
4-Advanced Pipelining - 241114 - 060906
80 pages
Instruction Level Parallelism and Its Dynamic Exploitation
No ratings yet
Instruction Level Parallelism and Its Dynamic Exploitation
60 pages
Pipelining Hazards & Solutions
100% (1)
Pipelining Hazards & Solutions
8 pages
10 Pipelining
No ratings yet
10 Pipelining
44 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Slides Chapter 6 Pipelining
No ratings yet
Slides Chapter 6 Pipelining
60 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
CA7 2024S2 New
No ratings yet
CA7 2024S2 New
30 pages
03 Dynamic Sched
No ratings yet
03 Dynamic Sched
84 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
53 pages
Coa Unit 4
No ratings yet
Coa Unit 4
10 pages
CH 6
No ratings yet
CH 6
29 pages
Pipeline Hazards Detailed Notes
No ratings yet
Pipeline Hazards Detailed Notes
49 pages
Lec 2
No ratings yet
Lec 2
21 pages
CA Slides#5 Pipeline Hazards
No ratings yet
CA Slides#5 Pipeline Hazards
33 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
37 pages
Lecutre-7 Instruction Pipelining
No ratings yet
Lecutre-7 Instruction Pipelining
29 pages
CA Unit 3 Answers
No ratings yet
CA Unit 3 Answers
10 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Pipeline Conflicts
No ratings yet
Pipeline Conflicts
3 pages
Unit 6
No ratings yet
Unit 6
22 pages
31 Pipeline Hazards 25-04-2024
No ratings yet
31 Pipeline Hazards 25-04-2024
35 pages
Instruction Pipelining Overview
No ratings yet
Instruction Pipelining Overview
29 pages
U33
No ratings yet
U33
61 pages
Pipeline - Instr - Super Branch
No ratings yet
Pipeline - Instr - Super Branch
48 pages
L8 PipelineHazards 1
No ratings yet
L8 PipelineHazards 1
28 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
CBA Processor
No ratings yet
CBA Processor
21 pages
Hazards - V3
No ratings yet
Hazards - V3
34 pages
Kuliah 14 Pipeliningg
No ratings yet
Kuliah 14 Pipeliningg
28 pages
Computer Pipelining Explained
No ratings yet
Computer Pipelining Explained
45 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
50 pages
Lecture 13-14: Pipelines Hazards": Suggested Reading:" (HP Chapter 4.5-4.7) "
No ratings yet
Lecture 13-14: Pipelines Hazards": Suggested Reading:" (HP Chapter 4.5-4.7) "
51 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
No ratings yet
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
27 pages
Pooja Vashisth
No ratings yet
Pooja Vashisth
68 pages
CH14-WS - 10thed - Pipeline
No ratings yet
CH14-WS - 10thed - Pipeline
16 pages
Lecture 4.3 - The Processor - Pipelining
No ratings yet
Lecture 4.3 - The Processor - Pipelining
27 pages
Understanding Data Hazards in Pipelines
No ratings yet
Understanding Data Hazards in Pipelines
29 pages
Computer Architecture Insights
100% (1)
Computer Architecture Insights
55 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
CA-unit 4-Material
No ratings yet
CA-unit 4-Material
31 pages
CPU Architecture Essentials
No ratings yet
CPU Architecture Essentials
40 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
55 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Advanced Pipelining for CE Students
No ratings yet
Advanced Pipelining for CE Students
43 pages
Unit 5.2 Processor
No ratings yet
Unit 5.2 Processor
40 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Lec 4
No ratings yet
Lec 4
35 pages
The CPU & Memory - Design and Enhancement
No ratings yet
The CPU & Memory - Design and Enhancement
63 pages
Course Handout CS2203 - 2024
100% (1)
Course Handout CS2203 - 2024
6 pages
CS/COE 1541 Term 2174 Quiz 1: (Solutions)
No ratings yet
CS/COE 1541 Term 2174 Quiz 1: (Solutions)
2 pages
Module 4
No ratings yet
Module 4
12 pages
CCEE 213 - 2006 - 2007 - II - Final
No ratings yet
CCEE 213 - 2006 - 2007 - II - Final
10 pages
Compute Sanitizer
No ratings yet
Compute Sanitizer
50 pages
"The Color Purple" Essay Guide
100% (2)
"The Color Purple" Essay Guide
5 pages
Coa QB
No ratings yet
Coa QB
7 pages
Department of Cse CP7103 Multicore Architecture UNIT - 1, Fundamentals of Quantitative Design and Analysis 100% THEORY Question Bank
No ratings yet
Department of Cse CP7103 Multicore Architecture UNIT - 1, Fundamentals of Quantitative Design and Analysis 100% THEORY Question Bank
3 pages
cs3351 Dpco Unit 4
No ratings yet
cs3351 Dpco Unit 4
26 pages
Reduced Instruction Set Computers Pipelining: (RISC)
No ratings yet
Reduced Instruction Set Computers Pipelining: (RISC)
25 pages
MIPS Superscalar Simulator
No ratings yet
MIPS Superscalar Simulator
5 pages
MIPS
No ratings yet
MIPS
70 pages
Carl Hamacher CHPTR 8 Pipeline
No ratings yet
Carl Hamacher CHPTR 8 Pipeline
58 pages
ILP Hazards & Mitigation Strategies
No ratings yet
ILP Hazards & Mitigation Strategies
5 pages
Pipeline Architecture: C. V. Ramamoorthy
No ratings yet
Pipeline Architecture: C. V. Ramamoorthy
42 pages
Cao Unit-3 QB
No ratings yet
Cao Unit-3 QB
2 pages
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
100% (1)
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
5 pages
Pipelining: Advanced Computer Architecture
100% (1)
Pipelining: Advanced Computer Architecture
30 pages
RISC-V Processor Design Guide
No ratings yet
RISC-V Processor Design Guide
35 pages
Pipeline Qus
No ratings yet
Pipeline Qus
6 pages
Computer Architecture
No ratings yet
Computer Architecture
7 pages
Gayathri Report Content
No ratings yet
Gayathri Report Content
24 pages
CH04 Solution
No ratings yet
CH04 Solution
24 pages
Computer Architecture
No ratings yet
Computer Architecture
18 pages
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
No ratings yet
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
8 pages
Pipeline
No ratings yet
Pipeline
22 pages
Ambo University Waliso Campus: Dep:-Information Technology
No ratings yet
Ambo University Waliso Campus: Dep:-Information Technology
9 pages
BCA Second Smester TU Old Questions 2018 2020 2021
No ratings yet
BCA Second Smester TU Old Questions 2018 2020 2021
58 pages
Pipelining Hazards Explained
No ratings yet
Pipelining Hazards Explained
12 pages

Lecture 10 & 11

Uploaded by

Lecture 10 & 11

Uploaded by

1

CS-506 Advanced Computer Systems

If there is no hazard, the ALU’s operands will come from the

RAR : Read After Read

Before discussing data hazard classification, let us get

A read after write (RAW) data hazard refers to a situation

y tries to write a destination before it is read by x, so x

If a branch changes the PC to its target

Note: We will compare them for both

The attractiveness of this solution lies

Not Taken Taken Taken

2. Dynamic Branch Prediction

branch b3 – Clearly depends on the results of b1 and b2

You might also like