100% found this document useful (1 vote)

70 views15 pages

Pipeline Hazard Solutions

- The document discusses techniques for handling pipeline hazards caused by branch instructions, known as control hazards. - It describes flushing the pipeline as a simple technique but one that incurs significant performance penalties. - Three alternative techniques are presented: Predict Not Taken, Predict Taken, and Delayed Branch. These involve hardware predicting the outcome of branches and reordering instructions to minimize penalties. - Predict Not Taken assumes branches are not taken, allowing fetching to continue past the branch. Predict Taken assumes branches are taken but requires knowing the target address earlier. Delayed Branch inserts useful instructions between the branch and its target to allow resolving the target.

Uploaded by

Elisée Ndjabu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

70 views15 pages

Pipeline Hazard Solutions

Uploaded by

Elisée Ndjabu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Computer Architecture

Prof. Madhu Mutyam

Department of Computer Science And Engineering
Indian Institute of Technology, Madras

Module – 05
Lecture – 16
Handling Pipeline Hazards (Part 2)

So in the last module, we discussed a various pipeline hazards. And we also discussed that
among all the pipeline hazards, the control hazards are going to create significant
performance penalty. So, as a result, we need to look at the techniques to minimize the
penalty associated with this control hazards. So, we know that the control hazards happen
because of the branch instructions, which alter the normal flow of execution of the program.

(Refer Slide Time: 00:46)

So, simple technique one can use to deal with this control instructions is flushing the pipeline.
Here the processor will treat the branch instructions also like any other instruction. And as
soon as the target address is computed for a branch instruction then if target address is
different from the, the next instruction of the branch instruction then we flush the pipeline.
Otherwise we will continue executing the instructions as it is.

So, this is like a simple technique which will not add any extra hardware overhead to the
processor, but it is going to create a significant penalty in terms of performance as long as, if
our applications have large number of branches and most of the times the branches are taken.
So, as a result, like this flushing the pipeline is, though it is a simple technique, it may not
been efficient from the performance point of view.

So, rather than considering this flushing the pipeline, so we can consider a different type of
other techniques, where we can take the help of both the hardware as well as the software. So,
our underline hardware can implement specific technique, but the software, that is a compiler,
can exploit this underline feature, and then reorganize the code to minimize the penalties
associated with this branches.

So, first technique is Predict Not Taken. So here our hardware, the processor always assumes
that the branches are not going to take place. So, once we have this predict not taken, so
automatically after the branch instruction the processor fetches the next instruction in the
sequence, in the program order. And we know that after the branches fetch and after some
time the branch target will be computed and if this particular branch instruction is going to
take place. That means it is going to get a new target address and at that point of time our
prediction is wrong and as a result we have to flush the pipeline.

But as long as if the branch is not going to take place, then we are going to get the benefit
from this particular technique. So, in other words the simple technique where the hardware
side it always assumes all the branches are not going to take place. Note that here it is
independent of a branch instruction and so on. Application can have multiple branch
instructions, but once our hardware or the processor is designed in such a way that it always
considers this predict not taken, then it assumes that all the branches are not going to take
place.

So as a result, for any branch instruction in the program it, the processor always going to start
fetching the next instruction after this branch, and the control will alter only when our
prediction is wrong. And so as a result if the prediction is wrong then we flush the pipeline at
that point of time, and then start fetching the instruction from the target address. In other
words, in this process, in this Predict Not Taken technique processor state should not be
changed until the branch outcome is definitely known.

So, our compiler can exploit this behavior or compiler can know that the processor is
implementing all the branches in a Predict Not Taken fashion, and then it can reorganize the
code, so that it favors this predict not taken scenario. If the applications have branch
instructions, which are not going to take place most of the times then this technique will be
efficient from the performance point of view.

And this next technique is Predict Taken. For any branch instructions, the processor is going
to assume that the branch is going to take place and as a result, so in order to execute the
instructions from the target address, first of all we have to know the target address. So
effectively, once a branch instruction is fetched and after decoding we have to compute the
target address and only then we can start fetching the instructions from the target address.

So for example, the five stage pipeline whatever we considered in the previous module. So,
we cannot exploit any benefit from this Predict Taken technique and this Predict Taken
technique will be helpful only when the target address is known before the actual branch
outcome is released. So, as a result among these two, Predict Not Taken and Predict Taken,
the Predict Not Taken will be beneficial for improving the performance using our compiler
support.

And finally, we can also consider another technique called as the delayed branch. So, here
what we can do is, we insert delay slots between the branch instruction and the target
instruction or the next instruction following the branch. Effectively, we insert few delay slots
and these delay slots will be filled with useful instructions and again, since we are going to
take the help of compiler and compiler going to put useful instructions in these delay slots.

And we can consider any number of delay slots after the branch instruction, but to make the
design simple, we can consider one delay slot after the branch instruction. So, this delay slot
will immediately follow the branch instruction. So, as a result, once we have this Delayed
Branch Technique, our sequence of instructions in our code will become like first branch
instruction, then the sequential successor this is a delay slot and then the branch target if
taken.

So once we have this delay slot and if we keep useful instruction in this delay slot, so that
while the processor is busy executing these useful instruction that is there in the delay slot,
we compute the target address for the branch, if the branch is taken. And by the next cycle,
we already know the target address so if the branch is taken, we flush or we discard the
whatever instruction that was there in the delay slot, and then we continue with the target
address. And if the branch is not taken and if our delay slot is actually having the next
instruction to the branch instruction, then we will, we do not have to do anything and we just
continue executing the instructions in the program order.

Effectively so we can minimize the penalty associated with the branches by using these
Predict Not Taken, Predict Taken and Delay Branch mechanisms. And for these three
techniques, we take the help of compiler and compiler reorganizes the code based on the
technique what we implement. Once we have the support of hardware and the software and
we can minimize the penalties associated with the branches, and as a result we can improve
the overall performance. And note that in most of the benchmark applications the branch
instructions contribute to 22 to 25 percent of the instructions. And in a deeper pipelined
processor, if we are not going to take care of these branches then we will get a significant
performance penalty.

As a result, we need to look at efficient branch penalty minimization techniques to improve

the overall performance. By the way, so in the delayed branch technique, we are not supposed
to insert a branch instruction in the delay slot. The main reason here is, because if we keep
another branch instruction in the delay slot and we do not know what is the outcome of the
branch instruction? Whether the branch is going to take place or not take place. As a result
this is going to complicate the overall things and that is the reason why typically, we disallow
keeping a branch instruction in the delay slot. So, other than the branch instruction we can
keep any instruction in the delay slot. Now, we will discuss the Predict Not Taken with an
example.
(Refer Slide Time: 09:32)

So, consider a scenario where we have a non-taken branch instruction, we just following
through the five stage pipeline instruction fetch, instruction decode, execute, memory access
and the write back. And because this is Predict Not Taken technique, so that means our
underline processor is implementing the Predict Not Taken. That means for any branch
instruction in the program the processor always assumes that the branch is not going to take
place.

And then it can start fetching the instructions immediately after the branch instruction. So, as
a result, after the branch instruction is issued at time t. In time t+1 we fetch the next
instruction following the branch instruction that is instruction i+1. And that will be in the
instruction fetch stage, while the branch instruction is in the instruction decode. And then at
the end of the instruction decode we know that the branch is not taken, and as a result we
continue with instruction i+2, i+3, i+4 and so on.

So, as a result, so there is no problem as long as if the branch is not taken and our underline
processor is assuming all the branches are not going to take place. So effectively there is no
performance penalty and we get very good performance with this code. But consider another
scenario; still here also our processor is always assuming that the branches are not going to
take place. And but unfortunately the branch instruction is taken, as a result the branch
instruction has to go to a different target address rather than the next address in the sequence.
But because we issued branch instruction at time t, while the branch instruction is in the ID
stage we fetch instruction, which is following the branch instruction in the program order that
is instruction i+1. But at the end of ID stage for the branch instruction we came to know that
the branch is taken. So, as a result, we already computed the target address for this branch
instruction. Now we have to flush the pipeline, which is currently holding the instruction i+1.

In other words, we have to treat the instruction i+1 as no-op instruction and we re-fetch the
instructions starting from the target address specified by our branch instruction the ID stage.
As a result in the while the branch instruction in the execute stage, we are fetching a new
instructions specified by the branch target address, and that is in the instruction fetch stage.
And we will continue with the subsequent instructions with respect to this branch target
address.

In other words, when we are considering always Predict Not Taken technique, and if the
branches are taken then we are going to incur a penalty. And that penalty is, depends on what
time the branch instruction issued, and at what time the branch outcome is realized and at
what time the branch target address is computed. So, as a result we have to realize the target
address and we have to realize the outcome of the branch as quickly as possible, with respect
to the time at which the branch is issued so that the penalty can be minimized. Now we will
consider another example which illustrates the concept of delayed branch technique. In this
delayed branch technique,

(Refer Slide Time: 13:10)

So, if the branch is not taken, let us assume that the branch instruction is issued at time t and
this branch is not taken. So, when the branch instruction is in the ID stage, we are fetching an
instruction, which is there in the delayed slot. We assume that the delay slot is actually having
the instruction, which is following the branch instruction in the sequence. So, effectively
delay slot is having instruction i+1 and we fetch that instruction and we will continue, but at
the end of the ID stage for the branch instruction, we came to know that the branch is not
taken.

So, as a result we do not have to alter the program order and we just continue fetching the
subsequent instructions and then these instructions will go through the remaining pipeline
stages. So, there would not be any stalls and as a result no performance penalty we get. Now,
consider another scenario, if the branch instruction is taken, so in this case. So, what we have
done is in our the delay slot, we kept instruction which is following the branch instruction,
and at the end of the ID stage of the branch instruction we came to know that the branch is
taken.

So, as a result whatever the instructions we fetch, that is stored in the delay slot will be void
now and then we can continue with the target address instruction. But, remember if our delay
slot is actually going to store some instruction, which is independent of the branch condition
and also is an useful instruction in the program, then we can complete the execution of that
useful instruction without discarding it. As a result we do not have to waste this one cycle of
the pipeline, and as a result we can improve the performance.

So, the catch is in the delayed branch technique if we want to get better performance
improvement, then we have to fill the delay slot with some useful instructions. And again
because we are going to take the help of compiler to fill the slots, these delay slots then if the
compiler can identify independent instructions and those are useful then we can eliminate
even the wastage of this one pipeline cycle stage and we will get significant performance
improvement.

So, in other words among all the three techniques as long as if the compiler can find useful
instructions from the program, then the delayed branch technique will improve the overall
performance, compared to the other two techniques, Predict Not Taken and Predict Taken.
Now as mentioned earlier, if we can keep useful instruction in the delay slot, then we can
minimize the penalties associated with the control hazards significantly, but how do we select
useful instruction? So we will consider three scenarios here the first scenario is -

(Refer Slide Time: 16:12)

We have an add instruction, which is going to read to register contents R2 and R3, and then
perform the add operation and store the result in R1 and following this add instruction, we
have a compare, branch instruction which is going to compare R2 content with 0. And if
branch is equal to 0 then we are going to go to this target address, otherwise we are going to
execute the delay slot instruction. So, how the compiler can reorganize this code such that the
delay slot is filled with useful instructions.

So, if we see this code, we know that this ADD instruction is independent of this branch
instruction, because in both the instructions R2 is used but R2 source operant in add operation
and whereas the branch instruction is using R2 for comparing. So effectively, even if we
move this ADD instruction to the delay slot, we are not going to get any functional
incorrectness. So, as a result if we can rearrange the code like this, where our delay slot is
actually having this ADD instruction.

Now we can see if R2 is equal to 0 if the condition is true, but that condition we know
whether the condition branch is going to take place or not only in the second stage of the
execution. And by the time we already fetch the instruction that is there in the delay slot, but
now the delay slot is having useful instruction because we are supposed to perform this add
operation according to our original code. So as a result our delay slot is not wasted and it is
actually performing useful computation.

So, even if the branch is taken, here we are not wasting the delay slot and we are performing
useful computation. And after the once cycle delay, then we will know the target address and
then we start fetching the instruction form the target address. And the other case for example,
if the branch is not taken here, in that case so we first fetch the ADD instruction that is there
in the delay slot and after that we proceed with the remaining instructions after this branch
instruction. In both the cases we are not going to lose any performance, and as a result if we
can find useful instruction to keep in the delay slot we are going to gain significant
performance improvement.

But again not always we will have that luxury of finding independent instructions. So
compiler may not find useful instructions always, useful as well as independent instructions.
So, if we cannot find useful independent instructions, then we have go to the next option,
where we will consider another. Example here we have a subtract operation which performs a
subtract on contents of R5 and R6 and store the result into R4. And then we will perform an
add operation which performs add of contents of R2 and R3 and store the result in R1.

And there is a branch instruction which is going to compare whether R1 is equal to 0 or not.
If it is 0 then we are going back to this subtract operation and we will repeat. And if the
condition is false, then we will just follow the instructions that are there after this branch
instruction. Now we have a delay slot after this branch instruction and we have to fill this
delay slot with some useful instruction. So which instruction we can keep it here.

We can store this subtract operation in the delay slot, and if the condition is true that will be
known only in the ID stage of this branch instruction. And by the time we already fetch the
instruction that is there in the delay slot, which is nothing but the subtract instruction and
after fetching this if we know that the branch is taken then we have to go to the target
address, but now the previously the target address was pointing to the subtract instruction in
the original code. But now we pointed this target address to the next instruction following the
subtract operation. And we already executed the subtract operation.

So, effectively the functionality wise both are doing the same thing, we are not wasting the
delay slot when the branch is taken, but now consider a scenario where the branch is not
taken. So, according to the original code, if the branch is not taken, we are supposed to
execute instruction that is following this branch instruction. And we are not supposed to
execute this subtract operation, but after the code is transformed to this form then we know
that if the branch is not taken. Even then the immediate slot associated with this branch
instruction, we are fetching this subtract instruction. And so this is wasting one pipeline
cycle.

So, effectively if the branch is not taken in this scenario, we are going to waste one pipeline
cycle, but if the branch is taken then we are actually not wasting any cycles and then getting
the useful computation. So, compared to the first technique, this is not going to give you that
much performance improvement, but this will give you performance improvement as long as
branches are going to take place. Effectively, if most of our branches are going to take place
with high probability, then we can implement this type of technique in our code.

Consider an example here in the third case, we have an ADD instruction, which is performing
add of R2 and R3 and store the result in R1. And there is a branch instruction which is
comparing R1 content with 0, if it is true then we are going to execute the subtract
instruction, otherwise we are going to execute the OR instruction. And now see here this
branch instruction is using the operand, which is the destination operant for the add
instruction.

And similarly, even in the previous case also, here the branch instruction is going to take the
operand, which is the destination operand for add instruction. Effectively, in both the cases
our branch instruction is dependent on immediate previous instruction. So, as a result, we
cannot keep that previous instruction in the delay slot. And now in this third example, so what
we can do is, we can keep the OR instruction in the delay slot. So, this example will be
helpful for scenarios, where most of the time the branches are not going to take place. If the
branches are not going to take place with the high probability, we can always fill the delay
slot with the instruction which is following the branch instruction.

Now you can see clearly, if this branch is going to be false, that means it is not going to take
place then the delay slot is actually having the instruction, which is following the else path of
this condition. So, as a result we are performing the useful computation. And if the branch is
going to take place that means, if it is going to take the true path then we are supposed to
execute this instruction, but we are executing the OR instruction then we are going to get a
performance penalty, a performance penalty of one cycle. So effectively this third technique
will be helpful for the cases where branches are not going to take place with high probability.
And the second technique is helpful in the cases where the branches are going to take place
with high probability.

And the first case is actually helpful where if the compiler can identify, can find useful
independent instructions in the program. So, effectively with the help of the underlying, the
implementation for a handling the branches, the software or the compiler can re-arrange its
code, the rearrange the program code in such a way that, it can fill the delay slot with useful
instructions and then minimize the penalty associated with the branches.

And as I mentioned earlier, because the branch penalty is significant and it is critical for
improving the overall performance so, we have to minimize the branch penalties, and once
we do that then we can get a significant performance improvement. And we already discussed
in the previous module that, the speed at that we achieve with the pipeline design is

Pipeline Depth
Speedup Pipeline =
1+ ( Branch frequency ∗ Branch penalty )

assuming that we do not have any hazards associated with the data and the structural hazards.
And in that scenario, if we are going to minimize the branch penalty, we can get a significant
performance improvement with the pipeline as compared to a non pipeline design.

And so fore we discussed the techniques which are static in nature, that means when the
processor is designed in such a way that it always can assume that all the branches are not
going to take place or it can always assume that all branches are going to take place, or it can
do nothing with the branches, or it can insert a delay slot and the compiler can insert a delay
slot, useful instruction in the delay slot and so on. But here in all these techniques, we are not
worried about the outcome of the branch, we are just treating all the branches equal, but in
reality that is not a case.

So, in reality what happens is, same branch instruction when it is occurring several times in
the program execution can behave differently. One time it can take the branch next time it
may not take the branch and so on. So, as a result branch outcome can be one instance can be
true and in the other case it may be false. So, once you have that then you have to exploit that
behavior and then come out with efficient mechanisms to deal with these branch penalties.
In other words, these static techniques may not be helpful always, as we change the input to
the program then branch outcomes can change significantly, and as a result we need to come
up with dynamic mechanisms, which will adopt based on the run time conditions. And for
that there are several techniques proposed in the literature and all these belong to the class of
dynamic branch predictions, we have to predict the outcome of a branch at run time. And for
that we can take the help of hardware and we provide extra hardware components, one for
predicting the outcome of the branch and one for storing the target address.

Effectively, in the dynamic branch prediction technique, we are going to exploit the outcomes
of the branches, which happen at different times in the program execution and use this
previous outcome history, and predict the next outcome for the branch and accordingly we
just fetch the instruction either from the target address, or from the next instruction following
the branch.

(Refer Slide Time: 27:42)

So, consider sequence of branch outcomes for a particular branch instruction. Here T
indicates branch is taken, and N indicates branch is not taken. So, lets us assume that there is
a branch instruction which happened in the past 11 times, and then the outcomes for each of
the times is like a first time it is taken, the next it is not taken, next time taken and so on.
Once you have this history and if we can remember this history and then use this history to
predict the next outcome, if the same branch is going to happen next time. That is what
dynamic branch predictors are going to do.
Now, how much history we have to remember? Whether we take the decision based on the
immediate past decision, or the immediate past outcome, or are we going to consider the last
few outcomes and use that information? So, if you are going to remember large history of
outcomes for given specific branch instruction, then we are going to incur significant
hardware overhead. And because this branch predictor logic is going to be placed in the
processor, so it is going to occupy chip area and as a result so it is not good idea to increase
the overall overhead of the chip.

So, as a result we have to come up with efficient branch predictors, when I say efficient, it
has to provide with high accuracy, whether the branch is going to take place or not take place
and at the same time it should consume significant area overhead. Now, consider a simple
case, let us assume that we are going to remember only the last outcome of the branch, and
based on that we are going to take a decision. So that is nothing but predict the next outcome
of the branch based on its present outcome. In this particular scenario, if we use this logic.

For example, when we are here, previous the present outcome is not taken and this next time
again the same branch is going to come, based on this outcome if we are going to say next
time also the branch is not going to take place. But actually the branch is taken so effectively
there is a misprediction, and after that again the branch is not going to place, again there is a
misprediction and so on. So, effectively, this just based on the present outcome if you are
going to predict the next outcome, we may not get high accuracy branch predictor. And this
actually is called as a one bit branch predictor and the state diagram for this one bit branch
predictor will be something like this.

So, it has two states, the Predict Not Taken which is represented with 0 and Predict Taken
which is represented with 1. So, initially assume that we are in the 0 state and so because this
saying predict not taken, so we assume that the branch is not going to take place. And then we
will fetch the instruction accordingly, but after the branch outcome is realized if it is actually
not taken, then we will be in the same state and without changing the state. But if the branch
outcome says that the branch is taken, so the taken is represented with dotted arrow. So, there
is a state change from predict not taken to predict taken the state 1.

And after that next time if the same branch happens, we will look at the state diagram and its
state is 1. So, then we will assume that the branch is going to take place and we will fetch
instructions from the predicted target address, and if the prediction is wrong, then we are
again going back to this state and so on. Effectively we will oscillate between predict not
taken and predict taken if you are going to consider 1 bit history, and if the branch outcome is
something like this. So, effectively this 1 bit branch predictor is not providing high accuracy
and as a result we cannot use this.

So, in order to come up with a reasonable accuracy branch predictor we can actually go for
the 2 bit branch predictor, where we will consider four states and the state change happens
only when we predict wrongly or we predict correctly, two times consecutively then only we
change the state. So for example, here we can see our four states are 00, 01, 11 and 10.
Assume that we are in 00 state and 00 indicates that predict not taken, so we will be assuming
that the branch is not going to take place and we will fetch the instruction accordingly.

But if the branch is taken actually after we perform the branch computation, then we will go
to predict not taken with the state 01 using this dotted arrow. And still in this 01 state also
because it says predict is not taken, next time also if the same branch occurs then we are
going to assume that the branch is not going to take place, and we proceed further. But again
if here also, for example, the branch is taken, that means we actually mispredicted two times
then we will go to state where, the value is 11 and it is treated as predict taken.

So, effectively from this state predict not taken if we have two times misprediction happen,
then we will go to predict taken. And in this, we assume that the branch is going to take place
and the next time, when the branch instruction comes then using this state we treat the branch
is going to place, and we fetch the instruction from the predicted target address and proceed
further. For example, if the next time the branch is not taken then we will move from this
state to predict taken, but state number is 10 using this, the bold, the arrow not taken, and we
will continue in that.

Next, again if the next time the same branch happens and if it is not taken, then only we will
go to this predict not taken. Effectively we can see here from this state with a two
mispredictions, we are going to change the decision. Similarly, from this when there are two
mispredictions then we are going to the other state. So state change happens, state change
means here prediction decision changes, when there are two mispredictions happen
consecutively for our given specific branch instruction.

And we can actually come up with n bit branch predictor, where rather than considering two
bit we can have n bits, so that we can get high accuracy. But experimental results show that,
the two bit predictor gives a reasonable accuracy, and the extra hardware overhead we incur
because of this n bit predictor, where n is greater than 2 is not good compared to the benefits
we get in terms of accuracy. So, as a result we can, we can go with the two bit branch
predictor in our processor designs.

And to implement this branch predictors, so we can either associate a specific cache, a small
cache type of thing in the processor design, where this cache will store the branch history
information, branch history target. So, it remembers the previous outcomes of the branches
and next time when the same branch happens we index into the cache and then based on the
outcome we can take the decision. It remembers the state of the outcome, or we can also
implement this branch predictions using extra bits associated with the cache block, because
the branch instruction is part of a cache block in the instruction cache, and that entire block
will be associated with the state of this the state diagram. And using that state we can identify,
whether we can predict whether the branch is going to take place or not and accordingly we
can proceed further.

So, effectively the branch history table can be implemented either by using a special cache, or
by appending a set of bits to the cache block which is holding the branch instruction. In the
case of special cache, we are going to store this entire history information and whenever there
is a new branch instruction comes we index into the cache and then we will get the predicted
outcome and based on that we will proceed further.

This is going to tell you about the whether the branch is going to take place or not, but once a
branch is predicted to be taken, we have to know the target address. For that we also associate
an extra table in our processor that is called as a branch target address buffer. So, BTB branch
target buffer, which actually stores the predicted target address. Previously if the when the
same branch occurred, what was the target address it took? And that address will be stored in
this BTB, and next time if the same branch is going to take place, then we index into the BTB
and use that predicted address and fetch the instruction from the predicted address.

So, effectively in order to implement this dynamic branch prediction, we need hardware
components associated with the prediction logic, as well as the branch target buffer which
stores the target addresses. So, with that, so, I am concluding this module and the next
module, we are going to discuss simple pipeline implementation of MIPS ISA.

Thank you

Lec5 PDF
No ratings yet
Lec5 PDF
23 pages
Branch Prediction Techniques
No ratings yet
Branch Prediction Techniques
5 pages
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
No ratings yet
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
26 pages
ACA Notes
No ratings yet
ACA Notes
39 pages
Reducing Pipeline Branch Penalties
No ratings yet
Reducing Pipeline Branch Penalties
4 pages
CA Classes-155-160
No ratings yet
CA Classes-155-160
6 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
32 pages
CA Classes-151-155
No ratings yet
CA Classes-151-155
5 pages
Pipeline - Instr - Super Branch
No ratings yet
Pipeline - Instr - Super Branch
48 pages
Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties
No ratings yet
Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties
5 pages
Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties
No ratings yet
Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties
4 pages
06 - Branch Prediction Logic Notes Upload
No ratings yet
06 - Branch Prediction Logic Notes Upload
48 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
Branch Prediction
No ratings yet
Branch Prediction
38 pages
Lecture-6-13.01.2025 HPC
No ratings yet
Lecture-6-13.01.2025 HPC
17 pages
Instruction Hazards
No ratings yet
Instruction Hazards
20 pages
Pipeline Control Hazards Explained
No ratings yet
Pipeline Control Hazards Explained
20 pages
L9 PipelineHazards 2
No ratings yet
L9 PipelineHazards 2
21 pages
Branch Prediction
No ratings yet
Branch Prediction
2 pages
Control Hazard
No ratings yet
Control Hazard
4 pages
Branch Hazard in Pipelining
No ratings yet
Branch Hazard in Pipelining
35 pages
CA7 2024S2 New
No ratings yet
CA7 2024S2 New
30 pages
Conditional Branches
No ratings yet
Conditional Branches
35 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
53 pages
Slides Chapter 6 Pipelining
No ratings yet
Slides Chapter 6 Pipelining
60 pages
Kuliah 14 Pipeliningg
No ratings yet
Kuliah 14 Pipeliningg
28 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
4 pages
Branch Hazard.: Control Hazards
No ratings yet
Branch Hazard.: Control Hazards
4 pages
$RQ5E5IU
No ratings yet
$RQ5E5IU
9 pages
Co-4-2nd Part
No ratings yet
Co-4-2nd Part
4 pages
Group 17 - 2151177
No ratings yet
Group 17 - 2151177
15 pages
Pipe 3
No ratings yet
Pipe 3
32 pages
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
No ratings yet
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
27 pages
Anch Prediction
No ratings yet
Anch Prediction
183 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Pipelining Hazards 3
No ratings yet
Pipelining Hazards 3
14 pages
RFGHJ
No ratings yet
RFGHJ
20 pages
Computer Science 37 Lecture 22
No ratings yet
Computer Science 37 Lecture 22
14 pages
Computer Pipelining Explained
No ratings yet
Computer Pipelining Explained
45 pages
Module 5 Pipeline and Vector Processing
No ratings yet
Module 5 Pipeline and Vector Processing
71 pages
Pipelining and Performance Optimization
No ratings yet
Pipelining and Performance Optimization
30 pages
App C
No ratings yet
App C
50 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
CMP3010L04-Control Hazards Exceptions
No ratings yet
CMP3010L04-Control Hazards Exceptions
36 pages
Canvas Pipelining and Parallel Processors
No ratings yet
Canvas Pipelining and Parallel Processors
5 pages
More On Pipelining
100% (1)
More On Pipelining
34 pages
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
No ratings yet
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
93 pages
Branch Handling 1
No ratings yet
Branch Handling 1
50 pages
L13 MIPS Control Hazards
No ratings yet
L13 MIPS Control Hazards
40 pages
Lec14 Control Hazards
No ratings yet
Lec14 Control Hazards
48 pages
Co March
No ratings yet
Co March
26 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Saddam 024222000534-1194 Sec D-39th
No ratings yet
Saddam 024222000534-1194 Sec D-39th
14 pages
Lec-12 Pipelining
No ratings yet
Lec-12 Pipelining
44 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
Advanced Pipelining Techniques
No ratings yet
Advanced Pipelining Techniques
44 pages
The RISC-V Processor: Hakim Weatherspoon CS 3410
No ratings yet
The RISC-V Processor: Hakim Weatherspoon CS 3410
47 pages
Single-Cycle CPU Design Guide
No ratings yet
Single-Cycle CPU Design Guide
14 pages
Bias Tech (Self Bias CKTS)
No ratings yet
Bias Tech (Self Bias CKTS)
21 pages
Arithmetic Circuit Design Guide
No ratings yet
Arithmetic Circuit Design Guide
31 pages
Computer Architecture: MIPS Instruction Set Architecture
No ratings yet
Computer Architecture: MIPS Instruction Set Architecture
14 pages
Computer Architecture: MIPS Instruction Set Architecture
No ratings yet
Computer Architecture: MIPS Instruction Set Architecture
34 pages
Computer Architecture: Storage and Other I/O Topics
No ratings yet
Computer Architecture: Storage and Other I/O Topics
82 pages
Computer Architecture: Multiprocessors
No ratings yet
Computer Architecture: Multiprocessors
20 pages
Advanced Embedded Computer Architecture
No ratings yet
Advanced Embedded Computer Architecture
23 pages
EECS 252 Graduate Computer Architecture Lec 1 - Introduction
No ratings yet
EECS 252 Graduate Computer Architecture Lec 1 - Introduction
60 pages
Computer Architecture: Pipelining Basics
No ratings yet
Computer Architecture: Pipelining Basics
19 pages
Embedded Computer Architecture 5SAI0
No ratings yet
Embedded Computer Architecture 5SAI0
59 pages
Lec 36
No ratings yet
Lec 36
21 pages
Integral Calculus Made Easy (PDFDrive)
0% (1)
Integral Calculus Made Easy (PDFDrive)
674 pages
MIPS Pipeline Implementation
No ratings yet
MIPS Pipeline Implementation
10 pages
Lec 12
No ratings yet
Lec 12
17 pages
Computer Architecture Prof. Madhu Mutyam Department of Computer Science and Engineering Indian Institute of Technology, Madras
No ratings yet
Computer Architecture Prof. Madhu Mutyam Department of Computer Science and Engineering Indian Institute of Technology, Madras
14 pages
Lec 10
No ratings yet
Lec 10
13 pages
Computer Architecture Prof. Madhu Mutyam Department of Computer Science and Engineering Indian Institute of Technology, Madras
No ratings yet
Computer Architecture Prof. Madhu Mutyam Department of Computer Science and Engineering Indian Institute of Technology, Madras
11 pages
Architectural Concrete
No ratings yet
Architectural Concrete
24 pages
Shaby Shop Rent Details
No ratings yet
Shaby Shop Rent Details
6 pages
Kangen Water - Change Your Water.. Change Your Life! Presentation
No ratings yet
Kangen Water - Change Your Water.. Change Your Life! Presentation
22 pages
Lecture 5: Basic Probability Theory: Donglei Du (Ddu@unb - Edu)
No ratings yet
Lecture 5: Basic Probability Theory: Donglei Du (Ddu@unb - Edu)
55 pages
PPT
No ratings yet
PPT
22 pages
Matrix Algebra for Management
No ratings yet
Matrix Algebra for Management
24 pages
SPM Chemistry Revision Guide
100% (1)
SPM Chemistry Revision Guide
36 pages
Flash On English For Tourism - Answer Key and Transcripts: Unit 1, Pp. 4-7
100% (1)
Flash On English For Tourism - Answer Key and Transcripts: Unit 1, Pp. 4-7
15 pages
Power of Language
No ratings yet
Power of Language
50 pages
D9R Hydraulic System
No ratings yet
D9R Hydraulic System
24 pages
2019-Planning and Execution of Knowledge Management Assist Visits For Nuclear Organizations - Tecdoc-1880
No ratings yet
2019-Planning and Execution of Knowledge Management Assist Visits For Nuclear Organizations - Tecdoc-1880
66 pages
Gps Literature Review
67% (3)
Gps Literature Review
6 pages
Digital Marketing & Content Writing
No ratings yet
Digital Marketing & Content Writing
11 pages
Detailed Lesson Plan (DLP) Format: (With Inclusion of The Provisions of D.O. No. 8, S. 2015 and D.O. 42, S. 2016)
No ratings yet
Detailed Lesson Plan (DLP) Format: (With Inclusion of The Provisions of D.O. No. 8, S. 2015 and D.O. 42, S. 2016)
9 pages
Alcohol Protecting Groups Guide
No ratings yet
Alcohol Protecting Groups Guide
15 pages
Planning and Designing Practicals 2022-2023
No ratings yet
Planning and Designing Practicals 2022-2023
3 pages
Unit 4 Instructed
No ratings yet
Unit 4 Instructed
11 pages
HSN Code
No ratings yet
HSN Code
6 pages
SOFI 2024 Web
No ratings yet
SOFI 2024 Web
90 pages
What Is An Interrobang
No ratings yet
What Is An Interrobang
6 pages
Module 1 - Technical Terms in Research - Quarter 4
No ratings yet
Module 1 - Technical Terms in Research - Quarter 4
48 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
2 pages
What Is ASP?: Features of ASP in Terms of Syntax and Grammar
No ratings yet
What Is ASP?: Features of ASP in Terms of Syntax and Grammar
10 pages
Ultrasonic Pulse Velocity in Concrete NDT
No ratings yet
Ultrasonic Pulse Velocity in Concrete NDT
24 pages
Aquaculture's Role in the Philippines
No ratings yet
Aquaculture's Role in the Philippines
50 pages
Play Court Quest - Judicial System Game - Icivics
No ratings yet
Play Court Quest - Judicial System Game - Icivics
2 pages
Under - The - Moon AK
No ratings yet
Under - The - Moon AK
10 pages
FBB Macro Calculation Guide
No ratings yet
FBB Macro Calculation Guide
25 pages
Mobileye 8 Connect User Manual v0.5
No ratings yet
Mobileye 8 Connect User Manual v0.5
51 pages
EBOOK - Netguru - Culture Book PDF
No ratings yet
EBOOK - Netguru - Culture Book PDF
50 pages

Pipeline Hazard Solutions

Uploaded by

Pipeline Hazard Solutions

Uploaded by

Computer Architecture

Prof. Madhu Mutyam

(Refer Slide Time: 00:46)

As a result, we need to look at efficient branch penalty minimization techniques to improve

(Refer Slide Time: 13:10)

(Refer Slide Time: 16:12)

(Refer Slide Time: 27:42)

You might also like