Stage delay 1
A 4 stage pipeline has stage delays as 150, 120, 160 and 140 ns. Latches are
used between stages and have a delay of 5 ns each. Assuming constant clock
rate, total time taken to process 1000 data items will be?
Stage delay 2
The stage delays in a 4 stage pipeline are 5, 6, 4 and 5 ns. The second stage with
is replaced with 2 functionally equivalent stages, of delay 3 and 4 ns. The
throughput increase of the pipeline is?
Stage delay 3
Consider the following processors. Assume latches have 0 latency. Which
processor has the highest clock frequency?
(i) 4 stage pipeline with stage latencies 1, 2, 2, 1 ns
(ii) 4 stage pipeline with stage latencies 1, 1.5, 1.5, 1.5 ns
(iii) 5 stage pipeline with stage latencies 0.5, 1, 1, 0.6, 1 ns
(iv) 5 stage pipeline with stage latencies 0.5, 0.5, 1, 1, 1.1 ns
Stage delay 4
Calculate cycle time, latency, throughput of pipelined vs. non-pipelined
processors. Latch latency is 20 ps. If you could split any stage for the pipelined
into two equal halves, which one would you choose? What would be the new cycle
time, latency, throughput?
fetch decode execute memory writeback
a. 300ps 400ps 350ps 550ps 100ps
b. 200ps 150ps 100ps 190ps 140ps
Stage delay 5
A non-pipelined processor has a clock rate of 2.5 GHz and average CPI of 4. The
processor is now pipelined to have 5 stages, but the clock speed is reduced to 2
GHz. Assuming no stalls in the pipelined processor, what is its speedup over
non-pipelined?
Stage delay 6
What is the speedup of this pipeline under ideal conditions in steady state,
compared to non-pipelined?
Latch Stage Latch Stage Latch Stage Latch Stage
l1 s1 l2 s2 l3 s3 l4 s4
delay delay delay delay delay delay delay delay
1ns 5ns 1ns 6ns 1ns 11ns 1ns 8ns
Stage delay 7
D1 has 5 pipeline stages with delays 3, 2, 4, 2, 3 ns. D2 has 8 pipeline stages
each with 2 ns execution time. How much time can D2 save over D1, in executing
100 instructions?
Stage delay 8
A non-pipelined single cycle processor operating at 100 MHz, is converted to a
synchronous pipelined processor with 5 stages requiring 2.5, 1.5, 2, 1.5 and 2.5
ns. The delay of a latch is 0.5 ns. Find the speedup of the pipelined processor for
a large number of instructions.
Data hazard 1
lw R15, 0(R2)
add R14, R15, R15
lw R16, 4(R2)
add R17, R16, R16
Where do we need NOPs in a standard 5 stage pipeline with forwarding? Can the
compiler do any instruction reordering to reduce NOPs?
Data hazard 2
An otherwise ideal 5-stage pipeline has a 1 cycle load-to-use delay (that is you
have to wait 1 cycle between a load, and an ALU operation that uses the result of
the load). A program has 20% loads, but the compiler can find independent
instructions to put after the load, for only half of them. What is the slowdown due
to NOPs?
Data hazard 3
How long will the following code take on a 5 stage pipeline, with appropriate
forwarding and NOPs?
add R5, R5, R7
lw R6, 100(R7)
sub R7, R6, R8
Data hazard 4
Fill in the pipeline, with and without forwarding:
add R4, R5, R2
lw R15, 0(R4)
sw R15, 4(R2)
Control hazard 1
What states do 1-bit and 2-bit branch predictors go through for the following
branch patterns? Assume that they start in strongly not taken 00 state.
History: NNNTNNN History: NTNTNTN