0% found this document useful (0 votes)
10 views43 pages

Pipe Lining

Uploaded by

bader.k.othman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views43 pages

Pipe Lining

Uploaded by

bader.k.othman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Pipelining

▪ Assuming you’ve got:


— One washer (takes 30 minutes)

— One drier (takes 40 minutes)

— One “folder” (takes 20 minutes)

▪ It takes 90 minutes to wash, dry, and fold 1 load of laundry.


— How long does 4 loads take?

1
The slow way

6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20

▪ If each load is done sequentially it takes 6 hours

2
Laundry Pipelining
▪ Start each load as soon as possible
— Overlap loads
6 PM 7 8 9 10 11 Midnight

Time

30 40 40 40 40 20

▪ Pipelined laundry takes 3.5 hours


3
Pipelining Lessons

▪ Pipelining doesn’t help latency of


6 PM 7 8 9
single load, it helps throughput of
Time entire workload
▪ Pipeline rate limited by slowest
30 40 40 40 40 20 pipeline stage
▪ Multiple tasks operating
simultaneously using different
resources
▪ Potential speedup = Number pipe
stages

4
Instruction execution review
▪ Executing a MIPS instruction can take up to five steps.
Step Name Description
Instruction Fetch IF Read an instruction from memory.
Instruction Decode ID Read source registers and generate control signals.
Execute EX Compute an R-type result or a branch outcome.
Memory MEM Read or write the data memory.
Writeback WB Store a result in the destination register.

▪ However, as we saw, not all instructions need all five steps.


Instruction Steps required
beq IF ID EX
R-type IF ID EX WB
sw IF ID EX MEM
lw IF ID EX MEM WB

5
Single-cycle datapath diagram

0
M
PC Add u
x
4
Add 1
Shift
1ns left 2
PCSrc
RegWrite 2ns
Read Instruction
address [31-0]
I [25 - 21]
Read Read
2ns MemWrite MemToReg

register 1 data 1 Read Read


ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Data 0
u register Write
Registers memory
I [15 - 11] x data
2ns 1
Write
data
1 ALUOp

MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

6
Single-cycle review
▪ All five execution steps occur in one clock cycle.
▪ This means the cycle time must be long enough to accommodate all the
steps of the most complex instruction—a “lw” in our instruction set.
— If the register file has a 1ns latency and the memories and ALU have a
2ns latency, “lw” will require 8ns.
— Thus all instructions will take 8ns to execute.
▪ Each hardware element can only be used once per clock cycle.
— A “lw” or “sw” must access memory twice (in the IF and MEM stages),
so there are separate instruction and data memories.
— There are multiple adders, since each instruction increments the PC
(IF) and performs another computation (EX). On top of that, branches
also need to compute a target address.

7
Example: Instruction Fetch (IF)
▪ Let’s quickly review how lw is executed in the single-cycle datapath.
▪ We’ll ignore PC incrementing and branching for now.
▪ In the Instruction Fetch (IF) step, we read the instruction memory.

RegWrite

Read Instruction MemWrite MemToReg


I [25 - 21]
address [31-0] Read Read
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

8
Instruction Decode (ID)
▪ The Instruction Decode (ID) step reads the source registers from the
register file.

RegWrite

Read Instruction MemWrite MemToReg


I [25 - 21]
address [31-0] Read Read
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

9
Execute (EX)
▪ The third step, Execute (EX), computes the effective memory address
from the source register and the instruction’s constant field.

RegWrite

Read Instruction MemWrite MemToReg


I [25 - 21]
address [31-0] Read Read
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

10
Memory (MEM)
▪ The Memory (MEM) step involves reading the data memory, from the
address computed by the ALU.

RegWrite

Read Instruction MemWrite MemToReg


I [25 - 21]
address [31-0] Read Read
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

11
Writeback (WB)
▪ Finally, in the Writeback (WB) step, the memory value is stored into the
destination register.

RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
register 2 Read 0 Write u
memory 0 Result
data 2 M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

12
A bunch of lazy functional units
▪ Notice that each execution step uses a different functional unit.
▪ In other words, the main units are idle for most of the 8ns cycle!
— The instruction RAM is used for just 2ns at the start of the cycle.
— Registers are read once in ID (1ns), and written once in WB (1ns).
— The ALU is used for 2ns near the middle of the cycle.
— Reading the data memory only takes 2ns as well.
▪ That’s a lot of hardware sitting around doing nothing.

13
Putting those slackers to work
▪ We shouldn’t have to wait for the entire instruction to complete before
we can re-use the functional units.
▪ For example, the instruction memory is free in the Instruction Decode
step as shown below, so...

Idle Instruction Decode (ID)


RegWrite

Read Instruction MemWrite MemToReg


I [25 - 21]
address [31-0] Read Read
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

14
Decoding and fetching together
▪ Why don’t we go ahead and fetch the next instruction while we’re
decoding the first one?

Fetch 2nd Decode 1st instruction


RegWrite

Read Instruction MemWrite MemToReg


I [25 - 21]
address [31-0] Read Read
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

15
Executing, decoding and fetching
▪ Similarly, once the first instruction enters its Execute stage, we can go
ahead and decode the second instruction.
▪ But now the instruction memory is free again, so we can fetch the third
instruction!

Fetch 3rd Decode 2nd Execute 1st


RegWrite

Read Instruction MemWrite MemToReg


I [25 - 21]
address [31-0] Read Read
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

16
Making Pipelining Work
▪ We’ll make our pipeline 5 stages long, to handle load instructions as they
were handled in the multi-cycle implementation
— Stages are: IF, ID, EX, MEM, and WB
▪ We want to support executing 5 instructions simultaneously: one in each
stage.

17
Break datapath into 5 stages
▪ Each stage has its own functional units.
▪ Each stage can execute in 2ns
— Just like the multi-cycle implementation

IF ID EXE MEM WB
RegWrite

Read Instruction MemWrite MemToReg


I [25 - 21]
address [31-0] Read Read
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

2ns 2ns 2ns 2ns


18
Pipelining Loads
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
lw $t1, 8($sp) IF ID EX MEM WB
lw $t2, 12($sp) IF ID EX MEM WB
lw $t3, 16($sp) IF ID EX MEM WB
lw $t4, 20($sp) IF ID EX MEM WB

6 PM 7 8 9
Time

30 40 40 40 40 20

19
A pipeline diagram
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $sp, $sp, -4 IF ID EX MEM WB

▪ A pipeline diagram shows the execution of a series of instructions.


— The instruction sequence is shown vertically, from top to bottom.
— Clock cycles are shown horizontally, from left to right.
— Each instruction is divided into its component stages. (We show five
stages for every instruction, which will make the control unit easier.)
▪ This clearly indicates the overlapping of instructions. For example, there
are three instructions active in the third cycle above.
— The “lw” instruction is in its Execute stage.
— Simultaneously, the “sub” is in its Instruction Decode stage.
— Also, the “and” instruction is just being fetched.
20
Pipeline terminology
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $sp, $sp, -4 IF ID EX MEM WB

filling full emptying

▪ The pipeline depth is the number of stages—in this case, five.


▪ In the first four cycles here, the pipeline is filling, since there are unused
functional units.
▪ In cycle 5, the pipeline is full. Five instructions are being executed
simultaneously, so all hardware units are in use.
▪ In cycles 6-9, the pipeline is emptying.

21
Pipelining Performance

Speedup=2400/1400~2
22
Pipeline Datapath: Resource Requirements
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
lw $t1, 8($sp) IF ID EX MEM WB
lw $t2, 12($sp) IF ID EX MEM WB
lw $t3, 16($sp) IF ID EX MEM WB
lw $t4, 20($sp) IF ID EX MEM WB

▪ We need to perform several operations in the same cycle.


— Increment the PC and add registers at the same time.
— Fetch one instruction while another one reads or writes data.
▪ Thus, like the single-cycle datapath, a pipelined processor duplicates
hardware elements that are needed several times in the same clock
cycle.

23
Pipelining other instruction types

▪ R-type instructions only require 4 stages: IF, ID, EX, and WB


— We don’t need the MEM stage
▪ What happens if we try to pipeline loads with R-type instructions?

Clock cycle
1 2 3 4 5 6 7 8 9
add $sp, $sp, -4 IF ID EX WB
sub $v0, $a0, $a1 IF ID EX WB
lw $t0, 4($sp) IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX WB
lw $t1, 8($sp) IF ID EX MEM WB

24
Important Observation

▪ Each functional unit can only be used once per instruction


▪ Each functional unit must be used at the same stage for all instructions.
See the problem if:
— Load uses Register File’s Write Port during its 5th stage
— R-type uses Register File’s Write Port during its 4th stage

Clock cycle
1 2 3 4 5 6 7 8 9
add $sp, $sp, -4 IF ID EX WB
sub $v0, $a0, $a1 IF ID EX WB
lw $t0, 4($sp) IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX WB
lw $t1, 8($sp) IF ID EX MEM WB

25
A solution: Insert NOP stages

▪ Enforce uniformity
— Make all instructions take 5 cycles.
— Make them have the same stages, in the same order
• Some stages will do nothing for some instructions
R-type IF ID EX NOP WB

Clock cycle
1 2 3 4 5 6 7 8 9
add $sp, $sp, -4 IF ID EX NOP WB
sub $v0, $a0, $a1 IF ID EX NOP WB
lw $t0, 4($sp) IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX NOP WB
lw $t1, 8($sp) IF ID EX MEM WB

• Stores and Branches have NOP stages, too…


store IF ID EX MEM NOP
branch IF ID EX NOP NOP

26
Summary

▪ Pipelining attempts to maximize instruction throughput by overlapping


the execution of multiple instructions.
▪ Pipelining offers amazing speedup.
— In the best case, one instruction finishes on every cycle, and the
speedup is equal to the pipeline depth.
▪ The pipeline datapath is much like the single-cycle one, but with added
pipeline registers
— Each stage needs is own functional units
▪ Now we’ll see the pipelined data path and control with an
example execution.

27
Pipelined datapath
1

PCSrc

IF/ID ID/EX EX/MEM MEM/WB


4
Add
P
C Add
Shift
RegWrite left 2

Read Read
register 1 data 1 ALU MemWrite
Read Instruction Zero
address [31-0] Read Read
0 Result Address
register 2 data 2
Write Data
Instruction register 1 MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
data data 1

Instr [15 - 0] Sign


RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1

28
What about control signals?
▪ The control signals are generated in the same way as in the single-cycle
processor—after an instruction is fetched, the processor decodes it and
produces the appropriate control values.
▪ But just like before, some of the control signals will not be needed until
some later stage and clock cycle.
▪ These signals must be propagated through the pipeline until they reach
the appropriate stage. We can just pass them in the pipeline registers,
along with the other data.
▪ Control signals can be categorized by the pipeline stage that uses them.

Stage Control signals needed


EX ALUSrc ALUOp RegDst
MEM MemRead MemWrite PCSrc
WB RegWrite MemToReg

29
Pipelined datapath and control
1

0
ID/EX
WB EX/MEM
PCSrc
Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C Add
Shift
RegWrite left 2

Read Read
register 1 data 1 ALU MemWrite
Read Instruction Zero
address [31-0] Read Read
0 Result Address
register 2 data 2
Write Data
Instruction register 1 MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
data data 1

Instr [15 - 0] Sign


RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1

30
An example execution sequence
▪ Here’s a sample sequence of instructions to execute.
1000: lw $8, 4($29)
addresses in 1004: sub $2, $4, $5
decimal 1008: and $9, $10, $11
1012: or $16, $17, $18
1016: add $13, $14, $0

▪ We’ll make some assumptions, just so we can show actual data values.
— Each register contains its number plus 100. For instance, register $8
contains 108, register $29 contains 129, and so forth.
— Every data memory location contains 99.
▪ Our pipeline diagrams will follow some conventions.
— An X indicates values that aren’t important, like the constant field of
an R-type instruction.
— Question marks ??? indicate values we don’t know, usually resulting
from instructions coming before and after the ones in our example.

31
Cycle 1 (filling)
IF: lw $8, 4($29) ID: ??? EX: ??? MEM: ??? WB: ???

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
1004
C Add
Shift
RegWrite (?) left 2

??? ??? ???


1000 Read Read
register 1 data 1 ALU MemWrite (?)
Read Instruction Zero
??? ???
address [31-0] Read Read ??? ???
0 Result Address
register 2 data 2
??? MemToReg
??? Write Data
Instruction register 1 (?)
memory
memory ??? Registers ALUOp (???)
Write ???
data ALUSrc (?) ??? Write Read
data data 1

??? Sign ???


RegDst (?)
extend MemRead (?) ???
0
??? ???
0 ??? ??? ???
??? ???
1

???

32
Cycle 2
IF: sub $2, $4, $5 ID: lw $8, 4($29) EX: ??? MEM: ??? WB: ???

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C 1008
Add
Shift
RegWrite (?) left 2

29 ???
1004 Read Read
register 1 data 1 ALU MemWrite (?)
Read Instruction Zero
X X ???
address [31-0] Read Read ???
0 Result Address
register 2 data 2
Write ??? MemToReg
??? Data
Instruction register 1 (?)
memory
memory ??? Registers ALUOp (???)
Write ???
data ALUSrc (?) ??? Write Read
data data 1

4 Sign ???
RegDst (?) ???
extend MemRead (?)
0
8 ???
0 ??? ??? ???
X ???
1

???

33
Cycle 3
IF: and $9, $10, $11 ID: sub $2, $4, $5 EX: lw $8, 4($29) MEM: ??? WB: ???

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
1012
C Add
Shift
RegWrite (?) left 2

4 104 129
1008 Read Read
register 1 data 1 ALU MemWrite (?)
Read Instruction Zero
5 X
address [31-0] Read Read 105 ???
0 Result Address
register 2 data 2
??? Write 133 MemToReg
Data
Instruction register 1 (?)
memory
memory ??? Registers ALUOp (add)
Write
??? Write Read ???
data ALUSrc (1)

X Sign 4
RegDst (0)
extend MemRead (?) ???
0
X 8
0 8 ??? ???
2 X
1

???

34
Cycle 4
IF: or $16, $17, $18 ID: and $9, $10, $11 EX: sub $2, $4, $5 MEM: lw $8, 4($29) WB: ???

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
1016
C Add
Shift
RegWrite (?) left 2

10 110 104
1012 Read Read
register 1 data 1 ALU MemWrite (0)
Read Instruction Zero
11 105
address [31-0] Read Read 111 133
0 Result Address
register 2 data 2
–1
??? Write Data MemToReg
Instruction register 1 (?)
memory
memory ??? Registers ALUOp (sub)
Write
data ALUSrc (0) X Write Read 99 ???
data data 1

X Sign X
RegDst (1)
extend MemRead (1) ???
0
X X 0 2 8 ???
9 2
1

???

20
Cycle 5 (full)
IF: add $13, $14, $0 ID: or $16, $17, $18 EX: and $9, $10, $11 MEM: sub $2, $4, $5 WB:
lw $8, 4($29)
1

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C 1020
Add
Shift
RegWrite (1) left 2

17 117 110
1016 Read Read
register 1 data 1 ALU MemWrite (0)
Read Instruction Zero
18 111
address [31-0] Read Read 118 -1
0 Result Address
register 2 data 2
8 Write 110 Data MemToReg
Instruction register 1 (1)
memory
memory 99 Registers ALUOp (and)
Write
data ALUSrc (0) 105 Write Read X 99
data data 1

X Sign X
RegDst (1)
extend MemRead (0) 133
0
X X 0 9 2 8
16 9
1

99

36
Cycle 6 (emptying)
IF: ??? ID: add $13, $14, $0 EX: or $16, $17, $18 MEM: and $9, $10, $11 WB: sub
$2, $4, $5
1

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
???
C Add
Shift
RegWrite (1) left 2
14 114
117
1020 Read Read
register 1 data 1 ALU MemWrite (0)
Zero
Read Instruction 0 118
Read Read 0 110
address [31-0] 0
register 2 data 2 Result Address
2 119 MemToReg
Write
1 Data
Instruction register (0)
-1 memory
memory Registers ALUOp (or)
Write X
data ALUSrc (0) 111 Write Read
1
data data
X Sign X
RegDst (1)
extend MemRead (0)
0
X X 16 9
0
13 16
1

37
Cycle 7
IF: ??? ID: ??? EX: add $13, $14, $0 MEM: or $16, $17, $18 WB: and
$9, $10, $11
1

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C ???
Add
Shift
RegWrite (1) left 2

??? ??? 114


??? Read Read
register 1 data 1 ALU MemWrite (0)
Read Instruction Zero
??? 0
address [31-0] Read Read ??? 119
0 Result Address
register 2 data 2
9 Write 114 Data MemToReg
Instruction register 1 (0)
memory
memory 110 Registers ALUOp (add)
Write
data ALUSrc (0) 118 Write Read X X

??? Sign X
RegDst (1)
extend MemRead (0) 110
0
??? X 0 13 16 9
??? 13
1

110

38
Cycle 8
IF: ??? ID: ??? EX: ??? MEM: add $13, $14, $0 WB: or $16,
$17, $18
1

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C ???
Add
Shift
RegWrite (1) left 2

??? ??? ???


??? Read Read
register 1 data 1 ALU MemWrite (0)
Read Instruction Zero
??? ???
address [31-0] Read Read ??? 114
0 Result Address
register 2 data 2
16 Write ??? Data MemToReg
Instruction register 1 (0)
memory
memory 119 Registers ALUOp (???)
Write
data ALUSrc (?) 0 Write Read X X

??? Sign ???


RegDst (?)
extend MemRead (0) 119
0
??? ???
0 ??? 13 16
??? ???
1

119

39
Cycle 9
IF: ??? ID: ??? EX: ??? MEM: ??? WB: add
$13, $14, $0
1

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C ???
Add
Shift
RegWrite (1) left 2

??? ??? ???


??? Read Read
register 1 data 1 ALU MemWrite (?)
Read Instruction Zero
??? ???
address [31-0] Read Read ??? ???
0 Result Address
register 2 data 2
13 Write ??? Data MemToReg
Instruction register 1 (0)
memory
memory 114 Registers ALUOp (???)
Write
data ALUSrc (?) ? Write Read X X
data data 1

??? Sign ???


RegDst (?)
extend MemRead (?) 114
0
??? ???
0 ??? ??? 13
??? ???
1

114

40
Performance Revisited
▪ Assuming the following functional unit latencies:

3ns 2ns 2ns 3ns 2ns


Inst Reg
Data Reg

ALU
mem Read
Mem Write

▪ What is the cycle time of a single-cycle implementation?


Add them (3+2+2+3+2) = 12 ns
— What is its throughput?

▪ What is the cycle time of a ideal pipelined implementation?


Take max. time = 3ns
(even if we don’t have number of stages)
— What is its steady-state throughput?

41
The pipelining paradox

lw $t0, 4($sp) Clock cycle


sub $v0, $a0, $a1 1 2 3 4 5 6 7 8 9
IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $sp, $sp, -4 IF ID EX MEM WB
IF ID EX MEM WB
▪ Pipelining does not improve the execution time of any single instruction. Each
instruction here actually takes longer to execute than in a single- cycle datapath
(15ns vs. 12ns)!
▪ Instead, pipelining increases the throughput, or the amount of work done per unit
time. Here, several instructions are executed together in each clock cycle.
▪ The result is improved execution time for a sequence of instructions, such as an
entire program.

42
Pipeline Hazards

• Data hazards
– Dependency: Instruction depends on the result of a previous instruction
still in the pipeline
Add $s0, $t0, $t1

Sub $t2, $s0, $t3


– Stall: add three bubbles (no-ops) to the pipeline
• Structural
– Different instructions trying to use the same functional unit (e.g. memory,
register file)
• Control (branches)
– Target address known only at the end of 3rd cycle => STALLS

You might also like