The Pipeline
Module Outline
●
Why Pipeline?
– How to pipeline?
●
Speedup of the pipeline
●
Pipelined datapath
– Execution of instructions
– Pipeline Timing diagram
●
Dependences, Hazards
– Structural, Data, Control
– Stalling, Forwarding
●
Branch prediction
Car Assembly Line Example
●
Unpipelined Assembly Line
– Team of engineers build a full car
–
Car Assembly Line Example
●
Unpipelined Assembly Line
– Team of engineers build a full car
Car 1
24 48 72 96
Car Assembly Line Example
●
Unpipelined Assembly Line
– Team of engineers build a full car
Car 1
Car 2
Car 3
Car 4
24 48 72 96
Car Assembly Line Example
●
Pipelined Assembly Line
– 3 (equal) stages to build a Car
– Stages don’t interfere
– Stage 2 consumes previous stage’s output
Car Assembly Line Example
●
Pipelined Assembly Line
– 3 (equal) stages to build a Car
– Split the team into 3. Each team works on one
stage only
– 8H per stage
Car Assembly Line Example
●
Pipelined Assembly Line
– Car 1-Stage 1
C1-S1
24 48 72 96
Car Assembly Line Example
●
Pipelined Assembly Line
– Car 1 moves to Stage 2
C1-S1 C1-S2
24 48 72 96
Car Assembly Line Example
●
Pipelined Assembly Line
– Team 1 is free. Takes up Car 2
– Team 1 and 2 are working
C1-S1 C1-S2
C2-S1
24 48 72 96
Car Assembly Line Example
●
Pipelined Assembly Line
– 3 cars are in different stages of production
C1-S1 C1-S2 C1-S3
C2-S1 C2-S2
C3-S1
24 48 72 96
Car Assembly Line Example
●
Pipelined Assembly Line
– 3 cars are in different stages of production
– Car 1 rolls out
C1-S1 C1-S2 C1-S3
C2-S1 C2-S2
C3-S1
24 48 72 96
Car Assembly Line Example
●
Pipelined Assembly Line
– Steady state
C1-S1 C1-S2 C1-S3
C2-S1 C2-S2 C2-S3
C3-S1 C3-S2 C3-S3
C4-S1 C4-S2 ...
C5-S1 ...
24 48 72 96
Car Assembly Line Example
●
Pipelined Assembly Line
– One car rolls out every 8 hours (24/3)
C1-S1 C1-S2 C1-S3
C2-S1 C2-S2 C2-S3
C3-S1 C3-S2 C3-S3
C4-S1 C4-S2 ...
C5-S1 ...
24 48 72 96
Car Assembly Line Example
●
Non-pipelined vs. Pipelined implementation
●
Speedup
The Processor Datapath
The Processor Datapath
Execution and Timing – Load
Execution and Timing – Loads
Instruction
InstructionFetch
Fetch(IF)
(IF)
Execution and Timing – Loads
ID:
ID:Instruction
Instructiondecode/
decode/
Register file read
Register file read
Execution and Timing – Loads
EX:
EX:Execution/
Execution/
Address
AddressCalculation
Calculation
Execution and Timing – Loads
MEM:
MEM:Memory
Memory
Access
Access
Execution and Timing – Loads
WB:
WB:Write
Write
Back
Back
Datapath Stages
ID:
ID:Instruction
Instructiondecode/
decode/ EX:
EX:Execution/
Execution/ MEM:
MEM:Memory
Memory WB:
WB:Write
Write
Instruction
InstructionFetch
Fetch(IF)
(IF) Address
Register file read
Register file read AddressCalculation
Calculation Access
Access Back
Back
RISC-V Datapath
IF ID EX MEM WB
RISC-V Datapath
●
Where are the clocked elements?
IF ID EX MEM WB
RISC-V Datapath
●
Clocked elements
IF ID EX MEM WB
Datapath – Observations
●
During instruction fetch (IF), other components
are idle
– During ID, ....
Datapath – Observations
●
During instruction fetch (IF), other components
are idle
– During ID, ....
●
Improve hardware utilization
– The entire datapath should be busy
Pipelining the Datapath
●
Pipeline the datapath
Pipelining the Datapath
●
Pipeline the datapath
IF ID EX MEM WB
Pipelining – Desired Effect
Instr i IF
Pipelining – Desired Effect
Instr i IF ID
Pipelining – Desired Effect
Instr i IF ID
Instr i+1 IF
Pipelining – Desired Effect
Instr i IF ID EX
Instr i+1 IF ID
Instr i+2 IF
Pipelining – Desired Effect
Instr i IF ID EX MEM
Instr i+1 IF ID EX
Instr i+2 IF ID
Instr i+3 IF
Pipelining – Desired Effect
Instr i IF ID EX MEM WB
Instr i+1 IF ID EX MEM WB
Instr i+2 IF ID EX MEM WB
Instr i+3 IF ID EX MEM W
Instr i+4 IF ID EX M
Instr i+5 IF ID E
Instr i+6 IF ID
Pipelining – Desired Effect
Steady
SteadyState
State
Instr i IF ID EX MEM WB
Instr i+1 IF ID EX MEM WB
Instr i+2 IF ID EX MEM WB
Instr i+3 IF ID EX MEM W
Instr i+4 IF ID EX M
Instr i+5 IF ID E
Instr i+6 IF ID
Need for Clocked Elements
IF ID EX MEM WB
0.5ns
ld x10, 4(x11)
add x12, x13, x14
Clocked Elements in Pipeline
●
Problem: Signal overwrite/interference
Stage
Stagei i Stage
Stagei+1
i+1
Flip Flop Waveforms
Clock
Clock
DD
QQ
Clk-Q
Delay
Clocked Elements in Pipeline
●
Problem: Signal overwrite/interference
●
Solution: Clocked elements (FF or a Latch)
Stage
Stagei i Stage
Stagei+1
i+1
Clock Speed – Non-Pipelined vs. Pipelined
IF ID EX MEM WB
1.0 ns 0.5ns 0.8ns 1.0ns 0.5ns
TT==3.8ns
3.8ns
Clock Speed – Non-Pipelined vs. Pipelined
IF ID EX MEM WB
1.0 ns 0.1ns
Clock Speed – Non-Pipelined vs. Pipelined
IF ID EX MEM WB
1.0 ns 0.1ns
TT=T
=Tcc+T
+Tffff
TT==1.1ns
1.1ns
Clock Speed – Non-Pipelined vs. Pipelined
IF ID EX MEM WB
1.0 ns 0.1ns
TT=T
=Tcc+T
+Tffff
TT==1.1ns
1.1ns
CPI
CPI==1.0
1.0in
inboth
bothcases
cases
Pipelined Datapath
ID:
ID:Instruction
Instructiondecode/ MEM:
decode/ MEM:Memory
Memory
Register file read
Register file read Access
Instruction Access WB:
WB:Write
InstructionFetch
Fetch(IF)
(IF) EX:
EX:Execution/
Execution/ Write
Address Back
AddressCalculation
Calculation Back
Pipelined Execution – Load
Instruction
InstructionFetch
Fetch(IF)
(IF)
Pipelined Execution – Load
ID:
ID:Instruction
Instructiondecode/
decode/
Register file read
Register file read
Pipelined Execution – Load
EX:
EX:Execution/
Execution/
Address
AddressCalculation
Calculation
Pipelined Execution – Load
MEM:
MEM:Memory
Memory
Access
Access
Pipelined Execution – Load
WB:
WB:Write
Write
Back
Back
Pipelined Datapath
Pipelined Control
Execution Sequence
Execution Sequence – Non
pipelined
Time (clock cycles)
ld
ld IF ID EX MA WB
Execution Sequence – Non
pipelined
Time (clock cycles)
ld
ld IF ID EX MA WB
sub
sub IF ID EX MA WB
Execution Sequence – Non
pipelined
Time (clock cycles)
ld
ld IF ID EX MA WB
sub
sub IF ID EX MA WB
add
add IF ID EX
ld
ld
add
add
Execution Sequence – Pipelined
Time (clock cycles)
ld
ld IF
Execution Sequence – Pipelined
Time (clock cycles)
ld
ld IF ID
sub
sub IF
Execution Sequence – Pipelined
Time (clock cycles)
ld
ld IF ID EX
sub
sub IF ID
add
add IF
Execution Sequence – Pipelined
Time (clock cycles)
ld
ld IF ID EX MA
sub
sub IF ID EX
add IF ID
add
ld IF
ld
Execution Sequence – Pipelined
Time (clock cycles)
ld
ld IF ID EX MA WB
sub
sub IF ID EX MA
add IF ID EX
add
ld IF ID
ld
add IF
add
Execution Sequence – Pipelined
Time (clock cycles)
ld
ld IF ID EX MA WB
sub
sub IF ID EX MA WB
add IF ID EX MA WB
add
ld IF ID EX MA WB
ld
add IF ID EX MA WB
add
Execution Sequence – Pipelined
Execution Sequence – Pipelined
Pipelined vs. Nonpipelined
Implementation
●
Ratio of execution times?
– For 106 instructions?
Speedup of the Pipeline
●
The speedup of a k stage pipelined processor
over an unpipelined processor
TT unpipelined
unpipelined n⋅k
n⋅k
SSkk=
= =
= ≈k
≈k
TT pipelined
pipelined
n+(k
n+(k−1)
−1)
n: number of instructions in the program.
k: number of pipeline stages
Pipelined vs. Nonpipelined
Implementation
●
Pipelining increases the instruction throughput
opposed to individual instruction execution
time.
Module Outline
●
Why Pipeline?
– How to pipeline?
●
Speedup of the pipeline
●
Pipelined datapath
– Execution of instructions
– Pipeline Timing diagram
●
Dependences, Hazards
– Structural, Data, Control
– Stalling, Forwarding
●
Branch prediction
Pipelined vs. Nonpipelined
Implementation
●
Ratio of execution times between the two?
– For 106 instructions?
●
Pipelining increases the instruction throughput
opposed to individual instruction execution
time.
IF ID EX MEM WB
Pipelined vs. Nonpipelined Implementation
Pipelined vs. Nonpipelined Implementation