361
Computer Architecture
Lecture 12: Designing a Pipeline Processor
pipeline.1
Overview of a Multiple Cycle Implementation
The root of the single cycle processors problems:
The cycle time has to be long enough for the slowest instruction
Solution:
Break the instruction into smaller steps
Execute each step (instead of the entire instruction) in one cycle
- Cycle time: time it takes to execute the longest step
- Keep all the steps to have similar length
This is the essence of the multiple cycle processor
The advantages of the multiple cycle processor:
Cycle time is much shorter
Different instructions take different number of cycles to complete
- Load takes five cycles
- Jump only takes three cycles
Allows a functional unit to be used more than once per instruction
pipeline.2
Multiple Cycle Processor
MCP: A functional unit to be used more than once per instruction
PCWr
PCWrCond
Zero
MemWr
IRWr
RegDst
ALUSelA
RegWr
32
PC
WrAdr
32
Din Dout
32
32
32
Rt 0
Rd
Mux
Ideal
Memory
Rb
busA
Reg File
Rw
busW busB 32
1 Mux 0
Imm 16
1
32
Extend
ExtOp
<< 2
0
1
32
32
2
3
32
MemtoReg
Zero
32
ALU
32
Rt
Ra
Target
32
Rs
Mux
RAdr
Mux
32
Instruction Reg
32
0
BrWr
Mux
IorD
PCSrc
ALU
Control
ALUOp
ALUSelB
pipeline.3
Outline of Todays Lecture
Recap and Introduction
Introduction to the Concept of Pipelined Processor
Pipelined Datapath and Pipelined Control
How to Avoid Race Condition in a Pipeline Design?
Pipeline Example: Instructions Interaction
Summary
pipeline.4
Pipelining is Natural!
Laundry Example
Sammy, Marc, Griffy, Albert
each have one load of clothes
to wash, dry, and fold
Washer takes 30 minutes
Dryer takes 30 minutes
Folder takes 30 minutes
Stasher takes 30 minutes
to put clothes into drawers
pipeline.5
Sequential Laundry
6 PM
T
a
s
k
10
11
12
2 AM
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Time
O
r
d
e
r
C
D
Sequential laundry takes 8 hours for 4 loads
If they learned pipelining, how long would laundry take?
pipeline.6
Pipelined Laundry: Start work ASAP
6 PM
10
30 30 30 30 30 30 30
T
a
s
k
11
12
2 AM
Time
A
B
C
O
r
d
e
r
Pipelined laundry takes 3.5 hours for 4 loads!
pipeline.7
Pipelining Lessons
6 PM
T
a
s
k
O
r
d
e
r
9
Time
30 30 30 30 30 30 30
Pipelining doesnt help latency
of single task, it helps
throughput of entire workload
Multiple tasks operating
simultaneously using different
resources
Potential speedup = Number
pipe stages
Pipeline rate limited by slowest
pipeline stage
C
D
Unbalanced lengths of pipe
stages reduces speedup
Time to fill pipeline and time to
drain it reduces speedup
Stall for Dependences
pipeline.8
Why Pipeline?
Suppose we execute 100 instructions
Single Cycle Machine
45 ns/cycle x 1 CPI x 100 inst = 4500 ns
Multicycle Machine
10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns
Ideal pipelined machine
10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
pipeline.9
Timing Diagram of a Load Instruction
Instruction Fetch
Address
Data Memory
Reg Wr
Reg. Fetch
Clk
PC
Instr Decode /
Old Value
Clk-to-Q
New Value
Instruction Memory Access Time
New Value
Rs, Rt, Rd,
Op, Func
Old Value
ALUctr
Old Value
ExtOp
Old Value
New Value
ALUSrc
Old Value
New Value
RegWr
Old Value
New Value
busB
Register File Access Time
New Value
Old Value
Delay through Extender & Mux
Old Value
New Value
ALU Delay
Address
Old Value
New Value
Data Memory Access Time
busW
Old Value
Register File Write Time
busA
Delay through Control Logic
New Value
New
pipeline.10
The Five Stages of Load
Cycle 1 Cycle 2
Load Ifetch
Cycle 3 Cycle 4
Reg/Dec
Exec
Cycle 5
Mem
Wr
Ifetch: Instruction Fetch
Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Exec: Calculate the memory address
Mem: Read the data from the Data Memory
Wr: Write the data back to the register file
pipeline.11
Pipelining the Load Instruction
Cycle 1 Cycle 2
Cycle 3 Cycle 4
Cycle 5
Cycle 6
Cycle 7
Clock
1st lw Ifetch
Reg/Dec
2nd lw Ifetch
3rd lw
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
The five independent functional units in the pipeline datapath are:
Instruction Memory for the Ifetch stage
Register Files Read ports (bus A and busB) for the Reg/Dec stage
ALU for the Exec stage
Data Memory for the Mem stage
Register Files Write port (bus W) for the Wr stage
One instruction enters the pipeline every cycle
One instruction comes out of the pipeline (complete) every cycle
The Effective Cycles per Instruction (CPI) is 1
pipeline.12
Conventional Pipelined Execution Representation
Time
IFetch Dcd
Exec
IFetch Dcd
Mem
WB
Exec
Mem
WB
Exec
Mem
WB
Exec
Mem
WB
Exec
Mem
WB
Exec
Mem
IFetch Dcd
IFetch Dcd
IFetch Dcd
Program Flow
IFetch Dcd
WB
pipeline.13
Single Cycle, Multiple Cycle, vs. Pipeline
Cycle 1
Cycle 2
Clk
Single Cycle Implementation:
Load
Store
Waste
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Multiple Cycle Implementation:
Load
Ifetch
Store
Reg
Exec
Mem
Wr
Ifetch
R-type
Reg
Exec
Mem
Ifetch
Pipeline Implementation:
Load Ifetch
Reg
Store Ifetch
Exec
Mem
Wr
Reg
Exec
Mem
R-type Ifetch
Reg
Exec
Wr
Mem
Wr
pipeline.14
Why Pipeline? Because the resources are there!
Time (clock cycles)
Inst 3
Reg
Im
Reg
Dm
Reg
Dm
Im
Reg
Im
Reg
Reg
Reg
Dm
ALU
Inst 4
Im
Dm
ALU
Inst 2
Reg
ALU
Inst 1
Im
ALU
O
r
d
e
r
Inst 0
ALU
I
n
s
t
r.
Reg
Dm
Reg
pipeline.15
Can pipelining get us into trouble?
Yes: Pipeline Hazards
structural hazards: attempt to use the same resource two
different ways at the same time
- E.g., combined washer/dryer would be a structural hazard
or folder busy doing something else (watching TV)
data hazards: attempt to use item before it is ready
- E.g., one sock of pair in dryer and one in washer; cant
fold until get sock from washer through dryer
- instruction depends on result of prior instruction still in
the pipeline
control hazards: attempt to make a decision before condition is
evaulated
- E.g., washing football uniforms and need to get proper
detergent level; need to see after dryer before next load in
- branch instructions
Can always resolve hazards by waiting
pipeline control must detect the hazard
take action (or delay action) to resolve hazards
pipeline.16
Single Memory is a Structural Hazard
Time (clock cycles)
Instr 2
Reg
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
ALU
Mem
Mem
ALU
Reg
ALU
Instr 1
Mem
ALU
O
r
d
e
r
Load
ALU
I
n
s
t
r.
Mem
Instr 3
Instr 4
Reg
Detection is easy in this case! (right half highlight means read, left half write)
pipeline.17
Structural Hazards limit performance
Example: if 1.3 memory accesses per instruction and only one memory
access per cycle then
average CPI 1.3
otherwise resource is more than 100% utilized
More on Hazards later
pipeline.18
Pipelining the R-type and Load Instruction
Cycle 1 Cycle 2
Cycle 3 Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Clock
R-type Ifetch
R-type
Reg/Dec
Exec
Ifetch
Reg/Dec
Exec
Ifetch
Reg/Dec
Load
Ops! We have a problem!
Wr
R-type Ifetch
Wr
Exec
Mem
Wr
Reg/Dec
Exec
Wr
R-type Ifetch
Reg/Dec
Exec
Wr
We have a problem:
Two instructions try to write to the register file at the same time!
pipeline.19
The Four Stages of R-type
Cycle 1 Cycle 2
R-type Ifetch
Reg/Dec
Cycle 3 Cycle 4
Exec
Wr
Ifetch: Instruction Fetch
Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Exec: ALU operates on the two register operands
Wr: Write the ALU output back to the register file
pipeline.20
10
Important Observation
Each functional unit can only be used once per instruction
Each functional unit must be used at the same stage for all instructions:
Load uses Register Files Write Port during its 5th stage
Load
Ifetch
Reg/Dec
3
Exec
Mem
Wr
R-type uses Register Files Write Port during its 4th stage
1
R-type Ifetch
2
Reg/Dec
3
Exec
4
Wr
pipeline.21
Solution 1: Insert Bubble into the Pipeline
Cycle 1 Cycle 2
Cycle 3 Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Clock
Ifetch
Load
Reg/Dec
Exec
Ifetch
Reg/Dec
R-type Ifetch
Wr
Exec
Mem
Reg/Dec
Exec
R-type Ifetch
Wr
Wr
Reg/Dec Pipeline
R-type Ifetch
Exec
Bubble Reg/Dec
Ifetch
Wr
Exec
Reg/Dec
Wr
Exec
Insert a bubble into the pipeline to prevent 2 writes at the same cycle
The control logic can be complex
No instruction is completed during Cycle 5:
The Effective CPI for load is >1
pipeline.22
11
Solution 2: Delay R-types Write by One Cycle
Delay R-types register write by one cycle:
Now R-type instructions also use Reg Files write port at Stage 5
Mem stage is a NOOP stage: nothing is being done
1
R-type Ifetch
Cycle 1 Cycle 2
Reg/Dec
Exec
Mem
Wr
Cycle 3 Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Clock
R-type Ifetch
Reg/Dec
Mem
Ifetch
R-type
Load
Exec
Wr
Reg/Dec
Mem
Exec
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Reg/Dec
Mem
Exec
Wr
Reg/Dec
Mem
Exec
R-type Ifetch
R-type Ifetch
Wr
pipeline.23
The Four Stages of Store
Cycle 1 Cycle 2
Store
Ifetch
Reg/Dec
Cycle 3 Cycle 4
Exec
Mem
Wr
Ifetch: Instruction Fetch
Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Exec: Calculate the memory address
Mem: Write the data into the Data Memory
pipeline.24
12
The Four Stages of Beq
Cycle 1 Cycle 2
Beq
Ifetch
Reg/Dec
Cycle 3 Cycle 4
Exec
Mem
Wr
Ifetch: Instruction Fetch
Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Exec: ALU compares the two register operands
Adder calculates the branch target address
Mem: If the registers we compared in the Exec stage are the same,
Write the branch target address into the PC
pipeline.25
A Pipelined Datapath
Clk
Ifetch
Reg/Dec
Exec
ExtOp
RegWr
Mem
ALUOp
Wr
Branch
1
0
PC
Ra
Rt
Rb
RFile
Rw Di
Rd
Exec
Unit
Zero
Data
Mem
RA Do
WA
Di
Mux
Rt
Imm16
busA
busB
Mem/Wr Register
Rs
ID/Ex Register
IUnit
IF/ID Register
PC+4
Imm16
Ex/Mem Register
PC+4
PC+4
RegDst
ALUSrc
MemWr
MemtoReg
pipeline.26
13
The Instruction Fetch Stage
Location 10: lw $1, 0x100($2)
$1 <- Mem[($2) + 0x100]
You are here!
Clk
Ifetch
Reg/Dec
Exec
ExtOp
RegWr
Mem
ALUOp
Branch
1
0
Ra
Rb
RFile
Rw Di
Rt
Rd
Exec
Unit
Zero
Data
Mem
RA Do
WA
Di
Mux
Rt
Imm16
busA
busB
Mem/Wr Register
Rs
Ex/Mem Register
IUnit
I
PC+4
Imm16
ID/Ex Register
IF/ID: lw $1, 100 ($2)
PC = 14
PC+4
PC+4
ALUSrc
RegDst
MemWr
MemtoReg
pipeline.27
A Detail View of the Instruction Unit
Location 10: lw $1, 0x100($2)
You are here!
Clk
Ifetch
Reg/Dec
1
0
Address
Instruction
Memory
Instruction
IF/ID: lw $1, 100 ($2)
Adder
PC = 14
10
pipeline.28
14
The Decode / Register Fetch Stage
Location 10: lw $1, 0x100($2)
$1 <- Mem[($2) + 0x100]
You are here!
Clk
Ifetch
Reg/Dec
Exec
ExtOp
RegWr
Mem
ALUOp
Branch
1
0
Imm16
Rt
Rb
RFile
Rw Di
Rd
Exec
Unit
Zero
Data
Mem
RA Do
WA
Di
Mux
IUnit
Rt
Imm16
busA
busB
Mem/Wr Register
Ra
Ex/Mem Register
Rs
PC+4
ID/Ex: Reg. 2 & 0x100
IF/ID:
PC+4
PC
PC+4
ALUSrc
RegDst
MemWr
MemtoReg
pipeline.29
Loads Address Calculation Stage
Location 10: lw $1, 0x100($2)
$1 <- Mem[($2) + 0x100]
You are here!
Clk
Ifetch
Reg/Dec
RegWr
Exec
Mem
ALUOp=Add
ExtOp=1
Branch
1
0
Ra
Rt
Rd
Rb
RFile
Rw Di
Exec
Unit
0
1
RegDst=0
Zero
Data
Mem
RA Do
WA
Di
ALUSrc=1 MemWr
Mux
IUnit
I
Rt
Imm16
busA
busB
Mem/Wr Register
Rs
Ex/Mem: Loads Address
PC+4
Imm16
ID/Ex Register
IF/ID:
PC+4
PC
PC+4
MemtoReg
pipeline.30
15
A Detail View of the Execution Unit
You are here!
Clk
Exec
Mem
Adder
32
32
busA
Zero
32
busB
0
Extender
16
Mux
32
imm16
ALU
ID/Ex Register
PC+4
Target
32
32
ALUctr
ALU
Control
ALUSrc=1
ExtOp=1
ALUout
Ex/Mem: Loads Memory Address
<< 2
ALUOp=Add
pipeline.31
Loads Memory Access Stage
Location 10: lw $1, 0x100($2)
$1 <- Mem[($2) + 0x100]
You are here!
Clk
Ifetch
Reg/Dec
Exec
ExtOp
RegWr
Mem
ALUOp
Branch=0
1
0
Ra
Rt
Rb
RFile
Rw Di
Rd
Exec
Unit
Zero
Data
Mem
RA Do
WA
Di
RegDst
ALUSrc
MemWr=0
Mux
IUnit
I
Rt
Imm16
busA
busB
Ex/Mem Register
Rs
Mem/Wr: Loads Data
PC+4
Imm16
ID/Ex Register
IF/ID:
PC+4
PC
PC+4
MemtoReg
pipeline.32
16
Loads Write Back Stage
Location 10: lw $1, 0x100($2)
$1 <- Mem[($2) + 0x100]
You are somewhere out there!
Clk
Ifetch
Reg/Dec
Exec
ExtOp
RegWr=1
Mem
ALUOp
Wr
Branch
1
0
PC+4
Imm16
Rt
Rb
RFile
Rw Di
Rd
Exec
Unit
Zero
Data
Mem
RA Do
WA
Di
Mux
IUnit
Rt
Imm16
busA
busB
Mem/Wr Register
Ra
Ex/Mem Register
Rs
ID/Ex Register
IF/ID:
PC+4
PC
PC+4
ALUSrc
RegDst
MemWr
MemtoReg=1
pipeline.33
How About Control Signals?
Key Observation: Control Signals at Stage N = Func (Instr. at Stage N)
N = Exec, Mem, or Wr
Example: Controls Signals at Exec Stage = Func(Loads Exec)
Ifetch
Reg/Dec
RegWr
Wr
Exec
Mem
ALUOp=Add
ExtOp=1
Branch
1
0
Ra
Rt
Rd
Rb
RFile
Rw Di
Exec
Unit
0
1
RegDst=0
Zero
Data
Mem
RA Do
WA
Di
ALUSrc=1 MemWr
Mux
IUnit
I
Rt
Imm16
busA
busB
Mem/Wr Register
Rs
Ex/Mem: Loads Address
PC+4
Imm16
ID/Ex Register
IF/ID:
PC+4
PC
PC+4
MemtoReg
pipeline.34
17
Pipeline Control
The Main Control generates the control signals during Reg/Dec
Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later
Control signals for Mem (MemWr Branch) are used 2 cycles later
Control signals for Wr (MemtoReg MemWr) are used 3 cycles later
Reg/Dec
Exec
ExtOp
ALUSrc
ALUSrc
RegDst
MemWr
Branch
ALUOp
RegDst
MemtoReg
RegWr
MemWr
Branch
MemtoReg
Wr
Mem/Wr Register
Main
Control
ID/Ex Register
IF/ID Register
ALUOp
Ex/Mem Register
ExtOp
Mem
MemWr
Branch
MemtoReg
RegWr
RegWr
MemtoReg
RegWr
pipeline.35
Beginning of the Wrs Stage: A Real World Problem
Clk
Clk
RegAdr
WrAdr
RegWr
MemWr
RegWrs Clk-to-Q
MemWrs Clk-to-Q
RegAdrs Clk-to-Q
RegAdr
Data
Reg
File
WrAdrs Clk-to-Q
Ex/Mem
Mem/Wr
RegWr
MemWr
WrAdr
Data
Data
Memory
At the beginning of the Wr stage, we have a problem if:
RegAdrs (Rd or Rt) Clk-to-Q > RegWrs Clk-to-Q
Similarly, at the beginning of the Mem stage, we have a problem if:
WrAdrs Clk-to-Q > MemWrs Clk-to-Q
We have a race condition between Address and Write Enable!
pipeline.36
18
The Pipeline Problem
Multiple Cycle design prevents race condition between Addr and WrEn:
Make sure Address is stable by the end of Cycle N
Asserts WrEn during Cycle N + 1
This approach can NOT be used in the pipeline design because:
Must be able to write the register file every cycle
Must be able write the data memory every cycle
Clock
Store Ifetch
Reg/Dec
Store Ifetch
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Reg/Dec
Exec
Mem
R-type Ifetch
R-type Ifetch
Wr
pipeline.37
Synchronize Register File & Synchronize Memory
Solution: And the Write Enable signal with the Clock
This is the ONLY place where gating the clock is used
MUST consult circuit expert to ensure no timing violation:
- Example: Clock High Time > Write Access Delay
Synchronize Memory and Register File
Clk
Address, Data, and WrEn must be stable
at least 1 set-up time before the Clk edge
I_Addr
I_WrEn
Write occurs at the cycle following
the clock edge that captures the signals
C_WrEn
WrEn
WrEn
C_WrEn
I_WrEn
Address
I_Addr
Data
I_Data
Address
Reg File
or
Memory
Data
Clk
Reg File
or
Memory
pipeline.38
19
A More Extensive Pipelining Example
Cycle 1 Cycle 2
Cycle 3 Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Clock
0: Load Ifetch
Reg/Dec
Exec
Ifetch
Reg/Dec
4: R-type
8: Store Ifetch
Mem
Wr
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
12: Beq (target is 1000)
End of
Cycle 4
End of
Cycle 5
End of
Cycle 6
Wr
End of
Cycle 7
End of Cycle 4: Loads Mem, R-types Exec, Stores Reg, Beqs Ifetch
End of Cycle 5: Loads Wr, R-types Mem, Stores Exec, Beqs Reg
End of Cycle 6: R-types Wr, Stores Mem, Beqs Exec
End of Cycle 7: Stores Wr, Beqs Mem
pipeline.39
Pipelining Example: End of Cycle 4
0: Loads Mem
4: R-types Exec
8: Stores Reg
12: Beqs Ifet
RegWr=0
8: Stores Reg
4: R-types Exec
ALUOp=R-type
ExtOp=x
0: Loads Mem
Branch=0
Clk
1
0
Ra
Rt
Rd
Rb
RFile
Rw Di
Exec
Unit
0
1
RegDst=1 ALUSrc=0
Zero
Clk
MemWr=0
Data
Mem
RA Do
WA
Di
Mux
Rt
Imm16
busA
busB
Mem/Wr: Loads Dout
Rs
PC+4
Ex/Mem: R-types Result
Imm16
ID/Ex: Stores busA & B
IUnit
IF/ID: Beq Instruction
PC = 16
PC+4
PC+4
pipeline.40
12: Beqs Ifetch
MemtoReg=x
20
Pipelining Example: End of Cycle 5
0: Lws Wr 4: Rs Mem 8: Stores Exec 12: Beqs Reg 16: Rs Ifetch
12: Beqs Reg
8: Stores Exec
0: Loads Wr
16: Rs Ifet
4: R-types Mem
ALUOp=Add
ExtOp=1
RegWr=1
Branch=0
Clk
1
0
Ra
Rb
RFile
Rw Di
Rt
Rd
Exec
Unit
0
1
RegDst=x ALUSrc=1
pipeline.41
Zero
Data
Mem
RA Do
WA
Di
Clk
MemWr=0
Mux
Rt
Imm16
busA
busB
Mem/Wr: R-types Result
Rs
PC+4
Ex/Mem: Stores Address
IUnit
I
Imm16
ID/Ex: Beqs busA & B
IF/ID: Instruction @ 16
PC = 20
PC+4
PC+4
MemtoReg=1
Pipelining Example: End of Cycle 6
4: Rs Wr 8: Stores Mem 12: Beqs Exec 16: Rs Reg 20: Rs Ifet
16: R-types Reg
20:
R-types Ifet
4: R-types Wr
RegWr=1
12: Beqs Exec
8: Stores Mem
ALUOp=Sub
ExtOp=1
Branch=0
Clk
1
0
Ra
Rt
Rd
Rb
RFile
Rw Di
Exec
Unit
0
1
RegDst=x ALUSrc=0
Zero
Clk
MemWr=1
Data
Mem
RA Do
WA
Di
Mux
Rt
Imm16
busA
busB
Mem/Wr: Nothing for St
Rs
PC+4
Ex/Mem: Beqs Results
IUnit
I
Imm16
ID/Ex:R-types busA & B
IF/ID: Instruction @ 20
PC = 24
PC+4
PC+4
MemtoReg=0
pipeline.42
21
Pipelining Example: End of Cycle 7
8: Stores Wr 12: Beqs Mem 16: Rs Exec 20: Rs Reg 24: Rs Ifet
20: R-types Reg
24:
R-types Ifet
8: Stores Wr
16: R-types Exec
12: Beqs Mem
ALUOp=R-type
ExtOp=x
RegWr=0
Branch=1
Clk
1
0
Ra
Rt
Rb
RFile
Rw Di
Rd
Exec
Unit
0
1
RegDst=1 ALUSrc=0
Zero
Data
Mem
RA Do
WA
Di
Clk
MemWr=0
Mux
Rt
Imm16
busA
busB
Mem/Wr:Nothing for Beq
Rs
PC+4
Ex/Mem: Rtypes Results
IUnit
I
Imm16
ID/Ex:R-types busA & B
IF/ID: Instruction @ 24
PC = 1000
PC+4
PC+4
MemtoReg=x
pipeline.43
The Delay Branch Phenomenon
Cycle 4 Cycle 5
Cycle 6 Cycle 7
Cycle 8
Cycle 9
Cycle 10 Cycle 11
Clk
12: Beq Ifetch Reg/Dec Exec
(target is 1000)
16: R-type Ifetch Reg/Dec
20: R-type
Ifetch
24: R-type
Mem
Wr
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
1000: Target of Br
Wr
Although Beq is fetched during Cycle 4:
Target address is NOT written into the PC until the end of Cycle 7
Branchs target is NOT fetched until Cycle 8
3-instruction delay before the branch take effect
This is referred to as Branch Hazard:
Clever design techniques can reduce the delay to ONE instruction
pipeline.44
22
The Delay Load Phenomenon
Cycle 1 Cycle 2
Cycle 3 Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Clock
I0: Load Ifetch
Plus 1
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Plus 2
Plus 3
Plus 4
Wr
Although Load is fetched during Cycle 1:
The data is NOT written into the Reg File until the end of Cycle 5
We cannot read this value from the Reg File until Cycle 6
3-instruction delay before the load take effect
This is referred to as Data Hazard:
Clever design techniques can reduce the delay to ONE instruction
pipeline.45
Summary
Disadvantages of the Single Cycle Processor
Long cycle time
Cycle time is too long for all instructions except the Load
Multiple Clock Cycle Processor:
Divide the instructions into smaller steps
Execute each step (instead of the entire instruction) in one cycle
Pipeline Processor:
Natural enhancement of the multiple clock cycle processor
Each functional unit can only be used once per instruction
If a instruction is going to use a functional unit:
- it must use it at the same stage as all other instructions
Pipeline Control:
- Each stages control signal depends ONLY on the instruction
that is currently in that stage
pipeline.46
23