09 Hazards W
09 Hazards W
Hakim Weatherspoon
CS 3410, Spring 2013
Computer Science
Cornell University
See P&H Chapter 4.7
Goals for Today
Data Hazards
• Revisit Pipelined Processors
• Data dependencies
• Problem, detection, and solutions
– (delaying, stalling, forwarding, bypass, etc)
• Hazard detection unit
• Forwarding unit
Next time
• Control Hazards
What is the next instruction to execute if
a branch is taken? Not taken?
MIPS Design Principles
Simplicity favors regularity
• 32 bit instructions
Smaller is faster
• Small register file
I-type op rs rt immediate
6 bits 5 bits 5 bits 16 bits
Memory Access
• load/store between registers and memory
• word, half-word and byte operations
Control flow
• conditional branches: pc-relative addresses
• jumps: fixed offsets, register absolute
Recall: MIPS Instruction Types
Arithmetic/Logical
• ADD, ADDU, SUB, SUBU, AND, OR, XOR, NOR, SLT, SLTU
• ADDI, ADDIU, ANDI, ORI, XORI, LUI, SLL, SRL, SLLV, SRLV, SRAV,
SLTI, SLTIU
• MULT, DIV, MFLO, MTLO, MFHI, MTHI
Memory Access
• LW, LH, LB, LHU, LBU, LWL, LWR
• SW, SH, SB, SWL, SWR
Control flow
• BEQ, BNE, BLEZ, BLTZ, BGEZ, BGTZ
• J, JR, JAL, JALR, BEQL, BNEL, BLEZL, BGTZL
Special
• LL, SC, SYSCALL, BREAK, SYNC, COPROC
Pipelined Processor
+4
addr
PC din dout
control
memory
compute
new jump/branch
extend targets
pc
A
memory register
D
alu
file
B
+4
addr
inst
PC din dout
M
B
control
memory
compute
new jump/branch
imm
extend targets
pc
ctrl
ctrl
Fetch Decode Execute Memory Back
IF/ID ID/EX EX/MEM MEM/WB
Example: : Sample Code (Simple)
add r3, r1, r2;
nand r6, r4, r5;
lw r4, 20(r2);
add r5, r2, r5;
sw r7, 12(r3);
Example: Sample Code (Simple)
Assume eight-register machine
Run the following code on a pipelined datapath
add r3 r1 r2 ; reg 3 = reg 1 + reg 2
nand r6 r4 r5 ; reg 6 = ~(reg 4 & reg 5)
lw r4 20 (r2) ; reg 4 = Mem[reg2+20]
add r5 r2 r5 ; reg 5 = reg 2 + reg 5
sw r7 12(r3) ; Mem[reg3+12] = reg 7
add IF ID EX MEM WB
nand IF ID EX MEM WB
lw IF ID EX MEM WB
add IF ID EX MEM WB
sw IF ID EX MEM WB
Latency:
Throughput:
Concurrency: CPI =
M
U
X
4 target
+ PC+4 PC+4
R0 0
R1 ALU
regA
M
instruction
regB
R2 result
R3 valA U
PC Inst A ALU X
Register file
R4
L mdata
mem result
R5 U
valB M Data
R6
U mem
R7
X data
imm dest
extend
valB
Bits 11-15
Rd M
Bits 16-20 Rt U dest dest
X
Bits 26-31
op op op
data
dest
extend
0 M
U
0 X
4 0
+ 4 /0 4
R0 0
R1 36 0
0 M
add 3 1 2
R2 9
Inst
R3 12 /0 36 A
U
PC X
Register file
mem
R4 18 L 0 0
R5 7 /0 9 M U Data
R6 41 U mem
R7 22 X data
0 dest
extend
0
Fetch: Bits 11-15 /0 3 M
add 3 1 2 Bits 16-20 /0 2 U 0 0
X
Bits 26-31
/ add
nop nop nop
Time: 1/ 2
IF/ID ID/EX EX/MEM MEM/WB
nand 6 4 5 add 3 1 2
M
U
X
4 /0 4
+ 8 /4 8
R0 0
R1 36 0
1 0
nand 6 4 5
R2 9 36 M
2 36 U
Inst
R3 12 / 18 A
PC X
Register file
mem
R4 18 L /0 45 0
R5 7 9 U
/9 7 M Data
R6 41 U mem
R7 22 X data
3 dest
extend
/0 9
Fetch: Bits 11-15 /3 6 M 3
nand 6 4 5 Bits 16-20 /2 5 U
X
/0 3 0
Bits 26-31
/ nand
add / add
nop nop
Time: 2/ 3
IF/ID ID/EX EX/MEM MEM/WB
lw 4 20(2) nand 6 4 5 add 3 1 2
M
U nand (18 ∙ 7)
X
18 = 01 0010
4 7 = 00 0111 /4 8
+ 12 8 ------------------
R0 0 -3 = 11 1101
R1 36 0
4
/0 45 M
lw 4 20(2)
R2 9 / 18
36
5 18 U
R3 12 A
PC Inst X
Register file
mem
R4 18 L / -3
45 0
R5 7 /9 7
7 M U Data
R6 41 U mem
R7 22 X data
6 dest
extend
/9 7
Fetch: Bits 11-15
6 3 M 3
lw 4 20(2) Bits 16-20 5 2 U
X
/3 6 /0 3
Bits 26-31
nand / nand
add / add
nop
Time: 3/ 4
IF/ID ID/EX EX/MEM MEM/WB
add 5 2 5 lw 4 20(2) nand 6 4 5 add 3 1 2
M
U
X
4 8
+ 16 12
R0 0
R1 36 0
2 45 M
add 5 2 5
4
R2 9 18
R3 12 9 U
PC Inst A X
Register file
mem
R4 18 L -3 45 0
7
R5 7 18 M U Data
R6 41 U mem
R7 22 X data
20 dest
extend
7
Fetch: Bits 11-15
0 6 M 6 3
add 5 2 5 Bits 16-20 4 5 U 6 3
X
Bits 26-31
lw nand add
Time: 4
IF/ID ID/EX EX/MEM MEM/WB
sw 7 12(3) add 5 2 5 lw 4 20 (2) nand 6 4 5 add 3 1 2
M
U
X
4 12
+ 20 16
R0 0
R1 36 0 45
2 -3 M
sw 7 12(3)
5
R2 9 9
R3 45 9 U
PC Inst A X
Register file
mem
R4 18 L 29 -3 0
R5 7 7 M U Data
R6 41 U mem
R7 22 20 X data
5 dest
extend
18
Fetch: Bits 11-15
5 0 M 4 6 3
sw 7 12(3) Bits 16-20 5 4 U 4 6
X
Bits 26-31
add lw nand
Time: 5
IF/ID ID/EX EX/MEM MEM/WB
sw 7 12(3) add 5 2 5 lw 4 20(2) nand 6 4 5
M
U
X
4 16
+ 20
R0 0
R1 36 0 -3
3 29
R2 9 9 M
7 45 U
R3 45 A
PC Inst X
Register file
mem
R4 18 L 16 29 99
7
R5 7 22 M U Data
R6 -3 U mem
R7 22 X data
12 dest
extend
7
No more Bits 11-15
0 5 M 5 4 6
instructions Bits 16-20 7 5 U 5 4
X
Bits 26-31
sw add lw
Time: 6
IF/ID ID/EX EX/MEM MEM/WB
nop nop sw 7 12(3) add 5 2 5 lw 4 20(2)
M
U
X
4 20
+
R0 0
R1 36 0
R2 9 45 16 M
R3 45 U
PC Inst A 99 X
Register file
mem
R4 99 L 57 16 0
R5 7 M U Data
R6 -3 U mem
R7 22 12 X data
dest
extend
22
No more Bits 11-15 0 M 7 5 4
instructions Bits 16-20 7 U 7 5
X
Bits 26-31
sw add
Time: 7
IF/ID ID/EX EX/MEM MEM/WB
nop nop nop sw 7 12(3) add 5 2 5
M
U
X
4
+
R0 0
R1 36 16
R2 9 57 M
R3 45 U
PC Inst A X
Register file
mem
R4 99 L 57 22 0
R5 16 M U Data
R6 -3 U mem
R7 22 X data
22 dest
extend
Time: 8
IF/ID ID/EX EX/MEM MEM/WB
Slides thanks to Sally McKee
nop nop nop nop sw 7 12(3)
M
U
X
4
+
R0 0
R1 36
R2 9 M
R3 45 U
PC Inst A X
Register file
mem
R4 99 L
R5 16 M U Data
R6 -3 U mem
R7 22 X data
dest
extend
Time: 9
IF/ID ID/EX EX/MEM MEM/WB
Takeaway
Pipelining is a powerful technique to mask
latencies and increase throughput
• Logically, instructions execute one at a time
• Physically, instructions execute in parallel
– Instruction level parallelism
IF ID MEM WB
add r3, r1, r2
A
A
add r3, r1, r2 Rd
sub inst
r5, r3, r5
D
or r6, r3, r4
D
mem B
B
add r6, r3, r8
inst
Ra Rb addr
imm
M
B
din dout
+4
mem
detect Rt Rd PC+4
PC+4
PC hazard
Rd
Rd
OP
OP
OP
IF/ID ID/EX EX/MEM MEM/WB
Takeaway
Data hazards occur when a operand (register) depends on
the result of a previous instruction that may not be
computed yet. A pipelined processor needs to detect data
hazards.
Next Goal
What to do if data hazard detected?
Stalling
How to stall an instruction in ID stage
• prevent IF/ID pipeline register update
– stalls the ID stage instruction
• convert ID stage instr into nop for later stages
– innocuous “bubble” passes through pipeline
• prevent PC update
– stalls the next (IF stage) instruction
Detecting Data Hazards
A
A
add r3, r1, r2 Rd
sub inst
r5, r3, r5
D
or r6, r3, r4
D
mem B
B
add r6, r3, r8
inst
Ra Rb addr
imm
M
B
din dout
+4
mem
detect Rt Rd PC+4
PC+4
PC hazard
Rd
Rd
If detect hazard
OP
OP
OP
MemWr=0
RegWr=0
WE=0
IF/ID ID/EX EX/MEM MEM/WB
Stalling
Clock cycle
time 1 2 3 4 5 6 7 8
or r6, r3, r4
r3 = 10
add r3, r1, r2
r3 = 20
or r6, r3, r4
rA rB data
B mem M
+4
Rd
Rd
Rd
(MemWr=0
WE
RegWr=0)
WE
WE
PC
nop
Op
Op
Op
sub r5,r3,r5 add r3,r1,r2
or r6,r3,r4 (WE=0)
/stall
NOP = If(IF/ID.rA ≠ 0 &&
(IF/ID.rA==ID/Ex.Rd
IF/ID.rA==Ex/M.Rd
IF/ID.rA==M/W.Rd))
Stalling
A A
D D D
inst rD B B
mem
inst
rA rB data
B mem M
+4
Rd
Rd
Rd
(MemWr=0
WE
RegWr=0)
WE
WE
PC
nop (MemWr=0
Op
Op
Op
RegWr=0)
sub r5,r3,r5 nop add r3,r1,r2
or r6,r3,r4 (WE=0)
/stall
NOP = If(IF/ID.rA ≠ 0 &&
(IF/ID.rA==ID/Ex.Rd
IF/ID.rA==Ex/M.Rd
IF/ID.rA==M/W.Rd))
Stalling
A A
D D D
inst rD B B
mem
inst
rA rB data
B mem M
+4
Rd
Rd
Rd
(MemWr=0
WE
RegWr=0)
WE
WE
PC
nop (MemWr=0 (MemWr=0
Op
Op
Op
RegWr=0) RegWr=0)
sub r5,r3,r5 nop nop add r3,r1,r2
or r6,r3,r4 (WE=0)
/stall
NOP = If(IF/ID.rA ≠ 0 &&
(IF/ID.rA==ID/Ex.Rd
IF/ID.rA==Ex/M.Rd
IF/ID.rA==M/W.Rd))
Stalling
How to stall an instruction in ID stage
• prevent IF/ID pipeline register update
– stalls the ID stage instruction
• convert ID stage instr into nop for later stages
– innocuous “bubble” passes through pipeline
• prevent PC update
– stalls the next (IF stage) instruction
Takeaway
Data hazards occur when a operand (register) depends on
the result of a previous instruction that may not be
computed yet. A pipelined processor needs to detect data
hazards.
A A
D D D
inst B B
mem data
imm B mem M
Rd
Rd
detect
Rb
MC WE
MC WE
hazard forward
Ra
unit
A A
D D D
inst B B
mem data
imm B mem M
Rd
Rd
detect
Rb
MC WE
MC WE
hazard forward
Ra
unit
A
D
inst B
mem data
mem
A
D
inst B
mem data
mem
A
D
inst B
mem data
mem
r3 = 10
add r3, r1, r2
r3 = 20
or r6, r3, r4
lw r6, 4(r3)
or r5, r3, r5
sw r6, 12(r3)
Tricky Example
A
D
inst B
mem data
mem
OR r1, r4, r1
Forwarding Datapath
A A
D D D
inst B B
mem data
imm B mem M
Rd
Rd
detect
Rb
MC WE
MC WE
hazard forward
Ra
unit
• Closed Book
• Cannot use electronic device or outside material
• Practice prelims are online in CMS
• Material covered everything up to end of this week
• Appendix C (logic, gates, FSMs, memory, ALUs)
• Chapter 4 (pipelined [and non-pipeline] MIPS processor with hazards)
• Chapters 2 (Numbers / Arithmetic, simple MIPS instructions)
• Chapter 1 (Performance)
• HW1, HW2, Lab0, Lab1, Lab2
Administrivia
HW2 is due tomorrow
• Fill out Survey online. Receive credit/points on homework for survey:
• Should have received email from Kathryn Dimiduk
• Survey is anonymous
Late Policy
• Each person has a total of four “slip days”
• Max of two slip days for any individual assignment
• Slip days deducted first for any late assignment,
cannot selectively apply slip days
• For projects, slip days are deducted from all partners
• 25% deducted per day late after slip days are exhausted
Regrade policy
• Submit written request to lead TA,
and lead TA will pick a different grader
• Submit another written request,
lead TA will regrade directly
• Submit yet another written request for professor to regrade.
Quiz
Find all hazards, and say how they are resolved:
A
D
inst B
mem data
mem
lw r4, 20(r8)
Stall
• Pause current and all subsequent instructions
Forward/Bypass
• Try to steal correct value from elsewhere in pipeline
• Otherwise, fall back to stalling or require a delay slot
Tradeoffs?