Module 2 Wrap-up and OoO
Colin Schmidt
CS 152 Sec9on 7
3/3/2016
Agenda
• PS2 Review
• Quiz 2 Prep
• Out of Order Execu9on
• Lab 3 Info
Q1: Caches
• 1.{A,B}: Table ques9ons?
• 1.C: Modeling a cache ques9ons?
• 1.D: Average latency ques9ons?
Q2: Coding for caches
• 2.{A,B,C}: Calculate cache misses ques9ons?
– Cache access paYern
Q3: New Cache Design
• 3.A: Same cycle 9me as before ques9ons?
• 3.B: AMAT ques9ons?
• 3.C: 3C’s ques9ons?
• 3.D: Virtual aliasing ques9ons?
• 3.E: Preven9ng aliasing ques9ons?
Q4 Vic9m Cache
• 4.A: Access 9me ques9ons?
• 4.B: Cache behavior ques9ons?
• 4.C: AMAT ques9ons?
Q5 3C’s
• doubling assoc?
• halving line size?
• doubling sets?
• adding prefetching
Q6 Memory Hierachy
• Hit Time
• Miss Rate
• Miss Penalty
Topics
• Caches
– 3 C’s
– associa9vity
– replacement policy
– write policy
– access 9me
– AMAT
– Wri9ng good code
Transla9on/Protec9on
• Virtual Memory
• TLB
– Cache interac9on
– Aliasing
• Page Tables
– Entries
– Organiza9on
• Protec9on
Complex Pipelines
• Adding long latency opera9ons is what causes
complica9ons
• Why wasn’t this a problem with loads
• How do we prevent this?
Data Hazards: An Example
I1 FDIV.D f6, f6, f4
I2 FLD f2, 45(x3)
I3 FMUL.D f0, f2, f4
I4 FDIV.D f8, f6, f2
I5 FSUB.D f10, f0, f6
I6 FADD.D f6, f8, f2
RAW Hazards
WAR Hazards
WAW Hazards
2/29/2016 CS152, Spring 2016 12
Scoreboard for In-order Issues
Busy[FU#] : a bit-vector to indicate FU’s availability.
(FU = Int, Add, Mult, Div)
These bits are hardwired to FU's.
WP[reg#] : a bit-vector to record the registers for which writes
are pending.
These bits are set by Issue stage and cleared by WB stage
Issue checks the instruc9on (opcode dest src1 src2)
against the scoreboard (Busy & WP) to dispatch
FU available? Busy[FU#]
RAW? WP[src1] or WP[src2]
WAR? cannot arise
WAW? WP[dest]
2/29/2016 CS152, Spring 2016 13
Scoreboard Dynamics
Functional Unit Status Registers Reserved
Int(1) Add(1) Mult(3) Div(4) WB for Writes
t0 I1 f6 f6
t1 I2 f2 f6 f6, f2
t2 f6 f2 f6, f2 I2
t3 I3 f0 f6 f6, f0
t4 f0 f6 f6, f0 I1
t5 I4 f0 f8 f0, f8
t6 f8 f0 f0, f8 I3
t7 I5 f10 f8 f8, f10
t8 f8 f10 f8, f10 I5
t9 f8 f8 I4
t10 I6 f6 f6
t11 f6 f6 I6
I1 FDIV.D f6, f6, f4
I2 FLD f2, 45(x3)
I3 FMULT.D f0, f2, f4
I4 FDIV.D f8, f6, f2
I5 FSUB.D f10, f0, f6
I6 FADD.D f6, f8, f2
2/29/2016 CS152, Spring 2016 14
Issue LimitaBons: In-Order and Out-of-Order
latency
1 FLD f2, 34(x2) 1 1 2
2 FLD f4, 45(x3) long
3 FMULT.D f6, f4, f2 3 4 3
4 FSUB.D f8, f2, f2 1 X
5 FDIV.D f4’, f2, f8 4 5
6 FADD.D f10, f6, f4’ 1
6
In-order: 1 (2,1) . . . . . . 2 3 4 4 3 5 . . . 5 6 6
Out-of-order: 1 (2,1) 4 4 5 . . . 2 (3,5) 3 6 6
Any an;dependence can be eliminated by renaming.
(renaming => addi;onal storage)
Can it be done in hardware? yes!
2/29/2016 CS152, Spring 2016 15
Register Renaming
ALU Mem
IF ID Issue WB
Fadd
Fmul
§ Decode does register renaming and adds instruc9ons to the
issue-stage instruc9on reorder buffer (ROB)
⇒ renaming makes WAR or WAW hazards impossible
§ Any instruc9on in ROB whose RAW hazards have been sa9sfied
can be issued.
⇒ Out-of-order or dataflow execu9on
3/2/2016 CS152, Spring 2016 16
IBM 360/91 FloaBng-Point Unit
R. M. Tomasulo, 1967
Floa9ng-Point
1 p tag/data load instruc9ons 1 p tag/data Regfile
2 p tag/data
3 p tag/data buffers 2 p tag/data
4 p tag/data (from 3 p tag/data
5 p tag/data memory) ... 4 p tag/data
6 p tag/data
Distribute
1 p tag/data p tag/data
instruc;on 2 p tag/data p tag/data 1 p tag/data p tag/data
templates 3 p tag/data p tag/data 2 p tag/data p tag/data
by
func;onal Adder Mult
units
< tag, result >
p tag/data Common bus ensures that data is made available
store buffers p tag/data immediately to all the instruc;ons wai;ng for it.
(to memory) p tag/data Match tag, if equal, copy value & set presence “p”.
3/2/2016 CS152, Spring 2016 17
Phases of InstrucBon ExecuBon
PC
Fetch: Instruction bits retrieved
I-cache
from cache.
Fetch Buffer
Decode/Rename Decode: Instructions dispatched to
appropriate issue-stage buffer
Issue Buffer
Execute: Instructions and operands issued
to execution units.
Functional Units
When execution completes, all results and
exception flags are available.
Result Buffer
Commit Commit: Instruction irrevocably updates
architectural state (aka “graduation”).
Architectural
State
3/2/2016 CS152, Spring 2016 18
Unified Physical Register File
(MIPS R10K, Alpha 21264, Intel Pen?um 4 & Sandy Bridge)
§ Rename all architectural registers into a single physical register
file during decode, no register values read
– x1 -> P1
§ Func9onal units read and write from single unified register file
holding commiYed and temporary registers in execute
§ Commit only updates mapping of architectural register to
physical register, no data movement
Decode Stage
Committed
Register Unified Physical Register
Mapping Register File Mapping
Read operands at issue Write results at completion
Functional Units
3/2/2016 CS152, Spring 2016 19
Physical Register Management
Rename Physical Regs Free List
Table P0 P0
x0 P1 P1
x1 P8 P2 P3 ld x1, 0(x3)
x2 P3 P2
x3 P7 P4 P4
addi x3, x1, #4
x4 P5 <x6> p sub x6, x7, x6
x5 P6 <x7> p
x6 P5 P7 <x3> p add x3, x3, x6
x7 P6 P8 <x1> p
ld x6, 0(x1)
Pn
ROB
use ex op p1 PR1 p2 PR2 Rd LPRd PRd (LPRd requires
third read port
on Rename
Table for each
instruction)
3/2/2016 CS152, Spring 2016 20
LifeBme of Physical Registers
• Physical regfile holds committed and speculative values
• Physical registers decoupled from ROB entries (no data in ROB)
ld x1, (x3) ld P1, (Px)
addi x3, x1, #4 addi P2, P1, #4
sub x6, x7, x9 sub P3, Py, Pz
add x3, x3, x6 Rename add P4, P2, P3
ld x6, (x1) ld P5, (P1)
add x6, x6, x3 add P6, P5, P4
sd x6, (x1) sd P6, (P1)
ld x6, (x11) ld P7, (Pw)
When can we reuse a physical register?
When next write of same architectural register commits
3/2/2016 CS152, Spring 2016 21
Lab 3
• Not ready yet…
• Later this week/weekend
• Info at hYp://ccelio.github.io/riscv-boom-doc/
Ques9ons