0% found this document useful (0 votes)

10 views43 pages

Pipe Lining

Uploaded by

bader.k.othman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views43 pages

Pipe Lining

Uploaded by

bader.k.othman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Pipelining

▪ Assuming you’ve got:

— One washer (takes 30 minutes)

— One drier (takes 40 minutes)

— One “folder” (takes 20 minutes)

▪ It takes 90 minutes to wash, dry, and fold 1 load of laundry.

— How long does 4 loads take?

1
The slow way

6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20

▪ If each load is done sequentially it takes 6 hours

2
Laundry Pipelining
▪ Start each load as soon as possible
— Overlap loads
6 PM 7 8 9 10 11 Midnight

Time

30 40 40 40 40 20

▪ Pipelined laundry takes 3.5 hours

3
Pipelining Lessons

▪ Pipelining doesn’t help latency of

6 PM 7 8 9
single load, it helps throughput of
Time entire workload
▪ Pipeline rate limited by slowest
30 40 40 40 40 20 pipeline stage
▪ Multiple tasks operating
simultaneously using different
resources
▪ Potential speedup = Number pipe
stages

4
Instruction execution review
▪ Executing a MIPS instruction can take up to five steps.
Step Name Description
Instruction Fetch IF Read an instruction from memory.
Instruction Decode ID Read source registers and generate control signals.
Execute EX Compute an R-type result or a branch outcome.
Memory MEM Read or write the data memory.
Writeback WB Store a result in the destination register.

▪ However, as we saw, not all instructions need all five steps.

Instruction Steps required
beq IF ID EX
R-type IF ID EX WB
sw IF ID EX MEM
lw IF ID EX MEM WB

5
Single-cycle datapath diagram

0
M
PC Add u
x
4
Add 1
Shift
1ns left 2
PCSrc
RegWrite 2ns
Read Instruction
address [31-0]
I [25 - 21]
Read Read
2ns MemWrite MemToReg

register 1 data 1 Read Read

ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Data 0
u register Write
Registers memory
I [15 - 11] x data
2ns 1
Write
data
1 ALUOp

MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

6
Single-cycle review
▪ All five execution steps occur in one clock cycle.
▪ This means the cycle time must be long enough to accommodate all the
steps of the most complex instruction—a “lw” in our instruction set.
— If the register file has a 1ns latency and the memories and ALU have a
2ns latency, “lw” will require 8ns.
— Thus all instructions will take 8ns to execute.
▪ Each hardware element can only be used once per clock cycle.
— A “lw” or “sw” must access memory twice (in the IF and MEM stages),
so there are separate instruction and data memories.
— There are multiple adders, since each instruction increments the PC
(IF) and performs another computation (EX). On top of that, branches
also need to compute a target address.

7
Example: Instruction Fetch (IF)
▪ Let’s quickly review how lw is executed in the single-cycle datapath.
▪ We’ll ignore PC incrementing and branching for now.
▪ In the Instruction Fetch (IF) step, we read the instruction memory.

RegWrite

Read Instruction MemWrite MemToReg

I [25 - 21]
address [31-0] Read Read
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
memory register 2 Read 0 Write u
0 data 2 Result
M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

8
Instruction Decode (ID)
▪ The Instruction Decode (ID) step reads the source registers from the
register file.

RegWrite

Read Instruction MemWrite MemToReg

9
Execute (EX)
▪ The third step, Execute (EX), computes the effective memory address
from the source register and the instruction’s constant field.

RegWrite

Read Instruction MemWrite MemToReg

10
Memory (MEM)
▪ The Memory (MEM) step involves reading the data memory, from the
address computed by the ALU.

RegWrite

Read Instruction MemWrite MemToReg

11
Writeback (WB)
▪ Finally, in the Writeback (WB) step, the memory value is stored into the
destination register.

RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1 Read Read
ALU 1
I [20 - 16] address data
Instruction Read Zero M
register 2 Read 0 Write u
memory 0 Result
data 2 M address x
M Write
u Write Data 0
u register
Registers memory
I [15 - 11] x ALUOp data
Write 1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

12
A bunch of lazy functional units
▪ Notice that each execution step uses a different functional unit.
▪ In other words, the main units are idle for most of the 8ns cycle!
— The instruction RAM is used for just 2ns at the start of the cycle.
— Registers are read once in ID (1ns), and written once in WB (1ns).
— The ALU is used for 2ns near the middle of the cycle.
— Reading the data memory only takes 2ns as well.
▪ That’s a lot of hardware sitting around doing nothing.

13
Putting those slackers to work
▪ We shouldn’t have to wait for the entire instruction to complete before
we can re-use the functional units.
▪ For example, the instruction memory is free in the Instruction Decode
step as shown below, so...

Idle Instruction Decode (ID)

RegWrite

Read Instruction MemWrite MemToReg

14
Decoding and fetching together
▪ Why don’t we go ahead and fetch the next instruction while we’re
decoding the first one?

Fetch 2nd Decode 1st instruction

RegWrite

Read Instruction MemWrite MemToReg

15
Executing, decoding and fetching
▪ Similarly, once the first instruction enters its Execute stage, we can go
ahead and decode the second instruction.
▪ But now the instruction memory is free again, so we can fetch the third
instruction!

Fetch 3rd Decode 2nd Execute 1st

RegWrite

Read Instruction MemWrite MemToReg

16
Making Pipelining Work
▪ We’ll make our pipeline 5 stages long, to handle load instructions as they
were handled in the multi-cycle implementation
— Stages are: IF, ID, EX, MEM, and WB
▪ We want to support executing 5 instructions simultaneously: one in each
stage.

17
Break datapath into 5 stages
▪ Each stage has its own functional units.
▪ Each stage can execute in 2ns
— Just like the multi-cycle implementation

IF ID EXE MEM WB
RegWrite

Read Instruction MemWrite MemToReg

2ns 2ns 2ns 2ns

18
Pipelining Loads
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
lw $t1, 8($sp) IF ID EX MEM WB
lw $t2, 12($sp) IF ID EX MEM WB
lw $t3, 16($sp) IF ID EX MEM WB
lw $t4, 20($sp) IF ID EX MEM WB

6 PM 7 8 9
Time

30 40 40 40 40 20

19
A pipeline diagram
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $sp, $sp, -4 IF ID EX MEM WB

▪ A pipeline diagram shows the execution of a series of instructions.

— The instruction sequence is shown vertically, from top to bottom.
— Clock cycles are shown horizontally, from left to right.
— Each instruction is divided into its component stages. (We show five
stages for every instruction, which will make the control unit easier.)
▪ This clearly indicates the overlapping of instructions. For example, there
are three instructions active in the third cycle above.
— The “lw” instruction is in its Execute stage.
— Simultaneously, the “sub” is in its Instruction Decode stage.
— Also, the “and” instruction is just being fetched.
20
Pipeline terminology
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $sp, $sp, -4 IF ID EX MEM WB

filling full emptying

▪ The pipeline depth is the number of stages—in this case, five.

▪ In the first four cycles here, the pipeline is filling, since there are unused
functional units.
▪ In cycle 5, the pipeline is full. Five instructions are being executed
simultaneously, so all hardware units are in use.
▪ In cycles 6-9, the pipeline is emptying.

21
Pipelining Performance

Speedup=2400/1400~2
22
Pipeline Datapath: Resource Requirements
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
lw $t1, 8($sp) IF ID EX MEM WB
lw $t2, 12($sp) IF ID EX MEM WB
lw $t3, 16($sp) IF ID EX MEM WB
lw $t4, 20($sp) IF ID EX MEM WB

▪ We need to perform several operations in the same cycle.

— Increment the PC and add registers at the same time.
— Fetch one instruction while another one reads or writes data.
▪ Thus, like the single-cycle datapath, a pipelined processor duplicates
hardware elements that are needed several times in the same clock
cycle.

23
Pipelining other instruction types

▪ R-type instructions only require 4 stages: IF, ID, EX, and WB

— We don’t need the MEM stage
▪ What happens if we try to pipeline loads with R-type instructions?

Clock cycle
1 2 3 4 5 6 7 8 9
add $sp, $sp, -4 IF ID EX WB
sub $v0, $a0, $a1 IF ID EX WB
lw $t0, 4($sp) IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX WB
lw $t1, 8($sp) IF ID EX MEM WB

24
Important Observation

▪ Each functional unit can only be used once per instruction

▪ Each functional unit must be used at the same stage for all instructions.
See the problem if:
— Load uses Register File’s Write Port during its 5th stage
— R-type uses Register File’s Write Port during its 4th stage

Clock cycle
1 2 3 4 5 6 7 8 9
add $sp, $sp, -4 IF ID EX WB
sub $v0, $a0, $a1 IF ID EX WB
lw $t0, 4($sp) IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX WB
lw $t1, 8($sp) IF ID EX MEM WB

25
A solution: Insert NOP stages

▪ Enforce uniformity
— Make all instructions take 5 cycles.
— Make them have the same stages, in the same order
• Some stages will do nothing for some instructions
R-type IF ID EX NOP WB

Clock cycle
1 2 3 4 5 6 7 8 9
add $sp, $sp, -4 IF ID EX NOP WB
sub $v0, $a0, $a1 IF ID EX NOP WB
lw $t0, 4($sp) IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX NOP WB
lw $t1, 8($sp) IF ID EX MEM WB

• Stores and Branches have NOP stages, too…

store IF ID EX MEM NOP
branch IF ID EX NOP NOP

26
Summary

▪ Pipelining attempts to maximize instruction throughput by overlapping

the execution of multiple instructions.
▪ Pipelining offers amazing speedup.
— In the best case, one instruction finishes on every cycle, and the
speedup is equal to the pipeline depth.
▪ The pipeline datapath is much like the single-cycle one, but with added
pipeline registers
— Each stage needs is own functional units
▪ Now we’ll see the pipelined data path and control with an
example execution.

27
Pipelined datapath
1

PCSrc

IF/ID ID/EX EX/MEM MEM/WB

4
Add
P
C Add
Shift
RegWrite left 2

Read Read
register 1 data 1 ALU MemWrite
Read Instruction Zero
address [31-0] Read Read
0 Result Address
register 2 data 2
Write Data
Instruction register 1 MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
data data 1

Instr [15 - 0] Sign

RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1

28
What about control signals?
▪ The control signals are generated in the same way as in the single-cycle
processor—after an instruction is fetched, the processor decodes it and
produces the appropriate control values.
▪ But just like before, some of the control signals will not be needed until
some later stage and clock cycle.
▪ These signals must be propagated through the pipeline until they reach
the appropriate stage. We can just pass them in the pipeline registers,
along with the other data.
▪ Control signals can be categorized by the pipeline stage that uses them.

Stage Control signals needed

EX ALUSrc ALUOp RegDst
MEM MemRead MemWrite PCSrc
WB RegWrite MemToReg

29
Pipelined datapath and control
1

0
ID/EX
WB EX/MEM
PCSrc
Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C Add
Shift
RegWrite left 2

Instr [15 - 0] Sign

RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1

30
An example execution sequence
▪ Here’s a sample sequence of instructions to execute.
1000: lw $8, 4($29)
addresses in 1004: sub $2, $4, $5
decimal 1008: and $9, $10, $11
1012: or $16, $17, $18
1016: add $13, $14, $0

▪ We’ll make some assumptions, just so we can show actual data values.
— Each register contains its number plus 100. For instance, register $8
contains 108, register $29 contains 129, and so forth.
— Every data memory location contains 99.
▪ Our pipeline diagrams will follow some conventions.
— An X indicates values that aren’t important, like the constant field of
an R-type instruction.
— Question marks ??? indicate values we don’t know, usually resulting
from instructions coming before and after the ones in our example.

31
Cycle 1 (filling)
IF: lw $8, 4($29) ID: ??? EX: ??? MEM: ??? WB: ???

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
1004
C Add
Shift
RegWrite (?) left 2

??? ??? ???

1000 Read Read
register 1 data 1 ALU MemWrite (?)
Read Instruction Zero
??? ???
address [31-0] Read Read ??? ???
0 Result Address
register 2 data 2
??? MemToReg
??? Write Data
Instruction register 1 (?)
memory
memory ??? Registers ALUOp (???)
Write ???
data ALUSrc (?) ??? Write Read
data data 1

??? Sign ???

RegDst (?)
extend MemRead (?) ???
0
??? ???
0 ??? ??? ???
??? ???
1

???

32
Cycle 2
IF: sub $2, $4, $5 ID: lw $8, 4($29) EX: ??? MEM: ??? WB: ???

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C 1008
Add
Shift
RegWrite (?) left 2

29 ???
1004 Read Read
register 1 data 1 ALU MemWrite (?)
Read Instruction Zero
X X ???
address [31-0] Read Read ???
0 Result Address
register 2 data 2
Write ??? MemToReg
??? Data
Instruction register 1 (?)
memory
memory ??? Registers ALUOp (???)
Write ???
data ALUSrc (?) ??? Write Read
data data 1

4 Sign ???
RegDst (?) ???
extend MemRead (?)
0
8 ???
0 ??? ??? ???
X ???
1

???

33
Cycle 3
IF: and $9, $10, $11 ID: sub $2, $4, $5 EX: lw $8, 4($29) MEM: ??? WB: ???

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
1012
C Add
Shift
RegWrite (?) left 2

4 104 129
1008 Read Read
register 1 data 1 ALU MemWrite (?)
Read Instruction Zero
5 X
address [31-0] Read Read 105 ???
0 Result Address
register 2 data 2
??? Write 133 MemToReg
Data
Instruction register 1 (?)
memory
memory ??? Registers ALUOp (add)
Write
??? Write Read ???
data ALUSrc (1)

X Sign 4
RegDst (0)
extend MemRead (?) ???
0
X 8
0 8 ??? ???
2 X
1

???

34
Cycle 4
IF: or $16, $17, $18 ID: and $9, $10, $11 EX: sub $2, $4, $5 MEM: lw $8, 4($29) WB: ???

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
1016
C Add
Shift
RegWrite (?) left 2

10 110 104
1012 Read Read
register 1 data 1 ALU MemWrite (0)
Read Instruction Zero
11 105
address [31-0] Read Read 111 133
0 Result Address
register 2 data 2
–1
??? Write Data MemToReg
Instruction register 1 (?)
memory
memory ??? Registers ALUOp (sub)
Write
data ALUSrc (0) X Write Read 99 ???
data data 1

X Sign X
RegDst (1)
extend MemRead (1) ???
0
X X 0 2 8 ???
9 2
1

???

20
Cycle 5 (full)
IF: add $13, $14, $0 ID: or $16, $17, $18 EX: and $9, $10, $11 MEM: sub $2, $4, $5 WB:
lw $8, 4($29)
1

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C 1020
Add
Shift
RegWrite (1) left 2

17 117 110
1016 Read Read
register 1 data 1 ALU MemWrite (0)
Read Instruction Zero
18 111
address [31-0] Read Read 118 -1
0 Result Address
register 2 data 2
8 Write 110 Data MemToReg
Instruction register 1 (1)
memory
memory 99 Registers ALUOp (and)
Write
data ALUSrc (0) 105 Write Read X 99
data data 1

X Sign X
RegDst (1)
extend MemRead (0) 133
0
X X 0 9 2 8
16 9
1

36
Cycle 6 (emptying)
IF: ??? ID: add $13, $14, $0 EX: or $16, $17, $18 MEM: and $9, $10, $11 WB: sub
$2, $4, $5
1

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
???
C Add
Shift
RegWrite (1) left 2
14 114
117
1020 Read Read
register 1 data 1 ALU MemWrite (0)
Zero
Read Instruction 0 118
Read Read 0 110
address [31-0] 0
register 2 data 2 Result Address
2 119 MemToReg
Write
1 Data
Instruction register (0)
-1 memory
memory Registers ALUOp (or)
Write X
data ALUSrc (0) 111 Write Read
1
data data
X Sign X
RegDst (1)
extend MemRead (0)
0
X X 16 9
0
13 16
1

37
Cycle 7
IF: ??? ID: ??? EX: add $13, $14, $0 MEM: or $16, $17, $18 WB: and
$9, $10, $11
1

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C ???
Add
Shift
RegWrite (1) left 2

??? ??? 114

??? Read Read
register 1 data 1 ALU MemWrite (0)
Read Instruction Zero
??? 0
address [31-0] Read Read ??? 119
0 Result Address
register 2 data 2
9 Write 114 Data MemToReg
Instruction register 1 (0)
memory
memory 110 Registers ALUOp (add)
Write
data ALUSrc (0) 118 Write Read X X

??? Sign X
RegDst (1)
extend MemRead (0) 110
0
??? X 0 13 16 9
??? 13
1

110

38
Cycle 8
IF: ??? ID: ??? EX: ??? MEM: add $13, $14, $0 WB: or $16,
$17, $18
1

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C ???
Add
Shift
RegWrite (1) left 2

??? ??? ???

??? Read Read
register 1 data 1 ALU MemWrite (0)
Read Instruction Zero
??? ???
address [31-0] Read Read ??? 114
0 Result Address
register 2 data 2
16 Write ??? Data MemToReg
Instruction register 1 (0)
memory
memory 119 Registers ALUOp (???)
Write
data ALUSrc (?) 0 Write Read X X

??? Sign ???

RegDst (?)
extend MemRead (0) 119
0
??? ???
0 ??? 13 16
??? ???
1

119

39
Cycle 9
IF: ??? ID: ??? EX: ??? MEM: ??? WB: add
$13, $14, $0
1

0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P
C ???
Add
Shift
RegWrite (1) left 2

??? ??? ???

??? Read Read
register 1 data 1 ALU MemWrite (?)
Read Instruction Zero
??? ???
address [31-0] Read Read ??? ???
0 Result Address
register 2 data 2
13 Write ??? Data MemToReg
Instruction register 1 (0)
memory
memory 114 Registers ALUOp (???)
Write
data ALUSrc (?) ? Write Read X X
data data 1

??? Sign ???

RegDst (?)
extend MemRead (?) 114
0
??? ???
0 ??? ??? 13
??? ???
1

114

40
Performance Revisited
▪ Assuming the following functional unit latencies:

3ns 2ns 2ns 3ns 2ns

Inst Reg
Data Reg

ALU
mem Read
Mem Write

▪ What is the cycle time of a single-cycle implementation?

Add them (3+2+2+3+2) = 12 ns
— What is its throughput?

▪ What is the cycle time of a ideal pipelined implementation?

Take max. time = 3ns
(even if we don’t have number of stages)
— What is its steady-state throughput?

41
The pipelining paradox

lw $t0, 4($sp) Clock cycle

sub $v0, $a0, $a1 1 2 3 4 5 6 7 8 9
IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $sp, $sp, -4 IF ID EX MEM WB
IF ID EX MEM WB
▪ Pipelining does not improve the execution time of any single instruction. Each
instruction here actually takes longer to execute than in a single- cycle datapath
(15ns vs. 12ns)!
▪ Instead, pipelining increases the throughput, or the amount of work done per unit
time. Here, several instructions are executed together in each clock cycle.
▪ The result is improved execution time for a sequence of instructions, such as an
entire program.

42
Pipeline Hazards

• Data hazards
– Dependency: Instruction depends on the result of a previous instruction
still in the pipeline
Add $s0, $t0, $t1

Sub $t2, $s0, $t3

– Stall: add three bubbles (no-ops) to the pipeline
• Structural
– Different instructions trying to use the same functional unit (e.g. memory,
register file)
• Control (branches)
– Target address known only at the end of 3rd cycle => STALLS

CSCE 5610 Computer System Architecture: Instruction Level Parallelism
No ratings yet
CSCE 5610 Computer System Architecture: Instruction Level Parallelism
11 pages
Pipelining
No ratings yet
Pipelining
24 pages
Lec 7 CSE-509 Pipelining
No ratings yet
Lec 7 CSE-509 Pipelining
27 pages
What Is The Most Boring Household Activity?
No ratings yet
What Is The Most Boring Household Activity?
27 pages
6multicycle Datapath
No ratings yet
6multicycle Datapath
11 pages
Lecture10 - Chapter4-P2
No ratings yet
Lecture10 - Chapter4-P2
46 pages
MIPS Multicycle Datapath Guide
No ratings yet
MIPS Multicycle Datapath Guide
22 pages
Computer Architecture Lecture
No ratings yet
Computer Architecture Lecture
34 pages
The Final Datapath: Add M U X
No ratings yet
The Final Datapath: Add M U X
32 pages
MIPS Architecture for IT Students
No ratings yet
MIPS Architecture for IT Students
29 pages
4 The Processors
No ratings yet
4 The Processors
112 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Pipelining ControlUnitAndHazards
No ratings yet
Pipelining ControlUnitAndHazards
109 pages
L24 Pipeline
No ratings yet
L24 Pipeline
40 pages
Processor
No ratings yet
Processor
184 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
Cmps343cpu Parta
No ratings yet
Cmps343cpu Parta
25 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
80 pages
CPU Implementation for ECE Students
No ratings yet
CPU Implementation for ECE Students
15 pages
Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path
No ratings yet
Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path
27 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
Computer Organization & Assembly Language: CS/COE0447
No ratings yet
Computer Organization & Assembly Language: CS/COE0447
82 pages
MIPS Processor Design Basics
No ratings yet
MIPS Processor Design Basics
13 pages
DDCO Notes-162-171
No ratings yet
DDCO Notes-162-171
10 pages
The Processor: (Datapath and Pipelining)
No ratings yet
The Processor: (Datapath and Pipelining)
144 pages
Computer Architecture Lecture
No ratings yet
Computer Architecture Lecture
31 pages
MIPS Processor Basics for Engineers
No ratings yet
MIPS Processor Basics for Engineers
25 pages
Lecture08 RISCV Impl 2
No ratings yet
Lecture08 RISCV Impl 2
55 pages
L13 Multicycle Datapath
No ratings yet
L13 Multicycle Datapath
62 pages
Lec7 Pipelining
No ratings yet
Lec7 Pipelining
22 pages
5 Singlecycle
No ratings yet
5 Singlecycle
60 pages
Module-5 DDCO
No ratings yet
Module-5 DDCO
35 pages
Lec07 Annotated
No ratings yet
Lec07 Annotated
26 pages
Single Cycle Vs Multi Cycle Cpu
No ratings yet
Single Cycle Vs Multi Cycle Cpu
11 pages
Ch#4 Part 1, 2,34
No ratings yet
Ch#4 Part 1, 2,34
70 pages
Multi Cycle PDF
No ratings yet
Multi Cycle PDF
16 pages
CPU Instruction Execution Guide
No ratings yet
CPU Instruction Execution Guide
15 pages
MIPS Single Cycle Datapath Guide
No ratings yet
MIPS Single Cycle Datapath Guide
61 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Chapter 3 General-Purpose Processors: Software
No ratings yet
Chapter 3 General-Purpose Processors: Software
44 pages
Chapter 4 Notes
No ratings yet
Chapter 4 Notes
32 pages
Single Cycle
No ratings yet
Single Cycle
28 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
KAIST cs311 05 Proc I
No ratings yet
KAIST cs311 05 Proc I
28 pages
MIPS CPU Data Path Design Guide
No ratings yet
MIPS CPU Data Path Design Guide
8 pages
Chapter7 - Basic Processing Unit 1
No ratings yet
Chapter7 - Basic Processing Unit 1
31 pages
Lec6 Multi Cycle
No ratings yet
Lec6 Multi Cycle
19 pages
Comparch 04
No ratings yet
Comparch 04
73 pages
Chapter V Processor Architecture
No ratings yet
Chapter V Processor Architecture
140 pages
Single Cycle Datapath PDF
No ratings yet
Single Cycle Datapath PDF
30 pages
Design of 32bit MIPS Processor
No ratings yet
Design of 32bit MIPS Processor
23 pages
Chapter 11 Single Cycle Datapath
No ratings yet
Chapter 11 Single Cycle Datapath
17 pages
Pipelining 2
No ratings yet
Pipelining 2
33 pages
CA04 2024S2 Printout
No ratings yet
CA04 2024S2 Printout
31 pages
Digital Design & CPU Basics
No ratings yet
Digital Design & CPU Basics
10 pages
Lec5b-Singlecycle - Datapath
No ratings yet
Lec5b-Singlecycle - Datapath
35 pages
RISC Processor Design: Multi-Cycle Cycle Implementation: Mips
No ratings yet
RISC Processor Design: Multi-Cycle Cycle Implementation: Mips
49 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
PracticeSheetCPU PipeliningSoln
No ratings yet
PracticeSheetCPU PipeliningSoln
6 pages
Cmps343 - Lab 7
No ratings yet
Cmps343 - Lab 7
2 pages
Cmps343 Lab 10
No ratings yet
Cmps343 Lab 10
11 pages
CMPS343Chapter2 - PartA - MHD
No ratings yet
CMPS343Chapter2 - PartA - MHD
23 pages
Memory
No ratings yet
Memory
42 pages
ICC Article
No ratings yet
ICC Article
8 pages
SQL Syntax Quick Reference Guide
No ratings yet
SQL Syntax Quick Reference Guide
6 pages
Bilal Philips Thesis
100% (3)
Bilal Philips Thesis
4 pages
Writing Module Language Bank - IELTS Exam
No ratings yet
Writing Module Language Bank - IELTS Exam
4 pages
Unit 1: Family Life Lesson 1: Speaking: Checking
No ratings yet
Unit 1: Family Life Lesson 1: Speaking: Checking
2 pages
Xhosa Level 1V12 A5Manual 2019
100% (1)
Xhosa Level 1V12 A5Manual 2019
97 pages
Unit V Development of Surfaces
No ratings yet
Unit V Development of Surfaces
6 pages
Practice Set 999
No ratings yet
Practice Set 999
3 pages
Evaluation and Application of Credible Sources
No ratings yet
Evaluation and Application of Credible Sources
3 pages
WIDA PRIME 2020 Rubric and Portfolio Workbook 1
No ratings yet
WIDA PRIME 2020 Rubric and Portfolio Workbook 1
54 pages
Praying in Tongues Guide
No ratings yet
Praying in Tongues Guide
4 pages
Seminar 2
No ratings yet
Seminar 2
16 pages
JAVA MCQ-CSA0917-word Format
No ratings yet
JAVA MCQ-CSA0917-word Format
6 pages
Prime Time 4 SB PDF
No ratings yet
Prime Time 4 SB PDF
205 pages
Wa0034.
No ratings yet
Wa0034.
11 pages
Salesforce Social Studio Instructions
No ratings yet
Salesforce Social Studio Instructions
14 pages
Docx4j: Comprehensive Guide & Usage
No ratings yet
Docx4j: Comprehensive Guide & Usage
45 pages
Nikto
No ratings yet
Nikto
15 pages
Department of Education: Monitoring Tool For The Reading Level of Grs. 1 - 3 Pupils
100% (1)
Department of Education: Monitoring Tool For The Reading Level of Grs. 1 - 3 Pupils
3 pages
Sobha - Aranya - Unit Plan
No ratings yet
Sobha - Aranya - Unit Plan
12 pages
Viva Questions and Answers
No ratings yet
Viva Questions and Answers
4 pages
Lab Experiment 08 Complement
No ratings yet
Lab Experiment 08 Complement
4 pages
50 Common English Phrasal Verbs
No ratings yet
50 Common English Phrasal Verbs
61 pages
Art App Lesson 2 Philosophical Foundation of Art
No ratings yet
Art App Lesson 2 Philosophical Foundation of Art
15 pages
Crash Course Syllabus
No ratings yet
Crash Course Syllabus
1 page
ICSE Chapter Tracker Formula Notes Guide
No ratings yet
ICSE Chapter Tracker Formula Notes Guide
6 pages
Premchand The Child
No ratings yet
Premchand The Child
10 pages
PREPARE 2 Grammar Standard Unit 13
No ratings yet
PREPARE 2 Grammar Standard Unit 13
2 pages
Evans Levinson BBS Response
No ratings yet
Evans Levinson BBS Response
21 pages
F1 Second Exam 5 (14-15 劉金龍 Final Exam) (modified)
No ratings yet
F1 Second Exam 5 (14-15 劉金龍 Final Exam) (modified)
8 pages

Pipe Lining

Uploaded by

Pipe Lining

Uploaded by

Pipelining

▪ Assuming you’ve got:

— One drier (takes 40 minutes)

— One “folder” (takes 20 minutes)

▪ It takes 90 minutes to wash, dry, and fold 1 load of laundry.

▪ If each load is done sequentially it takes 6 hours

▪ Pipelined laundry takes 3.5 hours

▪ Pipelining doesn’t help latency of

▪ However, as we saw, not all instructions need all five steps.

register 1 data 1 Read Read

Read Instruction MemWrite MemToReg

Read Instruction MemWrite MemToReg

Read Instruction MemWrite MemToReg

Read Instruction MemWrite MemToReg

Idle Instruction Decode (ID)

Read Instruction MemWrite MemToReg

Fetch 2nd Decode 1st instruction

Read Instruction MemWrite MemToReg

Fetch 3rd Decode 2nd Execute 1st

Read Instruction MemWrite MemToReg

Read Instruction MemWrite MemToReg

2ns 2ns 2ns 2ns

▪ A pipeline diagram shows the execution of a series of instructions.

filling full emptying

▪ The pipeline depth is the number of stages—in this case, five.

▪ We need to perform several operations in the same cycle.

▪ R-type instructions only require 4 stages: IF, ID, EX, and WB

▪ Each functional unit can only be used once per instruction

• Stores and Branches have NOP stages, too…

▪ Pipelining attempts to maximize instruction throughput by overlapping

IF/ID ID/EX EX/MEM MEM/WB

Instr [15 - 0] Sign

Stage Control signals needed

Instr [15 - 0] Sign

??? ??? ???

??? Sign ???

??? ??? 114

??? ??? ???

??? Sign ???

??? ??? ???

??? Sign ???

3ns 2ns 2ns 3ns 2ns

▪ What is the cycle time of a single-cycle implementation?

▪ What is the cycle time of a ideal pipelined implementation?

lw $t0, 4($sp) Clock cycle

Sub $t2, $s0, $t3

You might also like