0% found this document useful (0 votes)
15 views73 pages

1 Processor Pipeline

The document outlines the concept of pipelining in processor architecture, explaining its benefits and implementation through examples like a car assembly line. It covers the pipelined datapath, execution timing, hazards, and branch prediction, emphasizing the speedup achieved by pipelining compared to non-pipelined execution. Additionally, it discusses the importance of clocked elements and the overall impact on instruction throughput.

Uploaded by

chanddank10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views73 pages

1 Processor Pipeline

The document outlines the concept of pipelining in processor architecture, explaining its benefits and implementation through examples like a car assembly line. It covers the pipelined datapath, execution timing, hazards, and branch prediction, emphasizing the speedup achieved by pipelining compared to non-pipelined execution. Additionally, it discusses the importance of clocked elements and the overall impact on instruction throughput.

Uploaded by

chanddank10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

The Pipeline

Module Outline

Why Pipeline?
– How to pipeline?

Speedup of the pipeline

Pipelined datapath
– Execution of instructions
– Pipeline Timing diagram

Dependences, Hazards
– Structural, Data, Control
– Stalling, Forwarding

Branch prediction
Car Assembly Line Example

Unpipelined Assembly Line
– Team of engineers build a full car

Car Assembly Line Example

Unpipelined Assembly Line
– Team of engineers build a full car

Car 1

24 48 72 96
Car Assembly Line Example

Unpipelined Assembly Line
– Team of engineers build a full car

Car 1

Car 2

Car 3

Car 4

24 48 72 96
Car Assembly Line Example

Pipelined Assembly Line
– 3 (equal) stages to build a Car
– Stages don’t interfere
– Stage 2 consumes previous stage’s output
Car Assembly Line Example

Pipelined Assembly Line
– 3 (equal) stages to build a Car
– Split the team into 3. Each team works on one
stage only
– 8H per stage
Car Assembly Line Example

Pipelined Assembly Line
– Car 1-Stage 1

C1-S1

24 48 72 96
Car Assembly Line Example

Pipelined Assembly Line
– Car 1 moves to Stage 2

C1-S1 C1-S2

24 48 72 96
Car Assembly Line Example

Pipelined Assembly Line
– Team 1 is free. Takes up Car 2
– Team 1 and 2 are working
C1-S1 C1-S2

C2-S1

24 48 72 96
Car Assembly Line Example

Pipelined Assembly Line
– 3 cars are in different stages of production

C1-S1 C1-S2 C1-S3

C2-S1 C2-S2

C3-S1

24 48 72 96
Car Assembly Line Example

Pipelined Assembly Line
– 3 cars are in different stages of production
– Car 1 rolls out
C1-S1 C1-S2 C1-S3

C2-S1 C2-S2

C3-S1

24 48 72 96
Car Assembly Line Example

Pipelined Assembly Line
– Steady state

C1-S1 C1-S2 C1-S3

C2-S1 C2-S2 C2-S3

C3-S1 C3-S2 C3-S3

C4-S1 C4-S2 ...

C5-S1 ...
24 48 72 96
Car Assembly Line Example

Pipelined Assembly Line
– One car rolls out every 8 hours (24/3)

C1-S1 C1-S2 C1-S3

C2-S1 C2-S2 C2-S3

C3-S1 C3-S2 C3-S3

C4-S1 C4-S2 ...

C5-S1 ...
24 48 72 96
Car Assembly Line Example

Non-pipelined vs. Pipelined implementation

Speedup
The Processor Datapath
The Processor Datapath
Execution and Timing – Load
Execution and Timing – Loads

Instruction
InstructionFetch
Fetch(IF)
(IF)
Execution and Timing – Loads

ID:
ID:Instruction
Instructiondecode/
decode/
Register file read
Register file read
Execution and Timing – Loads

EX:
EX:Execution/
Execution/
Address
AddressCalculation
Calculation
Execution and Timing – Loads

MEM:
MEM:Memory
Memory
Access
Access
Execution and Timing – Loads

WB:
WB:Write
Write
Back
Back
Datapath Stages

ID:
ID:Instruction
Instructiondecode/
decode/ EX:
EX:Execution/
Execution/ MEM:
MEM:Memory
Memory WB:
WB:Write
Write
Instruction
InstructionFetch
Fetch(IF)
(IF) Address
Register file read
Register file read AddressCalculation
Calculation Access
Access Back
Back
RISC-V Datapath

IF ID EX MEM WB
RISC-V Datapath

Where are the clocked elements?

IF ID EX MEM WB
RISC-V Datapath

Clocked elements

IF ID EX MEM WB
Datapath – Observations

During instruction fetch (IF), other components
are idle
– During ID, ....
Datapath – Observations

During instruction fetch (IF), other components
are idle
– During ID, ....

Improve hardware utilization
– The entire datapath should be busy
Pipelining the Datapath

Pipeline the datapath
Pipelining the Datapath

Pipeline the datapath

IF ID EX MEM WB
Pipelining – Desired Effect

Instr i IF
Pipelining – Desired Effect

Instr i IF ID
Pipelining – Desired Effect

Instr i IF ID

Instr i+1 IF
Pipelining – Desired Effect

Instr i IF ID EX

Instr i+1 IF ID

Instr i+2 IF
Pipelining – Desired Effect

Instr i IF ID EX MEM

Instr i+1 IF ID EX

Instr i+2 IF ID

Instr i+3 IF
Pipelining – Desired Effect

Instr i IF ID EX MEM WB

Instr i+1 IF ID EX MEM WB

Instr i+2 IF ID EX MEM WB

Instr i+3 IF ID EX MEM W

Instr i+4 IF ID EX M

Instr i+5 IF ID E

Instr i+6 IF ID
Pipelining – Desired Effect

Steady
SteadyState
State
Instr i IF ID EX MEM WB

Instr i+1 IF ID EX MEM WB

Instr i+2 IF ID EX MEM WB

Instr i+3 IF ID EX MEM W

Instr i+4 IF ID EX M

Instr i+5 IF ID E

Instr i+6 IF ID
Need for Clocked Elements

IF ID EX MEM WB
0.5ns

ld x10, 4(x11)
add x12, x13, x14
Clocked Elements in Pipeline

Problem: Signal overwrite/interference

Stage
Stagei i Stage
Stagei+1
i+1
Flip Flop Waveforms

Clock
Clock

DD

QQ

Clk-Q
Delay
Clocked Elements in Pipeline

Problem: Signal overwrite/interference

Solution: Clocked elements (FF or a Latch)

Stage
Stagei i Stage
Stagei+1
i+1
Clock Speed – Non-Pipelined vs. Pipelined

IF ID EX MEM WB

1.0 ns 0.5ns 0.8ns 1.0ns 0.5ns

TT==3.8ns
3.8ns
Clock Speed – Non-Pipelined vs. Pipelined

IF ID EX MEM WB

1.0 ns 0.1ns
Clock Speed – Non-Pipelined vs. Pipelined

IF ID EX MEM WB

1.0 ns 0.1ns

TT=T
=Tcc+T
+Tffff
TT==1.1ns
1.1ns
Clock Speed – Non-Pipelined vs. Pipelined

IF ID EX MEM WB

1.0 ns 0.1ns

TT=T
=Tcc+T
+Tffff
TT==1.1ns
1.1ns

CPI
CPI==1.0
1.0in
inboth
bothcases
cases
Pipelined Datapath

ID:
ID:Instruction
Instructiondecode/ MEM:
decode/ MEM:Memory
Memory
Register file read
Register file read Access
Instruction Access WB:
WB:Write
InstructionFetch
Fetch(IF)
(IF) EX:
EX:Execution/
Execution/ Write
Address Back
AddressCalculation
Calculation Back
Pipelined Execution – Load

Instruction
InstructionFetch
Fetch(IF)
(IF)
Pipelined Execution – Load

ID:
ID:Instruction
Instructiondecode/
decode/
Register file read
Register file read
Pipelined Execution – Load

EX:
EX:Execution/
Execution/
Address
AddressCalculation
Calculation
Pipelined Execution – Load

MEM:
MEM:Memory
Memory
Access
Access
Pipelined Execution – Load

WB:
WB:Write
Write
Back
Back
Pipelined Datapath
Pipelined Control
Execution Sequence
Execution Sequence – Non
pipelined

Time (clock cycles)

ld
ld IF ID EX MA WB
Execution Sequence – Non
pipelined

Time (clock cycles)

ld
ld IF ID EX MA WB

sub
sub IF ID EX MA WB
Execution Sequence – Non
pipelined

Time (clock cycles)

ld
ld IF ID EX MA WB

sub
sub IF ID EX MA WB

add
add IF ID EX

ld
ld
add
add
Execution Sequence – Pipelined

Time (clock cycles)

ld
ld IF
Execution Sequence – Pipelined

Time (clock cycles)

ld
ld IF ID

sub
sub IF
Execution Sequence – Pipelined

Time (clock cycles)

ld
ld IF ID EX

sub
sub IF ID

add
add IF
Execution Sequence – Pipelined

Time (clock cycles)

ld
ld IF ID EX MA

sub
sub IF ID EX

add IF ID
add

ld IF
ld
Execution Sequence – Pipelined

Time (clock cycles)

ld
ld IF ID EX MA WB

sub
sub IF ID EX MA

add IF ID EX
add

ld IF ID
ld
add IF
add
Execution Sequence – Pipelined

Time (clock cycles)

ld
ld IF ID EX MA WB

sub
sub IF ID EX MA WB

add IF ID EX MA WB
add

ld IF ID EX MA WB
ld
add IF ID EX MA WB
add
Execution Sequence – Pipelined
Execution Sequence – Pipelined
Pipelined vs. Nonpipelined
Implementation

Ratio of execution times?
– For 106 instructions?
Speedup of the Pipeline

The speedup of a k stage pipelined processor
over an unpipelined processor

TT unpipelined
unpipelined n⋅k
n⋅k
SSkk=
= =
= ≈k
≈k
TT pipelined
pipelined
n+(k
n+(k−1)
−1)

n: number of instructions in the program.


k: number of pipeline stages
Pipelined vs. Nonpipelined
Implementation

Pipelining increases the instruction throughput
opposed to individual instruction execution
time.
Module Outline

Why Pipeline?
– How to pipeline?

Speedup of the pipeline

Pipelined datapath
– Execution of instructions
– Pipeline Timing diagram

Dependences, Hazards
– Structural, Data, Control
– Stalling, Forwarding

Branch prediction
Pipelined vs. Nonpipelined
Implementation

Ratio of execution times between the two?
– For 106 instructions?

Pipelining increases the instruction throughput
opposed to individual instruction execution
time.

IF ID EX MEM WB
Pipelined vs. Nonpipelined Implementation
Pipelined vs. Nonpipelined Implementation

You might also like