0% found this document useful (0 votes)

89 views30 pages

HRY-312 Computer Organization Introduction To Pipelining

This document introduces pipelining by discussing the five stages of a pipelined processor: instruction fetch, register fetch and decode, execute, memory access, and write back. It explains how pipelining can improve performance by overlapping the execution of multiple instructions. However, pipelining can introduce hazards like structural hazards from competing for the same resource, control hazards from branches, and data hazards from dependencies between instructions. Various techniques for dealing with hazards are presented, such as stalling, predicting branches, delaying branches, and forwarding results between stages.

Uploaded by

gami2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views30 pages

HRY-312 Computer Organization Introduction To Pipelining

Uploaded by

gami2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

HRY-312 Computer Organization Lecture 6 Introduction to Pipelining

Recall: Performance Evaluation What is the average CPI?

state diagram gives CPI for each instruction type workload gives frequency of each type Type Arith/Logic Load Store branch CPIi for type 4 5 4 3 Frequency 40% 30% 10% 20% CPIi x freqIi 1.6 1.5 0.4 0.6

Average CPI:4.1

Can we get CPI < 4.1? Seems to be lots of idle hardware

Why not overlap instructions???
PCWr PCWrCond Zero IorD MemWr IRWr
32 32 32 32 0 RAdr Rs 32 Rt 5 5 0

PCSrc RegDst RegWr ALUSelA

Mux x

PC Mux M
1 32 32

0 32

Zero ALU Out A ALU A

Instru uction Reg

Mux

Ra Rb Rw busW busB busA b A A

Ideal Memory
WrAdr 32 Din Dout

Mem Da Reg ata

Rt 0 Rd 1

Reg File B

0 1 2 3 32

1 Mux 0

Mux

<< 2

ALU Control

Imm 16

Extend

ALUOp ALUSelB

ExtOp

MemtoReg

The Big Picture: Where are We Now? The Five Classic Components of a Computer
Processor Input Control Memory Datapath

Output

Next Topics:
Pipelining by Analogy Pi li hazards Pipeline h d

Pipelining is Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes 20 minutes Folder A B C D

Sequential Laundry 6 PM 7 8 9
Time

Midnight

30 40 20 30 40 20 30 40 20 30 40 20
T a s k O r d e r

A B C D Sequential laundry takes 6 hours for 4 loads pipelining, If they learned pipelining how long would laundry take?

Pipelined Laundry: Start work ASAP 6 PM 7 8 9

Time

Midnight

30 40
T a s k O r d e r

40 20

A B C D Pipelined laundry takes 3.5 hours for 4 loads

Pipelining Lessons 6 PM 7 8 9
Time

Pipelining doesnt help latency of single task, it helps throughput of entire workload kl d Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup Stall for Dependences

30 40
T a s k O r d e r

40 20

A B C D

The Five Stages of Load

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Load Ifetch L d If t h

Reg/Dec R /D

Exec E

Mem M

Wr W

Ifetch: Instruction Fetch

Fetch the instruction from the Instruction Memory

Reg/Dec: Register Fetch and Instruction Decode Exec: Calculate the memory address Mem: Read the data from the Data Memory Wr: Write the data back to the register file

Note: These 5 stages were there all along Fetch h

IR <= MEM[PC]
PC <= PC + 4

0000

D Decode e

ALUout <= PC +SX

0001 R-type
ALUout <= A fun B

M Memory y Writ te-back k Execu ute

ORi
ALUout <= A op ZX

LW
ALUout <= A + SX

SW
ALUout <= A + SX

BEQ
If A = B then PC <= ALUout

0100

0110

1000
M <= MEM[ALUout]

1011
MEM[ALUout] <= B

0010

1001
R[rd] <= ALUout R[rt] <= ALUout R[rt] <= M

1100

0101

0111

1010

Pipelining Improve performance by increasing throughput

Ideal speedup is number of stages in the pipeline. Do we achieve this?

Basic Idea

What do we need to add to split the datapath into stages?

Graphically Representing Pipelines

Can help with answering questions like:

how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths

Conventional Pipelined Execution Representation

Time IFetch Dcd IF t h D d Exec E Mem M Exec WB Mem Exec WB Mem Exec WB Mem Exec WB Mem Exec WB Mem WB

IFetch Dcd

IFetch Dcd Program Flow

IFetch Dcd

Single Cycle, Multiple Cycle, vs. Pipeline

Cycle 1 Clk Single Cycle Implementation: Load Store Waste Cycle 2

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load Ifetch Reg g Exec Mem

Store Ifetch

Reg g

Exec

Mem

R-type Ifetch

Pipeline Implementation: Load Ifetch Reg Exec Reg Mem Exec Reg Wr Mem Exec Wr Mem Wr

Store Ifetch

R-type Ifetch

Why Pipeline? Suppose we execute 100 instructions Si l C l M hi Single Cycle Machine

45 ns/cycle x 1 CPI x 100 inst = 4500 ns

M lti Multicycle Machine l M hi

10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns

Id l pipelined machine Ideal i li d hi

10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

Why pipeline (cont.)?

Time (clock cycles)

ALU

I n s t r. O r d e r

Inst 0 Inst 1 Inst 2 Inst I t3 Inst I t4

Reg

Dm ALU

Reg

Dm ALU

Reg

Dm ALU

Reg

Dm ALU A

Reg

Can pipelining get us into trouble? Yes: Pipeline Hazards

structural hazards: attempt to use the same resource two different ways at the same time - E E.g., combined washer/dryer would be a structural hazard bi d h /d ld b t t lh d or folder busy doing something else (watching TV) control hazards: attempt to make a decision before condition is evaluated - E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in - b branch instructions hi t ti data hazards: attempt to use item before it is ready - E.g., one sock of pair in dryer and one in washer; cant fold g, p y ; until get sock from washer through dryer - instruction depends on result of prior instruction still in the pipeline

Can always resolve hazards by waiting

pipeline control must detect the hazard take action (or delay action) to resolve hazards

Single Memory is a Structural Hazard

Time (clock cycles)

ALU

I n s t r. O r d e r

Load Instr 1 Instr 2 Instr 3 Instr 4

Mem

Reg

Mem ALU

Reg

Mem

Reg

Mem ALU A

Reg

Mem

Reg

Mem AL LU

Reg

Mem

Reg

Mem ALU U

Reg

Mem

Reg

Mem

Reg

Detection is easy in this case! (right half highlight means read, left half write)

Structural Hazards limit performance Example: if 1.3 memory accesses per instruction and only one memory access per cycle then
average CPI 1.3 otherwise resource is more than 100% utilized

Control Hazard Solution #1: Stall

I n s t r. O r d e r Time (clock cycles)
AL LU

Add Beq Load

Mem

Reg

Mem ALU U

Reg

Mem

Reg

Mem

Reg

ALU U

Lost potential

Mem

Reg

Mem

Reg

Stall: wait until decision is clear Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) => slow i t ti ) l Move decision to end of decode
save 1 cyc e pe b a c sa e cycle per branch

Control Hazard Solution #2: Predict

I n s t r. O r d e r Time (clock cycles) Ti ( l k l )
ALU

Add Beq Load

Mem

Reg

Mem ALU

Reg

Mem

Reg

Mem

Reg

ALU

Mem

Reg

Mem

Reg

Predict: guess one direction then back up if wrong Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right - 50% of time)
Need to Squash and restart following instruction if wrong Produce CPI on branch of (1 *.5 + 2 * .5) = 1.5 Total CPI might then be: 1.5 * .2 + 1 * .8 = 1.1 (20% branch)

More dynamic scheme: history of 1 branch (- 90%)

Control Hazard Solution #3: Delayed Branch

I n s t r. O r d e r Time (clock cycles)
AL LU

Add Beq Misc Load

Mem M

Reg R

Mem M ALU U

Reg R

Mem

Reg g

Mem ALU U

Reg

Mem

Reg Mem

Mem ALU

Reg Mem Reg

Reg

Delayed Branch: Redefine branch behavior (takes place after next instruction) Impact: 0 clock cycles per branch instruction if can find instruction to put in slot (- 50% of time) As launch more instruction per clock cycle less useful cycle,

Data Hazard on r1

add r1,r2,r3 , , sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11

Data Hazard on r1: Dependencies backwards in time are hazards

Time (clock cycles) I F Im ID/R F Reg
Im

I n s t r. O r d e r

add r1,r2,r3 , , sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9

E X

ME M Dm
ALU

W B Reg
Dm ALU Reg

Reg

ALU

Reg

Dm ALU A

Reg

Dm AL LU

Reg

xor r10,r1,r11

Reg R

Reg

Data Hazard Solution: Forward result from one stage to another

Time (clock cycles) I F Im ID/R F Reg
Im

I n s t r. O r d e r

add r1,r2,r3 , , sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9

E X

ME M Dm
ALU

W B Reg
Dm ALU Reg

Reg

ALU

Reg

Dm ALU A

Reg

Dm AL LU

Reg

xor r10,r1,r11

Reg R

Reg

or OK if define read/write properly

Forwarding (or Bypassing): What about Loads? Dependencies backwards in time are hazards
Time (clock cycles) I F Im ID/R F Reg
Im

lw r1,0(r2) , ( )

E X

ME M Dm
ALU

W B Reg
Dm Reg

sub r4,r1,r3

Reg

Cant solve with forwarding: Must delay/stall instruction dependent on loads

ALU

Forwarding (or Bypassing): What about Loads Dependencies backwards in time are hazards
Time (clock cycles) I F Im ID/R F Reg E X ME M Dm
Reg

lw r1,0(r2) , ( )

W B Reg
ALU Dm Reg

ALU

sub r4,r1,r3

Stall

Cant solve with forwarding: Must delay/stall instruction dependent on loads

Designing a Pipelined Processor Go back and examine your datapath and control diagram associated resources with states conflict ensure that flows do not conflict, or figure out how to resolve assert control in appropriate stage

Summary: Pipelining Reduce CPI by overlapping many instructions

Average throughput of approximately 1 CPI with fast clock

Utilize capabilities of the Datapath

start next instruction while working on the current one limited by length of longest stage (plus fill/flush) detect and resolve hazards

What makes it easy

all instructions are the same length just a few instruction formats memory operands appear only in loads and stores

What makes it hard?

structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction

Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Moduel 5
No ratings yet
Moduel 5
46 pages
Pipeline
No ratings yet
Pipeline
39 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
No ratings yet
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
27 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Advanced Pipelining Techniques
No ratings yet
Advanced Pipelining Techniques
44 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
Chapter 6
No ratings yet
Chapter 6
43 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
Pipeline
100% (2)
Pipeline
8 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
Ca 5
No ratings yet
Ca 5
12 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
Lecture # Pipelining
No ratings yet
Lecture # Pipelining
36 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
Lec 1
No ratings yet
Lec 1
30 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
85 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Bản Sao Của Lecture 9 - Pipelined Processor Design
No ratings yet
Bản Sao Của Lecture 9 - Pipelined Processor Design
11 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
Pipelined Processor Design: Computer Architecture and Assembly Language
No ratings yet
Pipelined Processor Design: Computer Architecture and Assembly Language
22 pages
Pipelining 2019
No ratings yet
Pipelining 2019
82 pages
Basic Pipelining: CS2100 - Computer Organization
No ratings yet
Basic Pipelining: CS2100 - Computer Organization
83 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Week 11
No ratings yet
Week 11
33 pages
CPU Architecture Essentials
No ratings yet
CPU Architecture Essentials
40 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
10 Pipelining
No ratings yet
10 Pipelining
44 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
26 pages
Unit V
No ratings yet
Unit V
23 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Lec 06
No ratings yet
Lec 06
18 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
Computer Architecture Insights
100% (1)
Computer Architecture Insights
55 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
55 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
Introduction To Pipelining Introduction To Pipelining
No ratings yet
Introduction To Pipelining Introduction To Pipelining
35 pages
Pipelining
No ratings yet
Pipelining
43 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
Analisis Del Matrimonio Zulu
No ratings yet
Analisis Del Matrimonio Zulu
117 pages
Forecasting
No ratings yet
Forecasting
2 pages
A Seminar On Practical Training Taken at Western Central Railway, Kota
No ratings yet
A Seminar On Practical Training Taken at Western Central Railway, Kota
25 pages
15 Power Converter Technologies for Photovoltaic Systems and Energy Storage - คุณณัฐนันท์
No ratings yet
15 Power Converter Technologies for Photovoltaic Systems and Energy Storage - คุณณัฐนันท์
24 pages
Env 6500
No ratings yet
Env 6500
7 pages
Lines and Angles Maths Assignment Class 9 CBSE
70% (10)
Lines and Angles Maths Assignment Class 9 CBSE
2 pages
Grade Thresholds - November 2016: Cambridge IGCSE Physics (0625)
No ratings yet
Grade Thresholds - November 2016: Cambridge IGCSE Physics (0625)
2 pages
DeZURIK KCG Knife Gate Valves
No ratings yet
DeZURIK KCG Knife Gate Valves
12 pages
Advanced Chromatography Guide
No ratings yet
Advanced Chromatography Guide
9 pages
In Vitro Study of Plant Growth Promoting Rhizobacteria (PGPR) and
No ratings yet
In Vitro Study of Plant Growth Promoting Rhizobacteria (PGPR) and
13 pages
PRACTICE văn học anh mỹ
No ratings yet
PRACTICE văn học anh mỹ
6 pages
Housing Theory
No ratings yet
Housing Theory
7 pages
Joncryl 668 TDS
No ratings yet
Joncryl 668 TDS
3 pages
Magnetic Survey Techniques Guide
No ratings yet
Magnetic Survey Techniques Guide
5 pages
Chemical Import Data Analysis
No ratings yet
Chemical Import Data Analysis
44 pages
MiCOM P442 Device Configuration
No ratings yet
MiCOM P442 Device Configuration
16 pages
Structural and Non-Structural Measures in Flood Risk Management
No ratings yet
Structural and Non-Structural Measures in Flood Risk Management
10 pages
Math 10 Diagnostic
No ratings yet
Math 10 Diagnostic
2 pages
Demon Possessed Stars
100% (3)
Demon Possessed Stars
49 pages
Stockbalanceby Alat Appl
No ratings yet
Stockbalanceby Alat Appl
14 pages
Exp-01-Study of Different Types of Hand Tools Used in Workshop
67% (6)
Exp-01-Study of Different Types of Hand Tools Used in Workshop
9 pages
Hiatal Hernia Pathophysiology - Schematic Diagram
100% (1)
Hiatal Hernia Pathophysiology - Schematic Diagram
1 page
Ca610d Dynapac PDF
100% (1)
Ca610d Dynapac PDF
218 pages
Reading Comprehension
No ratings yet
Reading Comprehension
119 pages
1981-NSW Crown Lands Office Survey Directions - Text
No ratings yet
1981-NSW Crown Lands Office Survey Directions - Text
45 pages
PART 8 ENVIRONMENT AND ECOLOGY சுற்றுச்சூழல் மற்றும் சூழலியல்
No ratings yet
PART 8 ENVIRONMENT AND ECOLOGY சுற்றுச்சூழல் மற்றும் சூழலியல்
61 pages
Annabel Karmel
100% (1)
Annabel Karmel
213 pages
Homework PDF
No ratings yet
Homework PDF
4 pages
Eclipse Immersojet Burner: Model Ij002
No ratings yet
Eclipse Immersojet Burner: Model Ij002
2 pages
Mastery Test in Philo
No ratings yet
Mastery Test in Philo
2 pages

HRY-312 Computer Organization Introduction To Pipelining

Uploaded by

HRY-312 Computer Organization Introduction To Pipelining

Uploaded by

HRY-312 Computer Organization Lecture 6 Introduction to Pipelining

Recall: Performance Evaluation What is the average CPI?

Can we get CPI < 4.1? Seems to be lots of idle hardware

PCSrc RegDst RegWr ALUSelA

Zero ALU Out A ALU A

Instru uction Reg

Ra Rb Rw busW busB busA b A A

Mem Da Reg ata

Pipelined Laundry: Start work ASAP 6 PM 7 8 9

A B C D Pipelined laundry takes 3.5 hours for 4 loads

The Five Stages of Load

Ifetch: Instruction Fetch

Note: These 5 stages were there all along Fetch h

ALUout <= PC +SX

M Memory y Writ te-back k Execu ute

Pipelining Improve performance by increasing throughput

Ideal speedup is number of stages in the pipeline. Do we achieve this?

What do we need to add to split the datapath into stages?

Graphically Representing Pipelines

Can help with answering questions like:

Conventional Pipelined Execution Representation

IFetch Dcd Program Flow

Single Cycle, Multiple Cycle, vs. Pipeline

Why Pipeline? Suppose we execute 100 instructions Si l C l M hi Single Cycle Machine

M lti Multicycle Machine l M hi

Id l pipelined machine Ideal i li d hi

Why pipeline (cont.)?

Inst 0 Inst 1 Inst 2 Inst I t3 Inst I t4

Can pipelining get us into trouble? Yes: Pipeline Hazards

Can always resolve hazards by waiting

Single Memory is a Structural Hazard

Load Instr 1 Instr 2 Instr 3 Instr 4

Control Hazard Solution #1: Stall

Add Beq Load

Control Hazard Solution #2: Predict

Add Beq Load

More dynamic scheme: history of 1 branch (- 90%)

Control Hazard Solution #3: Delayed Branch

Add Beq Misc Load

Reg Mem Reg

add r1,r2,r3 , , sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11

Data Hazard on r1: Dependencies backwards in time are hazards

add r1,r2,r3 , , sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9

Data Hazard Solution: Forward result from one stage to another

add r1,r2,r3 , , sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9

or OK if define read/write properly

Cant solve with forwarding: Must delay/stall instruction dependent on loads

Cant solve with forwarding: Must delay/stall instruction dependent on loads

Summary: Pipelining Reduce CPI by overlapping many instructions

Utilize capabilities of the Datapath

What makes it easy

What makes it hard?

You might also like