0% found this document useful (0 votes)

17 views28 pages

Lecture: Pipelining Basics

The document summarizes a lecture on pipelining basics that covers performance equations, implementing basic pipelining, and examples of a 5-stage pipeline. It also includes examples and problems about pipelining concepts like clock cycles, latency, throughput, and how instructions flow through each stage.

Uploaded by

Tahsin Arik Tusan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views28 pages

Lecture: Pipelining Basics

Uploaded by

Tahsin Arik Tusan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Lecture: Pipelining Basics

• Topics: Performance equations wrap-up,

Basic pipelining implementation

 Video 1: What is pipelining?

 Video 2: Clocks and latches
 Video 3: An example 5-stage pipeline
 Video 4: Loads/Stores and RISC/CISC

 Turn in HW1
 Guest teacher, Manju Shevgoor, on Monday

1
An Alternative Perspective - I

• Each program is assumed to run for an equal number

of cycles, so we’re fair to each program

• The number of instructions executed per cycle is a

measure of how well a program is doing on a system

• The appropriate summary measure is sum of IPCs or

AM of IPCs = 1.2 instr + 1.8 instr + 0.5 instr
cyc cyc cyc

• This measure implicitly assumes that 1 instr in prog-A

has the same importance as 1 instr in prog-B
2
An Alternative Perspective - II

• Each program is assumed to run for an equal number

of instructions, so we’re fair to each program

• The number of cycles required per instruction is a

measure of how well a program is doing on a system

• The appropriate summary measure is sum of CPIs or

AM of CPIs = 0.8 cyc + 0.6 cyc + 2.0 cyc
instr instr instr

• This measure implicitly assumes that 1 instr in prog-A

has the same importance as 1 instr in prog-B
3
AM vs. GM

• GM of IPCs = 1 / GM of CPIs

• AM of IPCs represents thruput for a workload where each

program runs sequentially for 1 cycle each; but high-IPC
programs contribute more to the AM

• GM of IPCs does not represent run-time for any real

workload (what does it mean to multiply instructions?); but
every program’s IPC contributes equally to the final measure

4
Problem 6

• My new laptop has a clock speed that is 30% higher than

the old laptop. I’m running the same binaries on both
machines. Their IPCs are listed below. I run the binaries
such that each binary gets an equal share of CPU time.
What speedup is my new laptop providing?
P1 P2 P3 AM GM
Old-IPC 1.2 1.6 2.0 1.6 1.57
New-IPC 1.6 1.6 1.6 1.6 1.6

AM of IPCs is the right measure. Could have also used GM.

Speedup with AM would be 1.3.

5
Speedup Vs. Percentage

• “Speedup” is a ratio = old exec time / new exec time

• “Improvement”, “Increase”, “Decrease” usually refer to

percentage relative to the baseline
= (new perf – old perf) / old perf

• A program ran in 100 seconds on my old laptop and in 70

seconds on my new laptop
 What is the speedup? (1/70) / (1/100) = 1.42
 What is the percentage increase in performance?
( 1/70 – 1/100 ) / (1/100) = 42%
 What is the reduction in execution time? 30%
6
Building a Car
Unpipelined Start and finish a job before moving to the next

Jobs

Time
7
The Assembly Line
Pipelined Break the job into smaller stages

A B C

A B C
Jobs

A B C

Time
8
Clocks and Latches

Stage 1 Stage 2

9
Clocks and Latches

Stage 1 L Stage 2 L

Clk

10
Some Equations

• Unpipelined: time to execute one instruction = T + Tovh

• For an N-stage pipeline, time per stage = T/N + Tovh
• Total time per instruction = N (T/N + Tovh) = T + N Tovh
• Clock cycle time = T/N + Tovh
• Clock speed = 1 / (T/N + Tovh)
• Ideal speedup = (T + Tovh) / (T/N + Tovh)
• Cycles to complete one instruction = N
• Average CPI (cycles per instr) = 1

11
Problem 1

• An unpipelined processor takes 5 ns to work on one

instruction. It then takes 0.2 ns to latch its results into
latches. I was able to convert the circuits into 5 equal
sequential pipeline stages. Answer the following, assuming
that there are no stalls in the pipeline.

 What are the cycle times in the two processors?

 What are the clock speeds?
 What are the IPCs?
 How long does it take to finish one instr?
 What is the speedup from pipelining?

12
Problem 1

• An unpipelined processor takes 5 ns to work on one

 What are the cycle times in the two processors?

5.2ns and 1.2ns
 What are the clock speeds? 192 MHz and 833 MHz
 What are the IPCs? 1 and 1
 How long does it take to finish one instr? 5.2ns and 6ns
 What is the speedup from pipelining? 833/192 = 4.34

13
Problem 2

• An unpipelined processor takes 5 ns to work on one

instruction. It then takes 0.2 ns to latch its results into
latches. I was able to convert the circuits into 5 sequential
pipeline stages. The stages have the following lengths:
1ns; 0.6ns; 1.2ns; 1.4ns; 0.8ns. Answer the following,
assuming that there are no stalls in the pipeline.

 What is the cycle time in the new processor?

 What is the clock speed?
 What is the IPC?
 How long does it take to finish one instr?
 What is the speedup from pipelining?
 What is the max speedup from pipelining?
14
Problem 2

• An unpipelined processor takes 5 ns to work on one

 What is the cycle time in the new processor? 1.6ns

 What is the clock speed? 625 MHz
 What is the IPC? 1
 How long does it take to finish one instr? 8ns
 What is the speedup from pipelining? 625/192 = 3.26
 What is the max speedup from pipelining? 5.2/0.2 = 26
15
A 5-Stage Pipeline

16
Source: H&P textbook
A 5-Stage Pipeline

Use the PC to access the I-cache and increment PC by 4

17
A 5-Stage Pipeline
Read registers, compare registers, compute branch target; for now, assume
branches take 2 cyc (there is enough work that branches can easily take more)

18
A 5-Stage Pipeline

ALU computation, effective address computation for load/store

19
A 5-Stage Pipeline

Memory access to/from data cache, stores finish in 4 cycles

20
A 5-Stage Pipeline

Write result of ALU computation or load into register file

21
RISC/CISC Loads/Stores

22
Problem 3
• For the following code sequence, show how the instrs
flow through the pipeline:
ADD R1, R2,  R3
BEZ R4, [R5]
LD [R6]  R7
ST [R8]  R9

23
Pipeline Summary

RR ALU DM RW

ADD R1, R2,  R3 Rd R1,R2 R1+R2 -- Wr R3

BEZ R1, [R5] Rd R1, R5 -- -- --

Compare, Set PC

LD 8[R3]  R6 Rd R3 R3+8 Get data Wr R6

ST 8[R3]  R6 Rd R3,R6 R3+8 Wr data --

24
Problem 4

• Convert this C code into equivalent RISC assembly

instructions

a[i] = b[i] + c[i];

25
Problem 4

• Convert this C code into equivalent RISC assembly

instructions

a[i] = b[i] + c[i];

LD [R1], R2 # R1 has the address for variable i

MUL R2, 8, R3 # the offset from the start of the array
ADD R4, R3, R7 # R4 has the address of a[0]
ADD R5, R3, R8 # R5 has the address of b[0]
ADD R6, R3, R9 # R6 has the address of c[0]
LD [R8], R10 # Bringing b[i]
LD [R9], R11 # Bringing c[i]
ADD R10, R11, R12 # Sum is in R12
ST [R7], R12 # Putting result in a[i] 26
Problem 5

• Design your own hypothetical 8-stage pipeline.

27
Title

• Bullet

Lecture 4
No ratings yet
Lecture 4
19 pages
Assignment4 Solutions PDF
No ratings yet
Assignment4 Solutions PDF
4 pages
Assignment 4 Solutions Pipelining and Hazards: 1 Processor Performance
100% (1)
Assignment 4 Solutions Pipelining and Hazards: 1 Processor Performance
4 pages
CSC 313 Module 3 Pipelining
No ratings yet
CSC 313 Module 3 Pipelining
59 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
Co MODULE 3 - Merged
No ratings yet
Co MODULE 3 - Merged
102 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
04 Pipeline
No ratings yet
04 Pipeline
83 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Assignment5 Soln
No ratings yet
Assignment5 Soln
5 pages
Computer Systems Pipelining Guide
No ratings yet
Computer Systems Pipelining Guide
39 pages
Pipeline Processing
No ratings yet
Pipeline Processing
43 pages
Week 11 Reduced
No ratings yet
Week 11 Reduced
29 pages
07 Pipeline Notes
No ratings yet
07 Pipeline Notes
145 pages
COA Practice Problems
No ratings yet
COA Practice Problems
59 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
Unit 6 Updated
No ratings yet
Unit 6 Updated
40 pages
PipeLining in Microprocessors
No ratings yet
PipeLining in Microprocessors
19 pages
Vectors
No ratings yet
Vectors
52 pages
ACA - All Unit
No ratings yet
ACA - All Unit
31 pages
Module 2
No ratings yet
Module 2
64 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
18116029
No ratings yet
18116029
6 pages
05 Pipelining
No ratings yet
05 Pipelining
34 pages
CA Unit 6 (Pipelining)
No ratings yet
CA Unit 6 (Pipelining)
13 pages
Pipelining
No ratings yet
Pipelining
43 pages
IT3030E CA Chap5 CPU Exercises
No ratings yet
IT3030E CA Chap5 CPU Exercises
9 pages
RISC, I-O Interface - 2025
No ratings yet
RISC, I-O Interface - 2025
26 pages
Pipelining Preview: Basics & Challenges
No ratings yet
Pipelining Preview: Basics & Challenges
75 pages
General Principles of Pipelining: Andrew Warfield CS313
No ratings yet
General Principles of Pipelining: Andrew Warfield CS313
25 pages
Optimizing CPU Performance with Pipelining
No ratings yet
Optimizing CPU Performance with Pipelining
82 pages
Chapter # 03 Pipelining
No ratings yet
Chapter # 03 Pipelining
85 pages
Unit 6 Updated
No ratings yet
Unit 6 Updated
40 pages
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
No ratings yet
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
8 pages
اسمبلي ٩
No ratings yet
اسمبلي ٩
3 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Pipelining and Parallelism
No ratings yet
Pipelining and Parallelism
41 pages
COA DR MVN 5 UNIT - Latest PDF
No ratings yet
COA DR MVN 5 UNIT - Latest PDF
24 pages
Lecture 32 Pipelined Execution Structural and Data Hazards
No ratings yet
Lecture 32 Pipelined Execution Structural and Data Hazards
30 pages
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
Unit5 Parallel Processing Multiprocessor
No ratings yet
Unit5 Parallel Processing Multiprocessor
32 pages
Week 11
No ratings yet
Week 11
33 pages
Unit 3 Problems
No ratings yet
Unit 3 Problems
18 pages
CAO-II Module 2 Complete
100% (1)
CAO-II Module 2 Complete
32 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Pipeline Very Useful
No ratings yet
Pipeline Very Useful
8 pages
CSN-221 Pipelines-Quiz: Enrollment No.: 18114031 Name - Hemil Panchiwala
No ratings yet
CSN-221 Pipelines-Quiz: Enrollment No.: 18114031 Name - Hemil Panchiwala
6 pages
Week 11-13
No ratings yet
Week 11-13
76 pages
15.1 Processors & Paralell Processing (MT-L)
No ratings yet
15.1 Processors & Paralell Processing (MT-L)
12 pages
COA Unit-5
No ratings yet
COA Unit-5
144 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Layout & Stick Diagrams
No ratings yet
Layout & Stick Diagrams
17 pages
4-Bit Universal Shifter
100% (1)
4-Bit Universal Shifter
3 pages
Buz 71 A
No ratings yet
Buz 71 A
8 pages
BAI 03v1
No ratings yet
BAI 03v1
24 pages
Pir Intruder Alarm Internship Report Sa Polytechnic College
No ratings yet
Pir Intruder Alarm Internship Report Sa Polytechnic College
16 pages
Experiment To Design 4-Bit Shift Register
No ratings yet
Experiment To Design 4-Bit Shift Register
7 pages
2022 Scheme Verilog Programs
No ratings yet
2022 Scheme Verilog Programs
4 pages
ARM Microcontroller Overview
No ratings yet
ARM Microcontroller Overview
21 pages
Experimental Chip Testing Techniques
No ratings yet
Experimental Chip Testing Techniques
10 pages
Logic Gates - Definition, Types, Uses
No ratings yet
Logic Gates - Definition, Types, Uses
12 pages
Power Supply:: Transformer
No ratings yet
Power Supply:: Transformer
17 pages
Spi Controlled H-Bridge: 1 Features
100% (1)
Spi Controlled H-Bridge: 1 Features
23 pages
Lect6 Logicaleffort
No ratings yet
Lect6 Logicaleffort
33 pages
Fetch Decode Execute: Fetch Execute Cycle: Von Neumann Architecture: Processor Can Directly Access Memory!
No ratings yet
Fetch Decode Execute: Fetch Execute Cycle: Von Neumann Architecture: Processor Can Directly Access Memory!
4 pages
Digital Electronics Question Bank
No ratings yet
Digital Electronics Question Bank
6 pages
Digital Electronics Powerpoint Presentaion
No ratings yet
Digital Electronics Powerpoint Presentaion
16 pages
Unit III 8254
No ratings yet
Unit III 8254
29 pages
Comb MOS Logic Stick Diagram
No ratings yet
Comb MOS Logic Stick Diagram
59 pages
CBM2098E Datasheet Rev1.1
No ratings yet
CBM2098E Datasheet Rev1.1
25 pages
BJT DC Circuit Analysis Guide
No ratings yet
BJT DC Circuit Analysis Guide
13 pages
32K 2.5V I C Serial EEPROM: Features Package Types
No ratings yet
32K 2.5V I C Serial EEPROM: Features Package Types
12 pages
VHDL MCQ
100% (2)
VHDL MCQ
35 pages
Grade 10 Notes Printed - 05 - 2010 - Logic Gates
No ratings yet
Grade 10 Notes Printed - 05 - 2010 - Logic Gates
4 pages
CPU Design & Assembly Basics
No ratings yet
CPU Design & Assembly Basics
31 pages
Tsb0404 Running Change To Tx3200, Tx3400 & Tx4400
No ratings yet
Tsb0404 Running Change To Tx3200, Tx3400 & Tx4400
3 pages
Comb Circuits
No ratings yet
Comb Circuits
12 pages
Combinational Circuits Exercises
No ratings yet
Combinational Circuits Exercises
29 pages
Chapter 4 CMOS
No ratings yet
Chapter 4 CMOS
17 pages
SANDISK
100% (3)
SANDISK
3 pages
2024 Course Outline
No ratings yet
2024 Course Outline
2 pages

Lecture: Pipelining Basics

Uploaded by

Lecture: Pipelining Basics

Uploaded by

Lecture: Pipelining Basics

• Topics: Performance equations wrap-up,

 Video 1: What is pipelining?

• Each program is assumed to run for an equal number

• The number of instructions executed per cycle is a

• The appropriate summary measure is sum of IPCs or

• This measure implicitly assumes that 1 instr in prog-A

• Each program is assumed to run for an equal number

• The number of cycles required per instruction is a

• The appropriate summary measure is sum of CPIs or

• This measure implicitly assumes that 1 instr in prog-A

• AM of IPCs represents thruput for a workload where each

• GM of IPCs does not represent run-time for any real

• My new laptop has a clock speed that is 30% higher than

AM of IPCs is the right measure. Could have also used GM.

• “Speedup” is a ratio = old exec time / new exec time

• “Improvement”, “Increase”, “Decrease” usually refer to

• A program ran in 100 seconds on my old laptop and in 70

• Unpipelined: time to execute one instruction = T + Tovh

• An unpipelined processor takes 5 ns to work on one

 What are the cycle times in the two processors?

• An unpipelined processor takes 5 ns to work on one

 What are the cycle times in the two processors?

• An unpipelined processor takes 5 ns to work on one

 What is the cycle time in the new processor?

• An unpipelined processor takes 5 ns to work on one

 What is the cycle time in the new processor? 1.6ns

Use the PC to access the I-cache and increment PC by 4

ALU computation, effective address computation for load/store

Memory access to/from data cache, stores finish in 4 cycles

Write result of ALU computation or load into register file

ADD R1, R2,  R3 Rd R1,R2 R1+R2 -- Wr R3

BEZ R1, [R5] Rd R1, R5 -- -- --

LD 8[R3]  R6 Rd R3 R3+8 Get data Wr R6

ST 8[R3]  R6 Rd R3,R6 R3+8 Wr data --

• Convert this C code into equivalent RISC assembly

a[i] = b[i] + c[i];

• Convert this C code into equivalent RISC assembly

a[i] = b[i] + c[i];

LD [R1], R2 # R1 has the address for variable i

• Design your own hypothetical 8-stage pipeline.

You might also like