0% found this document useful (0 votes)

22 views40 pages

Unit 6 Updated

The document discusses pipelining in computer architecture, highlighting its role in improving system performance through increased throughput. It explains basic concepts, traditional examples, and the impact of hazards on pipeline performance. Additionally, it covers parallel computing classifications according to Flynn's taxonomy, including SISD, SIMD, MISD, and MIMD systems.

Uploaded by

Gaikwad Aditya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views40 pages

Unit 6 Updated

Uploaded by

Gaikwad Aditya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

III Semester

[CSE]
Computer Architecture and
Organization(CST222)

UNIT-6
PROF. W.H. BISEN,
DEPARTMENT OF CSE,
SHRI RAMDEOBABA COLLEGE OF ENGINEERING &
MANAGEMENT, NAGPUR
Pipelining
CONTENTS: PIPELINING: BASIC CONCEPTS OF PIPELINING,
THROUGHPUT AND SPEEDUP, INTRODUCTION OF PARALLEL
COMPUTING: SISD, MISD, SIMD, MIMD
Overview
Pipelining is widely used in modern processors.
Pipelining improves system performance in terms of throughput.
Pipelined organization requires sophisticated compilation techniques.
Basic Concepts
Making the Execution of
Programs Faster
Use faster circuit technology to build the processor and the main
memory.
Arrange the hardware so that more than one operation can be
performed at the same time.
In the latter way, the number of operations performed per second is
increased even though the elapsed time needed to perform any one
operation is not changed.
Traditional Concept
Laundry Example
Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold A B C D
Washer takes 30 minutes

Dryer takes 40 minutes

“Folder” takes 20 minutes

Traditional Concept
6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20
Sequential laundry takes 6
A hours for 4 loads
If they learned pipelining, how
long would laundry take?
B

D
Traditional Pipeline Concept
6 PM 7 8 9 10 11 Midnight

Time
T
a 30 40 40 40 40 20
s
k A
Pipelined laundry takes 3.5
hours for 4 loads
O B
r
d C
e
r D
Traditional Pipeline Concept
Pipelining doesn’t help latency of
6 PM 7 8 9
single task, it helps throughput of
entire workload
Time Pipeline rate limited by slowest
T pipeline stage
a 30 40 40 40 40 20
Multiple tasks operating
s simultaneously using different
A
k resources
Potential speedup = Number pipe
O B stages
r Unbalanced lengths of pipe stages
d C reduces speedup
e Time to “fill” pipeline and time to
r “drain” it reduces speedup
D
Stall for Dependences
Role of Cache Memory
Each pipeline stage is expected to complete in one clock
cycle.
The clock period should be long enough to let the slowest
pipeline stage to complete.
Faster stages can only wait for the slowest one to
complete.
Since main memory is very slow compared to the
execution, if each instruction needs to be fetched from
main memory, pipeline is almost useless.
Fortunately, we have cache.
Pipeline Performance
The potential increase in performance resulting from pipelining is
proportional to the number of pipeline stages.
However, this increase would be achieved only if all pipeline stages
require the same time to complete, and there is no interruption
throughout program execution.
Unfortunately, this is not true.
Pipeline Performance
The previous pipeline is said to have been stalled for two clock cycles.
Any condition that causes a pipeline to stall is called a hazard.
Data hazard – any condition in which either the source or the
destination operands of an instruction are not available at the time
expected in the pipeline. So some operation has to be delayed, and the
pipeline stalls.
Instruction (control) hazard – a delay in the availability of an instruction
causes the pipeline to stall.
Structural hazard – the situation when two instructions require the use
of a given hardware resource at the same time.
Pipeline Performance
Again, pipelining does not result in individual instructions
being executed faster; rather, it is the throughput that
increases.
Throughput is measured by the rate at which instruction
execution is completed.
Pipeline stall causes degradation in pipeline performance.
We need to identify all hazards that may cause the
pipeline to stall and to find ways to minimize their impact.
Example-1
Non-pipeline execution takes 40ns.
A 4 stage pipeline with cycle time of 5ns. Speed up ratio of pipeline for
100 tasks is ??
Example-2
Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns.
Given latch delay is 10 ns. Calculate-
Pipeline cycle time
Non-pipeline execution time
Speed up ratio
Pipeline time for 1000 tasks
Sequential time for 1000 tasks
Throughput
Example-2
Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns.
Given latch delay is 10 ns. Calculate-

Pipeline Cycle Time-

Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 60, 50, 90, 80 } + 10 ns
= 90 ns + 10 ns
= 100 ns
Non-Pipeline Execution Time-

Non-pipeline execution time for one instruction

= 60 ns + 50 ns + 90 ns + 80 ns
= 280 ns
Example-2
Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns.
Given latch delay is 10 ns. Calculate-

Speed Up Ratio-

Speed up
= Non-pipeline execution time / Pipeline execution time
= 280 ns / Cycle time
= 280 ns / 100 ns
= 2.8 Pipeline Time For 1000 Tasks-

Pipeline time for 1000 tasks

= Time taken for 1st task + Time taken for
remaining 999 tasks
= 1 x 4 clock cycles + 999 x 1 clock cycle
= 4 x cycle time + 999 x cycle time
= 4 x 100 ns + 999 x 100 ns
= 400 ns + 99900 ns
= 100300 ns
Example-2
Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns.
Given latch delay is 10 ns. Calculate-

Sequential Time For 1000 Tasks-

Non-pipeline time for 1000 tasks

= 1000 x Time taken for one task
= 1000 x 280 ns
= 280000 ns

Throughput-

Throughput for pipelined execution

= Number of instructions executed per unit
time
= 1000 tasks / 100300 ns
Example-3
A four stage pipeline has the stage delays as 150, 120, 160 and 140 ns respectively.
Registers are used between the stages and have a delay of 5 ns each. Assuming
constant clocking rate, the total time taken to process 1000 data items on the
pipeline will be?

Cycle Time-

Cycle time Pipeline Time To Process 1000 Data Items-

= Maximum delay due to any stage + Delay
due to its register Pipeline time to process 1000 data items
= Max { 150, 120, 160, 140 } + 5 ns = Time taken for 1st data item + Time taken
= 160 ns + 5 ns for remaining 999 data items
= 165 ns = 1 x 4 clock cycles + 999 x 1 clock cycle
= 4 x cycle time + 999 x cycle time
= 4 x 165 ns + 999 x 165 ns
= 660 ns + 164835 ns
= 165495 ns
= 165.5 μs
Example-4
Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and
average cycles per instruction of 4. The same processor is upgraded to a
pipelined processor with five stages but due to the internal pipeline
delay, the clock speed is reduced to 2 gigahertz. Assume there are no
stalls in the pipeline. The speed up achieved in this pipelined processor
is?
Cycle Time in Non-Pipelined Processor-

Frequency of the clock = 2.5 gigahertz

Cycle time
= 1 / frequency Non-Pipeline Execution Time-
= 1 / (2.5 gigahertz)
= 1 / (2.5 x 109 hertz) Non-pipeline execution time to process 1
= 0.4 ns instruction
= Number of clock cycles taken to execute one
instruction
= 4 clock cycles
= 4 x 0.4 ns
= 1.6 ns
Example-4
Cycle Time in Pipelined Processor-

Frequency of the clock = 2 gigahertz

Cycle time
= 1 / frequency
Pipeline Execution Time-
= 1 / (2 gigahertz)
= 1 / (2 x 109 hertz)
Since there are no stalls in the pipeline, so
= 0.5 ns
ideally one instruction is executed per clock
cycle. So,
Pipeline execution time
= 1 clock cycle
= 0.5 ns
Example-4
Speed Up-

Speed up
= Non-pipeline execution time / Pipeline
execution time
= 1.6 ns / 0.5 ns
= 3.2
What is the approximate speed up of the pipeline in steady state under ideal
conditions when compared to the corresponding non-pipeline implementation?

Example-5
Consider an instruction pipeline with four stages (S1, S2, S3 and S4)
each with combinational circuit only. The pipeline registers are required
between each stage and at the end of the last stage. Delays for the
stages and for the pipeline registers are as given in the figure-
Example-5
PARALLEL COMPUTING
Parallel computing is a computing where the jobs are broken into
discrete parts that can be executed concurrently.
Each part is further broken down to a series of instructions.
Instructions from each part execute simultaneously on different CPUs.
simultaneous use of multiple computer resources that can include a
single computer with multiple processors, a number of computers
connected by a network to form a parallel processing cluster or a
combination of both.
Parallel Processing objective is execution speed improvement
● The manner is by doing several things at once

● The instruments are

– Architecture,
– Languages,
– Algorithm
Classification/Flynn’s classification
Based on the number of instruction and data streams that can be processed
simultaneously, computing systems are classified into four major categories:
Flynn’s Taxonomy
In 1966, Flynn proposed a taxonomy based on Instruction Streams
and Data Streams:
◦ SISD: Single Instruction Stream, Single Data Stream – this is a conventional
sequential computer
◦ SIMD: Single Instruction Stream, Multiple Data Stream – a single control
unit dispatches instructions to multiple processing units
◦ MISD: Multiple Instruction Stream, Single Data Stream – no architectures in
this group (although some people include pipelined architectures)
◦ MIMD: Multiple Instruction Stream, Multiple Data Stream – each
processing unit is able to execute instructions from a different control unit
Single-instruction, single-data
(SISD) systems
An SISD computing system is a uniprocessor machine which is capable
of executing a single instruction, operating on a single data stream.
machine instructions are processed in a sequential manner and
computers adopting this model are popularly called sequential
computers.
All the instructions and data to be processed have to be stored in
primary memory.

The advantages of SISD architecture are as follows −

•It requires less power.
•There is no issue of complex communication protocol between multiple cores.
Disadvantages of SISD
•The speed of SISD architecture is limited just like single-core processors.
•It is not suitable for larger applications.
Single-instruction,
multiple-data (SIMD)
systems
An SIMD system is a multiprocessor machine capable of executing the same
instruction on all the CPUs but operating on different data streams.
Machines based on an SIMD model are well suited to scientific computing
since they involve lots of vector and matrix operations.

Advantages of SIMD
•Same operation on multiple elements can be performed using one instruction only.
•Throughput of the system can be increased by increasing the number of cores of
the processor.
•Processing speed is higher than SISD architecture.
Disadvantages of SIMD
•There is complex communication between numbers of cores of processor.
•The cost is higher than SISD architecture.
Multiple-instruction,
single-data (MISD) systems
An MISD computing system is a multiprocessor machine capable of executing
different instructions on different Processors but all of them operating on the same
dataset .
The system performs different operations on the same data set. Machines built using
the MISD model are not useful in most of the application, a few machines are built,
but none of them are available commercially.
Multiple-instruction,
multiple-data (MIMD)
systems
An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple data sets.
the MIMD model has separate instruction and data streams; therefore machines
built using this model are capable to any kind of application.
MIMD machines are broadly categorized into
shared-memory MIMD and distributed-memory MIMD
In the shared memory MIMD model, all the PEs are connected to a single global
memory and they all have access to it. The communication between PEs in this model
takes place through the shared memory, modification of the data stored in the global
memory by one PE is visible to all other PEs.
In Distributed memory MIMD machines all PEs have a local memory. The
communication between PEs in this model takes place through the interconnection
network (the inter process communication channel, or IPC).

processing elements (PEs)

HAPPY LEARNING

Unit 6 Updated
No ratings yet
Unit 6 Updated
40 pages
Unit 6
No ratings yet
Unit 6
30 pages
Module 4
No ratings yet
Module 4
12 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Pipelining I: Prepared By: Noshaba Nasir
No ratings yet
Pipelining I: Prepared By: Noshaba Nasir
32 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
Computer Architecture: Pipelining
No ratings yet
Computer Architecture: Pipelining
27 pages
Parallel Processing and Pipelining
No ratings yet
Parallel Processing and Pipelining
53 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
3.4 Pipelining Performance2
No ratings yet
3.4 Pipelining Performance2
14 pages
Pipeline 1
No ratings yet
Pipeline 1
17 pages
Pipelining I: Prepared By: Noshaba Nasir
No ratings yet
Pipelining I: Prepared By: Noshaba Nasir
19 pages
Pipe Lining
No ratings yet
Pipe Lining
32 pages
5 Pipeline
No ratings yet
5 Pipeline
63 pages
Understanding Processor Pipelining
No ratings yet
Understanding Processor Pipelining
28 pages
Flynn's Taxonomy & Pipelining
No ratings yet
Flynn's Taxonomy & Pipelining
12 pages
33 Hazards in Pipeline 06-04-2023
No ratings yet
33 Hazards in Pipeline 06-04-2023
27 pages
Lec3 1
No ratings yet
Lec3 1
18 pages
Vector Processing and Pipelining
No ratings yet
Vector Processing and Pipelining
22 pages
Pipe Lining
No ratings yet
Pipe Lining
29 pages
Pipeline Processing
No ratings yet
Pipeline Processing
43 pages
3.4 Pipelining Performance2
No ratings yet
3.4 Pipelining Performance2
12 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
74 pages
CAO-II Module 2 Complete
100% (1)
CAO-II Module 2 Complete
32 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Pipelining
No ratings yet
Pipelining
26 pages
Pipelining
No ratings yet
Pipelining
43 pages
CA Unit 6 (Pipelining)
No ratings yet
CA Unit 6 (Pipelining)
13 pages
Unit 4 Coa
No ratings yet
Unit 4 Coa
25 pages
Lecture 10
No ratings yet
Lecture 10
23 pages
Parallel Processing & Pipelining
No ratings yet
Parallel Processing & Pipelining
33 pages
4 Instruction Pipeline
No ratings yet
4 Instruction Pipeline
13 pages
Comparison Between Pipelining
No ratings yet
Comparison Between Pipelining
9 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Slide 6
No ratings yet
Slide 6
46 pages
ACA UNIT-2 Kai Hwang
No ratings yet
ACA UNIT-2 Kai Hwang
40 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Lec8 Cache Coherence and Introduction To Pipline
No ratings yet
Lec8 Cache Coherence and Introduction To Pipline
41 pages
ACFr Og Ajp 3 Woh Li TFaj WBBM PLa YIK8 Obdn FZ WVV9 O2 Anjwaz 88 Jzy XHUWlz 9541
No ratings yet
ACFr Og Ajp 3 Woh Li TFaj WBBM PLa YIK8 Obdn FZ WVV9 O2 Anjwaz 88 Jzy XHUWlz 9541
2 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
25 pages
RISC, I-O Interface - 2025
No ratings yet
RISC, I-O Interface - 2025
26 pages
Co Unit 4
No ratings yet
Co Unit 4
17 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
33 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
26 pages
Pipe Lining
No ratings yet
Pipe Lining
35 pages
Pipeline
No ratings yet
Pipeline
22 pages
Unit 2 - Session-6 To 10
No ratings yet
Unit 2 - Session-6 To 10
40 pages
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
No ratings yet
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
8 pages
An Introductory Analysis of Pipelines: I I I I I Clock Cycles
No ratings yet
An Introductory Analysis of Pipelines: I I I I I Clock Cycles
2 pages
Module 5
No ratings yet
Module 5
16 pages
Week 11 Reduced
No ratings yet
Week 11 Reduced
29 pages
BCS302 Unit-3 (Part-III)
No ratings yet
BCS302 Unit-3 (Part-III)
4 pages
Lecture 3.1.5 (Throughput and Speedup)
No ratings yet
Lecture 3.1.5 (Throughput and Speedup)
13 pages
Coa Unit 5
No ratings yet
Coa Unit 5
71 pages
Computer Architecture and Orgnization
No ratings yet
Computer Architecture and Orgnization
3 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
19 pages
VII Semester Project Report
No ratings yet
VII Semester Project Report
28 pages
FS Synopsis
No ratings yet
FS Synopsis
2 pages
HR Interview Questions and Answers: Q. What Motivates You? Ans
No ratings yet
HR Interview Questions and Answers: Q. What Motivates You? Ans
13 pages
Aditya Gaikwad Resume
No ratings yet
Aditya Gaikwad Resume
1 page
Basic of NN
No ratings yet
Basic of NN
230 pages
Torronto Power Written Test Shortlisted Students List
No ratings yet
Torronto Power Written Test Shortlisted Students List
2 pages
BILab3 71 CSE-B
No ratings yet
BILab3 71 CSE-B
5 pages
Class 5 ICT Notes
100% (1)
Class 5 ICT Notes
8 pages
Os Unit 5 Part 2
No ratings yet
Os Unit 5 Part 2
29 pages
Guard-Locking Proximity Inputs Safety Relay: User Manual
No ratings yet
Guard-Locking Proximity Inputs Safety Relay: User Manual
66 pages
Nomura Ai PCB CCL 20250916
No ratings yet
Nomura Ai PCB CCL 20250916
16 pages
GPU and Graphics Cards
No ratings yet
GPU and Graphics Cards
4 pages
Yamaha ns-200 Service Manual
No ratings yet
Yamaha ns-200 Service Manual
4 pages
10A 01-04 HP EliteDesk 800 G5 PC PDF
No ratings yet
10A 01-04 HP EliteDesk 800 G5 PC PDF
102 pages
Interconnection Structures: Collection of Paths Connecting Various Modules Is Called As Interconnection Structure
No ratings yet
Interconnection Structures: Collection of Paths Connecting Various Modules Is Called As Interconnection Structure
22 pages
H-10207 00 0ea00-001f PB-CSCS Spare Parts and Maintenance Tools List - C
No ratings yet
H-10207 00 0ea00-001f PB-CSCS Spare Parts and Maintenance Tools List - C
4 pages
OS Module 4
No ratings yet
OS Module 4
30 pages
Ict Essay
No ratings yet
Ict Essay
1 page
Manual M75s-1
No ratings yet
Manual M75s-1
116 pages
TLE 7 History of Computers wk1 2
No ratings yet
TLE 7 History of Computers wk1 2
10 pages
PIC16F877A Microcontroller Specs
No ratings yet
PIC16F877A Microcontroller Specs
2 pages
Mod 3 - RISC
No ratings yet
Mod 3 - RISC
32 pages
Intro to Operating Systems
No ratings yet
Intro to Operating Systems
23 pages
DBD-25X Bender ManualB
No ratings yet
DBD-25X Bender ManualB
8 pages
Smgdvbeub-3.7.0 em Pontus China Eng 220926.0
No ratings yet
Smgdvbeub-3.7.0 em Pontus China Eng 220926.0
173 pages
Final Inspection 1
No ratings yet
Final Inspection 1
1 page
Windows 11 Pro Activation Instruction
No ratings yet
Windows 11 Pro Activation Instruction
3 pages
How To Dual-Boot Linux and Windows
100% (1)
How To Dual-Boot Linux and Windows
27 pages
Ak500 Key Programmer User Manual
No ratings yet
Ak500 Key Programmer User Manual
27 pages
Horizontal Micro-Programmed vs. Vertical Micro-Programmed Control Unit - Javatpoint
No ratings yet
Horizontal Micro-Programmed vs. Vertical Micro-Programmed Control Unit - Javatpoint
8 pages
Audio Weaver stm32f4xx Discovery Board Users Guide
No ratings yet
Audio Weaver stm32f4xx Discovery Board Users Guide
13 pages
Add Payload Files Adding Uefi Images
No ratings yet
Add Payload Files Adding Uefi Images
21 pages
FPGA GSR Signal Guide
No ratings yet
FPGA GSR Signal Guide
20 pages
DOCS #100978 v45 TH4 IM Installation Manual
No ratings yet
DOCS #100978 v45 TH4 IM Installation Manual
280 pages
Chapter 02 Hardware Part 2
No ratings yet
Chapter 02 Hardware Part 2
20 pages
3DS Hardware Inventory
No ratings yet
3DS Hardware Inventory
27 pages
ReleaseNote - FileList of X415EA - 2009 - X64 - V1.00
No ratings yet
ReleaseNote - FileList of X415EA - 2009 - X64 - V1.00
7 pages

Unit 6 Updated

Uploaded by

Unit 6 Updated

Uploaded by

III Semester

Dryer takes 40 minutes

“Folder” takes 20 minutes

Pipeline Cycle Time-

Non-pipeline execution time for one instruction

Pipeline time for 1000 tasks

Sequential Time For 1000 Tasks-

Non-pipeline time for 1000 tasks

Throughput for pipelined execution

Cycle time Pipeline Time To Process 1000 Data Items-

Frequency of the clock = 2.5 gigahertz

Frequency of the clock = 2 gigahertz

● The instruments are

The advantages of SISD architecture are as follows −

processing elements (PEs)

You might also like