0% found this document useful (0 votes)

134 views21 pages

Pipelining

The document discusses different types of parallel processors including single-instruction single-data streams (SISD), single-instruction multiple-data streams (SIMD), multiple-instruction single-data streams (MISD), and multiple-instruction multiple-data streams (MIMD). It also covers pipeline processing which allows for parallel execution through different stages. Examples of applications suited for parallel processing include expert systems, fluid flow analysis, and seismic data analysis due to their vast computational needs.

Uploaded by

Sushmit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

134 views21 pages

Pipelining

Uploaded by

Sushmit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Introduction to Computer Architecture

(Parallel and Pipeline processors)

KR Chowdhary
Professor & Head
Email: kr.chowdhary@gmail.com
webpage: krchowdhary.com

Department of Computer Science and Engineering

MBM Engineering College, Jodhpur

November 22, 2013

KR Chowdhary Parallel and Pipeline processors 1/ 21

Classification of parallel Processors

◮ One classification by M.J. Flynn 2. Single-instruction multiple-data

considers the organization of a streams (SIMD): All processors
computer system by the number receive the same instruction from
of instructions and data items control unit, but operate in
that can be manipulated different sets of data.
simultaneously.
◮ The sequence of instructions read 3. Multiple-instruction single-data
from the memory constitute an streams (MISD): It is of
instruction stream. theoretical interest only as no
◮ The operation performed on data practical organization can be
in the processor constitutes a constructed using this
data stream. organization.
◮ Parallel processing may occur in
instruction stream stream or data 4. Multiple-instruction multiple-data
stream, or both. streams (MIMD): Several
1. Single-instruction single-data programs can execute at the
streams (SISD): Instructions are same time. Most multiprocessors
executed sequentially. come in this category.

KR Chowdhary Parallel and Pipeline processors 2/ 21

Flynn’s taxonomy of computer architecture(1966)

1. SISD Instruction stream Instruction stream

3. MIMD

Control unit Processor Memory

Control unit Processor (P) memory (M)
1 1 1
I/O b

Instr. stream data stream b

b
Instr. stream data stream b
b
b
Instruction stream

P1 M1
Control unit Processor memory
b data stream n n n
b b b
CU b b b
b b
Instr. stream data stream
2. SIMD
Pn Mn
Program loaded
from front end data stream
Figure 2: MIMD Architecture.

Figure 1: SISD and SIMD architecture

One of the parallel processing class that does not fit into this classification is
pipeline processing.

KR Chowdhary Parallel and Pipeline processors 3/ 21

Parallel processing

Parallel Processing:
◮ Increasing speed by doing many things in parallel.
◮ Let P is a sequential processor processing the task T in sequential
manner. If T is partitioned into n subtasks T1 , T2 , . . . , Tn of appox. same
′
size, then a processor P (say) having n processors can be programmed so
that all the subtasks of T can execute in parallel.
′
◮ Then P executes n times faster than P.
◮ A failure of CPU is fatal to a sequential processor, but not in the case of
parallel processor.
◮ Some of the applications of parallel computer (processors) are:
1. Expert system for AI
2. Fluid flow analysis,
3. Seismic data analysis
4. Long range weather forecasting,
5. Computer Assisted tomography
6. Nuclear reactor modeling,
7. Visual image processing
8. VLSI design
◮ The typical characteristic of parallel computing are: vast amount of
computation, floating point arithmetic, vast number of operands.

KR Chowdhary Parallel and Pipeline processors 4/ 21

Computational Models

◮ We assume that a given computation can be divided into concurrent tasks

for execution on the multiprocessor.
◮ Equal Duration Model: A given task can be divided into n equal subtasks,
each of which can be executed by one processor. If ts is the execution
time of the whole task using a single processor, then the time taken by
each processor to execute its subtask is tm = ts /n. Since, according to this
model, all processors are executing their subtasks simultaneously, then the
time taken to execute the whole task is tm = ts /n.
◮ The speedup factor of a parallel system can be defined as the ratio
between the time taken by a single processor to solve a given problem
instance to the time taken by a parallel system consisting of n processors
to solve the same problem instance.

S(n) = speedup factor

ts
=
tm
ts
= =n
ts /n

KR Chowdhary Parallel and Pipeline processors 5/ 21

Pipelining
◮ A typical example of parallel processing is a one-dimensional array of
processors, where there are n identical processors P1 . . . Pn and each having
its local memory. These processors communicate by message passing
(send - receive).
b b b

P1 P2 Pn
b b b

serial IO -operations(send-receive)

Figure 3: Pipeline processing.

◮ There are total n operations going on in parallel.

◮ A pipe line constitutes a sequence of processing circuits, called segments
or stages.
- m stage pipeline has same throughput as m separate units.
R1 C1 R2 C2 Rm Cm R

segment 1 segment 2 segment m

Ri : Buffer register Ci : Computing element

Figure 4: Pipeline segments

KR Chowdhary Parallel and Pipeline processors 6/ 21
Pipeline Processors

1. Instruction pipeline: Transfer of instructions through various stages of cpu,

during instruction cycle: fetch, decode, execute. Thus, there can be three
different instructions in different stages of execution: one getting fetched,
previous of that is getting decoded, and previous to that is getting
executed.
2. Arithmetic pipeline: The data is computed through different stages.

istr. fetch Instr. decode Instr. execute

Instr. adr.
instruction results
segment 1 segment 2 segment 3

compare align normalize

exponent add
mantissa mantissa resulst
segment 1 seg. 2 seg. 3 seg. 4

X = (XM , XE ), Y = (YM , YE ) X +Y

Figure 5: Instruction and data pipeline examples.

KR Chowdhary Parallel and Pipeline processors 7/ 21

Pipe-lining Example

◮ Consider an example to compute: Ai ∗ Bi + Ci , for i = 1, 2, 3, 4, 5. Each

segment has r registers, a multiplier, and an adder unit.

R1 R2

Multiplier R4

Adder

Figure 6: A segment comprising registers and computing elements.

R1 ← Ai , R2 ← Bi ; input Ai , Bi
R3 ← R1 ∗ R2 , R4 ← C ; multiply and i/ C
R5 ← R3 + R4 ; add Ci to product

KR Chowdhary Parallel and Pipeline processors 8/ 21

Pipe-lining Example

Table 1: Computation of expression Ai ∗ Bi + Ci in space and time in 3-stage pipeline.

Clock pulse Segment 1 Segment 2 Segment 3

no. R1 , R2 R3 , R4 R5
1. A1 , B1 −, − -
2. A2 , B2 A1 ∗ B1 , C1 -
3. A3 , B3 A2 ∗ B2 , C2 A1 ∗ B1 + C1
4. A4 , B4 A3 ∗ B3 , C3 A2 ∗ B2 + C2
5. A5 , B5 A4 ∗ B4 , C4 A3 ∗ B3 + C3
6. −− A5 ∗ B5 , C5 A4 ∗ B4 + C4
7. −− −, − A5 ∗ B5 + C5
◮ Any operator that can be decomposed into a sequence of sub-operations
of about the same components can be implemented by pipeline processor.
◮ Consider that for a k-segment pipeline with clock cycle time =tp sec.,
with total n no. of tasks (T1 , T2 , . . . , Tn ) are required to be executed.
◮ T1 requires time equal to k.tp secs. Remaining n − 1 tasks emerge from
the pipeline at the rate of one task per clock cycle, and they will be
completed in time of (n − 1)tp sec, so total clock cycles required =
k + (n − 1).
◮ For k = 3 segment and n = 5 tasks it is 3 + (5 − 1) = 7, as clear from
table 1.
KR Chowdhary Parallel and Pipeline processors 9/ 21
Computational Models

◮ Consider an instruction pipeline unit (segment) that performs the same

operation and takes time equal to tu to complete each task. Total time for
n tasks is n.tu . The speedup for no. of segments as k and clock period as
tp is:
n.tu
S(n) = (1)
(k + (n − 1))tp
◮ For large number of tasks, n >> k − 1, k + n − 1 ≈ n, so,
n.tu
S(n) = (2)
n.tp
tu
= (3)
tp
◮ Instruction pipelining is similar to use of assembly line in manufacturing
plant
◮ An instruction’s execution is broken in to many steps, which indicates the
scope for pipelining
◮ pipelining requires registers to store data between stages.

KR Chowdhary Parallel and Pipeline processors 10/ 21

Computational Models
Parallel computation with serial section model:
◮ It is assumed that fraction f of a given task (computation) cannot be
divided into concurrent subtasks. The remaining part (1 − f ) is assumed
to be dividable. (for example, f may correspond to data i/p).
◮ The time required to execute the task on n processors is:
ts
tm = f .ts + (1 − f ). (4)
n
◮ The speedup is therefore,
ts
S(n) = (5)
f .ts + (1 − f ). tns
n
= (6)
1 + (n − 1).f
◮ So, S(n) is primarily determined by the code section, which cannot be
divided.
◮ If task is completely serial (f = 1), then no speedup can be achieved even
by parallel processors.
◮ For n → ∞,
1
S(n) = (7)
f
which is maximum speedup.
KR Chowdhary Parallel and Pipeline processors 11/ 21
Computational Models

◮ Improvement in performance (speed) of parallel algorithm over a

sequential is limited not by no. of processors but by fraction of the
algorithm (code) that cannot be parallelized. (Amdahl’s law).
◮ Considering the communication overhead:
ts
S(n) = (8)
f .ts + (1 − f )(ts /n) + tc
n
= (9)
f .(n − 1) + 1 + n(tc /ts )
◮ For n → ∞,
n
S(n) = (10)
f (n − 1) + 1 + n(tc /ts )
1
= (11)
f + (tc /ts )
◮ Thus, S(n) depends on communication overhead tc also.

KR Chowdhary Parallel and Pipeline processors 12/ 21

Pipe-lining processors

Instruction Pipe-lining: typical stages of pipeline are:

1. FI (fetch instruction)
2. DI (decode Instruction)
3. CO (calculate operands)
4. FO (fetch operands)
5. EI (execute instruction)
6. WO (write operands)

KR Chowdhary Parallel and Pipeline processors 13/ 21

Instruction Pipe-lining

◮ Nine different instructions are to be executed

◮ The six stage pipeline can reduce the execution time for 9 instructions
from 54 time units to 14 time units.
time units →
Instruc. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
In1 FI DI CO FO EI WO
In2 FI DI CO FO EI WO
In3 FI DI CO FO EI WO
In4 FI DI CO FO EI WO
In5 FI DI CO FO EI WO
In6 FI DI CO FO EI WO
In7 FI DI CO FO EI WO
In8 FI DI CO FO EI WO
In9 FI DI CO FO EI WO

◮ The diagram assumes that each instruction goes through 6 stages of

pipeline.
◮ But, for example, a load instruction does not need WO.
◮ It is also assumed that there is no memory conflicts, for example, FI, FO,
WO in all require memory access (together).
◮ The value may be in cache, or FO/WO may be null.
◮ Six stages may not be of equal duration, conditional branch/interrupt
instruction may invalidate several fetches
◮ After which stage it should check for conditional branch/interrupt?
KR Chowdhary Parallel and Pipeline processors 14/ 21
Factors in Instruction Pipe-lining

◮ Overhead in each stage of ◮ Pipeline Hazard occur when

pipeline for data movements pipeline or its portion stalls.
buffer to buffer 1. Resource hazard: Two or more
instructions in pipeline require
◮ Amount of control logic needed same resource (say ALU/reg.)
to handle memory/register (called structure hazard)
dependencies increases with size 2. Data hazards: conflict in
of pipeline memory access
3. Control hazards: (called branch
◮ It needs time for the buffers to hazards) wrong decision in
operate branch prediction

KR Chowdhary Parallel and Pipeline processors 15/ 21

Vector Processing

◮ In many computational mvi i, 0

applications, a problem can be loop: read A[i]
formulated in terms of vectors read B[i]
and matrices. Processing these store i = i +1
by a special computer is called cmp i, 100
vector processing. jnz loop
◮ A vector is: ◮ Accesses the arrays A and B, and
V = [V1 V2 V3 . . . Vn ]. The index only counter needs to updated.
for Vi is represented as V [i]. A The vector processing computer
program for adding two vectors A eliminates the need of fetching
and B of length 100, to produce the instructions, and executing
vector C is: them. As they are fetched only
◮ Scalar Concept: once only, decoded once only, but
executes them 100 times. This
for(i=0; i < 100; i++) allows operations to be specified
c[i]=b[i]+a[i]; only as:
◮ In machine language we write it
as: C (1 : 100) = A(1 : 100)+B(1 : 100)

KR Chowdhary Parallel and Pipeline processors 16/ 21

Vector Processing

◮ Vector instructions includes the design vector processor to store

initial address of operands, length all operands in registers in
of vectors, and operands to be advance.
performed, all in one composition
instruction. The addition is done ◮ It can be applied in matrix
with a pipelines floating pointing multiplication, for
point adder. It is possible to [l × m] × [m × n].

KR Chowdhary Parallel and Pipeline processors 17/ 21

RISC v/s CISC

◮ CISC: (Complex instruction set program, thus small size, thus

computer) lesser memory, and faster
1. Complex programs have access
motivated the complex and ◮ RISC: (Reduced instruction set
powerful HLL. This produced computer)
semantic gap between HLL and
1. Large number of Gen. purpose
machine languages, and
registers, use of compiler
required more effort in
technology to optimize register
compiler constructions.
usage
2. The attempt to reduce the
2. R-R operations (Adv.?)
semantic gap/simplify compiler
construction, motivated to 3. Simple addressing modes
make more powerful instruction 4. Limited and simple instruction
sets. (CISC - Complex set (one instruction per
Instruction set computing) machine cycle)
3. CISC provide better support for 5. Advantage of simple
HLLs instructions?
4. Lesser count of instructions in 6. Optimizing pipeline

KR Chowdhary Parallel and Pipeline processors 18/ 21

Exercises

1. Design a pipeline configuration to carry out the task to compute:

(Ai + Bi )/(Ci + Di )
2. Construct pipeline to add 100 floating point numbers, i.e., find the result
of x1 × x2 × . . . x100 .
3. 3.1 List the advantages of designing a floating point processor in the form of a
k-segment pipeline rather than a k-unit parallel processor.
3.2 A floating-point pipeline has four segments S1 ,S2 ,S3 ,S4 , whose delays are
100, 90, 100, and 110 nano-secs, respectively. What is the pipeline’s
maximum throughput in MFLOPS?
4. It is frequently argued that large (super) computer is approaching its
performance limits, and the future advances in large computers will
depend on interconnected large number of inexpensive computers
together. List the arguments against and favor.
5. List the features to be added in sequential programming languages, to use
them in large interconnection of small inexpensive computers.
6. Let S1 , S2 , . . ., Sk denote the sequence of k-operations on a program.
Suppose that execution of these operations on a uni-processor produces
the same results regardless of the oder of execution of the k Si ’s. Show
that this does not imply that the Si ’s are parallelizable on a
multi-processor.
KR Chowdhary Parallel and Pipeline processors 19/ 21
Exercises

7. Prove that general problem of determining whether two program segments

S1 , S2 are parallelizable is undecidable by showing that a solution to this
problem implies a solution to the halting problem for Turing machine.
(Hint: Assume that an algorithm A to determine parallelization exists, and
consider applying A to a program containing the statement: if Turing
machine T halts after at most n steps then S1 else S2 ).

KR Chowdhary Parallel and Pipeline processors 20/ 21

Bibliography

M. Morris Mano, “Computer System Architecture”, 3nd Edition, Pearson,

2006.
William Stalling, ”Computer Organization and Architecture”, 8th Edition,
Pearson, 2010.
John P. Hayes, “Computer Architecture and Organization”, 2nd Edition,
McGraw-Hill International Edition, 1988.

KR Chowdhary Parallel and Pipeline processors 21/ 21

CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
52 pages
Coa Notes Unit 5
No ratings yet
Coa Notes Unit 5
55 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
5 Pipeline
No ratings yet
5 Pipeline
63 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Coa Unit 5
No ratings yet
Coa Unit 5
71 pages
Module 4
No ratings yet
Module 4
12 pages
Pipeline & Vector Processing Guide
No ratings yet
Pipeline & Vector Processing Guide
20 pages
Parallel Processing and Pipelining
No ratings yet
Parallel Processing and Pipelining
53 pages
Flynn's Taxonomy & Pipelining
No ratings yet
Flynn's Taxonomy & Pipelining
12 pages
CA Unit 6 (Pipelining)
No ratings yet
CA Unit 6 (Pipelining)
13 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Chapter 3 - Pipelining-And-Vector-Processing
100% (1)
Chapter 3 - Pipelining-And-Vector-Processing
29 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
Module 4 - Parallel & Pipeline Processing - Final
No ratings yet
Module 4 - Parallel & Pipeline Processing - Final
31 pages
Campmc Unit Ii
No ratings yet
Campmc Unit Ii
61 pages
Parallel Chapter3
No ratings yet
Parallel Chapter3
29 pages
PCC-CS402
No ratings yet
PCC-CS402
7 pages
Understanding Processor Pipelining
No ratings yet
Understanding Processor Pipelining
28 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
33 pages
Unit5 Parallel Processing Multiprocessor
No ratings yet
Unit5 Parallel Processing Multiprocessor
32 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
29 pages
Chapter 3
No ratings yet
Chapter 3
59 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
33 Hazards in Pipeline 06-04-2023
No ratings yet
33 Hazards in Pipeline 06-04-2023
27 pages
Ca Unit 2.2
100% (2)
Ca Unit 2.2
22 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Module 5
No ratings yet
Module 5
16 pages
Pipe Lining
No ratings yet
Pipe Lining
7 pages
Unit 5
No ratings yet
Unit 5
36 pages
Unit 5
No ratings yet
Unit 5
51 pages
Caalp Unit5
No ratings yet
Caalp Unit5
20 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Final
No ratings yet
Final
26 pages
Parallel Computing
No ratings yet
Parallel Computing
46 pages
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
No ratings yet
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
20 pages
Vectors
No ratings yet
Vectors
52 pages
Parallel Processing & Pipelining
No ratings yet
Parallel Processing & Pipelining
33 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Chap 9
No ratings yet
Chap 9
59 pages
CAO-II Module 2 Complete
100% (1)
CAO-II Module 2 Complete
32 pages
Pipelining Unit 3
No ratings yet
Pipelining Unit 3
19 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Flynn's Taxonomy & Pipelining
No ratings yet
Flynn's Taxonomy & Pipelining
13 pages
ACA1
No ratings yet
ACA1
26 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Pipelining Basic Concepts and Approaches
No ratings yet
Pipelining Basic Concepts and Approaches
6 pages
Parallel Computer Architecture
No ratings yet
Parallel Computer Architecture
22 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
Astm A435 A435m 90 1996
No ratings yet
Astm A435 A435m 90 1996
1 page
LXC31X0/LXC39X0 Series Genset Controller
No ratings yet
LXC31X0/LXC39X0 Series Genset Controller
28 pages
1 Bitcoin To Usd - Google Search
No ratings yet
1 Bitcoin To Usd - Google Search
1 page
Code GAF Shingle Products ICC ESR1475
100% (1)
Code GAF Shingle Products ICC ESR1475
12 pages
BCA/Integrated (BCA+MCA) VI (Sixth) Semester Examination 2023-24
No ratings yet
BCA/Integrated (BCA+MCA) VI (Sixth) Semester Examination 2023-24
2 pages
Research Skills Test for Students
No ratings yet
Research Skills Test for Students
5 pages
98-368 Exam - Free Actual Q&As, Page 1 - ExamTopics
No ratings yet
98-368 Exam - Free Actual Q&As, Page 1 - ExamTopics
41 pages
Chapter I-Complex Numbers&Variables
No ratings yet
Chapter I-Complex Numbers&Variables
10 pages
Lab Report PST Exp 6.11
No ratings yet
Lab Report PST Exp 6.11
7 pages
NW SIMATIC-NET-Glossar 76
No ratings yet
NW SIMATIC-NET-Glossar 76
114 pages
Vertical Integration Explained
No ratings yet
Vertical Integration Explained
6 pages
Gantom Series: Lighting Fixtures
No ratings yet
Gantom Series: Lighting Fixtures
4 pages
Tài Liệu Ôn Tập Đầu Vào CH 2023 - 2024
No ratings yet
Tài Liệu Ôn Tập Đầu Vào CH 2023 - 2024
35 pages
C Programming Operators Quiz
100% (1)
C Programming Operators Quiz
8 pages
Unit - I - Introduction
100% (1)
Unit - I - Introduction
77 pages
Ma 1530 Manual Usuario
No ratings yet
Ma 1530 Manual Usuario
122 pages
EY - Digitalization Industry Construction
No ratings yet
EY - Digitalization Industry Construction
36 pages
C Programming Problem Solving Lab
No ratings yet
C Programming Problem Solving Lab
13 pages
Managing Information Technology 6th Edition by Brown Official Test Bank
No ratings yet
Managing Information Technology 6th Edition by Brown Official Test Bank
311 pages
Milo Manara Erotic Comics PDF Download
No ratings yet
Milo Manara Erotic Comics PDF Download
2 pages
Client Onboarding Kyc Aml Compliance
No ratings yet
Client Onboarding Kyc Aml Compliance
2 pages
Robotics and Mechatronics
No ratings yet
Robotics and Mechatronics
23 pages
Critical Historical Archaeology 1st Edition Mark P Leone Instant Download
No ratings yet
Critical Historical Archaeology 1st Edition Mark P Leone Instant Download
52 pages
Implied Volatility Forecast and Option Trading STR
No ratings yet
Implied Volatility Forecast and Option Trading STR
12 pages
Office 365 Course Outline
No ratings yet
Office 365 Course Outline
7 pages
Comparative Analysis Between Amazon and Flipkart
No ratings yet
Comparative Analysis Between Amazon and Flipkart
16 pages
Civil Engineering Course Prep
No ratings yet
Civil Engineering Course Prep
17 pages
Common Tools
No ratings yet
Common Tools
36 pages
Laboratoty Work 15
No ratings yet
Laboratoty Work 15
6 pages
EATON Circuit Breakers
No ratings yet
EATON Circuit Breakers
1 page

Pipelining

Uploaded by

Pipelining

Uploaded by

Introduction to Computer Architecture

(Parallel and Pipeline processors)

Department of Computer Science and Engineering

November 22, 2013

KR Chowdhary Parallel and Pipeline processors 1/ 21

◮ One classification by M.J. Flynn 2. Single-instruction multiple-data

KR Chowdhary Parallel and Pipeline processors 2/ 21

1. SISD Instruction stream Instruction stream

Control unit Processor Memory

Instr. stream data stream b

Figure 1: SISD and SIMD architecture

KR Chowdhary Parallel and Pipeline processors 3/ 21

KR Chowdhary Parallel and Pipeline processors 4/ 21

◮ We assume that a given computation can be divided into concurrent tasks

S(n) = speedup factor

KR Chowdhary Parallel and Pipeline processors 5/ 21

Figure 3: Pipeline processing.

◮ There are total n operations going on in parallel.

segment 1 segment 2 segment m

Ri : Buffer register Ci : Computing element

Figure 4: Pipeline segments

1. Instruction pipeline: Transfer of instructions through various stages of cpu,

istr. fetch Instr. decode Instr. execute

compare align normalize

Figure 5: Instruction and data pipeline examples.

KR Chowdhary Parallel and Pipeline processors 7/ 21

◮ Consider an example to compute: Ai ∗ Bi + Ci , for i = 1, 2, 3, 4, 5. Each

Figure 6: A segment comprising registers and computing elements.

KR Chowdhary Parallel and Pipeline processors 8/ 21

Table 1: Computation of expression Ai ∗ Bi + Ci in space and time in 3-stage pipeline.

Clock pulse Segment 1 Segment 2 Segment 3

◮ Consider an instruction pipeline unit (segment) that performs the same

KR Chowdhary Parallel and Pipeline processors 10/ 21

◮ Improvement in performance (speed) of parallel algorithm over a

KR Chowdhary Parallel and Pipeline processors 12/ 21

Instruction Pipe-lining: typical stages of pipeline are:

KR Chowdhary Parallel and Pipeline processors 13/ 21

◮ Nine different instructions are to be executed

◮ The diagram assumes that each instruction goes through 6 stages of

◮ Overhead in each stage of ◮ Pipeline Hazard occur when

KR Chowdhary Parallel and Pipeline processors 15/ 21

◮ In many computational mvi i, 0

KR Chowdhary Parallel and Pipeline processors 16/ 21

◮ Vector instructions includes the design vector processor to store

KR Chowdhary Parallel and Pipeline processors 17/ 21

◮ CISC: (Complex instruction set program, thus small size, thus

KR Chowdhary Parallel and Pipeline processors 18/ 21

1. Design a pipeline configuration to carry out the task to compute:

7. Prove that general problem of determining whether two program segments

KR Chowdhary Parallel and Pipeline processors 20/ 21

M. Morris Mano, “Computer System Architecture”, 3nd Edition, Pearson,

KR Chowdhary Parallel and Pipeline processors 21/ 21

You might also like