0% found this document useful (0 votes)

24 views38 pages

Dld&Co Cse-Ds Unit 4-2

Uploaded by

Chippa Mahitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views38 pages

Dld&Co Cse-Ds Unit 4-2

Uploaded by

Chippa Mahitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

UNIT-IV

2. Pipelining
Contents
• Basic concepts of pipelining
• Throughput and speed
• Pipeline Hazards
Pipelining
• Pipelining is
– a technique of decomposing a sequential process into SUB-
OPERATIONS,
– with each sub-process being executed in a special DEDICATED
SEGMENT that operates concurrently with all other segments

• A pipeline is
– a collection of processing segments through which binary
information flows
– Each segment performs partial processing dictated by the way the
task is partitioned
– Result obtained in each segment is transferred to the next segment
in the pipeline
– Final result is obtained after the data have passed through all
segments

• Several computations can be in progress in distinct segments

at the SAME TIME
• Each segment consists of an input register followed by a combinational
circuit
– register holds the data
– combinational circuit performs the SUB-OPERATION
– output of the combinational circuit in a given segment is applied to the next
segment’s input register
– A CLOCK is applied to all registers after enough time has elapsed to perform all
segment activity

• Registers provide isolation between each segment

– So, operate on distinct data simultaneously
– overlapping of computation is made
• Demonstration pipeline organization

– Suppose that we want to perform the

combined multiply and add operations
with a stream of numbers
Ai * Bi + Ci for i = 1, 2, 3, . . . , 7

– Each sub-operation is to be
implemented in a segment within a
pipeline

– Each segment has one or two registers

and a combinational circuit (shown in
Fig.)
• R1 through R5 are registers that receive
new data with every clock pulse
• multiplier and adder are combinational
circuits

– SUB-OPERATIONS performed in EACH

SEGMENT of the pipeline are as
follows:
Figure: Example of pipeline processing
• Effect of Each Clock is shown in Table

– It takes THREE clock pulses to fill up the pipe and retrieve the first output
from R5

– From there on, each clock produces a new output and moves the data one
step down the pipeline

TABLE: Content of Registers in Pipeline Example

Pipe line
General Considerations
• A task define a task as the total operation performed going through all the
segments in the pipeline

• BEHAVIOUR OF A PIPELINE can be illustrated with a SPACE-TIME DIAGRAM.

– It shows the segment utilization as a function of time

• SPACE-TIME DIAGRAM of a FOUR-SEGMENT PIPELINE is demonstrated in

below Fig.
– horizontal axis displays the time in clock cycles
– vertical axis gives the segment number (ex:- fetch, decode ,read & execute)

Figure: Space-time diagram for pipeline

• 6-TASKS (T1 through T6) executed in 4-SEGMENTS
(A1+B1 ; A2+B2; ……. A6+B6)

• In first clock cycle, task 1 is handled by segment 1

• In second clock cycle,

– segment 2 is busy with T1,
– segment 1 is busy with task T2

• Continuing in this manner,

– first task T1 is completed after the fourth clock cycle
– from then on, the pipe completes a task EVERY CLOCK CYCLE
– That is, once the pipeline is full, it takes ONLY ONE CLOCK PERIOD to obtain an output
Pipe line Pipe line
is full

(F)

(D)

(R)

(E)

Figure: Space-time diagram for pipeline

• Consider,
– a ‘k’-segment pipeline with a clock cycle time ‘tp’, is used to
execute ‘n’ tasks
– first task T1 requires a time = ‘ktp’ to complete its operation
– remaining n-1 tasks emerge from the pipe at the rate of one task
per clock cycle and completed after a time = (n-1)tp
– Therefore, to complete ‘n’ tasks using a k-segment pipeline requires
k + (n - 1) clock cycles

For example,
– Above Fig. shows four segments and six tasks
– The time required to complete all the operations is
4 + (6 - 1) = 9 clock cycles
Numerical example
• Let,
– TIME it takes to process a sub-operation in each segment, tp = 20 ns
– No. of SEGMENTS in pipeline k = 4

• No. of tasks to be executed in sequence n = 100

– Time taken to complete 100 tasks in PIPELINE system
(k + n - 1)tp = (4 + 99) x 20 = 2060 ns

• Assuming that
– TIME it takes to process a task is the SAME in the PIPELINE and NON-PIPELINE
circuits then
tn = ktp = 4 x 20 = 80 ns
– So, Time taken to complete 100 tasks in NON-PIPELINE system
nktp = 100 x 80 = 8000 ns

• Then, speedup ratio = 8000/2060 = 3.88

• As the number of tasks increases, the speedup will approach 4 (= No. of

SEGMENTS in pipeline)

• If tn = 60 ns, the speedup becomes tn/tp = 60/20 = 3

• There are two areas of computer design where the
pipeline organization is applicable

• Arithmetic Pipeline
– divides an arithmetic operation into sub-operations for
execution in the pipeline segments

• Instruction Pipeline
– operates on a stream of instructions by overlapping the
fetch, decode, and execute phases of the instruction
cycle
Arithmetic Pipeline
• Used in very high speed computers

• Used to implement
– floating-point operations
– multiplication of fixed-point numbers
– similar computations encountered in scientific problems

• Example : Pipeline Unit for floating-point addition and subtraction

– inputs to the floating-point adder pipeline are two normalized floating-
point binary numbers
X = A X 2a
Y = B X 2b
– A and B are two fractions that represent the mantissas
– a and b are the exponents

• Floating-point addition and subtraction can be performed in four

segments, as shown in Fig.
Fig: Pipeline for floating-point Exponents Mantissas
addition and subtraction a b A B

R R

• Registers (R) are Compare

placed between the Difference
Segment 1: exponents
segments to store by subtraction
intermediate results
R
• Sub-operations that
are performed in the Choose exponent
Segment 2: Align mantissa
FOUR segments are:
1. Compare the
exponents R
2. Align the mantissas
3. Add or subtract the
mantissas Segment 3: Add or subtract
mantissas
4. Normalize the result

R R

Segment 4: Adjust Normalize

exponent result

R R
• Numerical Example clarify the sub-operations performed in each segment

• Consider the two normalized floating-point numbers:

X = 0.9504 X 103
Y = 0.8200 X 102

• In FIRST SEGMENT:
– two exponents are subtracted to obtain 3 - 2 = 1
– The larger exponent 3 is chosen as the exponent of the result.

• In SECOND SEGMENT:
– shifts the mantissa of Y to the right to obtain
X = 0.9504 X 103
Y = 0. 0820 X 103
– This aligns the two mantissas under the same exponent.

• In THIRD SEGMENT:
– Addition of the two mantissas in produces the sum Z = 1 . 0324 X 103

• In FOURTH SEGMENT:
– Sum is adjusted by normalizing the result (fraction with a non-zero first digit)
– This is done by shifting the mantissa once to the right and incrementing the exponent by
one to obtain the Normalized Sum
Z = 0.10324 X 104
• Comparator, shifter, adder-subtractor, incrementer, and decrementer
in the floating-point pipeline are implemented with combinational
circuits

• Suppose that the

– time delays of the four segments are t1 = 60ns, t2 = 70ns,
t3 = 100ns, t4 = 80 ns
– interface registers have a delay of tr = 10 ns

• Clock Cycle is chosen to be

tp = t3 + tr = 110 ns (highest delay to be considered, so- t3 )

• An equivalent NON-PIPELINE floating-point adder-subtractor will have a

delay time
tn = t1 + t2 + t3 + t4 + tr = 320 ns

• In this case the Pipelined Adder has a SPEEDUP over the Non-pipelined
Adder is
tn /tp =320/110 = 2.9
Instruction Pipeline
• An instruction pipeline reads consecutive
instructions from memory while previous
instructions are being executed in other segments

• In the most general case, the computer needs to

process each instruction with the following
sequence of steps:

1. Fetch the instruction from memory

2. Decode the instruction
3. Calculate the effective address
4. Fetch the operands from memory
5. Execute the instruction
6. Store the result in the proper place
• Difficulties that will prevent the instruction pipeline
from operating at its maximum rate

– Different segments may take different times to operate

on the incoming information

– Some segments are skipped for certain operations

• For example: a register mode instruction DOES NOT need an
effective address calculation.

– Two or more segments may require memory access at

the same time, causing one segment to WAIT until
another is finished with the memory
• Memory access conflicts are sometimes resolved by using TWO
memory buses for accessing instructions and data in separate
memory modules
Example: Four-Segment Instruction Pipeline

• Assume, decoding of the instruction can be combined

with the calculation of the effective address into one
segment

• Instruction execution and storing of the result can be

combined into one segment
(as most of the instructions the result is saved into a
processor register)

• This reduces the instruction pipeline into FOUR segments

• Figure shows how the instruction cycle in the CPU can be processed with a
Four-segment Pipeline

•While FIRST instruction is Fetch instruction

Segment 1 from memory
being executed in SEGMENT 4,
Decode instruction
•the SECOND instruction Segment 2 And calculate
in sequence is busy Effective address

fetching an operand from

yes
memory in SEGMENT 3 Branch?

no
•the effective address Fetch operand
may be calculated in a Segment 3 From memory
separate arithmetic circuit
for the THIRD
Segment 4 Execute instruction
INSTRUCTION,
Interrupt yes
•whenever the memory is handling Interrupt?

available, the FOURTH

no
and ALL subsequent Update PC
instructions can be
Figure: Four Segment CPU Pipeline
fetched and placed in an Empty pipe
instruction FIFO
• Figure shows the operation of the instruction pipeline

– Time in the horizontal axis is divided into steps of equal duration

– Four segments are represented in the diagram with an abbreviated symbol

1. FI is the segment that fetches an instruction

2. DA is the segment that decodes the instruction and calculates the
effective address
3. FO is the segment that fetches the operand
4. EX is the segment that executes the instruction
Time in

Figure: Timing of instruction pipeline

• Assume, processor has SEPARATE Instruction and DATA Memories
– so that operation in FI and FO can proceed at the Same time

• In the ABSENCE of a branch instruction, each segment operates on

different instructions
– in step 4, instruction 1 is being executed in segment EX;
– the operand for instruction 2 is being fetched in segment FO;
– instruction 3 is being decoded in segment DA;
– and instruction 4 is being fetched from memory in segment FI

Figure: Timing of instruction pipeline

• Assume, instruction 3 is a BRANCH INSTRUCTION
– this instruction is decoded in segment DA in step 4,
– the transfer from FI to DA of the other instructions is HALTED until the branch
instruction is executed in step 6
– If the branch is TAKEN, a NEW Instruction is fetched in step 7
– If the branch is NOT Taken, the instruction fetched PREVIOUSLY in step 4 can be used
– The pipeline then continues until a new branch instruction is encountered

• Another DELAY may occur in the pipeline

– if the EX segment needs to store the result of the operation in the data memory
while FO segment needs to fetch an operand
– So, segment FO must WAIT until segment EX has finished its operation

Figure: Timing of instruction pipeline

• In general, there are THREE major difficulties
(Hazards) that cause the instruction pipeline to
DEVIATE from its normal operation

1. Resource Conflicts caused by access to memory by

two segments at the same time
- Resolved by using separate instruction and data
memories

2. Data Dependency Conflicts arise when an instruction

depends on the result of a previous instruction, but
this result is not yet available

3. Branch Difficulties arise from branch and other

instructions that change the value of PC
Data Dependency
• A data dependency occurs when an instruction needs data that are not
yet available

• Degradation of performance in an instruction pipeline is due to possible

COLLISION of Data or Address
For example,
– an instruction in the FO segment may need to fetch an operand that is being
generated at the same time by the previous instruction in segment EX
– Therefore, the second instruction must WAIT for data to become available
by the first instruction.

• Similarly, an ADDRESS dependency may occur when an operand address

CANNOT be calculated because the information needed by the
addressing mode is NOT available.
For example,
– an instruction with register indirect mode can NOT proceed to fetch the
operand if the previous instruction is loading the address into the register
– Therefore, the operand access to memory must be DELAYED until the
required address is available
• Pipelined computers deal with such CONFLICTS
between DATA DEPENDENCIES in a variety of ways:
– hardware interlocks
– operand forwarding
– delayed load

• Insert HARDWARE interlocks

– An INTERLOCK is a CIRCUIT that detects instructions

whose source operands are destinations of instructions
farther up in the pipeline

– INSTRUCTION whose source is not available to be

DELAYED by enough clock cycles to resolve the conflict
• Operand forwarding
– uses SPECIAL HARDWARE to detect a conflict and then
avoid it by routing the data through special
(additional) paths between pipeline segments

For example,
– instead of transferring an ALU result into a destination
register, the HARDWARE checks the destination
operand, and if it is needed as a source in the next
instruction, it passes the result DIRECTLY into the ALU
Input, BYPASSING the register file.

• Delayed load
– COMPILER is designed to detect a data conflict and
REORDER the instructions as necessary to delay the
loading of the conflicting data by inserting delayed
load NO-OPERATION instructions.
Handling of Branch Instructions

• A branch instruction can be conditional or unconditional

• An unconditional branch always alters the sequential program

flow by loading the program counter with target address

• In a conditional branch, the control selects the target

instruction if the condition is satisfied

• Various HARDWARE techniques to minimize the performance

degradation caused by instruction branching:
– Pre-fetch target instruction
– branch target buffer
– loop buffer
– branch prediction
– delayed branch
• Pre-fetch target instruction
– One way of handling a conditional branch is to prefetch the target
instruction and instruction following the branch
– Both are saved until the branch is executed
– If the branch condition is successful, the pipeline continues from
the branch target instruction and to continue fetching instructions
from both places until the branch decision is made.

• Branch Target Buffer (BTB)

– BTB is an associative memory included in the fetch segment of the
pipeline
– Each entry in the BTB consists of the address of a previously
executed branch instruction and the target instruction for that
branch
– When the pipeline decodes a branch instruction
• If instruction is in the BTB, the instruction is available directly and
prefetch continues from the new path
• If the instruction is NOT in the BTB, the pipeline shifts to a new instruction
stream and stores the target instruction in the BTB
– Advantage: branch instructions that have occurred previously are
readily available in the pipeline without interruption.
• Loop buffer
– A variation of the BTB is the loop buffer.
– This is a small very high speed register file maintained by the instruction fetch
segment of the pipeline
– When a program loop is detected in the program, it is stored in the loop buffer
– and executed directly without having to access memory

• Branch prediction
– A pipeline with branch prediction uses some additional logic to guess the outcome
of a conditional branch instruction before it is executed
– The pipeline then begins prefetching the instruction stream from the predicted path
– A correct prediction eliminates the wasted time caused by branch penalties

• Delayed Branch
– This procedure employed in most RISC Processors
– COMPILER detects the Branch instructions and REARRANGES the machine language
CODE SEQUENCE by INSERTING useful INSTRUCTIONS that keep the pipeline
operating without interruptions.
Example:
– Insertion of a NO-OPERATION instruction after a Branch instruction.
– This causes the computer to fetch the target instruction during the execution of the
no-operation instruction, allowing a continuous flow of the pipeline
RISC Pipeline
Example: Three-Segment Instruction Pipeline

• Instruction cycle is divided into three sub-operations and implemented in

three segments:
I: Instruction fetch
A: ALU operation
E: Execute instruction

• I segment
– fetches the instruction from program memory

• A segment
– instruction is decoded and an ALU operation is performed
– It performs an operation for a data manipulation instruction
– it evaluates the effective address for a load or store instruction, (OR) it calculates the
branch address for a program control instruction

• E segment
– directs the output of the ALU to one of three destinations
– It transfers
• result of the ALU operation into a destination register (OR)
• effective address to a data memory storing (OR)
• branch address to the program counter
Delayed Load
• Consider operation of the following four
instructions:
1. LOAD: R1 <--M [address 1]
2. LOAD: R2 <--M [address 2]
3. ADD: R3 <--R 1 + R2
4. STORE: M[address 3] <--R3
• If the three-segment pipeline proceeds without interruptions,
– there will be a data conflict in instruction 3 because the operand in R2 is not yet available
in the A segment

• ‘E’ segment in clock cycle 4 is in a process of placing the memory data into R2.

• ‘A’ segment in clock cycle 4 is using the data from R2,

– but the value in R2 will NOT be the correct value since it has NOT yet been transferred from
memory

• COMPILER inserts a NO-OP (no-operation) instruction

– This instruction is fetched from memory but has NO OPERATION,
– thus wasting a clock cycle.

Figure: Three-segment
pipeline timing with
DATA CONFLICT
• Figure (b) shows the same program with a NO-OP instruction inserted after the load to R2
instruction
– The data is loaded into R2 in clock cycle 4
– The add instruction uses the value of R2 in step 5

• NO-OP instruction is used to advance one clock cycle in order to compensate for the DATA CONFLICT
in the pipeline
– NOP is performed in segment A during clock cycle 4 OR segment E during clock cycle 5

• Concept of delaying the use of the data loaded from memory is referred to as DELAYED LOAD

• Advantage:
– DATA DEPENDENCY is taken care of by the COMPILER rather than HARDWARE.

Figure: Three-segment
pipeline timing with
DELAYED LOAD
Delayed Branch

• A branch instruction delays the pipeline

operation until the instruction at the branch
address is fetched
– Most RISC processors rely on the COMPILER to
REDEFINE the branches
– This method is referred to as DELAYED BRANCH

• Delayed branches
– analyze the instructions before and after the branch
– and REARRANGE the PROGRAM SEQUENCE by
inserting useful instructions in the delay steps
• The program for this EXAMPLE consists of FIVE INSTRUCTIONS:
– Load from memory to R1
– Increment R2
– Add R3 to R4
– Subtract R5 from R6
– Branch to address X
• In Fig(a) the COMPILER inserts two NO-OP instructions after the branch
• Branch address X is TRANSFERRED to PC in clock cycle 7
• Fetching of the instruction at X is delayed by two clock cycles by the no-op instructions
• After the program counter PC has been UPDATED, the instruction in X starts the fetch
phase at clock cycle 8

Figure: Example of
delayed branch
using NO-OP
• The program in Fig.(b) is REARRANGED by placing the add and subtract
instructions after the branch instruction
• PC is updated to the value of X in clock cycle 5,
• add and subtract instructions are fetched from memory and executed in the
proper sequence
– if the load instruction is at address 101 and X is equal to 350,
– the branch instruction is fetched from address 103
– add instruction is fetched from address 104 and executed in clock cycle 6
– subtract instruction is fetched from address 105 and executed in clock cycle 7
– Since the value of X is transferred to PC with clock cycle 5 in the E segment,
– the instruction fetched from address 350 (instruction at the branch address) at
clock cycle 6

101
102
103
104
105
350 Fig: Using NOP
(PC)

Mod 3
No ratings yet
Mod 3
46 pages
Parallel Processing Essentials
No ratings yet
Parallel Processing Essentials
32 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
33 pages
Unit-5 (Coa) Notes
100% (1)
Unit-5 (Coa) Notes
33 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Pipeline - 3117
No ratings yet
Pipeline - 3117
22 pages
Pipe Lining
No ratings yet
Pipe Lining
7 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Csso U 5
No ratings yet
Csso U 5
29 pages
Coa M3 Bit
No ratings yet
Coa M3 Bit
4 pages
Pipeline - 3117
No ratings yet
Pipeline - 3117
21 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
29 pages
CST202 Lect. Note 3
No ratings yet
CST202 Lect. Note 3
25 pages
Chapter 5 - CO - BIM - III
No ratings yet
Chapter 5 - CO - BIM - III
7 pages
Unit 5 1
No ratings yet
Unit 5 1
21 pages
Unit 5 Pipeline
No ratings yet
Unit 5 Pipeline
13 pages
CS212 Unit 5
No ratings yet
CS212 Unit 5
38 pages
Pipelining and Vector Processing Guide
No ratings yet
Pipelining and Vector Processing Guide
63 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Unit 7 N
No ratings yet
Unit 7 N
13 pages
Mca Coa-Unit III
No ratings yet
Mca Coa-Unit III
16 pages
UNIT - 5 Pipeling Concept
No ratings yet
UNIT - 5 Pipeling Concept
15 pages
COA Module 5 QB Complete Solutions
No ratings yet
COA Module 5 QB Complete Solutions
32 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
28 pages
Pipeline & Parallel Processing
No ratings yet
Pipeline & Parallel Processing
19 pages
Caalp Unit5
No ratings yet
Caalp Unit5
20 pages
COA Unit - V Notes
No ratings yet
COA Unit - V Notes
21 pages
Principles of Linear Pipelining
50% (2)
Principles of Linear Pipelining
71 pages
Parallel Processing and Pipelining
No ratings yet
Parallel Processing and Pipelining
53 pages
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
No ratings yet
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
36 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
52 pages
Unit-5-Parallel Processing
No ratings yet
Unit-5-Parallel Processing
11 pages
2.2 Pipelining: Asynchronous
25% (4)
2.2 Pipelining: Asynchronous
24 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
3rd Unit
No ratings yet
3rd Unit
72 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
5.pipeline and Multiprocessors
100% (1)
5.pipeline and Multiprocessors
16 pages
Computer Pipelining Techniques
No ratings yet
Computer Pipelining Techniques
18 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Parallel Processing & Pipelining Guide
No ratings yet
Parallel Processing & Pipelining Guide
8 pages
CH-1 1 Pipelining
No ratings yet
CH-1 1 Pipelining
43 pages
Chapter 3
No ratings yet
Chapter 3
59 pages
COA DR MVN 5 UNIT - Latest PDF
No ratings yet
COA DR MVN 5 UNIT - Latest PDF
24 pages
Pipelining
No ratings yet
Pipelining
33 pages
COAU5
No ratings yet
COAU5
31 pages
Pipeline Processing Coa
No ratings yet
Pipeline Processing Coa
34 pages
Lecture 10
No ratings yet
Lecture 10
23 pages
CO Module 5 Notes
No ratings yet
CO Module 5 Notes
16 pages
Vectors
No ratings yet
Vectors
52 pages
Module 5
No ratings yet
Module 5
16 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
Unit-3 (Part-IV)
No ratings yet
Unit-3 (Part-IV)
4 pages
Lect28-Pipeline 15012019
No ratings yet
Lect28-Pipeline 15012019
36 pages
Important Questions Solution Coa Unit 5
No ratings yet
Important Questions Solution Coa Unit 5
8 pages
Pipelining 2
No ratings yet
Pipelining 2
43 pages
Unit 5
No ratings yet
Unit 5
51 pages
Siemon2017 00
No ratings yet
Siemon2017 00
12 pages
Terminal Shortcut
No ratings yet
Terminal Shortcut
4 pages
Databook - Q2'17 - NAND (Rev1.0)
No ratings yet
Databook - Q2'17 - NAND (Rev1.0)
12 pages
19BCP096 HashPointers
No ratings yet
19BCP096 HashPointers
4 pages
Fca Business Plan Example
No ratings yet
Fca Business Plan Example
12 pages
Exp10 Cmos Inv PDF
No ratings yet
Exp10 Cmos Inv PDF
7 pages
Punch - Order Form
No ratings yet
Punch - Order Form
2 pages
Sarvam Engineers
No ratings yet
Sarvam Engineers
11 pages
Rise of AI
No ratings yet
Rise of AI
7 pages
Writing Task Sample 1: Model Answer
No ratings yet
Writing Task Sample 1: Model Answer
3 pages
Unit 1-1
No ratings yet
Unit 1-1
2 pages
FMC - Novolastic Subsea Thermal Insulation - LOW RES
50% (2)
FMC - Novolastic Subsea Thermal Insulation - LOW RES
3 pages
VMware Pro 14 + License Keys
No ratings yet
VMware Pro 14 + License Keys
1 page
Ucs415 MST With Solutions
No ratings yet
Ucs415 MST With Solutions
9 pages
Solution of Matrix Eigenvalue Problem: Mike Renfro
No ratings yet
Solution of Matrix Eigenvalue Problem: Mike Renfro
23 pages
Year 6 Maths Curriculum Guide
No ratings yet
Year 6 Maths Curriculum Guide
3 pages
PS2001-CMS RMF DESCASE MP FILTRI NBI Auto Parts
No ratings yet
PS2001-CMS RMF DESCASE MP FILTRI NBI Auto Parts
5 pages
Scan Homework Easily with StudyHub
100% (1)
Scan Homework Easily with StudyHub
9 pages
Case Tools CIA I
No ratings yet
Case Tools CIA I
1 page
IMS - DB Presentation2
No ratings yet
IMS - DB Presentation2
25 pages
Salary Budgeting PDF
100% (1)
Salary Budgeting PDF
54 pages
Unit 1
No ratings yet
Unit 1
10 pages
Mustafa Mohammed Mohammed Abdel-Rahim
No ratings yet
Mustafa Mohammed Mohammed Abdel-Rahim
1 page
Coral M2 Datasheet
No ratings yet
Coral M2 Datasheet
12 pages
Design
No ratings yet
Design
6 pages
Gyanendra Number
No ratings yet
Gyanendra Number
1 page
Java Syntax Notes
No ratings yet
Java Syntax Notes
27 pages
5 Assign Plant To Company Code - SAP OX18 - SAP Training Tutorials
No ratings yet
5 Assign Plant To Company Code - SAP OX18 - SAP Training Tutorials
4 pages
Procedure For Evaluation and Selection
100% (1)
Procedure For Evaluation and Selection
2 pages
663b2c77317db99de578cb46 Chapter 3
No ratings yet
663b2c77317db99de578cb46 Chapter 3
73 pages

Dld&Co Cse-Ds Unit 4-2

Uploaded by

Dld&Co Cse-Ds Unit 4-2

Uploaded by

UNIT-IV

• Several computations can be in progress in distinct segments

• Registers provide isolation between each segment

– Suppose that we want to perform the

– Each segment has one or two registers

– SUB-OPERATIONS performed in EACH

TABLE: Content of Registers in Pipeline Example

• BEHAVIOUR OF A PIPELINE can be illustrated with a SPACE-TIME DIAGRAM.

• SPACE-TIME DIAGRAM of a FOUR-SEGMENT PIPELINE is demonstrated in

Figure: Space-time diagram for pipeline

• In first clock cycle, task 1 is handled by segment 1

• In second clock cycle,

• Continuing in this manner,

Figure: Space-time diagram for pipeline

• No. of tasks to be executed in sequence n = 100

• Then, speedup ratio = 8000/2060 = 3.88

• As the number of tasks increases, the speedup will approach 4 (= No. of

• If tn = 60 ns, the speedup becomes tn/tp = 60/20 = 3

• Example : Pipeline Unit for floating-point addition and subtraction

• Floating-point addition and subtraction can be performed in four

• Registers (R) are Compare

Segment 4: Adjust Normalize

• Consider the two normalized floating-point numbers:

• Suppose that the

• Clock Cycle is chosen to be

• An equivalent NON-PIPELINE floating-point adder-subtractor will have a

• In the most general case, the computer needs to

1. Fetch the instruction from memory

– Different segments may take different times to operate

– Some segments are skipped for certain operations

– Two or more segments may require memory access at

• Assume, decoding of the instruction can be combined

• Instruction execution and storing of the result can be

• This reduces the instruction pipeline into FOUR segments

•While FIRST instruction is Fetch instruction

fetching an operand from

available, the FOURTH

– Time in the horizontal axis is divided into steps of equal duration

– Four segments are represented in the diagram with an abbreviated symbol

1. FI is the segment that fetches an instruction

Figure: Timing of instruction pipeline

• In the ABSENCE of a branch instruction, each segment operates on

Figure: Timing of instruction pipeline

• Another DELAY may occur in the pipeline

Figure: Timing of instruction pipeline

1. Resource Conflicts caused by access to memory by

2. Data Dependency Conflicts arise when an instruction

3. Branch Difficulties arise from branch and other

• Degradation of performance in an instruction pipeline is due to possible

• Similarly, an ADDRESS dependency may occur when an operand address

• Insert HARDWARE interlocks

– An INTERLOCK is a CIRCUIT that detects instructions

– INSTRUCTION whose source is not available to be

• A branch instruction can be conditional or unconditional

• An unconditional branch always alters the sequential program

• In a conditional branch, the control selects the target

• Various HARDWARE techniques to minimize the performance

• Branch Target Buffer (BTB)

• Instruction cycle is divided into three sub-operations and implemented in

• ‘A’ segment in clock cycle 4 is using the data from R2,

• COMPILER inserts a NO-OP (no-operation) instruction

• A branch instruction delays the pipeline

You might also like