0% found this document useful (0 votes)
62 views36 pages

Pipe Lining

The document describes pipelining by dividing a task into stages that are executed synchronously, with the output of one stage becoming the input of the next. It provides examples of arithmetic pipelining for fixed-point and floating-point operations, breaking down operations like addition into stages like loading operands, performing the operation, and storing the result. The document also discusses concepts like pipeline performance, throughput, and ideal speedup that can be achieved through pipelining.

Uploaded by

Akshay Mahajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views36 pages

Pipe Lining

The document describes pipelining by dividing a task into stages that are executed synchronously, with the output of one stage becoming the input of the next. It provides examples of arithmetic pipelining for fixed-point and floating-point operations, breaking down operations like addition into stages like loading operands, performing the operation, and storing the result. The document also discusses concepts like pipeline performance, throughput, and ideal speedup that can be achieved through pipelining.

Uploaded by

Akshay Mahajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Pipelining

Unit 2
R1←An
R2←Bn Load An & Bn An * Bn + Cn for n = 1,2,3,4

R3←R1 *R2 Multiply R1 & R2 Memory


R4 ←Cn Load Cn An Bn Cn

R5 ←R3 + R4 Add R3 & R4 S


t
a
g
R1 R2
e
1
Clock Stage 1 Stage 2 Stage 3
Pulse
R1 R2 R3 R4 R5 MULTIPLIER
S
1 A1 B1 t
a
g
2 A2 B2 A1*B1 C1 e R3 R4
2
3 A3 B3 A2*B2 C2 A1*B1+C1
S
4 A4 B4 A3*B3 C3 A2*B2+C2 t
ADDER
a
g
5 A4*B4 C4 A3*B3+C3 e
3
6 A4*B4+C4 R5
Synchronous Pipelining

Stage Stage Stage


Bufer

Bufer

Bufer

Bufer

Bufer
1 2 n

• A task is divided into different subtasks which are performed synchronously by


diferent hardware blocks known as stages
• Stages -> perform operations & result produced at each stage stored in latches or
buffer.
• After the arrival of clock pulse, all the latches will transfer data to the next stage
simultaneously.
• All stages will have some delay, therefore the maximum delay time of a stage will define
clock period of pipeline
• Clock period t of pipeline is equal to sum of maximum stage delay time and latch delay
time d i.e t = max(ts) + d
• Five tasks are being executed by dividing each task into four subtasks
• So how many stages???
• T12 -> Task 1 in Stage 2

1 2 3 4 5 6 7 8 Time Cycle

Task 1 T11 T12 T13 T14

Task 2 T21 T22 T23 T24

Task 3 T31 T32 T33 T34

Task 4 T41 T42 T43 T44

Task 5 T51 T52 T53 T54


Asynchronous Pipelining

Input

Ready Stage 1 Ready Stage 2 Ready Stage k Ready


Ack Ack Ack Ack

• Data flow between adjacent stages controlled by handshaking protocol.


• When a stage is ready to transmit data, it sends ready signal to the next stage.
• After receiving the data, acknowledge (ack) will be sent to previous stage.
• Variable throughput due to amount of delay experienced at diferent stages.
Pipeline Performance

• Consider execution of m tasks (instructions) using n stages (subtasks).


• Time for execution of task in non pipeline implementation = m * n * t
• In n-stage pipeline processor, first task will be executed in n clock cycles.
• Remaining m-1 tasks will be executed in every one clock cycle due to
overlap
• Time units required to execute m tasks pipeline implementation
= ( n + m - 1)* t

Speedup S(n)
= =
When no of tasks (instructions) are very high m→∞
Lim S(n) = n = ideal speedup= maximum speedup
m→∞
The ideal speedup is equal to the number of pipeline stages. That is, when m is
very large, a pipelined processor can produce output approximately n times
faster than a non pipelined processor.
•• Efficiency of Pipeline E(n)

E(n)=
=
=

• Throughput U(n): Number of tasks executed
per unit time.
U(n) = =
Types of Pipeline
• Arithmetic Pipeline
└ Fixed –Point Arithmetic Pipeline
Float –Point Arithmetic Pipeline
• Instruction Pipeline
└ Two Stage Instruction Pipeline
└ Four Stage Instruction Pipeline
└ Six Stage Instruction Pipeline
ARITHMETIC PIPELINING

• Pipelining used for arithmetic computation are called


Arithmetic pipeline.
• Arithmetic pipeline constructed to perform simple fixed point
and complex floating arithmetic operations.
• Basic operation like addition, subtraction , division &
multiplication can be efficiently partitioned into subtasks for
pipeline stages
• FIXED POINT ARTHMETIC PIPELINE
• FLOATING POINT ARTHMETIC PIPELINE
Adders
• Two types of adders are used
• Carry propagation adder– Add numbers such that carry generated is propagated

eg 0101 101
+0100 + 1100
----------------- -------------------
0 1
0001 0001

• Carry save adder-Add numbers such that carry generated is propagated rather
carries are saved in carry vector
1 1 1 1

eg 1010 101
+ 0110 + 11
+ 1111 + 1
----------------- -------------------
Sum 00011 Sum 111
Carry 1 1 1 0 0 Carry 010
n + 1 bit max bit no
FIXED POINT ARTHMETIC PIPELINE
8 bit 8 bit

8 bit 8 bit
Multiplier Recording Logic
8 9 10 11 12 13 14 15

8 9 10 11 12 13
Carry Save Carry Save 14 15
Adder Adder
10 13 13
10
Carry Save Adder Carry Save Adder
13 13 15 15

13 13 15
15
Carry Save Adder
15
15
Carry Save Adder
16 16

16 16
Carry Propagate Adder
16

16
Pipeline Floating Point Adder
Steps to add to two floating point number
n1= (f1,e1) eg n1= 0.5 x 103
n2= (f2,e2) eg n2= 0.3 x 105
Step 1 : Compare the exponents (e1, e2 ) of two numbers (n1, n2 ).Pickup fraction of number with smaller
exponent. Pickup diference k=│ e1 - e2 │ and also larger exponent
n1= 0.5 x 103 n2= 0.3 x 105
Fraction of Smaller exponent= 0.5 Larger exponent= 105
k=│ e1 - e2 │= │ 3 - 5 │= 2

Step 2 : Right shift the fraction with smaller exponent by k positions


i.e.By right shifting n1= 0.5 x 103 by k=2 positions we get, n1= 0.005 x 105

Step 3 : Add the fraction f1 + f2 =0.305


Fraction part= 0.305 Exponent part = 105

Step 4: Find M = Compute no of leading zero’s in the mantissa of the result.


i.e no of leading zero’s in 0.305 is 3 . Therefore ,M=3

Step 5 : Subtract the number M from exponent and left shift the mantissa of the result by M positions
M=3
Exponent part 105-3 = 102 Left shift result mantissa by 3 positions=305
Fraction Part = 305 & Exponent part = 102
e1 f1 e2 f2

BUFFER e2 BUFFER
S e1 f2
t f1
a
g EXPONENT COMPARATOR FRACTION SELECTOR
e Fraction with smaller exp
1 k
Min(e1,e2) RIGHT SHIFTER
Max(e1,e2) Fraction with
larger exp
BUFFER BUFFER BUFFER
S
t Fraction with larger exp Fraction right shifted
a
g
Max(e1,e2) ADDER
e
2
BUFFER BUFFER
f1 + f 2

LEADING ZERO COUNTER


S
t Max(e1,e2) M
a
g LEFT SHIFT
e
3 EXPONENT SUBSTRACTOR

BUFFER
BUFFER
Exponent Fraction
INSTRUCTION PIPELINING

• A typical instruction processing involves following steps


1. Instruction Fetch
2. Instruction Decode
3. Address generation of operand
4. Operand fetch
5. Instruction Execution
6. Writing of Result
Two Stage Instruction Pipeline
• Subdividing instruction processing into two stages
1. Fetch Instruction
2. Execute Instruction

Stage Stage
Bufer

Bufer
Bufer
1 2

Instruction
1 2 3 4 5 6 7 8 Time Cycle

Instruction 1 EX
IF
Instruction 2 EX
IF
Instruction3 EX
IF
Instruction4 EX
IF
Four Stage Instruction Pipeline
• Subdividing instruction processing into two stages
1. Instruction Fetch IF
2. Instruction Decode ID
3. Operand fetch OF
4. Instruction Execution EX

Instruction 1 2 3 4 5 6 7 8 Time Cycle

Instruction 1 ID OF EX
IF
Instruction 2 ID OF EX
IF
Instruction 3 ID OF EX
IF
Instruction 4 ID OF EX
IF
Six Stage Instruction Pipeline
• A typical instruction processing involves following steps
1. Instruction Fetch IF
2. Instruction Decode ID
3. Calculate operands CO
4. Operand fetch OF
5. Instruction Execution EX
6. Write Operand WO

Instruction 1 2 3 4 5 6 7 8 9 Time Cycle

Instruction 1 ID CO OF EX WO
IF

Instruction 2 ID CO OF EX WO
IF

Instruction 3 ID CO OF EX WO
IF

Instruction 4 ID CO OF EX WO
IF
Six Stage Instruction Pipeline Time Cycle
Instruction 1 2 3 4 5 6 7 8 10 11 12 13 14

Instruction 1 ID CO OF EX WO
IF

Instruction 2 ID CO OF EX WO
IF

Instruction 3 ID CO OF EX WO
IF

Instruction 4 ID CO OF EX
IF

Instruction 5 ID CO OF
IF

Instruction 6 ID CO
IF

Instruction 7 ID
IF

Instruction 15 ID CO OF EX WO
IF

Instruction 16 ID CO OF EX WO
IF
Pipeline Hazards
• Pipeline hazards prevent the next instruction in the instruction
stream from executing during in its designated clock cycle
• Instruction is stalled & all instructions later in the pipeline are
also stalled.
• No new instructions are fetched during the stall.

• Classified into following three major types:


1. Structural hazards
2. Data Hazards
3. Control Hazards
Structural hazards
• Occurs when certain resource is requested by more than one instruction
at the same time.
• Arise from resource conflicts
• HW cannot support all possible combinations of instructions
• No separate memory available for instruction fetch and data
memory

Instruction 1 2 3 4 5 6 7 8

Instruction 1 ID EX MEM WR
IF

Instruction 2 ID EX MEM WR
IF

Instruction 3 ID EX MEM WR
IF

Instruction 4 ID EX MEM WR
IF

Instruction 4 ID EX MEM WR
IF

Techniques to eliminate hazards


Duplicate resources
Reorder the instruction
Data Hazards
• Dependency among instructions i.e data dependency
• Occurs when there is an instruction in the pipeline that
afects the result of another instruction in the pipeline
• I1 : ADD R3, R2 R3←R3+R2
• I2 : MUL R1, R3 R1←R1*R3

Instruction 1 2 3 4 5 6 7 8

Instruction 1 ID CO OF EX WO
IF

Instruction 2 ID CO stall stall OF EX WO


IF

Instruction 3 ID CO OF EX WO
IF
• Consider two instructions, A and B. A occurs before B.
Data Dependencies

• True Data Dependency


• Read-After-Write (RAW) Hazard
• Occurs when the value produced by an instruction is required by a
subsequent instruction.
• I1 : ADD R3, R2,R1 R3←R3+R1
• I2 : SUB R4, R3 ,1 R4←R3-1
• This is common, and forwarding helps to solve it.
• B tries to read a register before A has written it and gets the old
value.
Data Dependencies
• Name Dependency
• Anti-Dependencies
• Write-After-Read (RAW) Hazard

• Occurs when an instruction writes a location which has been read by a previous instruction.
• i+1 instruction tries to write an operand before it is read by ith instruction. So, ith instruction incorrectly
gets the new value
• I1 : ADD R3, R2,R1 R3←R3+R1
• I2 : SUB R4, R5 ,1 R4←R5-1

• WAR (write after read)


– B tries to write a register before A has read it.

– In this case, A uses the new (incorrect) value.
• Output Dependency
• Write-After-Write (WAW) Hazard
• Occurs when a location is written by two instructions.
• i+1 instruction tries to write an operand before it is written by ith instruction.
So, writes end up being performed in the wrong order.

• B tries to write an operand before A has written it.


• After instruction B has executed, the value of the register should be B's result, but A's result is
stored instead.

• I1 : ADD R3, R2,R1 R3←R3+R1


• I2 : SUB R2, R3 ,1 R2←R3-1
• I3 : ADD R3, R2,R5 R3←R2+R5

You might also like