0% found this document useful (0 votes)
44 views65 pages

Unit-3 Coa

Computer instructions specify operations for a processor to perform. They contain fields like an opcode, address, and mode. There are three basic instruction code formats: memory-reference uses bits to specify an address and mode; register-reference specifies operations on registers; and input-output instructions are for communication between the computer and outside environment. The opcode and other bits determine the instruction type.

Uploaded by

tarunprovides
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views65 pages

Unit-3 Coa

Computer instructions specify operations for a processor to perform. They contain fields like an opcode, address, and mode. There are three basic instruction code formats: memory-reference uses bits to specify an address and mode; register-reference specifies operations on registers; and input-output instructions are for communication between the computer and outside environment. The opcode and other bits determine the instruction type.

Uploaded by

tarunprovides
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

UNIT-3

Computer Instructions

Computer instructions are a set of machine language instructions that a particular processor
understands and executes. A computer performs tasks on the basis of the instruction provided.

An instruction comprises of groups called fields. These fields include:

o The Operation code (Opcode) field which specifies the operation to be performed.
o The Address field which contains the location of the operand, i.e., register or memory
location.
o The Mode field which specifies how the operand will be located.

A basic computer has three instruction code formats which are:

1. Memory - reference instruction


2. Register - reference instruction
3. Input-Output instruction

Memory - reference instruction

In Memory-reference instruction, 12 bits of memory is used to specify an address and one bit to
specify the addressing mode 'I'.

Register - reference instruction

The Register-reference instructions are represented by the Opcode 111 with a 0 in the leftmost bit
(bit 15) of the instruction.

Note: The Operation code (Opcode) of an instruction refers to a group of bits that define arithmetic
and logic operations such as add, subtract, multiply, shift, and compliment.
A Register-reference instruction specifies an operation on or a test of the AC (Accumulator)
register.

Input-Output instruction

Just like the Register-reference instruction, an Input-Output instruction does not need a reference to
memory and is recognized by the operation code 111 with a 1 in the leftmost bit of the instruction.

These instructions are for communication between computer and outside environment. The IR(14 –
12) is 111 (differentiates it from memory reference) and IR(15) is 1 (differentiates it from register
reference instructions). The rest 12 bits specify I/O operation. The remaining 12 bits are used to
specify the type of the input-output operation or test performed.

Note
o The three operation code bits in positions 12 through 14 should be equal to 111. Otherwise,
the instruction is a memory-reference type, and the bit in position 15 is taken as the
addressing mode I.
o When the three operation code bits are equal to 111, control unit inspects the bit in position
15. If the bit is 0, the instruction is a register-reference type. Otherwise, the instruction is an
input-output type having bit 1 at position 15.

Instruction Set Completeness


A set of instructions is said to be complete if the computer includes a sufficient number of
instructions in each of the following categories:

o Arithmetic, logical and shift instructions


o A set of instructions for moving information to and from memory and processor registers.
o Instructions which controls the program together with instructions that check status
conditions.
o Input and Output instructions

Arithmetic, logic and shift instructions provide computational capabilities for processing the type of
data the user may wish to employ.

A huge amount of binary information is stored in the memory unit, but all computations are done in
processor registers. Therefore, one must possess the capability of moving information between
these two units.

Program control instructions such as branch instructions are used change the sequence in which the
program is executed.

Input and Output instructions act as an interface between the computer and the user. Programs and
data must be transferred into memory, and the results of computations must be transferred back to
the user.
Types of instructions
Depending on operation they perform, all instructions are divided in several groups:

• Arithmetic Instructions
• Branch Instructions
• Data Transfer Instructions
• Logic Instructions
• Bit-oriented Instructions
Arithmetic instructions
Arithmetic instructions perform several basic operations such as addition, subtraction, division,
multiplication etc. After execution, the result is stored in the first operand. For example: ADD A,R1
- The result of addition (A+R1) will be stored in the accumulator.
Branch Instructions
There are two kinds of branch instructions: Unconditional jump instructions: upon their execution a
jump to a new location from where the program continues execution is executed. Conditional jump
instructions: a jump to a new program location is executed only if a specified condition is met.
Otherwise, the program normally proceeds with the next instruction.
Data Transfer Instructions
Data transfer instructions move the content of one register to another. The register the content of
which is moved remains unchanged. If they have the suffix “X” (MOVX), the data is exchanged
with external memory.
Logic Instructions
Logic instructions perform logic operations upon corresponding bits of two registers. After
execution, the result is stored in the first operand.
Bit-oriented Instructions
Similar to logic instructions, bit-oriented instructions perform logic operations. The difference is
that these are performed upon single bits.
The 8086 microprocessor supports 8 types of instructions −

• Data Transfer Instructions


• Arithmetic Instructions
• Bit Manipulation Instructions
• String Instructions
• Program Execution Transfer Instructions (Branch & Loop Instructions)
• Processor Control Instructions
• Iteration Control Instructions
• Interrupt Instructions
Arithmetic Instructions -
Instructions to perform addition-
• ADD − Used to add the provided byte to byte/word to word.

• ADC − Used to add with carry.

• INC − Used to increment the provided byte/word by 1.


Instructions to perform subtraction
• SUB − Used to subtract the byte from byte/word from word.

• SBB − Used to perform subtraction with borrow.

• DEC − Used to decrement the provided byte/word by 1.


CMP − Used to compare 2 provided byte/word.
Instruction to perform multiplication
• MUL − Used to multiply unsigned byte by byte/word by word.

• IMUL − Used to multiply signed byte by byte/word by word.


Instructions to perform division
• DIV − Used to divide the unsigned word by byte or unsigned double word by word.

• IDIV − Used to divide the signed word by byte or signed double word by word.

Bit Manipulation Instructions


Instructions to perform logical operation
• NOT − Used to invert each bit of a byte or word.

• AND − Used for adding each bit in a byte/word with the corresponding bit in another
byte/word.

• OR − Used to multiply each bit in a byte/word with the corresponding bit in another
byte/word.

• XOR − Used to perform Exclusive-OR operation over each bit in a byte/word with the
corresponding bit in another byte/word.
Instructions to perform shift operations
• SHL/SAL − Used to shift bits of a byte/word towards left and put zero(S) in LSBs.

• SHR − Used to shift bits of a byte/word towards the right and put zero(S) in MSBs.
Instructions to perform rotate operations
• ROL − Used to rotate bits of byte/word towards the left, i.e. MSB to LSB and to Carry Flag
[CF].

• ROR − Used to rotate bits of byte/word towards the right, i.e. LSB to MSB and to Carry
Flag [CF].

• RCR − Used to rotate bits of byte/word towards the right, i.e. LSB to CF and CF to MSB.

• RCL − Used to rotate bits of byte/word towards the left, i.e. MSB to CF and CF to LSB.
Program Execution Transfer Instructions (Branch and Loop Instructions
nstructions to transfer the instruction during an execution without any condition −

• CALL − Used to call a procedure and save their return address to the stack.

• RET − Used to return from the procedure to the main program.

• JMP − Used to jump to the provided address to proceed to the next instruction.

Processor Control Instructions


These instructions are used to control the processor action by setting/resetting the flag values.
Following are the instructions under this group −

• STC − Used to set carry flag CF to 1

• CLC − Used to clear/reset carry flag CF to 0

• CMC − Used to put complement at the state of carry flag CF.

• STD − Used to set the direction flag DF to 1

• CLD − Used to clear/reset the direction flag DF to 0

• STI − Used to set the interrupt enable flag to 1, i.e., enable INTR input.

• CLI − Used to clear the interrupt enable flag to 0, i.e., disable INTR input.

Iteration Control Instructions


These instructions are used to execute the given instructions for number of times. Following is the
list of instructions under this group −

• LOOP − Used to loop a group of instructions until the condition satisfies, i.e., CX = 0

Interrupt Instructions
These instructions are used to call the interrupt during program execution.

• INT − Used to interrupt the program during execution and calling service specified.

• INTO − Used to interrupt the program during execution if OF = 1

• IRET − Used to return from interrupt service to the main program

Q. Explain Computer Instruction with instruction code format.

Answer:

Computer Instructions

Computer instructions are a set of machine language instructions that a particular processor
understands and executes. A computer performs tasks on the basis of the instruction provided.

An instruction comprises of groups called fields. These fields include:


o The Operation code (Opcode) field which specifies the operation to be performed.
o The Address field which contains the location of the operand, i.e., register or memory
location.
o The Mode field which specifies how the operand will be located.

A basic computer has three instruction code formats which are:

1. Memory - reference instruction


2. Register - reference instruction
3. Input-Output instruction

Memory - reference instruction

In Memory-reference instruction, 12 bits of memory is used to specify an address and one bit to
specify the addressing mode 'I'.

Register - reference instruction

The Register-reference instructions are represented by the Opcode 111 with a 0 in the leftmost bit
(bit 15) of the instruction.

Note: The Operation code (Opcode) of an instruction refers to a group of bits that define arithmetic
and logic operations such as add, subtract, multiply, shift, and compliment.

A Register-reference instruction specifies an operation on or a test of the AC (Accumulator)


register.

Input-Output instruction
Just like the Register-reference instruction, an Input-Output instruction does not need a reference to
memory and is recognized by the operation code 111 with a 1 in the leftmost bit of the instruction.
The remaining 12 bits are used to specify the type of the input-output operation or test performed.

Note
o The three operation code bits in positions 12 through 14 should be equal to 111. Otherwise,
the instruction is a memory-reference type, and the bit in position 15 is taken as the
addressing mode I.
o When the three operation code bits are equal to 111, control unit inspects the bit in position
15. If the bit is 0, the instruction is a register-reference type. Otherwise, the instruction is an
input-output type having bit 1 at position 15.

Instruction Set Completeness

A set of instructions is said to be complete if the computer includes a sufficient number of


instructions in each of the following categories:

o Arithmetic, logical and shift instructions


o A set of instructions for moving information to and from memory and processor registers.
o Instructions which controls the program together with instructions that check status
conditions.
o Input and Output instructions

Arithmetic, logic and shift instructions provide computational capabilities for processing the type of
data the user may wish to employ.

A huge amount of binary information is stored in the memory unit, but all computations are done in
processor registers. Therefore, one must possess the capability of moving information between
these two units.

Program control instructions such as branch instructions are used change the sequence in which the
program is executed.

Input and Output instructions act as an interface between the computer and the user. Programs and
data must be transferred into memory, and the results of computations must be transferred back to
the user.

Q. Explain instruction cycle.

Instruction Cycle

A program residing in the memory unit of a computer consists of a sequence of instructions. These
instructions are executed by the processor by going through a cycle for each instruction.

In a basic computer, each instruction cycle consists of the following phases:

1. Fetch instruction from memory.


2. Decode the instruction.
3. Read the effective address from memory.
4. Execute the instruction.
Computer Organization | Instruction Formats
(Zero, One, Two and Three Address
Instruction)
Computer perform task on the basis of instruction provided. An instruction in computer comprises
of groups called fields. These field contains different information as for computers every thing is in
0 and 1 so each field has different significance on the basis of which a CPU decide what to perform.
The most common fields are:

• Operation field which specifies the operation to be performed like addition.


• Address field which contain the location of operand, i.e., register or memory location.
• Mode field which specifies how operand is to be founded.
An instruction is of various length depending upon the number of addresses it contain. Generally
CPU organization are of three types on the basis of number of address fields:
1. Single Accumulator organization
2. General register organization
3. Stack organization
Note that we will use X = (A+B)*(C+D) expression to showcase the procedure.

• Zero Address Instructions –


A stack based computer do not use address field in instruction. To evaluate a expression first it is
converted to revere Polish Notation i.e. Post fix Notation.
Expression: X = (A+B)*(C+D)
Postfixed : X = AB+CD+*
TOP means top of stack
M[X] is any memory location

PUSH A TOP = A
PUSH B TOP = B
ADD TOP = A+B
PUSH C TOP = C
PUSH D TOP = D
ADD TOP = C+D
MUL TOP = (C+D)*(A+B)
POP X M[X] = TOP
• One Address Instructions –
This use a implied ACCUMULATOR register for data manipulation. One operand is in
accumulator and other is in register or memory location. Implied means that the CPU already
know that one operand is in accumulator so there is no need to specify it.

Expression: X = (A+B)*(C+D)
AC is accumulator
M[] is any memory location
M[T] is temporary location

LOAD A AC = M[A]
ADD B AC = AC + M[B]
STORE T M[T] = AC
LOAD C AC = M[C]
ADD D AC = AC + M[D]
MUL T AC = AC * M[T]
STORE X M[X] = AC
• Two Address Instructions –
This is common in commercial computers.Here two address can be specified in the
instruction.Unlike earlier in one address instruction the result was stored in accumulator
here result cab be stored at different location rather than just accumulator, but require more
number of bit to represent address.

Here destination address can also contain operand.


Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location

MOV R1, A R1 = M[A]


ADD R1, B R1 = R1 + M[B]
MOV R2, C R2 = C
ADD R2, D R2 = R2 + D
MUL R1, R2 R1 = R1 * R2
MOV X, R1 M[X] = R1

• Three Address Instructions –


This has three address field to specify a register or a memory location. Program created are
much short in size but number of bits per instruction increase. These instructions make
creation of program much easier but it does not mean that program will run much faster
because now instruction only contain more information but each micro operation (changing
content of register, loading address in address bus etc.) will be performed in one cycle only.
Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location

ADD R1, A, B R1 = M[A] + M[B]


ADD R2, C, D R2 = M[C] + M[D]
MUL X, R1, R2 M[X] = R1 * R2

Computer Organization | Different Instruction Cycles


Registers Involved In Each Instruction Cycle:

• Memory address registers(MAR) : It is connected to the address lines of the system bus. It
specifies the address in memory for a read or write operation.
• Memory Buffer Register(MBR) : It is connected to the data lines of the system bus. It
contains the value to be stored in memory or the last value read from the memory.
• Program Counter(PC) : Holds the address of the next instruction to be fetched.
• Instruction Register(IR) : Holds the last instruction fetched.
The Instruction Cycle –

Each phase of Instruction Cycle can be decomposed into a sequence of elementary micro-operations.
In the above examples, there is one sequence each for the Fetch, Indirect, Execute and Interrupt
Cycles.

The Indirect Cycle is always followed by the Execute Cycle. The Interrupt Cycle is always followed
by the Fetch Cycle. For both fetch and execute cycles, the next cycle depends on the state of the
system.
00 : Fetch Cycle
01 : Indirect Cycle
10 : Execute Cycle
11 : Interrupt Cycle
Different Instruction Cycles:
1. The Fetch Cycle –
At the beginning of the fetch cycle, the address of the next instruction to be executed is in
the Program Counter(PC).

Step 1: The address in the program counter is moved to the memory address register(MAR),
as this is the only register which is connected to address lines of the system bus.

TheStep 2:
address in
MAR is placed
on the address
bus, now the
control unit
issues a READ
command on the control bus, and the result appears on the data bus and is then copied
into the memory buffer register (MBR). Program counter is incremented by one, to get
ready for the next instruction. (These two action can be performed simultaneously to
save time)
Step 3: The content of the MBR is moved to the instruction register(IR).

Thus, a simple
Fetch Cycle consist of three steps and four micro-operation. Symbolically, we can write these
sequence of events as follows:-

• Here ‘I’ is the instruction length. The notations (t1, t2, t3) represents successive time units.
We assume that a clock is available for timing purposes and it emits regularly spaced clock
pulses. Each clock pulse defines a time unit. Thus, all time units are of equal duration. Each
micro-operation can be performed within the time of a single time unit.
First time unit: Move the contents of the PC to MAR.
Second time unit: Move contents of memory location specified by MAR to MBR. Increment
content of PC by I.
Third time unit: Move contents of MBR to IR.
Note: Second and third micro-operations both take place during the second time unit.

• The Indirect Cycles –


Once an instruction is fetched, the next step is to fetch source operands. Source Operand is being
fetched by indirect addressing( it can be fetched by any addressing modes, here its done by indirect
addressing). Register-based operands need not be fetched. Once the opcode is executed, a similar
process may be needed to store the result in main memory. Following micro-operations takes
place:-

• Step 1: The address field of the instruction is transferred to the MAR. This is used to fetch
the address of the operand.
Step 2: The address field of the IR is updated from the MBR.(So that it now contains a
direct addressing rather than indirect addressing)
Step 3: The IR is now in the state, as if indirect addressing has not been occurred.
Note: Now IR is ready for the execute cycle, but it skips that cycle for a moment to consider the
Interrupt Cycle .

• The Execute Cycle


The other three cycles(Fetch, Indirect and Interrupt) are simple and predictable. Each of them
requires simple, small and fixed sequence of micro-operation. In each case same micro-operation
are repeated each time around.
Execute Cycle is different from them. Like, for a machine with N different opcodes there are N
different sequence of micro-operations that can occur.
Lets take an hypothetical example :-
consider an add instruction:

Here, this instruction adds the content of location X to register R. Corresponding micro-operation
will be:-

We begin with the IR containing the ADD instruction.


Step 1: The address portion of IR is loaded into the MAR.
Step 2: The address field of the IR is updated from the MBR, so the reference memory location is
read.
Step 3: Now, the contents of R and MBR are added by the ALU.
Lets take a complex example :-

Here, the content of location X is incremented by 1. If the result is 0, the next instruction will be
skipped. Corresponding sequence of micro-operation will be :-
• Here, the PC is incremented if (MBR) = 0. This test (is MBR equal to zero or not) and
action (PC is incremented by 1) can be implemented as one micro-operation.
Note : This test and action micro-operation can be performed during the same time unit
during which the updated value MBR is stored back to memory.

• The Interrupt Cycle:


At the completion of the Execute Cycle, a test is made to determine whether any enabled
interrupt has occurred or not. If an enabled interrupt has occurred then Interrupt Cycle
occurs. The natare of this cycle varies greatly from one machine to another.
Lets take a sequence of micro-operation:-
Step 1: Contents of the PC is transferred to the MBR, so that they can be saved for return.
Step 2: MAR is loaded with the address at which the contents of the PC are to be saved.
PC is loaded with the address of the start of the interrupt-processing routine.
Step 3: MBR, containing the old value of PC, is stored in memory.
Note: In step 2, two actions are implemented as one micro-operation. However, most processor
provide multiple types of interrupts, it may take one or more micro-operation to obtain the
save_address and the routine_address before they are transferred to the MAR and PC respectively.

Hardwired v/s Micro-programmed Control Unit


To execute an instruction, the control unit of the CPU must generate the required control signal in
the proper sequence. There are two approaches used for generating the control signals in proper
sequence as Hardwired Control unit and Micro-programmed control unit.
Hardwired Control Unit –
The control hardware can be viewed as a state machine that changes from one state to another in
every clock cycle, depending on the contents of the instruction register, the condition codes and the
external inputs. The outputs of the state machine are the control signals. The sequence of the
operation carried out by this machine is determined by the wiring of the logic elements and hence
named as “hardwired”.

• Fixed logic circuits that correspond directly to the Boolean expressions are used to generate
the control signals.
• Hardwired control is faster than micro-programmed control.
• A controller that uses this approach can operate at high speed.
• RISC architecture is based on hardwired control unit

Micro-programmed Control Unit –

• The control signals associated with operations are stored in special memory units
inaccessible by the programmer as Control Words.
• Control signals are generated by a program are similar to machine language programs.
• Micro-programmed control unit is slower in speed because of the time it takes to fetch
microinstructions from the control memory.
Some Important Terms –
1. Control Word : A control word is a word whose individual bits represent various control
signals.
2. Micro-routine : A sequence of control words corresponding to the control sequence of a
machine instruction constitutes the micro-routine for that instruction.
3. Micro-instruction : Individual control words in this micro-routine are referred to as
microinstructions.
4. Micro-program : A sequence of micro-instructions is called a micro-program, which is
stored in a ROM or RAM called a Control Memory (CM).
5. Control Store : the micro-routines for all instructions in the instruction set of a computer
are stored in a special memory called the Control Store.

Types of Micro-programmed Control Unit – Based on the type of Control Word stored in the
Control Memory (CM), it is classified into two types :
1. Horizontal Micro-programmed control Unit :
The control signals are represented in the decoded binary format that is 1 bit/CS. Example: If 53
Control signals are present in the processor than 53 bits are required. More than 1 control signal can
be enabled at a time.

• It supports longer control word.


• It is used in parallel processing applications.
• It allows higher degree of parallelism. If degree is n, n CS are enabled at a time.
• It requires no additional hardware(decoders). It means it is faster than Vertical
Microprogrammed.
• It is more flexible than vertical microprogrammed
2. Vertical Micro-programmed control Unit :
The control signals re represented in the encoded binary format. For N control signals- Log2(N) bits
are required.

• It supports shorter control words.


• It supports easy implementation of new conrol signals therefore it is more flexible.
• It allows low degree of parallelism i.e., degree of parallelism is either 0 or 1.
• Requires an additional hardware (decoders) to generate control signals, it implies it is slower
than horizontal microprogrammed.
• It is less flexible than horizontal but more flexible than that of hardwired control unit.
Difference between Horizontal and Vertical micro-programmed
Control Unit
Basically, control unit (CU) is the engine that runs the entire functions of a computer with the help
of control signals in the proper sequence. In the micro-programmed control unit approach, the
control signals that are associated with the operations are stored in special memory units. It is
convenient to think of sets of control signals that cause specific micro-operations to occur as being
“microinstructions”. The sequences of microinstructions could be stored in an internal “control”
memory.
Micro-programmed control unit can be classified into two types based on the type of Control Word
stored in the Control Memory, viz., Horizontal micro-programmed control unit and Vertical micro-
programmed control unit.

• In Horizontal micro-programmed control unit, the control signals are represented in the
decoded binary format, i.e., 1 bit/CS. Here ‘n’ control signals require n bit encoding. On the
other hand.
• In Vertical micro-programmed control unit, the control signals are represented in the
encoded binary format. Here ‘n’ control signals require log2n bit encoding.

Comparison between Horizontal micro-programmed control unit and Vertical micro-programmed


control unit:

Horizontal µ-programmed CU Vertical µ-programmed CU


It supports longer control word. It supports shorter control word.
It allows higher degree of parallelism. If
It allows low degree of parallelism i.e., degree of
degree is n, then n Control Signals are
parallelism is either 0 or 1.
enabled at a time.
Additional hardware in the form of decoders are
No additional hardware is required.
required to generate control signals.
It is faster than Vertical micro-programmed it is slower than Horizontal micro-programmed
control unit. control unit.
It is less flexible than Vertical micro- It is more flexible than Horizontal micro-programmed
programmed control unit. control unit.
Horizontal micro-programmed control unit Vertical micro-programmed control unit uses vertical
uses horizontal microinstruction, where microinstruction, where a code is used for each action
every bit in the control field attaches to a to be performedand thedecoder translates this code
control line. into individual control signals.
Horizontal micro-programmed control unit Vertical micro-programmed control unit makes more
makes less use of ROM encoding than use of ROM encoding to reduce the length of the
vertical micro-programmed control unit. control word.
Example: Consider a hypothetical Control Unit which supports 4 k words. The Hardware contains
64 control signals and 16 Flags. What is the size of control word used in bits and control memory in
byte using:
a) Horizontal Programming
b) Vertical programming
Solution:
a)For Horizontal

64 bits for 64 signals


Control Word Size = 4 + 64 + 12 = 80 bits

Control Memory = 4 kW = ( (4* 80) / 8 ) = 40 kByte

a)For Vertical
6 bits for 64 signals i.e log264

Control Word Size = 4 + 6 + 12 = 22 bits

Control Memory = 4 kW = ( (4* 22) / 8 ) = 11 kByte


Implementation of Micro Instructions Sequencer
Micro Instructions Sequencer is a combination of all hardware for selecting the next micro-
instruction address. The micro-instruction in control memory contains a set of bits to initiate micro
operations in computer registers and other bits to specify the method by which the address is
obtained.
Implementation of Micro Instructions Sequencer –

Control Address Register(CAR) :


Control address register receives the address from four different paths. For receiving the addresses
from four different paths, Multiplexer is used.
• Multiplexer :
Multiplexer is a combinational circuit which contains many data inputs and single data output
depending on control or select inputs.

• Branching :
Branching is achieved by specifying the branch address in one of the fields of the micro
instruction. Conditional branching is obtained by using part of the micro-instruction to select a
specific status bit in order to determine its condition.

• Mapping Logic :
An external address is transferred into control memory via a mapping logic circuit.
• Incrementer :
Incrementer increments the content of the control address register by one, to select the next
micro-instruction in sequence.

• Subroutine Register (SBR) :


The return address for a subroutine is stored in a special register called Subroutine Register
whose value is then used when the micro-program wishes to return from the subroutine.

• Control Memory :
Control memory is a type of memory which contains addressable storage registers. Data is
temporarily stored in control memory. Control memory can be accessed quicker than main
memory.

Computer Organization | RISC and CISC


Reduced Set Instruction Set Architecture (RISC) –
The main idea behind is to make hardware simpler by using an instruction set composed of a few
basic steps for loading, evaluating and storing operations just like a load command will load data,
store command will store the data.
Complex Instruction Set Architecture (CISC) –
The main idea is that a single instruction will do all loading, evaluating and storing operations just
like a multiplication command will do stuff like loading data, evaluating and storing it, hence it’s
complex.
Both approaches try to increase the CPU performance

• RISC: Reduce the cycles per instruction at the cost of the number of instructions per
program.
• CISC: The CISC approach attempts to minimize the number of instructions per program but
at the cost of increase in number of cycles per instruction.

Earlier when programming was done using assembly language, a need was felt to make instruction
do more task because programming in assembly was tedious and error prone due to which CISC
architecture evolved but with uprise of high level language dependency on assembly reduced RISC
architecture prevailed.
Characteristic of RISC –
1. Simpler instruction, hence simple instruction decoding.
2. Instruction come under size of one word.
3. Instruction take single clock cycle to get executed.
4. More number of general purpose register.
5. Simple Addressing Modes.
6. Less Data types.
7. Pipeling can be achieved.
Characteristic of CISC –
1. Complex instruction, hence complex instruction decoding.
2. Instruction are larger than one word size.
3. Instruction may take more than single clock cycle to get executed.
4. Less number of general purpose register as operation get performed in memory itself.
5. Complex Addressing Modes.
6. More Data types.
Example – Suppose we have to add two 8-bit number:

• CISC approach: There will be a single command or instruction for this like ADD which
will perform the task.
• RISC approach: Here programmer will write first load command to load data in registers
then it will use suitable operator and then it will store result in desired location.
So, add operation is divided into parts i.e. load, operate, store due to which RISC programs are
longer and require more memory to get stored but require less transistors due to less complex
command.
Difference –

RISC CISC
Focus on software Focus on hardware
Uses both hardwired and
Uses only Hardwired control unit micro programmed control
unit
Transistors are used for
Transistors are used for more registers storing complex
Instructions
Fixed sized instructions Variable sized instructions
Can perform REG to REG
Can perform only Register to Register Arthmetic operations or REG to MEM or MEM
to MEM
Requires less number of
Requires more number of registers
registers
Code size is large Code size is small
Instruction take more than
A instruction execute in single clock cycle
one clock cycle
Instruction are larger than
A instruction fit in one word
size of one word

CISC Architecture
The CISC approach attempts to minimize the number of instructions per program, sacrificing the
number of cycles per instruction. Computers based on the CISC architecture are designed to decrease
the memory cost. Because, the large programs need more storage, thus increasing the memory cost
and large memory becomes more expensive. To solve these problems, the number of instructions per
program can be reduced by embedding the number of operations in a single instruction, thereby
making the instructions more complex.
Examples of CISC PROCESSORS
IBM 370/168 – It was introduced in the year 1970. CISC design is a 32 bit processor and four 64-
bit floating point registers.
VAX 11/780 – CISC design is a 32-bit processor and it supports many numbers of addressing
modes and machine instructions which is from Digital Equipment Corporation.
Intel 80486 – It was launched in the year 1989 and it is a CISC processor, which has instructions
varying lengths from 1 to 11 and it will have 235 instructions.
CHARACTERISTICS OF CISC ARCHITECTURE
• Instruction-decoding logic will be Complex.
• One instruction is required to support multiple addressing modes.
• Less chip space is enough for general purpose registers for the instructions that are 0operated
directly on memory.
• Various CISC designs are set up two special registers for the stack pointer, handling
interrupts, etc.
• MUL is referred to as a “complex instruction” and requires the programmer for storing
functions.
RISC Architecture
RISC (Reduced Instruction Set Computer) is used in portable devices due to its power efficiency. For
Example, Apple iPod and Nintendo DS. RISC is a type of microprocessor architecture that uses
highly-optimized set of instructions. RISC does the opposite, reducing the cycles per instruction at
the cost of the number of instructions per program Pipelining is one of the unique feature of RISC. It
is performed by overlapping the execution of several instructions in a pipeline fashion. It has a high
performance advantage over CISC.
RISC ARCHITECTURE CHARACTERISTICS
• Simple Instructions are used in RISC architecture.
• RISC helps and supports few simple data types and synthesize complex data types.
• RISC utilizes simple addressing modes and fixed length instructions for pipelining.
• RISC permits any register to use in any context.
• One Cycle Execution Time
• The amount of work that a computer can perform is reduced by separating “LOAD” and
“STORE” instructions.
• RISC contains Large Number of Registers in order to prevent various number of interactions
with memory.
• In RISC, Pipelining is easy as the execution of all instructions will be done in a uniform
interval of time i.e. one click.
• In RISC, more RAM is required to store assembly level instructions.
• Reduced instructions need a less number of transistors in RISC.
• RISC uses Harvard memory model means it is Harvard Architecture.
• A compiler is used to perform the conversion operation means to convert a high-level
language statement into the code of its form.
The Advantages and
Disadvantages of
RISC and CISC
The Advantages of RISC architecture

• RISC(Reduced instruction set computing)architecture has a set of instructions, so high-level


language compilers can produce more efficient code
• It allows freedom of using the space on microprocessors because of its simplicity.
• Many RISC processors use the registers for passing arguments and holding the local variables.
• RISC functions use only a few parameters, and the RISC processors cannot use the call
instructions, and therefore, use a fixed length instruction which is easy to pipeline.
• The speed of the operation can be maximized and the execution time can be minimized.
Very less number of instructional formats, a few numbers of instructions and a few addressing
modes are needed.
The Disadvantages of RISC architecture

• Mostly, the performance of the RISC processors depends on the programmer or compiler as
the knowledge of the compiler plays a vital role while changing the CISC code to a RISC code
• While rearranging the CISC code to a RISC code, termed as a code expansion, will increase
the size. And, the quality of this code expansion will again depend on the compiler, and also
on the machine’s instruction set.
• The first level cache of the RISC processors is also a disadvantage of the RISC, in which these
processors have large memory caches on the chip itself. For feeding the instructions, they
require very fast memory systems.
Advantages of CISC architecture

• Microprogramming is easy assembly language to implement, and less expensive than hard
wiring a control unit.
• The ease of microcoding new instructions allowed designers to make CISC machines
upwardly compatible:
• As each instruction became more accomplished, fewer instructions could be used to
implement a given task.
Disadvantages of CISC architecture

• The performance of the machine slows down due to the amount of clock time taken by
different instructions will be dissimilar
• Only 20% of the existing instructions is used in a typical programming event, even though
there are various specialized instructions in reality which are not even used frequently.
• The conditional codes are set by the CISC instructions as a side effect of each instruction
which takes time for this setting – and, as the subsequent instruction changes the condition
code bits – so, the compiler has to examine the condition code bits before this happens.

Difference Between Hardwired and Micro programmed Control Unit


The Hardwired and Micro programmed control unit generates the control signals to fetch and
execute instructions. The fundamental difference between hardwired and micro programmed
control unit is that hardwired is a circuitry approach whereas, the micro program control unit is
implemented by programming.
The hardwired control unit is designed for the RISC style instruction set. On the other hand, the
micro programmed control unit was designed for the CISC style instruction set.
These control units can be distinguished on the several parameters which we have discussed below.
We will also discuss the design of both the control unit. So let us start with our discussion.

Hardwired Vs Micro programmed Control Unit


Basis of
Hardwired Control Unit Micro programmed Control Unit
Differentiation
This control unit is implemented by
Basic It is a circuitry approach.
programming
Design RISC style instructions CISC style instructions
Modification is difficult as the
Modifications are easy in case of micro
control unit is hardwired.
Modification programmed control unit as it will require
Modifying it will require the change
the in change in the code only.
in hardware.
It works well for simple It works well for complex instructions
Instructions
instructions. also.
Implementing hardwired structure Implementing micro programs is not
Costing
requires a cost. costly.
Control memory No control memory is required Control memory is required
Execution Speed Faster execution Comparatively slow

What are Control Signals?


Both Hardwired and Micro programmed control unit was designed to ‘generate’ the control signals.
The control signals operate the functioning of the processor’s hardware. It decides what
operation has to be performed, what must be the sequence of the operations performed by the
processor, in what time an operation must be executed and so on.

What is Hardwired Control Unit?


In simple words, the hardwired control unit generates the control signals to execute the instructions
in a proper sequence and at the correct time. The hardwired control unit is created with the
hardware; it is a circuitry approach. It is designed for the RISC style instruction set.
A hardwired circuit organization is shown in the figure below. Let us discuss all the components
one by one, in order to understand the “generation of control signals” from this circuitry
organization.

The instruction register is a processors register that has the ‘instruction’ which is currently in
execution. The instruction register generates the OP-code bits respective of the operation and the
addressing modes of the operands, mentioned in the instruction.

• Instruction decoder receives the Op-code bits generated by the instruction register and
interprets the operation and addressing modes of the instruction. Now, based on operation
and addressing mode of the instruction in instruction register it set the corresponding
Instruction signal INSi to 1.
Each instruction is executed in step-like, instruction fetch, decode, operand fetch, ALU,
memory store. These steps may vary in different books. But in general, five steps are
enough to for the execution of an instruction.
• Now, the control unit must be aware of the current step, the instruction is in. For this, a Step
Counter is implemented which has signals from T1, …, T5. The step counter sets one of the
signals T1 to T5 to 1 on the basis of the step, the instruction is in.
• Here, the question arises how step counter knows the current step of the instruction? For
this, a Clock is implemented. This clock is designed such that for each step the clock must
complete its one clock cycle.
So, consider if the step counter has set T3 signal 1 then after a clock cycle completes step
counter will set T4 to 1.
• What if the execution of instruction has is interrupted due to some reason? Will the clock
still continue to trigger step counter? The answer is No. The Counter Enable ‘disables’ the
step counter to increment to the next step signal, till the execution of the current step is
completed.
• Now, suppose the execution of an instruction depends on some condition or if it is branch
instruction. This is determined with the help of the Condition signals. The Condition signals
generate the signals for the conditions greater than, less than, equal, greater than equal, less
than equal etc.
• The remaining is External inputs, it acknowledges the control signal generator of
interrupts which affects the execution of the instruction.
On an, all the Control Signal Generator generates the control signals, based on the inputs obtained
by the Instruction register, Step counter, Condition signals and External inputs.

What is Micro programmed Control Unit?


Micro programmed control unit also produces the control signal but using the programs. This
approach was very popular in past during the evolution of CISC architecture. The program that
creates the ‘control signals’ is called Micro program. This micro program is placed on the
processor chip which is fast memory, it is also called control memory or control store.
A microprogram has a set of microinstructions, or it is also termed as control word. Each
microinstruction is ‘n’ bit word. Each control signal differs from other control signal depending on
the bit pattern of the control word. Each control word/microinstruction has a different bit pattern.
Instruction execution is performed in steps as we have seen above. So, for each step there is a
control word/ microinstruction in the micro program. A sequence of microinstructions required to
execute a particular instruction is called micro routine.
In the figure below, you can understand the organization of a control word/ microinstruction,
micro routine and micro program.
Now let us discuss the organization of the micro program control unit. We will discuss the flow
of instruction execution in terms of instruction execution steps.
In the first step (instruction fetch) the Microinstruction address generator would fetch the
instruction from ‘instruction register’ (IR).
1. In the second step, the microinstruction address generator decodes the instruction obtained
from IR and retrieves the starting address of the micro routine required to perform the
corresponding operation mentioned in the instruction. It loads that starting address to micro
program counter.
2. In the third step, the ‘control word’ corresponding to the ‘starting address’ of ‘micro
program counter’ is read and as the execution proceeds, micro program address generator
will increment the value of micro program counter to read the successive control words of
the micro routine.
3. In the last microinstruction of a micro routine, there is a bit which we call end bit. When
end bit is set to 1 it denotes successful execution of that micro routine.
4. After this, the micro program address generator would return back to Step 1 again to fetch a
new instruction. And the cycle goes on.
Control Store or Control memory is a memory used store the micro programs. A micro program
control unit is simple to implement and flexible to modify but it is slower than Hardwired control
unit.

Key Differences between Hardwired and Micro programmed Control Unit


1. The hardwired control unit is implemented using a hardware circuit while a micro
programmed control unit is implemented by programming.
2. A hardwired control unit is designed for RISC style instruction set. On the other hand, a
micro programmed control unit is for CISC style instruction set.
3. Implementing modification in a micro programmed control unit is easier as it is easy to
change the code. But implementing modification in Hardwired control unit is difficult as
changing the circuitry will add cost.
4. The hardwired control unit executes simple instructions easily but it finds difficulty in
executing complex instruction. The micro programmed control unit finds, executing
complex instructions easy.
5. Building up a hardwired control unit requires the hardware circuit which is costly. Building
up micro programmed control unit is cheaper as it only requires coding.
6. The hardwired control unit does not require a control memory as here; the control signal is
generated by hardware. The micro programmed control unit requires the control memory as
the micro programs which are responsible for generating the control signals are stored in
control memory.
7. Execution is fast as everything is in the hardware. But, the micro program control unit is
slow as it has to access control memory for the execution of every instruction.

Similarities
Both Hardwired and Micro programmed control unit ‘generates’ the control signals.
So this is all about the differences between hardwired and micro programmed control unit.
Explain Microinstruction sequencing and execution.
Microinstruction Sequencing:
A micro-program control unit can be viewed as consisting of two parts:
1. The control memory that stores the microinstructions.
2. Sequencing circuit that controls the generation of the next address.
A micro-program sequencer attached to a control memory inputs certain bits of the
microinstruction, from which it determines the next address for control memory. A typical
sequencer provides the following address-sequencing capabilities:
1. Increment the present address for control memory.
2. Branches to an address as specified by the address field of the micro instruction.
3. Branches to a given address if a specified status bit is equal to 1.
4. Transfer control to a new address as specified by an external source (Instruction Register).
5. Has a facility for subroutine calls and returns.
Depending on the current microinstruction condition flags, and the contents of the instruction
register, a control memory address must be generated for the next micro instruction.
There are three general techniques based on the format of the address information in the
microinstruction:
1. Two Address Field.
2. Single Address Field.
3. Variable Format
Two Address Field:

The simplest
approach is to
provide two address fields in each microinstruction and multiplexer is provided to select:

• Address from the second address field.


• Starting address based on the OPcode field in the current instruction.
The address selection signals are provided by a branch logic module whose input consists of control
unit flags plus bits from the control partition of the micro instruction.
Single Address Field:
Two-address approach is simple but it requires more bits in the microinstruction. With a simpler
approach, we can have a single address field in the micro instruction with the following options for
the next address.

• Address Field.
• Based on OPcode in instruction register.
• Next Sequential Address.

The address selection


signals determine
which option is
selected. This approach reduces the number of address field to one. In most cases (in case of
sequential execution) the address field will not be used. Thus the microinstruction encoding does
not efficiently utilize the entire microinstruction.
Variable Format:
In this approach, there are two entirely different microinstruction formats. One bit designates which
format is being used. In this first format, the remaining bits are used to activate control signals. In
the second format, some bits drive the branch logic module, and the remaining bits provide the
address. With the first format, the next address is either the next sequential address or an address
derived from the instruction register. With the second format, either a conditional or unconditional
branch is specified.
Computer Organization | Micro-Operation

In computer central processing units, micro-operations (also known as micro-ops) are the
functional or atomic, operations of a processor. These are low level instructions used in some
designs to implement complex machine instructions. They generally perform operations on data
stored in one or more registers. They transfer data between registers or between external buses of
the CPU, also performs arithmetic and logical operations on registers.
In executing a program, operation of a computer consists of a sequence of instruction cycles, with
one machine instruction per cycle. Each instruction cycle is made up of a number of smaller units –
Fetch, Indirect, Execute and Interrupt cycles. Each of these cycles involves series of steps, each of
which involves the processor registers. These steps are referred as micro-operations. the prefix
micro refers to the fact that each of the step is very simple and accomplishes very little. Figure
below depicts the concept being discussed here.
Summary: Execution of a program consists of sequential execution of instructions. Each
instruction is executed during an instruction cycle made up of shorter sub-cycles(example – fetch,
indirect, execute, interrupt). The performance of each sub-cycle involves one or more shorter
operations, that is, micro-operations.

Computer Organization and Architecture | Pipelining | Set 1


(Execution, Stages and Throughput)
To improve the performance of a CPU we have two options:
1) Improve the hardware by introducing faster circuits.
2) Arrange the hardware such that more than one operation can be performed at the same time.
Since, there is a limit on the speed of hardware and the cost of faster circuits is quite high, we have
to adopt the 2nd option.
Pipelining: Pipelining is a process of arrangement of hardware elements of the CPU such that its
overall performance is increased. Simultaneous execution of more than one instruction takes place
in a pipelined processor.
Let us see a real life example that works on the concept of pipelined operation. Consider a water
bottle packaging plant. Let there be 3 stages that a bottle should pass through, Inserting the
bottle(I), Filling water in the bottle(F), and Sealing the bottle(S). Let us consider these stages as
stage 1, stage 2 and stage 3 respectively. Let each stage take 1 minute to complete its operation.
Now, in a non pipelined operation, a bottle is first inserted in the plant, after 1 minute it is moved to
stage 2 where water is filled. Now, in stage 1 nothing is happening. Similarly, when the bottle
moves to stage 3, both stage 1 and stage 2 are idle. But in pipelined operation, when the bottle is in
stage 2, another bottle can be loaded at stage 1. Similarly, when the bottle is in stage 3, there can be
one bottle each in stage 1 and stage 2. So, after each minute, we get a new bottle at the end of stage
3. Hence, the average time taken to manufacture 1 bottle is :
Without pipelining = 9/3 minutes = 3m
I F S | | | | | |
| | | I F S | | |
| | | | | | I F S (9 minutes)

With pipelining = 5/3 minutes = 1.67m


I F S | |
| I F S |
| | I F S (5 minutes)

Thus, pipelined operation increases the efficiency of a system.


Design of a basic pipeline

• In a pipelined processor, a pipeline has two ends, the input end and the output end. Between
these ends, there are multiple stages/segments such that output of one stage is connected to
input of next stage and each stage performs a specific operation.
• Interface registers are used to hold the intermediate output between two stages. These
interface registers are also called latch or buffer.
• All the stages in the pipeline along with the interface registers are controlled by a common
clock.
Execution in a pipelined processor
Execution sequence of instructions in a pipelined processor can be visualized using a space-time
diagram. For example, consider a processor having 4 stages and let there be 2 instructions to be
executed. We can visualize the execution sequence through the following space-time diagrams:
Non overlapped execution:
Stage / Cycle 1 2 3 4 5 6 7 8
S1 I1 I2
S2 I1 I2
S3 I1 I2
S4 I1 I2
Total time = 8 Cycle
Overlapped execution:
Stage / Cycle 1 2 3 4 5
S1 I1 I2
S2 I1 I2
S3 I1 I2
S4 I1 I2
Total time = 5 Cycle
Pipeline Stages
RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC
instruction set. Following are the 5 stages of RISC pipeline with their respective operations:

• Stage 1 (Instruction Fetch)


In this stage the CPU reads instructions from the address in the memory whose value is
present in the program counter.
• Stage 2 (Instruction Decode)
In this stage, instruction is decoded and the register file is accessed to get the values from the
registers used in the instruction.
• Stage 3 (Instruction Execute)
In this stage, ALU operations are performed.
• Stage 4 (Memory Access)
In this stage, memory operands are read and written from/to the memory that is present in
the instruction.
• Stage 5 (Write Back)
In this stage, computed/fetched value is written back to the register present in the
instructions.
Performance of a pipelined processor
Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let there be ‘n’ tasks to be
completed in the pipelined processor. Now, the first instruction is going to take ‘k’ cycles to come
out of the pipeline but the other ‘n – 1’ instructions will take only ‘1’ cycle each, i.e, a total of ‘n –
1’ cycles. So, time taken to execute ‘n’ instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp

In the same case, for a non-pipelined processor, execution time of ‘n’ instructions will be:
ETnon-pipeline = n * k * Tp

So, speedup (S) of the pipelined processor over non-pipelined processor, when ‘n’ tasks are
executed on the same processor is:
S = Performance of pipelined processor /
Performance of Non-pipelined processor
As the performance of a processor is inversely proportional to the execution time, we have,
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n – 1]

When the number of tasks ‘n’ are significantly larger than k, that is, n >> k
S = n * k / n
S = k

where ‘k’ are the number of stages in the pipeline.


Also, Efficiency = Given speed up / Max speed up = S / Smax
We know that, Smax = k
So, Efficiency = S / k
Throughput = Number of instructions / Total time to complete the instructions
So, Throughput = n / (k + n – 1) * Tp
Note: The cycles per instruction (CPI) value of an ideal pipelined processor is 1

Computer Organization and Architecture | Pipelining | Set 2


(Dependencies and Data Hazard)
Dependencies in a pipelined processor
There are mainly three types of dependencies possible in a pipelined processor. These are :
1) Structural Dependency
2) Control Dependency
3) Data Dependency
These dependencies may introduce stalls in the pipeline.
Stall : A stall is a cycle in the pipeline without new input.

Structural dependency
This dependency arises due to the resource conflict in the pipeline. A resource conflict is a situation
when more than one instruction tries to access the same resource in the same cycle. A resource can
be a register, memory, or ALU.
Example:
Instruction / Cycle 1 2 3 4 5
I1 IF(Mem) ID EX
I2 IF(Mem) ID EX
I3 IF(Mem) ID EX
I4 ID
In the above scenario, in cycle 4, instructions I1 and I4 are trying to access same resource (Memory)
which introduces a resource conflict.
To avoid this problem, we have to keep the instruction on wait until the required resource (memory
in our case) becomes available. This wait will introduce stalls in the pipeline as shown below:

Cycle 1 2 3 4 5 6 7 8
I1 IF(Mem) ID EX Mem WB
I2 IF(Mem) ID EX Mem WB
I3 IF(Mem) ID EX Mem WB
I4 – – – IF(Mem)

Solution for structural dependency


To minimize structural dependency stalls in the pipeline, we use a hardware mechanism called
Renaming.
Renaming : According to renaming, we divide the memory into two independent modules used to
store the instruction and data separately called Code memory(CM) and Data memory(DM)
respectively. CM will contain all the instructions and DM will contain all the operands that are
required for the instructions.
Instruction/ Cycle 1 2 3 4 5 6 7
I1 IF(CM) ID EX DM WB
I2 IF(CM) ID EX DM WB
I3 IF(CM) ID EX DM WB
I4 IF(CM) ID EX DM
I5 IF(CM) ID EX
I6 IF(CM) ID
I7 IF(CM)

Control Dependency (Branch Hazards)


This type of dependency occurs during the transfer of control instructions such as BRANCH,
CALL, JMP, etc. On many instruction architectures, the processor will not know the target address
of these instructions when it needs to insert the new instruction into the pipeline. Due to this,
unwanted instructions are fed to the pipeline.
Consider the following sequence of instructions in the program:
100: I1
101: I2 (JMP 250)
102: I3
.
.
250: BI1

Expected output: I1 -> I2 -> BI1


NOTE: Generally, the target address of the JMP instruction is known after ID stage only.
Instruction/ Cycle 1 2 3 4 5 6
I1 IF ID EX MEM WB
I2 IF ID (PC:250) EX Mem WB
I3 IF ID EX Mem
BI1 IF ID EX
Output Sequence: I1 -> I2 -> I3 -> BI1

So, the output sequence is not equal to the expected output, that means the pipeline is not
implemented correctly.
To correct the above problem we need to stop the Instruction fetch until we get target address of
branch instruction. This can be implemented by introducing delay slot until we get the target
address.
Instruction/ Cycle 1 2 3 4 5 6
I1 IF ID EX MEM WB
I2 IF ID (PC:250) EX Mem WB
Delay – – – – – –
BI1 IF ID EX
Output Sequence: I1 -> I2 -> Delay (Stall) -> BI1

As the delay slot performs no operation, this output sequence is equal to the expected output
sequence. But this slot introduces stall in the pipeline.
Solution for Control dependency Branch Prediction is the method through which stalls due to
control dependency can be eliminated. In this at 1st stage prediction is done about which branch
will be taken.For branch prediction Branch penalty is zero.
Branch penalty : The number of stalls introduced during the branch operations in the pipelined
processor is known as branch penalty.
NOTE : As we see that the target address is available after the ID stage, so the number of stalls
introduced in the pipeline is 1. Suppose, the branch target address would have been present after the
ALU stage, there would have been 2 stalls. Generally, if the target address is present after the kth
stage, then there will be (k – 1) stalls in the pipeline.
Total number of stalls introduced in the pipeline due to branch instructions = Branch frequency *
Branch Penalty

Data Dependency (Data Hazard)


Let us consider an ADD instruction S, such that
S : ADD R1, R2, R3
Addresses read by S = I(S) = {R2, R3}
Addresses written by S = O(S) = {R1}
Now, we say that instruction S2 depends in instruction S1, when

This condition is called Bernstein condition.


Three cases exist:

• Flow (data) dependence: O(S1) ∩ I (S2), S1 → S2 and S1 writes after something read by S2
• Anti-dependence: I(S1) ∩ O(S2), S1 → S2 and S1 reads something before S2 overwrites it
• Output dependence: O(S1) ∩ O(S2), S1 → S2 and both write the same memory location.
Example: Let there be two instructions I1 and I2 such that:
I1 : ADD R1, R2, R3
I2 : SUB R4, R1, R2
When the above instructions are executed in a pipelined processor, then data dependency condition
will occur, which means that I2 tries to read the data before I1 writes it, therefore, I2 incorrectly gets
the old value from I1.

Instruction / Cycle 1 2 3 4
I1 IF ID EX DM
I2 IF EX
To minimize data dependency stalls in the pipeline, operand forwarding is used.
Operand Forwarding : In operand forwarding, we use the interface registers present between the
stages to hold intermediate output so that dependent instruction can access new value from the
interface register directly.
Considering the same example:
I1 : ADD R1, R2, R3
I2 : SUB R4, R1, R2
Instruction / Cycle 1 2 3 4
I1 IF ID EX DM
I2 IF ID EX

Data Hazards
Data hazards occur when instructions that exhibit data dependence, modify data in different stages
of a pipeline. Hazard cause delays in the pipeline. There are mainly three types of data hazards:
1) RAW (Read after Write) [Flow/True data dependency]
2) WAR (Write after Read) [Anti-Data dependency]
3) WAW (Write after Write) [Output data dependency]
Let there be two instructions I and J, such that J follow I. Then,

• RAW hazard occurs when instruction J tries to read data before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R4 <- R2 + R3
• WAR hazard occurs when instruction J tries to write data before instruction I reads it.
Eg:
I: R2 <- R1 + R3
J: R3 <- R4 + R5
• WAW hazard occurs when instruction J tries to write output before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R2 <- R4 + R5
WAR and WAW hazards occur during the out-of-order execution of the instructions.

Computer Organization and Architecture | Pipelining | Set 3 (Types


and Stalling)
types of pipeline

• Uniform delay pipeline


In this type of pipeline, all the stages will take same time to complete an operation.
In uniform delay pipeline, Cycle Time (Tp) = Stage Delay
If buffers are included between the stages then, Cycle Time (Tp) = Stage Delay + Buffer
Delay

• Non-Uniform delay pipeline


In this type of pipeline, different stages take different time to complete an operation.
In this type of pipeline, Cycle Time (Tp) = Maximum(Stage Delay)
For example, if there are 4 stages with delays, 1 ns, 2 ns, 3 ns, and 4 ns, then
Tp = Maximum(1 ns, 2 ns, 3 ns, 4 ns) = 4 ns
If buffers are included between the stages,
Tp = Maximum(Stage delay + Buffer delay)
Example : Consider a 4 segment pipeline with stage delays (2 ns, 8 ns, 3 ns, 10 ns). Find the
time taken to execute 100 tasks in the above pipeline.
Solution : As the above pipeline is a non-linear pipeline,
Tp = max(2, 8, 3, 10) = 10 ns
We know that ETpipeline = (k + n – 1) Tp = (4 + 100 – 1) 10 ns = 1030 ns

NOTE: MIPS = Million instructions per second


Performance of pipeline with stalls
Speed Up (S) = Performancepipeline / Performancenon-pipeline
=> S = Average Execution Timenon-pipeline / Average Execution Timepipeline
=> S = CPInon-pipeline * Cycle Timenon-pipeline / CPIpipeline * Cycle Timepipeline
Ideal CPI of the pipelined processor is ‘1’. But due to stalls, it becomes greater than ‘1’.
=>
S = CPInon-pipeline * Cycle Timenon-pipeline / (1 + Number of stalls per
Instruction) * Cycle Timepipeline

As Cycle Timenon-pipeline = Cycle Timepipeline,

Speed Up (S) = CPInon-pipeline / (1 + Number of stalls per instruction)

Introduction-
• A program consists of several number of instructions.
• These instructions may be executed in the following two ways-
1. Non-Pipelined Execution
2. Pipelined Execution

Non-Pipelined Execution-
In non-pipelined architecture,
All the instructions of a program are executed sequentially one after the other.

• A new instruction executes only after the previous instruction has executed completely.
• This style of executing the instructions is highly inefficient.

Example-
Consider a program consisting of three instructions.
In a non-pipelined architecture, these instructions execute one after the other as-

If time taken
for executing
one instruction
= t, then-
Time taken for
executing ‘n’ instructions = n x t

2. Pipelined Execution-
In pipelined architecture,

• Multiple instructions are executed parallely.


• This style of executing the instructions is highly efficient.
Now, let us discuss instruction pipelining in detail.
Instruction Pipelining-
Instruction pipelining is a technique that implements a form of parallelism called as instruction
level parallelism within a single processor.

• A pipelined processor does not wait until the previous instruction has executed completely.
• Rather, it fetches the next instruction and begins its execution.

Pipelined Architecture-
In pipelined architecture,

• The hardware of the CPU is split up into several functional units.


• Each functional unit performs a dedicated task.
• The number of functional units may vary from processor to processor.
• These functional units are called as stages of the pipeline.
• Control unit manages all the stages using control signals.
• There is a register associated with each stage that holds the data.
• There is a global clock that synchronizes the working of all the stages.
• At the beginning of each clock cycle, each stage takes the input from its register.
• Each stage then processes the data and feed its output to the register of the next stage.

Four-Stage Pipeline-
Instruction fetch (IF)
1. Instruction decode (ID)
2. Instruction Execute (IE)
3. Write back (WB)
To implement four stage pipeline,

• The hardware of the CPU is divided into four functional units.


• Each functional unit performs a dedicated task.
Stage-01:
At stage-01,

• First functional unit performs instruction fetch.


• It fetches the instruction to be executed.

Stage-02:
At stage-02,

• Second functional unit performs instruction decode.


• It decodes the instruction to be executed.

Stage-03:
At stage-03,

• Third functional unit performs instruction execution.


• It executes the instruction.

Stage-04:
At stage-04,

• Fourth functional unit performs write back.


• It writes back the result so obtained after executing the instruction.

Execution-
In pipelined architecture,

• Instructions of the program execute parallely.


• When one instruction goes from nth stage to (n+1)th stage, another instruction goes from (n-
1)th stage to nth stage.

Phase-Time Diagram-

• Phase-time diagram shows the execution of instructions in the pipelined architecture.


• The following diagram shows the execution of three instructions in four stage pipeline
architecture.
Time taken to
execute three
instructions in
four stage pipelined architecture = 6 clock cycles.

NOTE-
In non-pipelined architecture,
Time taken to execute three instructions would be
= 3 x Time taken to execute one instruction
= 3 x 4 clock cycles
= 12 clock cycles
Clearly, pipelined execution of instructions is far more efficient than non-pipelined execution.

Instruction Pipelining-
n instruction pipelining,

• A form of parallelism called as instruction level parallelism is implemented.


• Multiple instructions execute simultaneously.
• The efficiency of pipelined execution is more than that of non-pipelined execution.

Performance of Pipelined Execution-


• Speed Up
• Efficiency
• Throughput

1. Speed Up-
It gives an idea of “how much faster” the pipelined execution is as compared to non-pipelined
execution.
It is calculated as-
2. Efficiency-
The efficiency of pipelined execution is calculated as-

3.

Throughput-
Throughput is defined as number of instructions executed per unit time.
It is calculated as-

Calculation of Important Parameters-


Let us learn how to calculate certain important parameters of pipelined architecture.
Consider-

• A pipelined architecture consisting of k-stage pipeline


• Total number of instructions to be executed = n

Point-01: Calculating Cycle Time-


In pipelined architecture,

• There is a global clock that synchronizes the working of all the stages.
• Frequency of the clock is set such that all the stages are synchronized.
• At the beginning of each clock cycle, each stage reads the data from its register and process
it.
• Cycle time is the value of one clock cycle.
There are two cases possible-

Case-01: All the stages offer same delay-


If all the stages offer same delay, then-
Cycle time = Delay offered by one stage including the delay due to its register
Case-02: All the stages do not offer same delay-
If all the stages do not offer same delay, then-
Cycle time = Maximum delay offered by any stage including the delay due to its register

Point-02: Calculating Frequency Of Clock-


Frequency of the clock (f) = 1 / Cycle time

Point-03: Calculating Non-Pipelined Execution Time-

In non-pipelined architecture,

• The instructions execute one after the other.


• The execution of a new instruction begins only after the previous instruction has executed
completely.
• So, number of clock cycles taken by each instruction = k clock cycles
Thus,
Non-pipelined execution time
= Total number of instructions x Time taken to execute one instruction
= n x k clock cycles

Point-04: Calculating Pipelined Execution Time-


In pipelined architecture,

• Multiple instructions execute parallely.


• Number of clock cycles taken by the first instruction = k clock cycles
• After first instruction has completely executed, one instruction comes out per clock cycle.
• So, number of clock cycles taken by each remaining instruction = 1 clock cycle
Thus,
Pipelined execution time
= Time taken to execute first instruction + Time taken to execute remaining instructions
= 1 x k clock cycles + (n-1) x 1 clock cycle
= (k + n – 1) clock cycles

Point-04: Calculating Speed Up-


Speed up
= Non-pipelined execution time / Pipelined execution time
= n x k clock cycles / (k + n – 1) clock cycles
= n x k / (k + n – 1)
= n x k / n + (k – 1)
= k / { 1 + (k – 1)/n }

• For very large number of instructions, n→∞. Thus, speed up = k.


• Practically, total number of instructions never tend to infinity.
• Therefore, speed up is always less than number of stages in pipeline.

Important Notes-
Note-01:
• The aim of pipelined architecture is to execute one complete instruction in one clock cycle.
• In other words, the aim of pipelining is to maintain CPI ≅ 1.
• Practically, it is not possible to achieve CPI ≅ 1 due to delays that get introduced due to
registers.
• Ideally, a pipelined architecture executes one complete instruction per clock cycle (CPI=1).

Note-02:
• The maximum speed up that can be achieved is always equal to the number of stages.
• This is achieved when efficiency becomes 100%.
• Practically, efficiency is always less than 100%.
• Therefore speed up is always less than number of stages in pipelined architecture.

Note-03:
Under ideal conditions,

• One complete instruction is executed per clock cycle i.e. CPI = 1.


• Speed up = Number of stages in pipelined architecture

Note-04:
• Experiments show that 5 stage pipelined processor gives the best performance.

Note-05:
In case only one instruction has to be executed, then-

• Non-pipelined execution gives better performance than pipelined execution.


• This is because delays are introduced due to registers in pipelined architecture.
• Thus, time taken to execute one instruction in non-pipelined architecture is less.
Note-06:

High efficiency of pipelined processor is achieved when-

• All the stages are of equal duration.


• There are no conditional branch instructions.
• There are no interrupts.
• There are no register and memory conflicts.

PRACTICE PROBLEMS BASED ON PIPELINING IN


COMPUTER ARCHITECTURE-
Problem-01:
Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns. Given latch delay is 10 ns.
Calculate-
1. Pipeline cycle time
2. Non-pipeline execution time
3. Speed up ratio
4. Pipeline time for 1000 tasks
5. Sequential time for 1000 tasks
6. Throughput

Solution-
Given-

• Four stage pipeline is used


• Delay of stages = 60, 50, 90 and 80 ns
• Latch delay or delay due to each register = 10 ns

Part-01: Pipeline Cycle Time-


Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 60, 50, 90, 80 } + 10 ns
= 90 ns + 10 ns
= 100 ns
Part-02: Non-Pipeline Execution Time-
Non-pipeline execution time for one instruction
= 60 ns + 50 ns + 90 ns + 80 ns
= 280 ns

Part-03: Speed Up Ratio-


Speed up
= Non-pipeline execution time / Pipeline execution time
= 280 ns / Cycle time
= 280 ns / 100 ns
= 2.8

Part-04: Pipeline Time For 1000 Tasks-

Pipeline time for 1000 tasks


= Time taken for 1st task + Time taken for remaining 999 tasks
= 1 x 4 clock cycles + 999 x 1 clock cycle
= 4 x cycle time + 999 x cycle time
= 4 x 100 ns + 999 x 100 ns
= 400 ns + 99900 ns
= 100300 ns

Part-05: Sequential Time For 1000 Tasks-


Non-pipeline time for 1000 tasks
= 1000 x Time taken for one task
= 1000 x 280 ns
= 280000 ns

Part-06: Throughput-
Throughput for pipelined execution
= Number of instructions executed per unit time
= 1000 tasks / 100300 ns

Problem-02:
A four stage pipeline has the stage delays as 150, 120, 160 and 140 ns respectively. Registers are
used between the stages and have a delay of 5 ns each. Assuming constant clocking rate, the total
time taken to process 1000 data items on the pipeline will be-
1. 120.4 microseconds
2. 160.5 microseconds
3. 165.5 microseconds
4. 590.0 microseconds

Solution-
Given-

• Four stage pipeline is used


• Delay of stages = 150, 120, 160 and 140 ns
• Delay due to each register = 5 ns
• 1000 data items or instructions are processed

Cycle Time-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 150, 120, 160, 140 } + 5 ns
= 160 ns + 5 ns
= 165 ns

Pipeline Time To Process 1000 Data Items-


Pipeline time to process 1000 data items
= Time taken for 1st data item + Time taken for remaining 999 data items
= 1 x 4 clock cycles + 999 x 1 clock cycle
= 4 x cycle time + 999 x cycle time
= 4 x 165 ns + 999 x 165 ns
= 660 ns + 164835 ns
= 165495 ns
= 165.5 μs
Thus, Option (C) is correct.

Problem-03:
Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per
instruction of 4. The same processor is upgraded to a pipelined processor with five stages but due to
the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume there are no stalls in
the pipeline. The speed up achieved in this pipelined processor is-
1. 3.2
2. 3.0
3. 2.2
4. 2.0

Solution-
Cycle Time in Non-Pipelined Processor-
Frequency of the clock = 2.5 gigahertz
Cycle time
= 1 / frequency
= 1 / (2.5 gigahertz)

= 1 / (2.5 x 109 hertz)


= 0.4 ns

Non-Pipeline Execution Time-


Non-pipeline execution time to process 1 instruction
= Number of clock cycles taken to execute one instruction
= 4 clock cycles
= 4 x 0.4 ns
= 1.6 ns

Cycle Time in Pipelined Processor-


Frequency of the clock = 2 gigahertz
Cycle time
= 1 / frequency
= 1 / (2 gigahertz)

= 1 / (2 x 109 hertz)
= 0.5 ns

Pipeline Execution Time-


Since there are no stalls in the pipeline, so ideally one instruction is executed per clock cycle. So,
Pipeline execution time
= 1 clock cycle
= 0.5 ns

Speed Up-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 1.6 ns / 0.5 ns
= 3.2
Thus, Option (A) is correct.

Problem-04:
The stage delays in a 4 stage pipeline are 800, 500, 400 and 300 picoseconds. The first stage is
replaced with a functionally equivalent design involving two stages with respective delays 600 and
350 picoseconds.
The throughput increase of the pipeline is _____%.

Solution-
Execution Time in 4 Stage Pipeline-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 800, 500, 400, 300 } + 0
= 800 picoseconds
Thus, Execution time in 4 stage pipeline = 1 clock cycle = 800 picoseconds.

Throughput in 4 Stage Pipeline-


Throughput
= Number of instructions executed per unit time
= 1 instruction / 800 picoseconds

Execution Time in 2 Stage Pipeline-


Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 600, 350 } + 0
= 600 picoseconds
Thus, Execution time in 2 stage pipeline = 1 clock cycle = 600 picoseconds.

Throughput in 2 Stage Pipeline-


Throughput
= Number of instructions executed per unit time
= 1 instruction / 600 picoseconds
Throughput Increase-
Throughput increase
= { (Final throughput – Initial throughput) / Initial throughput } x 100
= { (1 / 600 – 1 / 800) / (1 / 800) } x 100
= { (800 / 600) – 1 } x 100
= (1.33 – 1) x 100
= 0.3333 x 100
= 33.33 %

Problem-05:
A non-pipelined single cycle processor operating at 100 MHz is converted into a synchronous
pipelined processor with five stages requiring 2.5 ns, 1.5 ns, 2 ns, 1.5 ns and 2.5 ns respectively.
The delay of the latches is 0.5 sec.
The speed up of the pipeline processor for a large number of instructions is-
1. 4.5
2. 4.0
3. 3.33
4. 3.0

Solution-
Cycle Time in Non-Pipelined Processor-
Frequency of the clock = 100 MHz
Cycle time
= 1 / frequency
= 1 / (100 MHz)

= 1 / (100 x 106 hertz)


= 0.01 μs

Non-Pipeline Execution Time-


Non-pipeline execution time to process 1 instruction
= Number of clock cycles taken to execute one instruction
= 1 clock cycle
= 0.01 μs
= 10 ns
Cycle Time in Pipelined Processor-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 2.5, 1.5, 2, 1.5, 2.5 } + 0.5 ns
= 2.5 ns + 0.5 ns
= 3 ns

Pipeline Execution Time-


Pipeline execution time
= 1 clock cycle
= 3 ns

Speed Up-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 10 ns / 3 ns
= 3.33
Thus, Option (C) is correct.

Problem-06:
We have 2 designs D1 and D2 for a synchronous pipeline processor. D1 has 5 stage pipeline with
execution time of 3 ns, 2 ns, 4 ns, 2 ns and 3 ns. While the design D2 has 8 pipeline stages each
with 2 ns execution time. How much time can be saved using design D2 over design D1 for
executing 100 instructions?
1. 214 ns
2. 202 ns
3. 86 ns
4. 200 ns

Solution-
Cycle Time in Design D1-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 3, 2, 4, 2, 3 } + 0
= 4 ns
Execution Time For 100 Instructions in Design D1-
Execution time for 100 instructions
= Time taken for 1st instruction + Time taken for remaining 99 instructions
= 1 x 5 clock cycles + 99 x 1 clock cycle
= 5 x cycle time + 99 x cycle time
= 5 x 4 ns + 99 x 4 ns
= 20 ns + 396 ns
= 416 ns

Cycle Time in Design D2-


Cycle time
= Delay due to a stage + Delay due to its register
= 2 ns + 0
= 2 ns

Execution Time For 100 Instructions in Design D2-


Execution time for 100 instructions
= Time taken for 1st instruction + Time taken for remaining 99 instructions
= 1 x 8 clock cycles + 99 x 1 clock cycle
= 8 x cycle time + 99 x cycle time
= 8 x 2 ns + 99 x 2 ns
= 16 ns + 198 ns
= 214 ns

Time Saved-
Time saved
= Execution time in design D1 – Execution time in design D2
= 416 ns – 214 ns
= 202 ns
Thus, Option (B) is correct.

Problem-07:
Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational
circuit only. The pipeline registers are required between each stage and at the end of the last stage.
Delays for the stages and for the pipeline registers are as given in the figure-

What is the approximate speed up of the pipeline in steady state under ideal conditions when
compared to the corresponding non-pipeline implementation?
1. 4.0
2. 2.5
3. 1.1
4. 3.0

Solution-
Non-Pipeline Execution Time-
Non-pipeline execution time for 1 instruction
= 5 ns + 6 ns + 11 ns + 8 ns
= 30 ns

Cycle Time in Pipelined Processor-


Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 5, 6, 11, 8 } + 1 ns
= 11 ns + 1 ns
= 12 ns

Pipeline Execution Time-


Pipeline execution time
= 1 clock cycle
= 12 ns

Speed Up-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 30 ns / 12 ns
= 2.5
Thus, Option (B) is correct.

Problem-08:
Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2,
I3 and I4 in stages S1, S2, S3 and S4 is shown below-

S1 S2 S3 S4
I1 2 1 1 1
I2 1 3 2 2
I3 2 1 1 3
I4 1 2 2 2

What is the number of cycles needed to execute the following loop?


for(i=1 to 2) { I1; I2; I3; I4; }
1. 16
2. 23
3. 28
4. 30

Solution-
The phase-time diagram is-

From here, number of clock cycles required to execute the loop = 23 clock cycles.
Thus, Option (B) is correct.

Problem-09:
Consider a pipelined processor with the following four stages-
IF : Instruction Fetch
ID : Instruction Decode and Operand Fetch
EX : Execute
WB : Write Back
The IF, ID and WB stages take one clock cycle each to complete the operation. The number of
clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1
clock cycle and the MUL instruction need 3 clock cycles in the EX stage. Operand forwarding is
used in the pipelined processor. What is the number of clock cycles taken to complete the following
sequence of instructions?

ADD R2, R1, R0 R2 ← R0 + R1


MUL R4, R3, R2 R4 ← R3 + R2
SUB R6, R5, R4 R6 ← R5 + R4
1. 7
2. 8
3. 10
4. 14

Solution-
The phase-time diagram is-

From here, number of


clock cycles required to execute the instructions = 8 clock cycles.
Thus, Option (B) is correct.

Problem-10:
Consider the following procedures. Assume that the pipeline registers have zero latency.
P1 : 4 stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns
P2 : 4 stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns
P3 : 5 stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns
P4 : 5 stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns
Which procedure has the highest peak clock frequency?
1. P1
2. P2
3. P3
4. P4

Solution-
It is given that pipeline registers have zero latency. Thus,
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Maximum delay due to any stage

For Processor P1:


Cycle time
= Max { 1 ns, 2 ns, 2 ns, 1 ns }
= 2 ns
Clock frequency
= 1 / Cycle time
= 1 / 2 ns
= 0.5 gigahertz

For Processor P2:


Cycle time
= Max { 1 ns, 1.5 ns, 1.5 ns, 1.5 ns }
= 1.5 ns
Clock frequency
= 1 / Cycle time
= 1 / 1.5 ns
= 0.67 gigahertz

For Processor P3:


Cycle time
= Max { 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns }
= 1 ns
Clock frequency
= 1 / Cycle time
= 1 / 1 ns
= 1 gigahertz

For Processor P4:


Cycle time
= Max { 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns }
= 1.1 ns
Clock frequency
= 1 / Cycle time
= 1 / 1.1 ns
= 0.91 gigahertz
Clearly, Process P3 has the highest peak clock frequency.
Thus, Option (C) is correct.

Problem-11:
Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies T1, T2 and
T3 such that T1 = 3T2/4 = 2T3. If the longest pipeline stage is split into two pipeline stages of equal
latency, the new frequency is ____ GHz, ignoring delays in the pipeline registers.

Solution-
Let ‘t’ be the common multiple of each ratio, then-

• T1 = t
• T2 = 4t / 3
• T3 = t / 2

Pipeline Cycle Time-


Pipeline cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { t, 4t/3, t/2 } + 0
= 4t/3

Frequency Of Pipeline-
Frequency
= 1 / Pipeline cycle time
= 1 / (4t / 3)
= 3 / 4t
Given frequency = 3 GHz. So,
3 / 4t = 3 GHz
4t = 1 ns
t = 0.25 ns

Stage Latencies Of Pipeline-


Stage latency of different stages are-

• Latency of stage-01 = 0.25 ns


• Latency of stage-02 = 0.33 ns
• Latency of stage-03 = 0.125 ns

Splitting The Pipeline-


The stage with longest latency i.e. stage-02 is split up into 4 stages.
After splitting, the latency of different stages are-

• Latency of stage-01 = 0.25 ns


• Latency of stage-02 = 0.165 ns
• Latency of stage-03 = 0.165 ns
• Latency of stage-04 = 0.125 ns

Splitted Pipeline Cycle Time-


Splitted pipeline cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 0.25, 0.165, 0.165, 0.125 } + 0
= 0.25 ns

Frequency Of Splitted Pipeline-


= 1 / splitted pipeline cycle time
= 1 / 0.25 ns
= 4 GHz
Thus, new frequency = 4 GHz.

You might also like