Unit-3 Coa
Unit-3 Coa
Computer Instructions
Computer instructions are a set of machine language instructions that a particular processor
understands and executes. A computer performs tasks on the basis of the instruction provided.
o The Operation code (Opcode) field which specifies the operation to be performed.
o The Address field which contains the location of the operand, i.e., register or memory
location.
o The Mode field which specifies how the operand will be located.
In Memory-reference instruction, 12 bits of memory is used to specify an address and one bit to
specify the addressing mode 'I'.
The Register-reference instructions are represented by the Opcode 111 with a 0 in the leftmost bit
(bit 15) of the instruction.
Note: The Operation code (Opcode) of an instruction refers to a group of bits that define arithmetic
and logic operations such as add, subtract, multiply, shift, and compliment.
A Register-reference instruction specifies an operation on or a test of the AC (Accumulator)
register.
Input-Output instruction
Just like the Register-reference instruction, an Input-Output instruction does not need a reference to
memory and is recognized by the operation code 111 with a 1 in the leftmost bit of the instruction.
These instructions are for communication between computer and outside environment. The IR(14 –
12) is 111 (differentiates it from memory reference) and IR(15) is 1 (differentiates it from register
reference instructions). The rest 12 bits specify I/O operation. The remaining 12 bits are used to
specify the type of the input-output operation or test performed.
Note
o The three operation code bits in positions 12 through 14 should be equal to 111. Otherwise,
the instruction is a memory-reference type, and the bit in position 15 is taken as the
addressing mode I.
o When the three operation code bits are equal to 111, control unit inspects the bit in position
15. If the bit is 0, the instruction is a register-reference type. Otherwise, the instruction is an
input-output type having bit 1 at position 15.
Arithmetic, logic and shift instructions provide computational capabilities for processing the type of
data the user may wish to employ.
A huge amount of binary information is stored in the memory unit, but all computations are done in
processor registers. Therefore, one must possess the capability of moving information between
these two units.
Program control instructions such as branch instructions are used change the sequence in which the
program is executed.
Input and Output instructions act as an interface between the computer and the user. Programs and
data must be transferred into memory, and the results of computations must be transferred back to
the user.
Types of instructions
Depending on operation they perform, all instructions are divided in several groups:
• Arithmetic Instructions
• Branch Instructions
• Data Transfer Instructions
• Logic Instructions
• Bit-oriented Instructions
Arithmetic instructions
Arithmetic instructions perform several basic operations such as addition, subtraction, division,
multiplication etc. After execution, the result is stored in the first operand. For example: ADD A,R1
- The result of addition (A+R1) will be stored in the accumulator.
Branch Instructions
There are two kinds of branch instructions: Unconditional jump instructions: upon their execution a
jump to a new location from where the program continues execution is executed. Conditional jump
instructions: a jump to a new program location is executed only if a specified condition is met.
Otherwise, the program normally proceeds with the next instruction.
Data Transfer Instructions
Data transfer instructions move the content of one register to another. The register the content of
which is moved remains unchanged. If they have the suffix “X” (MOVX), the data is exchanged
with external memory.
Logic Instructions
Logic instructions perform logic operations upon corresponding bits of two registers. After
execution, the result is stored in the first operand.
Bit-oriented Instructions
Similar to logic instructions, bit-oriented instructions perform logic operations. The difference is
that these are performed upon single bits.
The 8086 microprocessor supports 8 types of instructions −
• IDIV − Used to divide the signed word by byte or signed double word by word.
• AND − Used for adding each bit in a byte/word with the corresponding bit in another
byte/word.
• OR − Used to multiply each bit in a byte/word with the corresponding bit in another
byte/word.
• XOR − Used to perform Exclusive-OR operation over each bit in a byte/word with the
corresponding bit in another byte/word.
Instructions to perform shift operations
• SHL/SAL − Used to shift bits of a byte/word towards left and put zero(S) in LSBs.
• SHR − Used to shift bits of a byte/word towards the right and put zero(S) in MSBs.
Instructions to perform rotate operations
• ROL − Used to rotate bits of byte/word towards the left, i.e. MSB to LSB and to Carry Flag
[CF].
• ROR − Used to rotate bits of byte/word towards the right, i.e. LSB to MSB and to Carry
Flag [CF].
• RCR − Used to rotate bits of byte/word towards the right, i.e. LSB to CF and CF to MSB.
• RCL − Used to rotate bits of byte/word towards the left, i.e. MSB to CF and CF to LSB.
Program Execution Transfer Instructions (Branch and Loop Instructions
nstructions to transfer the instruction during an execution without any condition −
• CALL − Used to call a procedure and save their return address to the stack.
• JMP − Used to jump to the provided address to proceed to the next instruction.
• STI − Used to set the interrupt enable flag to 1, i.e., enable INTR input.
• CLI − Used to clear the interrupt enable flag to 0, i.e., disable INTR input.
• LOOP − Used to loop a group of instructions until the condition satisfies, i.e., CX = 0
Interrupt Instructions
These instructions are used to call the interrupt during program execution.
• INT − Used to interrupt the program during execution and calling service specified.
Answer:
Computer Instructions
Computer instructions are a set of machine language instructions that a particular processor
understands and executes. A computer performs tasks on the basis of the instruction provided.
In Memory-reference instruction, 12 bits of memory is used to specify an address and one bit to
specify the addressing mode 'I'.
The Register-reference instructions are represented by the Opcode 111 with a 0 in the leftmost bit
(bit 15) of the instruction.
Note: The Operation code (Opcode) of an instruction refers to a group of bits that define arithmetic
and logic operations such as add, subtract, multiply, shift, and compliment.
Input-Output instruction
Just like the Register-reference instruction, an Input-Output instruction does not need a reference to
memory and is recognized by the operation code 111 with a 1 in the leftmost bit of the instruction.
The remaining 12 bits are used to specify the type of the input-output operation or test performed.
Note
o The three operation code bits in positions 12 through 14 should be equal to 111. Otherwise,
the instruction is a memory-reference type, and the bit in position 15 is taken as the
addressing mode I.
o When the three operation code bits are equal to 111, control unit inspects the bit in position
15. If the bit is 0, the instruction is a register-reference type. Otherwise, the instruction is an
input-output type having bit 1 at position 15.
Arithmetic, logic and shift instructions provide computational capabilities for processing the type of
data the user may wish to employ.
A huge amount of binary information is stored in the memory unit, but all computations are done in
processor registers. Therefore, one must possess the capability of moving information between
these two units.
Program control instructions such as branch instructions are used change the sequence in which the
program is executed.
Input and Output instructions act as an interface between the computer and the user. Programs and
data must be transferred into memory, and the results of computations must be transferred back to
the user.
Instruction Cycle
A program residing in the memory unit of a computer consists of a sequence of instructions. These
instructions are executed by the processor by going through a cycle for each instruction.
PUSH A TOP = A
PUSH B TOP = B
ADD TOP = A+B
PUSH C TOP = C
PUSH D TOP = D
ADD TOP = C+D
MUL TOP = (C+D)*(A+B)
POP X M[X] = TOP
• One Address Instructions –
This use a implied ACCUMULATOR register for data manipulation. One operand is in
accumulator and other is in register or memory location. Implied means that the CPU already
know that one operand is in accumulator so there is no need to specify it.
Expression: X = (A+B)*(C+D)
AC is accumulator
M[] is any memory location
M[T] is temporary location
LOAD A AC = M[A]
ADD B AC = AC + M[B]
STORE T M[T] = AC
LOAD C AC = M[C]
ADD D AC = AC + M[D]
MUL T AC = AC * M[T]
STORE X M[X] = AC
• Two Address Instructions –
This is common in commercial computers.Here two address can be specified in the
instruction.Unlike earlier in one address instruction the result was stored in accumulator
here result cab be stored at different location rather than just accumulator, but require more
number of bit to represent address.
• Memory address registers(MAR) : It is connected to the address lines of the system bus. It
specifies the address in memory for a read or write operation.
• Memory Buffer Register(MBR) : It is connected to the data lines of the system bus. It
contains the value to be stored in memory or the last value read from the memory.
• Program Counter(PC) : Holds the address of the next instruction to be fetched.
• Instruction Register(IR) : Holds the last instruction fetched.
The Instruction Cycle –
Each phase of Instruction Cycle can be decomposed into a sequence of elementary micro-operations.
In the above examples, there is one sequence each for the Fetch, Indirect, Execute and Interrupt
Cycles.
The Indirect Cycle is always followed by the Execute Cycle. The Interrupt Cycle is always followed
by the Fetch Cycle. For both fetch and execute cycles, the next cycle depends on the state of the
system.
00 : Fetch Cycle
01 : Indirect Cycle
10 : Execute Cycle
11 : Interrupt Cycle
Different Instruction Cycles:
1. The Fetch Cycle –
At the beginning of the fetch cycle, the address of the next instruction to be executed is in
the Program Counter(PC).
Step 1: The address in the program counter is moved to the memory address register(MAR),
as this is the only register which is connected to address lines of the system bus.
TheStep 2:
address in
MAR is placed
on the address
bus, now the
control unit
issues a READ
command on the control bus, and the result appears on the data bus and is then copied
into the memory buffer register (MBR). Program counter is incremented by one, to get
ready for the next instruction. (These two action can be performed simultaneously to
save time)
Step 3: The content of the MBR is moved to the instruction register(IR).
Thus, a simple
Fetch Cycle consist of three steps and four micro-operation. Symbolically, we can write these
sequence of events as follows:-
• Here ‘I’ is the instruction length. The notations (t1, t2, t3) represents successive time units.
We assume that a clock is available for timing purposes and it emits regularly spaced clock
pulses. Each clock pulse defines a time unit. Thus, all time units are of equal duration. Each
micro-operation can be performed within the time of a single time unit.
First time unit: Move the contents of the PC to MAR.
Second time unit: Move contents of memory location specified by MAR to MBR. Increment
content of PC by I.
Third time unit: Move contents of MBR to IR.
Note: Second and third micro-operations both take place during the second time unit.
• Step 1: The address field of the instruction is transferred to the MAR. This is used to fetch
the address of the operand.
Step 2: The address field of the IR is updated from the MBR.(So that it now contains a
direct addressing rather than indirect addressing)
Step 3: The IR is now in the state, as if indirect addressing has not been occurred.
Note: Now IR is ready for the execute cycle, but it skips that cycle for a moment to consider the
Interrupt Cycle .
Here, this instruction adds the content of location X to register R. Corresponding micro-operation
will be:-
Here, the content of location X is incremented by 1. If the result is 0, the next instruction will be
skipped. Corresponding sequence of micro-operation will be :-
• Here, the PC is incremented if (MBR) = 0. This test (is MBR equal to zero or not) and
action (PC is incremented by 1) can be implemented as one micro-operation.
Note : This test and action micro-operation can be performed during the same time unit
during which the updated value MBR is stored back to memory.
• Fixed logic circuits that correspond directly to the Boolean expressions are used to generate
the control signals.
• Hardwired control is faster than micro-programmed control.
• A controller that uses this approach can operate at high speed.
• RISC architecture is based on hardwired control unit
• The control signals associated with operations are stored in special memory units
inaccessible by the programmer as Control Words.
• Control signals are generated by a program are similar to machine language programs.
• Micro-programmed control unit is slower in speed because of the time it takes to fetch
microinstructions from the control memory.
Some Important Terms –
1. Control Word : A control word is a word whose individual bits represent various control
signals.
2. Micro-routine : A sequence of control words corresponding to the control sequence of a
machine instruction constitutes the micro-routine for that instruction.
3. Micro-instruction : Individual control words in this micro-routine are referred to as
microinstructions.
4. Micro-program : A sequence of micro-instructions is called a micro-program, which is
stored in a ROM or RAM called a Control Memory (CM).
5. Control Store : the micro-routines for all instructions in the instruction set of a computer
are stored in a special memory called the Control Store.
Types of Micro-programmed Control Unit – Based on the type of Control Word stored in the
Control Memory (CM), it is classified into two types :
1. Horizontal Micro-programmed control Unit :
The control signals are represented in the decoded binary format that is 1 bit/CS. Example: If 53
Control signals are present in the processor than 53 bits are required. More than 1 control signal can
be enabled at a time.
• In Horizontal micro-programmed control unit, the control signals are represented in the
decoded binary format, i.e., 1 bit/CS. Here ‘n’ control signals require n bit encoding. On the
other hand.
• In Vertical micro-programmed control unit, the control signals are represented in the
encoded binary format. Here ‘n’ control signals require log2n bit encoding.
a)For Vertical
6 bits for 64 signals i.e log264
• Branching :
Branching is achieved by specifying the branch address in one of the fields of the micro
instruction. Conditional branching is obtained by using part of the micro-instruction to select a
specific status bit in order to determine its condition.
• Mapping Logic :
An external address is transferred into control memory via a mapping logic circuit.
• Incrementer :
Incrementer increments the content of the control address register by one, to select the next
micro-instruction in sequence.
• Control Memory :
Control memory is a type of memory which contains addressable storage registers. Data is
temporarily stored in control memory. Control memory can be accessed quicker than main
memory.
• RISC: Reduce the cycles per instruction at the cost of the number of instructions per
program.
• CISC: The CISC approach attempts to minimize the number of instructions per program but
at the cost of increase in number of cycles per instruction.
Earlier when programming was done using assembly language, a need was felt to make instruction
do more task because programming in assembly was tedious and error prone due to which CISC
architecture evolved but with uprise of high level language dependency on assembly reduced RISC
architecture prevailed.
Characteristic of RISC –
1. Simpler instruction, hence simple instruction decoding.
2. Instruction come under size of one word.
3. Instruction take single clock cycle to get executed.
4. More number of general purpose register.
5. Simple Addressing Modes.
6. Less Data types.
7. Pipeling can be achieved.
Characteristic of CISC –
1. Complex instruction, hence complex instruction decoding.
2. Instruction are larger than one word size.
3. Instruction may take more than single clock cycle to get executed.
4. Less number of general purpose register as operation get performed in memory itself.
5. Complex Addressing Modes.
6. More Data types.
Example – Suppose we have to add two 8-bit number:
• CISC approach: There will be a single command or instruction for this like ADD which
will perform the task.
• RISC approach: Here programmer will write first load command to load data in registers
then it will use suitable operator and then it will store result in desired location.
So, add operation is divided into parts i.e. load, operate, store due to which RISC programs are
longer and require more memory to get stored but require less transistors due to less complex
command.
Difference –
RISC CISC
Focus on software Focus on hardware
Uses both hardwired and
Uses only Hardwired control unit micro programmed control
unit
Transistors are used for
Transistors are used for more registers storing complex
Instructions
Fixed sized instructions Variable sized instructions
Can perform REG to REG
Can perform only Register to Register Arthmetic operations or REG to MEM or MEM
to MEM
Requires less number of
Requires more number of registers
registers
Code size is large Code size is small
Instruction take more than
A instruction execute in single clock cycle
one clock cycle
Instruction are larger than
A instruction fit in one word
size of one word
CISC Architecture
The CISC approach attempts to minimize the number of instructions per program, sacrificing the
number of cycles per instruction. Computers based on the CISC architecture are designed to decrease
the memory cost. Because, the large programs need more storage, thus increasing the memory cost
and large memory becomes more expensive. To solve these problems, the number of instructions per
program can be reduced by embedding the number of operations in a single instruction, thereby
making the instructions more complex.
Examples of CISC PROCESSORS
IBM 370/168 – It was introduced in the year 1970. CISC design is a 32 bit processor and four 64-
bit floating point registers.
VAX 11/780 – CISC design is a 32-bit processor and it supports many numbers of addressing
modes and machine instructions which is from Digital Equipment Corporation.
Intel 80486 – It was launched in the year 1989 and it is a CISC processor, which has instructions
varying lengths from 1 to 11 and it will have 235 instructions.
CHARACTERISTICS OF CISC ARCHITECTURE
• Instruction-decoding logic will be Complex.
• One instruction is required to support multiple addressing modes.
• Less chip space is enough for general purpose registers for the instructions that are 0operated
directly on memory.
• Various CISC designs are set up two special registers for the stack pointer, handling
interrupts, etc.
• MUL is referred to as a “complex instruction” and requires the programmer for storing
functions.
RISC Architecture
RISC (Reduced Instruction Set Computer) is used in portable devices due to its power efficiency. For
Example, Apple iPod and Nintendo DS. RISC is a type of microprocessor architecture that uses
highly-optimized set of instructions. RISC does the opposite, reducing the cycles per instruction at
the cost of the number of instructions per program Pipelining is one of the unique feature of RISC. It
is performed by overlapping the execution of several instructions in a pipeline fashion. It has a high
performance advantage over CISC.
RISC ARCHITECTURE CHARACTERISTICS
• Simple Instructions are used in RISC architecture.
• RISC helps and supports few simple data types and synthesize complex data types.
• RISC utilizes simple addressing modes and fixed length instructions for pipelining.
• RISC permits any register to use in any context.
• One Cycle Execution Time
• The amount of work that a computer can perform is reduced by separating “LOAD” and
“STORE” instructions.
• RISC contains Large Number of Registers in order to prevent various number of interactions
with memory.
• In RISC, Pipelining is easy as the execution of all instructions will be done in a uniform
interval of time i.e. one click.
• In RISC, more RAM is required to store assembly level instructions.
• Reduced instructions need a less number of transistors in RISC.
• RISC uses Harvard memory model means it is Harvard Architecture.
• A compiler is used to perform the conversion operation means to convert a high-level
language statement into the code of its form.
The Advantages and
Disadvantages of
RISC and CISC
The Advantages of RISC architecture
• Mostly, the performance of the RISC processors depends on the programmer or compiler as
the knowledge of the compiler plays a vital role while changing the CISC code to a RISC code
• While rearranging the CISC code to a RISC code, termed as a code expansion, will increase
the size. And, the quality of this code expansion will again depend on the compiler, and also
on the machine’s instruction set.
• The first level cache of the RISC processors is also a disadvantage of the RISC, in which these
processors have large memory caches on the chip itself. For feeding the instructions, they
require very fast memory systems.
Advantages of CISC architecture
• Microprogramming is easy assembly language to implement, and less expensive than hard
wiring a control unit.
• The ease of microcoding new instructions allowed designers to make CISC machines
upwardly compatible:
• As each instruction became more accomplished, fewer instructions could be used to
implement a given task.
Disadvantages of CISC architecture
• The performance of the machine slows down due to the amount of clock time taken by
different instructions will be dissimilar
• Only 20% of the existing instructions is used in a typical programming event, even though
there are various specialized instructions in reality which are not even used frequently.
• The conditional codes are set by the CISC instructions as a side effect of each instruction
which takes time for this setting – and, as the subsequent instruction changes the condition
code bits – so, the compiler has to examine the condition code bits before this happens.
The instruction register is a processors register that has the ‘instruction’ which is currently in
execution. The instruction register generates the OP-code bits respective of the operation and the
addressing modes of the operands, mentioned in the instruction.
• Instruction decoder receives the Op-code bits generated by the instruction register and
interprets the operation and addressing modes of the instruction. Now, based on operation
and addressing mode of the instruction in instruction register it set the corresponding
Instruction signal INSi to 1.
Each instruction is executed in step-like, instruction fetch, decode, operand fetch, ALU,
memory store. These steps may vary in different books. But in general, five steps are
enough to for the execution of an instruction.
• Now, the control unit must be aware of the current step, the instruction is in. For this, a Step
Counter is implemented which has signals from T1, …, T5. The step counter sets one of the
signals T1 to T5 to 1 on the basis of the step, the instruction is in.
• Here, the question arises how step counter knows the current step of the instruction? For
this, a Clock is implemented. This clock is designed such that for each step the clock must
complete its one clock cycle.
So, consider if the step counter has set T3 signal 1 then after a clock cycle completes step
counter will set T4 to 1.
• What if the execution of instruction has is interrupted due to some reason? Will the clock
still continue to trigger step counter? The answer is No. The Counter Enable ‘disables’ the
step counter to increment to the next step signal, till the execution of the current step is
completed.
• Now, suppose the execution of an instruction depends on some condition or if it is branch
instruction. This is determined with the help of the Condition signals. The Condition signals
generate the signals for the conditions greater than, less than, equal, greater than equal, less
than equal etc.
• The remaining is External inputs, it acknowledges the control signal generator of
interrupts which affects the execution of the instruction.
On an, all the Control Signal Generator generates the control signals, based on the inputs obtained
by the Instruction register, Step counter, Condition signals and External inputs.
Similarities
Both Hardwired and Micro programmed control unit ‘generates’ the control signals.
So this is all about the differences between hardwired and micro programmed control unit.
Explain Microinstruction sequencing and execution.
Microinstruction Sequencing:
A micro-program control unit can be viewed as consisting of two parts:
1. The control memory that stores the microinstructions.
2. Sequencing circuit that controls the generation of the next address.
A micro-program sequencer attached to a control memory inputs certain bits of the
microinstruction, from which it determines the next address for control memory. A typical
sequencer provides the following address-sequencing capabilities:
1. Increment the present address for control memory.
2. Branches to an address as specified by the address field of the micro instruction.
3. Branches to a given address if a specified status bit is equal to 1.
4. Transfer control to a new address as specified by an external source (Instruction Register).
5. Has a facility for subroutine calls and returns.
Depending on the current microinstruction condition flags, and the contents of the instruction
register, a control memory address must be generated for the next micro instruction.
There are three general techniques based on the format of the address information in the
microinstruction:
1. Two Address Field.
2. Single Address Field.
3. Variable Format
Two Address Field:
The simplest
approach is to
provide two address fields in each microinstruction and multiplexer is provided to select:
• Address Field.
• Based on OPcode in instruction register.
• Next Sequential Address.
In computer central processing units, micro-operations (also known as micro-ops) are the
functional or atomic, operations of a processor. These are low level instructions used in some
designs to implement complex machine instructions. They generally perform operations on data
stored in one or more registers. They transfer data between registers or between external buses of
the CPU, also performs arithmetic and logical operations on registers.
In executing a program, operation of a computer consists of a sequence of instruction cycles, with
one machine instruction per cycle. Each instruction cycle is made up of a number of smaller units –
Fetch, Indirect, Execute and Interrupt cycles. Each of these cycles involves series of steps, each of
which involves the processor registers. These steps are referred as micro-operations. the prefix
micro refers to the fact that each of the step is very simple and accomplishes very little. Figure
below depicts the concept being discussed here.
Summary: Execution of a program consists of sequential execution of instructions. Each
instruction is executed during an instruction cycle made up of shorter sub-cycles(example – fetch,
indirect, execute, interrupt). The performance of each sub-cycle involves one or more shorter
operations, that is, micro-operations.
• In a pipelined processor, a pipeline has two ends, the input end and the output end. Between
these ends, there are multiple stages/segments such that output of one stage is connected to
input of next stage and each stage performs a specific operation.
• Interface registers are used to hold the intermediate output between two stages. These
interface registers are also called latch or buffer.
• All the stages in the pipeline along with the interface registers are controlled by a common
clock.
Execution in a pipelined processor
Execution sequence of instructions in a pipelined processor can be visualized using a space-time
diagram. For example, consider a processor having 4 stages and let there be 2 instructions to be
executed. We can visualize the execution sequence through the following space-time diagrams:
Non overlapped execution:
Stage / Cycle 1 2 3 4 5 6 7 8
S1 I1 I2
S2 I1 I2
S3 I1 I2
S4 I1 I2
Total time = 8 Cycle
Overlapped execution:
Stage / Cycle 1 2 3 4 5
S1 I1 I2
S2 I1 I2
S3 I1 I2
S4 I1 I2
Total time = 5 Cycle
Pipeline Stages
RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC
instruction set. Following are the 5 stages of RISC pipeline with their respective operations:
In the same case, for a non-pipelined processor, execution time of ‘n’ instructions will be:
ETnon-pipeline = n * k * Tp
So, speedup (S) of the pipelined processor over non-pipelined processor, when ‘n’ tasks are
executed on the same processor is:
S = Performance of pipelined processor /
Performance of Non-pipelined processor
As the performance of a processor is inversely proportional to the execution time, we have,
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n – 1]
When the number of tasks ‘n’ are significantly larger than k, that is, n >> k
S = n * k / n
S = k
Structural dependency
This dependency arises due to the resource conflict in the pipeline. A resource conflict is a situation
when more than one instruction tries to access the same resource in the same cycle. A resource can
be a register, memory, or ALU.
Example:
Instruction / Cycle 1 2 3 4 5
I1 IF(Mem) ID EX
I2 IF(Mem) ID EX
I3 IF(Mem) ID EX
I4 ID
In the above scenario, in cycle 4, instructions I1 and I4 are trying to access same resource (Memory)
which introduces a resource conflict.
To avoid this problem, we have to keep the instruction on wait until the required resource (memory
in our case) becomes available. This wait will introduce stalls in the pipeline as shown below:
Cycle 1 2 3 4 5 6 7 8
I1 IF(Mem) ID EX Mem WB
I2 IF(Mem) ID EX Mem WB
I3 IF(Mem) ID EX Mem WB
I4 – – – IF(Mem)
So, the output sequence is not equal to the expected output, that means the pipeline is not
implemented correctly.
To correct the above problem we need to stop the Instruction fetch until we get target address of
branch instruction. This can be implemented by introducing delay slot until we get the target
address.
Instruction/ Cycle 1 2 3 4 5 6
I1 IF ID EX MEM WB
I2 IF ID (PC:250) EX Mem WB
Delay – – – – – –
BI1 IF ID EX
Output Sequence: I1 -> I2 -> Delay (Stall) -> BI1
As the delay slot performs no operation, this output sequence is equal to the expected output
sequence. But this slot introduces stall in the pipeline.
Solution for Control dependency Branch Prediction is the method through which stalls due to
control dependency can be eliminated. In this at 1st stage prediction is done about which branch
will be taken.For branch prediction Branch penalty is zero.
Branch penalty : The number of stalls introduced during the branch operations in the pipelined
processor is known as branch penalty.
NOTE : As we see that the target address is available after the ID stage, so the number of stalls
introduced in the pipeline is 1. Suppose, the branch target address would have been present after the
ALU stage, there would have been 2 stalls. Generally, if the target address is present after the kth
stage, then there will be (k – 1) stalls in the pipeline.
Total number of stalls introduced in the pipeline due to branch instructions = Branch frequency *
Branch Penalty
• Flow (data) dependence: O(S1) ∩ I (S2), S1 → S2 and S1 writes after something read by S2
• Anti-dependence: I(S1) ∩ O(S2), S1 → S2 and S1 reads something before S2 overwrites it
• Output dependence: O(S1) ∩ O(S2), S1 → S2 and both write the same memory location.
Example: Let there be two instructions I1 and I2 such that:
I1 : ADD R1, R2, R3
I2 : SUB R4, R1, R2
When the above instructions are executed in a pipelined processor, then data dependency condition
will occur, which means that I2 tries to read the data before I1 writes it, therefore, I2 incorrectly gets
the old value from I1.
Instruction / Cycle 1 2 3 4
I1 IF ID EX DM
I2 IF EX
To minimize data dependency stalls in the pipeline, operand forwarding is used.
Operand Forwarding : In operand forwarding, we use the interface registers present between the
stages to hold intermediate output so that dependent instruction can access new value from the
interface register directly.
Considering the same example:
I1 : ADD R1, R2, R3
I2 : SUB R4, R1, R2
Instruction / Cycle 1 2 3 4
I1 IF ID EX DM
I2 IF ID EX
Data Hazards
Data hazards occur when instructions that exhibit data dependence, modify data in different stages
of a pipeline. Hazard cause delays in the pipeline. There are mainly three types of data hazards:
1) RAW (Read after Write) [Flow/True data dependency]
2) WAR (Write after Read) [Anti-Data dependency]
3) WAW (Write after Write) [Output data dependency]
Let there be two instructions I and J, such that J follow I. Then,
• RAW hazard occurs when instruction J tries to read data before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R4 <- R2 + R3
• WAR hazard occurs when instruction J tries to write data before instruction I reads it.
Eg:
I: R2 <- R1 + R3
J: R3 <- R4 + R5
• WAW hazard occurs when instruction J tries to write output before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R2 <- R4 + R5
WAR and WAW hazards occur during the out-of-order execution of the instructions.
Introduction-
• A program consists of several number of instructions.
• These instructions may be executed in the following two ways-
1. Non-Pipelined Execution
2. Pipelined Execution
Non-Pipelined Execution-
In non-pipelined architecture,
All the instructions of a program are executed sequentially one after the other.
• A new instruction executes only after the previous instruction has executed completely.
• This style of executing the instructions is highly inefficient.
Example-
Consider a program consisting of three instructions.
In a non-pipelined architecture, these instructions execute one after the other as-
If time taken
for executing
one instruction
= t, then-
Time taken for
executing ‘n’ instructions = n x t
2. Pipelined Execution-
In pipelined architecture,
• A pipelined processor does not wait until the previous instruction has executed completely.
• Rather, it fetches the next instruction and begins its execution.
Pipelined Architecture-
In pipelined architecture,
Four-Stage Pipeline-
Instruction fetch (IF)
1. Instruction decode (ID)
2. Instruction Execute (IE)
3. Write back (WB)
To implement four stage pipeline,
Stage-02:
At stage-02,
Stage-03:
At stage-03,
Stage-04:
At stage-04,
Execution-
In pipelined architecture,
Phase-Time Diagram-
NOTE-
In non-pipelined architecture,
Time taken to execute three instructions would be
= 3 x Time taken to execute one instruction
= 3 x 4 clock cycles
= 12 clock cycles
Clearly, pipelined execution of instructions is far more efficient than non-pipelined execution.
Instruction Pipelining-
n instruction pipelining,
1. Speed Up-
It gives an idea of “how much faster” the pipelined execution is as compared to non-pipelined
execution.
It is calculated as-
2. Efficiency-
The efficiency of pipelined execution is calculated as-
3.
Throughput-
Throughput is defined as number of instructions executed per unit time.
It is calculated as-
• There is a global clock that synchronizes the working of all the stages.
• Frequency of the clock is set such that all the stages are synchronized.
• At the beginning of each clock cycle, each stage reads the data from its register and process
it.
• Cycle time is the value of one clock cycle.
There are two cases possible-
In non-pipelined architecture,
Important Notes-
Note-01:
• The aim of pipelined architecture is to execute one complete instruction in one clock cycle.
• In other words, the aim of pipelining is to maintain CPI ≅ 1.
• Practically, it is not possible to achieve CPI ≅ 1 due to delays that get introduced due to
registers.
• Ideally, a pipelined architecture executes one complete instruction per clock cycle (CPI=1).
Note-02:
• The maximum speed up that can be achieved is always equal to the number of stages.
• This is achieved when efficiency becomes 100%.
• Practically, efficiency is always less than 100%.
• Therefore speed up is always less than number of stages in pipelined architecture.
Note-03:
Under ideal conditions,
Note-04:
• Experiments show that 5 stage pipelined processor gives the best performance.
Note-05:
In case only one instruction has to be executed, then-
Solution-
Given-
Part-06: Throughput-
Throughput for pipelined execution
= Number of instructions executed per unit time
= 1000 tasks / 100300 ns
Problem-02:
A four stage pipeline has the stage delays as 150, 120, 160 and 140 ns respectively. Registers are
used between the stages and have a delay of 5 ns each. Assuming constant clocking rate, the total
time taken to process 1000 data items on the pipeline will be-
1. 120.4 microseconds
2. 160.5 microseconds
3. 165.5 microseconds
4. 590.0 microseconds
Solution-
Given-
Cycle Time-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 150, 120, 160, 140 } + 5 ns
= 160 ns + 5 ns
= 165 ns
Problem-03:
Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per
instruction of 4. The same processor is upgraded to a pipelined processor with five stages but due to
the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume there are no stalls in
the pipeline. The speed up achieved in this pipelined processor is-
1. 3.2
2. 3.0
3. 2.2
4. 2.0
Solution-
Cycle Time in Non-Pipelined Processor-
Frequency of the clock = 2.5 gigahertz
Cycle time
= 1 / frequency
= 1 / (2.5 gigahertz)
= 1 / (2 x 109 hertz)
= 0.5 ns
Speed Up-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 1.6 ns / 0.5 ns
= 3.2
Thus, Option (A) is correct.
Problem-04:
The stage delays in a 4 stage pipeline are 800, 500, 400 and 300 picoseconds. The first stage is
replaced with a functionally equivalent design involving two stages with respective delays 600 and
350 picoseconds.
The throughput increase of the pipeline is _____%.
Solution-
Execution Time in 4 Stage Pipeline-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 800, 500, 400, 300 } + 0
= 800 picoseconds
Thus, Execution time in 4 stage pipeline = 1 clock cycle = 800 picoseconds.
Problem-05:
A non-pipelined single cycle processor operating at 100 MHz is converted into a synchronous
pipelined processor with five stages requiring 2.5 ns, 1.5 ns, 2 ns, 1.5 ns and 2.5 ns respectively.
The delay of the latches is 0.5 sec.
The speed up of the pipeline processor for a large number of instructions is-
1. 4.5
2. 4.0
3. 3.33
4. 3.0
Solution-
Cycle Time in Non-Pipelined Processor-
Frequency of the clock = 100 MHz
Cycle time
= 1 / frequency
= 1 / (100 MHz)
Speed Up-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 10 ns / 3 ns
= 3.33
Thus, Option (C) is correct.
Problem-06:
We have 2 designs D1 and D2 for a synchronous pipeline processor. D1 has 5 stage pipeline with
execution time of 3 ns, 2 ns, 4 ns, 2 ns and 3 ns. While the design D2 has 8 pipeline stages each
with 2 ns execution time. How much time can be saved using design D2 over design D1 for
executing 100 instructions?
1. 214 ns
2. 202 ns
3. 86 ns
4. 200 ns
Solution-
Cycle Time in Design D1-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 3, 2, 4, 2, 3 } + 0
= 4 ns
Execution Time For 100 Instructions in Design D1-
Execution time for 100 instructions
= Time taken for 1st instruction + Time taken for remaining 99 instructions
= 1 x 5 clock cycles + 99 x 1 clock cycle
= 5 x cycle time + 99 x cycle time
= 5 x 4 ns + 99 x 4 ns
= 20 ns + 396 ns
= 416 ns
Time Saved-
Time saved
= Execution time in design D1 – Execution time in design D2
= 416 ns – 214 ns
= 202 ns
Thus, Option (B) is correct.
Problem-07:
Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational
circuit only. The pipeline registers are required between each stage and at the end of the last stage.
Delays for the stages and for the pipeline registers are as given in the figure-
What is the approximate speed up of the pipeline in steady state under ideal conditions when
compared to the corresponding non-pipeline implementation?
1. 4.0
2. 2.5
3. 1.1
4. 3.0
Solution-
Non-Pipeline Execution Time-
Non-pipeline execution time for 1 instruction
= 5 ns + 6 ns + 11 ns + 8 ns
= 30 ns
Speed Up-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 30 ns / 12 ns
= 2.5
Thus, Option (B) is correct.
Problem-08:
Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2,
I3 and I4 in stages S1, S2, S3 and S4 is shown below-
S1 S2 S3 S4
I1 2 1 1 1
I2 1 3 2 2
I3 2 1 1 3
I4 1 2 2 2
Solution-
The phase-time diagram is-
From here, number of clock cycles required to execute the loop = 23 clock cycles.
Thus, Option (B) is correct.
Problem-09:
Consider a pipelined processor with the following four stages-
IF : Instruction Fetch
ID : Instruction Decode and Operand Fetch
EX : Execute
WB : Write Back
The IF, ID and WB stages take one clock cycle each to complete the operation. The number of
clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1
clock cycle and the MUL instruction need 3 clock cycles in the EX stage. Operand forwarding is
used in the pipelined processor. What is the number of clock cycles taken to complete the following
sequence of instructions?
Solution-
The phase-time diagram is-
Problem-10:
Consider the following procedures. Assume that the pipeline registers have zero latency.
P1 : 4 stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns
P2 : 4 stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns
P3 : 5 stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns
P4 : 5 stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns
Which procedure has the highest peak clock frequency?
1. P1
2. P2
3. P3
4. P4
Solution-
It is given that pipeline registers have zero latency. Thus,
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Maximum delay due to any stage
Problem-11:
Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies T1, T2 and
T3 such that T1 = 3T2/4 = 2T3. If the longest pipeline stage is split into two pipeline stages of equal
latency, the new frequency is ____ GHz, ignoring delays in the pipeline registers.
Solution-
Let ‘t’ be the common multiple of each ratio, then-
• T1 = t
• T2 = 4t / 3
• T3 = t / 2
Frequency Of Pipeline-
Frequency
= 1 / Pipeline cycle time
= 1 / (4t / 3)
= 3 / 4t
Given frequency = 3 GHz. So,
3 / 4t = 3 GHz
4t = 1 ns
t = 0.25 ns