0% found this document useful (0 votes)
25 views53 pages

Unit 4

The document discusses the process of instruction execution in a CPU, including the steps involved in loading, arithmetic/logic, and store instructions. It also covers building the basic components of a datapath, including an instruction memory, program counter, registers, and ALU. Finally, it examines approaches to designing the control unit, comparing hardwired versus microprogrammed control.

Uploaded by

subithav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views53 pages

Unit 4

The document discusses the process of instruction execution in a CPU, including the steps involved in loading, arithmetic/logic, and store instructions. It also covers building the basic components of a datapath, including an instruction memory, program counter, registers, and ALU. Finally, it examines approaches to designing the control unit, comparing hardwired versus microprogrammed control.

Uploaded by

subithav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Unit 4 PROCESSOR

• Instruction Execution – Building a Data Path – Designing a Control


Unit – Hardwired Control, Micro programmed Control – Pipelining –
Data Hazard – Control Hazards.
INSTRUCTION EXECUTION
LOAD INSTRUCTIONS
• Consider the instruction
Load R5, X(R7)
which uses the Index addressing mode to load a word of data from
memory location X +[R7] into register R5.
Execution of this instruction involves the following actions:
1. Fetch the instruction from the memory.
2. Increment the program counter.
3. Decode the instruction to determine the operation to be
performed.
4. Read register R7.
5. Add the immediate value X to the contents of R7.
6. Use the sum X + [R7] as the effective address of the source operand,
and read the contents of that location in the memory.
7. Load the data received from the memory into the destination
register, R5.
ARITHMETIC AND LOGIC INSTRUCTIONS
• Instructions that involve an arithmetic or logic operation can be executed
using similar steps. They differ from the Load instruction in two ways:
• There are either two source registers, or a source register and
an immediate source operand.
• No access to memory operands is required.
A typical instruction of this type is Add R3, R4, R5
It requires the following steps:
1. Fetch the instruction and increment the program counter.
2. Decode the instruction and read the contents of source registers R4 and R5.
3. Compute the sum [R4] + [R5].
4. Load the result into the destination register, R3.
• If the instruction uses an immediate operand, as in
Add R3, R4, #1000
the immediate value is given in the instruction word. The same
sequence can be used, with steps 2 and 3 modified as:
2. Decode the instruction and read register R4.
3. Compute the sum [R4] + 1000.
STORE INSTRUCTIONS
For example, the instruction
Store R6, X(R8)
stores the contents of register R6 into memory location X + [R8]. It can
be implemented as follows:
1. Fetch the instruction and increment the program counter.
2. Decode the instruction and read registers R6 and R8.
3. Compute the effective address X + [R8].
4. Store the contents of register R6 into memory location X + [R8].
A five-step sequence of actions to fetch and execute an instruction.
Step Action
• 1 Fetch an instruction and increment the program counter.
• 2 Decode the instruction and read registers from the register file.
• 3 Perform an ALU operation.
• 4 Read or write memory data if the instruction involves a memory
operand.
• 5 Write the result into the destination register, if needed.
BUILDING A DATAPATH

• A datapath is a representation of the flow of information through


the CPU, implemented using combinatorial and sequential
circuitry.
• Components required to form a datapath is known as DATA PATH
ELEMENT.
Instruction Memory:
• It provides read access because the data path do not perform write
operation.
• It always holds contents of location specified by the address.
Program Counter (PC):
This is a 32 bit state register containing the address of the current
instruction that is being executed.
Adder:
This is a combinatorial circuit that updates the value of PC to get that
address of the next instruction to be executed.
FETCHING PHASE
• The fundamental operation in Instruction Fetch is to send the address in the PC to the instruction
memory and obtain the specified instruction, and the increment the PC
• The purpose of incrementing program counter is to point the next instruction by INCREMENTING
PC BY 4 BYTES.
• Program counter is a register containing the address of the instruction in the program being
executed.
ARITHMETIC LOGICAL INSTRUCTIONS/ R-FORMAT INSTRUCTION
• To perform ALU operation these instructions read two registers and
writes the result to one register.
• It performs the operation on the contents of the registers and this
instruction includes add, sub, AND, OR and slt (shift left).
All R-format instructions read two registers, rs and rt, and write to a
register rd.
The two output ports are:
Read register 1 – first source register. 5 bits wide.
Read data 1 – contents of source register 1.
Read register 2 – second source register. 5 bits wide.
Read data 2 – contents of source register 2.
Write register – destination register. 5 bits wide.
Write data – data to be written to a register. 32 bits wide.

The two elements needed to implement R-format ALU operations are the register fi le and the ALU.
I-FORMAT INSTRUCTIONS
In our limited MIPS instruction set, these are lw, sw, and beq.
• The op field is used to identify the type of instruction.
• The rs field is the source register.
• The rt field is either the source or destination register, depending on
the instruction.
• The immed field is zero-extended if it is a logical operation.
Otherwise, it is sign extended.
LOAD AND STORE INSTRUCTIONS:
•The load and store instructions compute a memory address by adding the base
register.
•If the instruction is a load, the value read from memory must be written into the
register file in the specified register.
•The memory is computed by adding the address of base register and the 16-bit
signed offset field (which is a part of the instruction).

If the instruction is a store, the value to be stored must also be read from the register.

The processor has a sign


extension unit to sign-extend the
16-bit offset field in the
instruction to a 32-bit signed
value.
BRANCH INSTRUCTIONS
• Branch Target is the address specified in a branch, which is used to
update the PC if the branch is taken.
• The branch target is computed as the sum of the offset field of the
instruction and the address of the Instruction.

The beq instruction has three operands, two registers that are
compared for equality, and a 16 bit offset to compute the branch target
address.
Beq t1, t2, offset
Thus, the branch data path must do two operations:
•compute the branch target address and
•compare the register contents.

Branch Taken is where the branch condition is satisfied and the program counter
(PC) loads the branch target. All unconditional branches are taken branches.

Branch not Taken is where the branch condition is


false and the program counter (PC) loads the
address of the instruction that sequentially follows
the branch.
Delayed branch is where the instruction
immediately following the branch is always
executed, independent of whether the branch
condition is true or false.
• Creating a single Data path
Designing a Control
• Control signals are used to know about what operation is going to be
performed.
• It is used to know about the sequence of operations that are
performed by processor.
• It is used to know about the timing at which an operation must be
executed and many other types of things.
Control Signals
There are two basic approaches:
• Hardwired control
• Microprogrammed control.
• Hardwired control is a method of control unit design .
• The control-signals are generated by using logic circuits such as gates,
flip-flops, decoders etc.
• Decoder/Encoder Block is a combinational-circuit that generates
required control-outputs depending on state of all its inputs.
• An instruction is executed in a sequence of steps, where each step requires one
clock cycle.
• A step counter may be used to keep track of the progress of execution.
• Several actions are performed in each step, depending on the instruction being
executed.
• In some cases, such as for branch instructions, the actions taken depend on
tests applied to the result of a computation or a comparison operation.
• External signals, such as interrupt requests, may also influence the actions to
be performed.
• Thus, the setting of the control signals depends on:
• Contents of the step counter
• Contents of the instruction register
• The result of a computation or a comparison operation
• External input signals, such as interrupt requests
• Instruction Decoder
• It decodes the instruction loaded in the IR.
• If IR is an 8 bit register, then instruction decoder generates 28(256 lines); one
for each instruction.
• It consists of a separate output-lines INS1 through INSm for each machine
instruction.
• According to code in the IR, one of the output-lines INS1 through INSm is set
to 1, and all other lines are set to 0.
• Step-Decoder provides a separate signal line for each step in the
control sequence.
• Advantage: Can operate at high speed.
• Disadvantages:
• Since no. of instructions/control-lines is often in hundreds, the complexity of
control unit is very high.
• It is costly and difficult to design.
• The control unit is inflexible because it is difficult to change the design
Microprogrammed Control
• Control signals are generated for each execution step based on the instruction in the
• IR. In hardwired control, these signals are generated by circuits that interpret the
contents
• of the IR as well as the timing signals derived from a step counter. Instead of employing
• such circuits, it is possible to use a “software" approach, in which the desired setting of
• the control signals in each step is determined by a program stored in a special memory.
• The control program is called a microprogram to distinguish it from the program being
• executed by the processor. The microprogram is stored on the processor chip in a small
and
• fast memory called the microprogram memory or the control store.
• Microprogramming is a method of control unit design (Figure 7.16).
• Control-signals are generated by a program similar to machine
language programs.
• Control Word(CW) is a word whose individual bits represent various
control-signals (like Add, PCin).
• Each of the control-steps in control sequence of an instruction defines
a unique combination of 1s & 0s in CW.
• Individual control-words in microroutine are referred to as
microinstructions (Figure 7.15).
• A sequence of CWs corresponding to control-sequence of a machine
instruction constitutes the microroutine.
• The microroutines for all instructions in the instruction-set of a computer are
stored in a special memory called the Control Store (CS).

Control-unit generates control-signals for any instruction by
sequentially reading CWs of corresponding microroutine from CS.
• µPC is used to read CWs sequentially from CS.
• Every time new instruction is loaded into IR, o/p of Starting Address
Generator is loaded into µPC.
• Then, µPC is automatically incremented by clock;
• causing successive microinstructions to be read from CS.
• Hence, control-signals are delivered to various parts of processor in
correct sequence.
• Advantages
• It simplifies the design of control unit. Thus it is both, cheaper and less error
prone implement.
• Complex function such as floating point arithmetic can be realized efficiently.
• Disadvantages
• The flexibility is achieved at some extra hardware cost due to the control
memory and its access circuitry.
PIPELINING

• Pipelining is an implementation technique in which multiple


instructions are overlapped in execution.
• Used to increase the speed and performance of the processor.
• the computer architecture allows the next instructions to be fetched
while the processor is performing arithmetic operations, holding
them in a buffer close to the processor until each instruction
operation can be performed.
• The staging of instruction fetching is continuous. The result is an
increase in the number of instructions that can be performed during a
given time period.
Pipelining in Processor
MIPS instructions classically take five steps:
• Fetch instruction from memory. (IF)
• Read registers while decoding the instruction. (ID)
• Execute the operation or calculate an address. (EX)
• Access an operand in data memory. (MEM)
• Write the result into a register. (WB)
Hence, the MIPS pipeline has five stages. The five stages are

 Instruction fetch cycle (IF)


 Instruction decode/register fetch cycle (ID)
 Execution/effective address cycle(EX)
 Memory access(MEM)
 Write-back cycle (WB)
HAZARDS
• A situation in pipelining in which the next instruction cannot execute
in the following clock cycle is called hazards, and there are three
different types.
• Structural Hazards
• Data Hazards
• Control Hazards
• STRUCTURAL HAZARDS
• Structural hazard means when a planned instruction cannot execute
in the proper clock cycle because the hardware does not support the
combination of instructions that are set to execute.
• In MIPS instruction structural hazard will occur when the first
instruction is accessing memory and the forth instruction is also
accessing the same memory.
• DATA HAZARDS:
• Data hazards occur when a planned instruction cannot execute in the
proper clock cycle because data that is needed to execute the
instruction is not yet available.
• In pipeline process if one step must wait for another to complete
means it cause data hazards. Data Hazards mainly occur if one
instruction depends on another instruction to complete their task.
Also called a pipeline data hazard.
• For example, suppose we have an add instruction followed immediately
by a subtract instruction that uses the sum ($s0):

• add $s0, $t0, $t1
• sub $t2, $s0, $t3

• The add instruction doesn‘t write its result until the fifth stage, meaning
that we would have to waste three clock cycles in the pipeline.
• sub instruction wait for the of s0.
• The primary solution - we don‘t need to wait for the instruction to
complete before trying to resolve the data hazard.
• As soon as the ALU creates the sum for the add, we can supply it as an
input for the subtract.
• Adding extra hardware to retrieve the missing item early from the
internal resources is called forwarding or bypassing.
• A method of resolving a data hazard by retrieving the missing data
element from internal buffers rather than waiting for it to arrive from
programmer visible registers or memory.
Example for Forwarding with Two Instructions
add $s0, $t0, $t1
sub $t2, $s0, $t3

Figure 3.33 The Graphical Representation of Forwarding


Forwarding paths are valid only if the destination stage is later in time than the source
stage.
Forwarding cannot prevent all pipeline stalls, so we need to focus new data hazard called
load- use data hazard .The desired data would be available only after the fourth stage of the
first instruction in the dependence, which is too late for the input of the third stage of sub.
Hence, even with forwarding, we would have to stall one stage for a load-use data hazard,

Figure 3.34 A Stall Even with Forwarding


• pipeline stall – other name -bubble.
• There are three different types of data hazard:
• RAW:
• A Read After Write hazard occurs when, one instruction reads a location after
an earlier instruction writes new data to it, but in the pipeline the write occurs
after the read.
• WAR :
• A Write After Read hazard is the reverse of a RAW: in the code a write occurs
after a read, but the pipeline causes write to happen first.
• WAW:
• A Write After Write hazard is a situation in which two writes occur out of
order. For example, suppose we have an add instruction followed immediately
by a subtract instruction that uses the sum ($s0):
add $s0, $t0, $t1
sub $t2, $s0, $t3
• Reordering Code to Avoid Pipeline Stalls:
• To avoid pipeline stalls , the coding can be reordered.
• For example consider the following code segment in C
a = b + e;
c = b + f;
• Here is the generated MIPS code for this segment, assuming all variables are in memory
and are addressable as offsets from $t0:
lw $t1, 0($t0)
lw $t2, 4($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
lw $t4, 8($t0)
add $t5, $t1, $t4
sw $t5, 16($t0)

• Both add instructions have a hazard because of their respective
dependence on the immediately preceding lw instruction. Notice that
bypassing eliminates several other potential hazards, including the
dependence of the first add on the first lw and any hazards for store
instructions. Moving up the third lw instruction to become the third
instruction eliminates both hazards:
• lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1,$t2 sw $t3,
12($t0) add $t5, $t1,$t4 sw $t5, 16($t0)
• CONTROL HAZARDS
• Control hazard, arising from the need to make a decision based on the
results of one instruction while others are executing. Also called as
branch Hazard. When the proper instruction cannot execute in the
proper pipeline clock cycle because the instruction that was fetched is
not the one that is needed.
• Consider the branch instruction we must begin fetching the
instruction following the branch on the very next clock cycle. The
pipeline cannot possibly know what the instruction should be. To
avoid STALL we fetch a branch after that waiting until the pipelines
determines the outcome of the branch and knows what instruction
address to fetch from. It occurs when we execute the branch
instruction in pipeline process. It occurs at less frequently.
• For Example

40 beq $1, $3, 28
• 44 and $12, $2, $5
• 48 or $13, $6, $2
• 52 add $14, $2, $2
• 72 lw $4, 50($7)
• Figure 3.44 shows a sequence of instructions and indicates when the branch would
occur in this pipeline. An instruction must be fetched at every clock cycle to sustain
the pipeline, yet in our design the decision about whether to branch doesn‘t occur
until the MEM pipeline stage. The beq instruction is executed at MEM stage only.
• Figure 3.44 The impact of the pipeline on the branch instruction.
• Two schemes for resolving control hazards are.
• Branch not taken
• Branch Prediction

• Branch Not Taken
• If the branch is taken, the instructions that are being fetched and decoded must be discarded. Discarding instructions, means we
must be able to flush instructions in the IF, ID, and EX stages of the pipeline. If branch instruction is fetched immediately it stall the
execution until the pipeline determines the outcome of the branch and knows what instruction address to fetch from.If branches are
not taken no need to discard the instructions the pipeline will execute the instructions continuously.
• Reducing the Delay of Branches
• Way to improve branch performance is to reduce the cost of the taken branch.The next PC for a branch is selected in the MEM stage,
but if we move the branch execution earlier in the pipeline, then fewer instructions need be flushed. Moving the branch decision
earlier will increase the speed of the performance. Moving the branch decision earlier requires two actions to occur earlier:
• Computing the branch target address
• Evaluating the branch decision.
• Branch prediction

Branch prediction is a method of resolving a branch hazard that assume a given outcome for the branch and proceeds from that
assumption rather that waiting to ascertain the actual outcome. A simple form of branch prediction is we have to assume branch is
not taken. This assumption is possible only for simple five stage pipeline. For deeper pipelines this assumption is not suitable because
it will increase the branch penalty when measured in clock cycles. For such kind of pipelines we have to add more hardware to
predict branch behavior during program execution. Solution for increasing branch penalty we have new technique called dynamic
branch prediction.

• Dynamic Branch Prediction
• One implementation of that approach is a branch prediction buffer or branch
history table. A branch prediction buffer is a small memory indexed by the lower
portion of the address of the branch instruction. The memory contains a bit that
says whether the branch was recently taken or not.
• Prediction is just a hint that we hope is correct, so fetching begins in the predicted
direction. If the hint turns out to be wrong, the incorrectly predicted instructions are
deleted, the prediction bit is inverted and stored back, and the proper sequence is
fetched and executed.
• Two bit prediction scheme
• To overcome this weakness, 2-bit prediction schemes are often used. In a 2-bit
scheme, a prediction must be wrong twice before it is changed.
• Figure 3.45 shows the finite-state machine for a 2-bit prediction
scheme. A branch prediction buffer can be implemented as a small,
special buffer accessed with the instruction address during the IF pipe
stage. If the instruction is predicted as taken, fetching begins from the
target as soon as the PC is known; it can be as early as the ID stage.
Otherwise, sequential fetching and executing continue. If the
prediction turns out to be wrong, the prediction bits are changed as
shown in Figure

You might also like