COMPUTER ORGANIZATION AND ARCHITECTURE                 – MODULE 5 BASIC PROCESSING UNIT
MODULE 5
                                    BASIC PROCESSING UNIT
   Instruction Set Processor (ISP) – executes machine instructions and coordinates the activities of other
   cells.
   Also called Central Processing Unit (CPU)
   A typical computing task consists of a series of steps specified by a sequence of machine instructions that
   constitute a program.
   An instruction is executed by carrying out a sequence of more rudimentary operations.
   Some fundamental concepts
        Processor fetches one instruction at a time and performs the operation specified.
        Instructions are fetched from successive memory locations until a branch or a jump instruction is
         encountered.
        Processor keeps track of the address of the memory location containing the next instruction to be
         fetched using Program Counter (PC).
        After fetching an instruction, the contents of the PC are updated to point to the next instruction in
         the sequence.
        A branch instruction may load a different value into the PC.
        When an instruction is fetched, it is placed in the instruction register, IR, from where it is
         interpreted, or decoded, by the processor’s control circuitry.
        The IR holds the instruction until its execution is completed
   Steps in executing an instruction occupying 32-bits
       Fetch the contents of the memory location pointed to by the PC. The contents of this location are
          loaded into the instruction register IR (fetch phase).
                                                    IR ← [[PC]]
       Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch
          phase).
                                                  PC ← [PC] + 4
       Carry out the actions specified by the instruction in the IR (execution phase).
      
   Processor Organization:
   Fig. 5.1 shows the organization in which the ALU and all the registers are interconnected via single
   common bus.
          This bus is internal to the processor.
          The data lines of external memory bus are connected to internal processor bus via the MDR
           (Memory Data Register)
          The address lines of the external memory bus are connected to the internal processor bus via the
           memory address register, MAR.
Dept of ECE, MITE, Moodabidre                                                                                    1
  COMPUTER ORGANIZATION AND ARCHITECTURE                   – MODULE 5 BASIC PROCESSING UNIT
                          Fig. 5.1 Single bus organiszation of the datapath inside a processor
          MDR (Memory Data Register) has two inputs and two outputs.
          Data may be loaded into MDR either from the memory bus or from the internal processor bus.
          The data stored in MDR may be placed on either bus.
          The input of MAR (Memory Address Register) is from the internal bus, and its output is
           connected to the external address bus.
          The control lines of the memory bus are connected to the instruction decoder and control logic
           block.
          This unit is responsible for issuing the signals that control the operation of all the units inside the
           processor and for interacting with the memory bus.
          The number and use of the processor registers R0 through R(n - 1) vary considerably from one
           processor to another.
          Registers may be provided for general-purpose use by the programmer.
          Some may be dedicated as special-purpose registers, such as index registers or stack pointers.
Dept of ECE, MITE, Moodabidre                                                                                        2
  COMPUTER ORGANIZATION AND ARCHITECTURE                – MODULE 5 BASIC PROCESSING UNIT
        The registers, Y, Z, and TEMP are used by the processor for temporary storage during execution of
         some instructions.
        These registers are never used for storing data generated by one instruction for later use by
         another instruction.
        The multiplexer MUX selects either the output of register Y or a constant value 4 to be provided
         as input A of the ALU.
        The constant 4 is used to increment the contents of the program counter.
        We will refer to the two possible values of the MUX control input Select as Select4 and SelectY
         for selecting the constant 4 or register Y, respectively.
        As instruction execution progresses, data are transferred from one register to another, often
         passing through the ALU to perform some arithmetic or logic operation.
        The instruction decoder and control logic unit is responsible for implementing the actions
         specified by the instruction loaded in the IR register.
        The decoder generates the control signals needed to select the registers involved and direct the
         transfer of data.
        The registers, the ALU, and the interconnecting bus are collectively referred to as the datapath.
  Different operations done using a processor
        Transfer a word of data from one processor register to another or to the ALU.
        Perform an arithmetic or a logic operation and store the result in a processor register.
        Fetch the contents of a given memory location and load them into a processor register.
        Store a word of data from a processor register into a given memory location
  1. REGISTER TRANSFERS
      Instruction execution involves a sequence of steps in which data are transferred from one register
        to another.
      For each register, two control signals are used to place the contents of that register on the bus or to
        load the data on the bus into the register.
      This is represented symbolically in Figure 5.2.
      The input and output of register Ri are connected to the bus via switches controlled by the signals
        Riin and Riout respectively.
      When Riin is set to 1, the data on the bus are loaded into Ri.
      Similarly, when Riout is set to 1, the contents of register Ri are placed on the bus.
      While Riout is equal to 0, the bus can be used for transferring data from other registers.
Dept of ECE, MITE, Moodabidre                                                                                3
  COMPUTER ORGANIZATION AND ARCHITECTURE                 – MODULE 5 BASIC PROCESSING UNIT
                                 Fig 5.2 Input and Output gating for the registers
   Suppose that we wish to transfer the contents of register R1 to register R4. (MOV R1, R4)
   This can be accomplished as follows:
       Enable the output of register R1 by setting R1out to 1. This place the contents of R1 on the
          processor bus.
       Enable the input of register R4 by setting R4in to 1. This loads data from the processor bus into
          register R4.
       All operations and data transfers within the processor take place within time periods defined by
          the processor clock.
   IMPLEMENTATION OF ONE BIT REGISTER
   An implementation of one bit of register Ri is shown in Figure 5.3 as an example.
   A two-input multiplexer is used to select the data applied to the input of an edge-triggered D flip- flop.
   When the control input Riin is equal to 1, the multiplexer selects the data on the bus.
Dept of ECE, MITE, Moodabidre                                                                                   4
  COMPUTER ORGANIZATION AND ARCHITECTURE                 – MODULE 5 BASIC PROCESSING UNIT
   This data will be loaded into the flip-flop at the rising edge of the clock.
   When Riin is equal to 0, the multiplexer feeds back the value currently stored in the flip-flop
   The Q output of the flip-flop is connected to the bus via a tri-state gate.
   When Riout is equal to 0, the gate's output is in the high-impedance (electrically disconnected) state.
   This corresponds to the open-circuit state of a switch.
   When Riout = 1, the gate drives the bus to 0 or 1, depending on the value of Q.
                                  Fig 5.3 Implementation of one bit register
   Performing an Arithmetic and Logic operations
          The ALU is a combinational circuit that has no internal storage.
          It performs arithmetic and logic operations on the two operands applied to its A and B inputs.
          ALU gets the two operands from MUX and bus. The result is temporarily stored in register Z.
          The sequence of operations to add the contents of register R1 to those of R2 and store the result in
           R3. (ADD R1, R2, R3)
          The signals whose names are given in any step are activated for the duration of the clock cycle
           corresponding to that step. (All other signals are inactive)
          In step 1, the output of register R1 and the input of register Y are enabled, causing the contents of
           R1 to be transferred over the bus to Y.
          In step 2, the multiplexer's Select signal is set to SelectY, causing the multiplexer to gate the
           contents of register Y to input A of the ALU.
          At the same time, the contents of register R2 are gated onto the bus and, hence, to input B.
          The Add line is set to 1, causing the output of the ALU to be the sum of the two numbers at inputs
           A and B.
          This sum is loaded into register Z because its input control signal is activated.
          In step 3, the contents of register Z are transferred to the destination register, R3.
Dept of ECE, MITE, Moodabidre                                                                                  5
  COMPUTER ORGANIZATION AND ARCHITECTURE                 – MODULE 5 BASIC PROCESSING UNIT
       2. FETCHING A WORD FROM MEMORY
          To fetch a word of information from memory, the processor has to specify the address of the
           memory location where this information is stored and request a Read operation.
          The information to be fetched may be an instruction in a program or an operand specified by an
           instruction(data)
          The processor transfers the required address to the MAR, whose output is connected to the address
           lines of the memory bus.
          At the same time, the processor uses the control lines of the memory bus to indicate that a Read
           operation is needed.
          When the requested data are received from the memory, they are stored in register MDR,
          From MDR, they can be transferred to other registers in the processor
                                Fig. 5.4 Connection and control signals for MDR
          The connections for register MDR are illustrated in Figure 5.4.
          It has four control signals:
             MDRin and MDRout control the connection to the internal bus.
             MDRinE and MDRoutE control the connection to the external bus
          During memory Read and Write operations, the timing of internal processor operations must be
           coordinated with the response of the addressed device on the memory bus.
          The speed of the processor and the memory transfer speeds are different
          To accommodate this, the processor waits until it receives an indication that the requested
           operation has been completed.
          A control signal called Memory-Function-Completed (MFC) is used for this purpose.
          The addressed device sets this signal to 1 to indicate that the contents of the specified location
           have been read and are available on the data lines of the memory bus.
   Example: Consider the instruction Move (R1), R2
   The actions needed to execute this instruction are:
          MAR ← [R1]
          Start a Read operation on the memory bus
Dept of ECE, MITE, Moodabidre                                                                                   6
  COMPUTER ORGANIZATION AND ARCHITECTURE              – MODULE 5 BASIC PROCESSING UNIT
          Wait for the MFC response from the memory
          Load MDR from the memory bus
          R2 ← [MDR]
       The memory read operation requires three steps, which can be described by the signals being
       activated as follows:
       STORING A WORD IN MEMORY
              The desired address is loaded into MAR.
              The data to be written are loaded into MDR and a Write command is issued.
Dept of ECE, MITE, Moodabidre                                                                        7
  COMPUTER ORGANIZATION AND ARCHITECTURE                – MODULE 5 BASIC PROCESSING UNIT
       Example: MOVE R2,(R1)
       Executing the instruction Move R2,(R1) requires the following sequence:
       EXECUTION OF A COMPLETE INSTRUCTION
       Consider the instruction Add (R3), R1
       Executing this instruction requires the following actions:
             Fetch the instruction
             Fetch the first operand (the contents of the memory location pointed to by R3)
             Perform the addition
             Load the result into R1
       Figure 7.6 gives the sequence of control steps required to perform these operations for the single-bus
       architecture of Figure 7.1.
             Steps 1 through 3 constitute the instruction fetch phase,
             This is the same for all instructions.
             The instruction decoding circuit interprets the contents of the IR at the beginning of step 4.
             This enables the control circuitry to activate the control signals for steps 4 through 7, which
              constitute the execution phase.
             The contents of register R3 are transferred to MAR in step 4 and memory read operation is
              initialized.
             Then the contents of R1 are transferred to register Y in step 5, to prepare for addition operation
             When the read operation is completed, the memory operand is available in MDR and addition
              operation is performed in step 6
Dept of ECE, MITE, Moodabidre                                                                                   8
  COMPUTER ORGANIZATION AND ARCHITECTURE                – MODULE 5 BASIC PROCESSING UNIT
             (The contents of MDR are gated on to the bus and thus also to the B input of ALU and register
              Y is selected as second input to ALU by choosing SelectY)
             The sum is stored in Z and then transferred to R1 in step 7
             End causes new instruction fetch cycle to begin by returning to step 1
             (updated PC value is stored in Y register in step 2. This is useful for branch instructions)
       EXECUTION OF BRANCH INSTRUCTIONS
       A branch instruction replaces the contents of PC with the branch target address
       This address is usually obtained by adding an offset X given in the branch instruction, to the updated
       value of the PC.
          Figure 7.7 gives a control sequence that implements an unconditional branch instruction.
          Processing starts with the fetch phase.
          This phase ends when the instruction is loaded into the IR in step 3.
          The offset value is extracted from the IR by the instruction decoding circuit, which will also
          perform sign extension if required.
          Since the value of the updated PC is already available in register Y, the offset X is gated onto the
          bus in step 4, and an addition operation is performed.
          The result, which is the branch target address, is loaded into the PC in step 5.
          The offset X is usually the difference between the branch target address and the address
          immediately following the branch instruction.
          For example, if the branch instruction is at location 2000 and if the branch target address is 2050,
          the value of X must be 46.
          Consider now a conditional branch.
          In this case, we need to check the status of the condition codes before loading a new value into the
          PC.
          For example, for a Branch-on-negative (Branch <0) instruction, step 4 in Figure 7.7 is replaced
          with
          Thus, if N = 0 the processor returns to step 1 immediately after step 4.
          If N = 1, step 5 is performed to load a new value into the PC, thus performing the branch
          operation.
Dept of ECE, MITE, Moodabidre                                                                                    9
  COMPUTER ORGANIZATION AND ARCHITECTURE                – MODULE 5 BASIC PROCESSING UNIT
   MULTIPLE BUS ORGANIZATION
   To reduce the number of steps needed, most commercial processors provide multiple internal paths that
   enable several transfers to take place in parallel.
   Figure 7.8 depicts a three-bus structure used to connect the registers and the ALU of a processor.
          All general-purpose registers are combined into a single block called the register file.
          Implemented in the form of an array of memory cells.
          The register file in Figure 7.8 is said to have three ports.
          There are two outputs, allowing the contents of two different registers to be accessed
           simultaneously and have their contents placed on buses A and B.
          The third port allows the data on bus C to be loaded into a third register during the same clock
           cycle.
          Buses A and B are used to transfer the source operands to the A and B inputs of the ALU, where
           an arithmetic or logic operation may be performed.
          The result is transferred to the destination over bus C.
          If needed, the ALU may simply pass one of its two input operands unmodified to bus C.
Dept of ECE, MITE, Moodabidre                                                                                 10
  COMPUTER ORGANIZATION AND ARCHITECTURE                 – MODULE 5 BASIC PROCESSING UNIT
          We will call the ALU control signals for such an operation R=A or R=B.
          The Incrementer unit is used to increment the PC by 4.
          Using the Incrementer eliminates the need to add 4 to the PC using the main ALU.
          The source for the constant 4 at the ALU input multiplexer is still useful.
          It can be used to increment other addresses, such as the memory addresses in LoadMultiple and
           StoreMultiple instructions
   Control sequence of instructions for ADD R1,R2,R3
          In step 1, the contents of the PC are passed through the ALU, using the R=B control signal, and
           loaded into the MAR to start a memory read operation.
          At the same time the PC is incremented by 4.
          In step 2, the processor waits for MFC and loads the data received into MDR, then transfers them
           to IR in step 3.
          Finally, the execution phase of the instruction requires only one control step to complete, step 4.
   HARDWIRED CONTROL
   To execute instructions, the processor must have some means of generating the control signals needed in
   the proper sequence.
   Two categories:
          Hardwired control
          Microprogrammed control
   Hardwired system can operate at high speed; but with little flexibility.
   The required control signals are determined by the following information:
          Contents of the control step counter
          Contents of the instruction register
          Contents of the condition code flags
Dept of ECE, MITE, Moodabidre                                                                                11
  COMPUTER ORGANIZATION AND ARCHITECTURE                – MODULE 5 BASIC PROCESSING UNIT
          External input signals, such as MFC and interrupt requests
   The decoder/encoder block in Figure 7.10 is a combinational circuit that generates the required control
   outputs, depending on the state of all its inputs.
   By separating the decoding and encoding functions, we obtain the more detailed block diagram in Figure
   7.11.
          The step decoder provides a separate signal line for each step, or time slot, in the control
           sequence.
          Similarly, the output of the instruction decoder consists of a separate line for each machine
           instruction.
          For any instruction loaded in the IR, one of the output lines INS1 through INSm is set to 1, and all
           other lines are set to 0.
          The input signals to the encoder block are combined to generate the individual control signals Yin,
           PCout, Add, End, and so on.
   Figure 7.11 contains another control signal called RUN.
          When set to 1, RUN causes the counter to be incremented by one at the end of every clock cycle.
          When RUN is equal to 0, the counter stops counting.
          This is needed whenever the WMFC signal is issued, to cause the processor to wait for the reply
           from the memory.
Dept of ECE, MITE, Moodabidre                                                                                12
  COMPUTER ORGANIZATION AND ARCHITECTURE                – MODULE 5 BASIC PROCESSING UNIT
          The control hardware shown in Figure 7.10 or 7.11 can be viewed as a state machine that changes
           from one state to another in every clock cycle,
          Depends on the contents of the instruction register, the condition codes, and the external inputs.
          The outputs of the state machine are the control signals.
          The sequence of operations carried out by this machine is determined by the wiring of the logic
           elements, hence the name "hardwired“
          A controller that uses this approach can operate at high speed.
          However, it has little flexibility, and the complexity of the instruction set it can implement is
           limited.
   Example to generate Zin control signal
   Consider the execution of instruction Add (R3), R1. and execution of unconditional branch instruction
Dept of ECE, MITE, Moodabidre                                                                              13
  COMPUTER ORGANIZATION AND ARCHITECTURE                 – MODULE 5 BASIC PROCESSING UNIT
   The above diagram shows the control signals to be generated in the sequence. In the form of equation
   control signal Zin is given as follows:
   This signal goes high during time slot T1 for all instructions, during T6 for an Add instruction, during T4
   for an unconditional branch instruction, and so on.
   Example: Generation of End signal
   The End signal starts a new instruction fetch cycle by resetting the control step counter to its starting
   value
   Consider the following instructions: ADD, Branch unconditionally, Conditional Branch
Dept of ECE, MITE, Moodabidre                                                                                  14
  COMPUTER ORGANIZATION AND ARCHITECTURE                 – MODULE 5 BASIC PROCESSING UNIT
   A COMPLETE PROCESSOR
   A complete processor can be designed using the structure shown in Figure 7.14.
          This structure has an instruction unit that fetches instructions from an instruction cache or from
           the main memory when the desired instructions are not already in the cache.
          It has separate processing units to deal with integer data and floating-point data.
          A data cache is inserted between these units and the main memory.
Dept of ECE, MITE, Moodabidre                                                                                   15
  COMPUTER ORGANIZATION AND ARCHITECTURE                – MODULE 5 BASIC PROCESSING UNIT
          A single cache can be used to store both instructions and data or separate caches can be used for
           instructions and data.
          The processor is connected to the system bus and, hence, to the rest of the computer, by means of
           a bus interface.
          A processor may include several integer or floating-point units to increase the potential for
           concurrent operations.
   MICROPROGRAMMED CONTROL
          Here the control signals are generated by a program similar to machine language program.
          A control word (CW) is a word whose individual bits represent the various control signals.
          A sequence of CWs corresponding to the control sequence of a machine instruction constitutes the
           microroutine for that instruction.
          The individual control words in this microroutine are referred to as microinstructions.
          Each of the control steps in the control sequence of an instruction defines a unique combination of
           1s and 0s in the CW.
          The CWs corresponding to the 7 steps of Figure 7.6 are shown in Figure 7.15.
          We have assumed that SelectY is represented by Select=0 and Select4 by Select=1.
Dept of ECE, MITE, Moodabidre                                                                               16
  COMPUTER ORGANIZATION AND ARCHITECTURE                 – MODULE 5 BASIC PROCESSING UNIT
          The microroutines for all instructions in the instruction set of a computer are stored in a special
           memory called the control store.
          The control unit can generate the control signals for any instruction by sequentially reading the
           CWs of the corresponding microroutine from the control store.
   Figure 7.16 shows the basic organization of a microprogrammed control unit.
Dept of ECE, MITE, Moodabidre                                                                                    17
  COMPUTER ORGANIZATION AND ARCHITECTURE                 – MODULE 5 BASIC PROCESSING UNIT
          To read the control words sequentially from the control store, a microprogram counter (µPC) is
           used.
          Every time a new instruction is loaded into the IR, the output of the block labeled "starting address
           generator'' is loaded into the µPC.
          The µPC is then automatically incremented by the clock, causing successive microinstructions to
           be read from the control store.
          Hence, the control signals are delivered to various parts of the processor in the correct sequence.
   Microprogram model for BRANCH instruction
   When the control unit has to check the status of condition codes or external inputs, the simple
   organization shown in fig, 7.16 is not sufficient. In this case, conditional branch microinstructions should
   be used. In addition to the branch address, these microinstrutions specify which external inputs or flag
   bits or registers to be checked before branching.
   The instruction Branch<0 can be implemented by a microroutine such as that shown in Figure 7.17.
          After loading this instruction into IR, a branch microinstruction transfers control to the
           corresponding microroutine, which is assumed to start at location 25 in the control store.
          This address is the output of the starting address generator block in Figure 7.16.
          The microinstruction at location 25 tests the N bit of the condition codes.
          If this bit is equal to 0, a branch takes place to location 0 to fetch a new machine instruction.
          Otherwise, the microinstruction at location 26 is executed to put the branch target address into
           register Z.
          The microinstruction in location 27 loads this address into the PC.
Dept of ECE, MITE, Moodabidre                                                                                  18
  COMPUTER ORGANIZATION AND ARCHITECTURE                – MODULE 5 BASIC PROCESSING UNIT
   To support microprogram branching, the organization of the control unit should be modified as shown in
   Figure 7.18.
   The starting address generator block of Figure 7.16 becomes the starting and branch address generator.
   This block loads a new address into the µPC when a microinstruction instructs it to do so.
   To allow implementation of a conditional branch, inputs to this block consist of the external inputs and
   condition codes as well as the contents of the instruction register.
   In this control unit, the µPC is incremented every time a new microinstruction is fetched from the
   microprogram memory, except in the following situations:
          When a new instruction is loaded into the IR, the µPC is loaded with the starting address of the
           microroutine for that instruction.
          When a Branch microinstruction is encountered and the branch condition is satisfied, the µPC is
           loaded with the branch address.
          When an End microinstruction is encountered, the µPC is loaded with the address of the first CW
           in the microroutine for the instruction fetch cycle.
Dept of ECE, MITE, Moodabidre                                                                                 19