Unit III
Processor
• The processing unit
   – Central processing unit (CPU)
   – The term “central” is not as appropriate today – as computers
     often include several processing units
   – Use the term processor
• To achieve high performance, make various functional units
  of a processor operate in parallel as much as possible :
   – Pipelined organization where the execution of an instruction is
     started before the execution of the preceding instruction is
     completed
   – Superscalar operation, is to fetch and start the execution of
     several instructions at the same time
                Fundamental Concepts
• Program :
    – Computing task
    – Series of operations
    – Specified by a sequence of machine-language instructions
• Instruction :
    – Processor fetches instruction
        • Fetch from successive location until branch or jump
    – Specified by PC
        • Keep track of next instruction
        • After instruction fetch it is updated to point to next instruction
             – Pc=pc+1
             – Branch : PC= target address
    – Instruction register, IR,
        • Fetched instruction is placed here
        • Hold until execution is complete
        • Control circuit interpret or decode
             Fundamental Concepts
• Instruction fetch phase
  – Fetching an instruction and loading it into the IR
     1. IR←[[PC]]
     2. PC←[PC] + 4
     3. Carry out the operation specified by the instruction in the IR
• Instruction execution phase
  – Performing the operation specified in the instruction
     1. Read the contents of a given memory location and load them
        into a processor register.
     2. Read data from one or more processor registers.
     3. Perform an arithmetic or logic operation and place the result
        into a processor register.
     4. Store data from a processor register into a given memory
        location.
      A Basic MIPS Implementation
• Simple subset, shows most aspects
  – The memory-reference instructions
     • load word (lw) and store word (sw)
  – The arithmetic-logical instructions
     • add, sub, AND, OR, and slt
  – Control transfer
     • branch equal (beq) and jump (j),
   An Overview of the Implementation
• First two steps are identical for all instruction type
  1. Send the program counter (PC) to the memory that
     contains the code and fetch the instruction from that
     memory
         –   IR←[[PC]]
  2. Read one or two registers
     •   Use fields of the instruction to select the registers
         –   Load word instruction, need to read only one register
         –   Most other instructions require reading two registers
 An Overview of the Implementation
3. Perform ALU operation (except jump)
  •   add – to perform operation
  •   lw – to calculate address
  •   beq – to compare
4. This step differ
  •   add –write data to register
  •   lw – read data to register
  •   beq- change or increment PC
                            CPU Overview
omits two important aspects of instruction execution
•Multiplexor
•control unit
                CPU Overview
• Add : add$s1,$s2,$s3
  – PC : Address : Instruction: PC=PC+4
  – reg1,reg2 : ALU : reg3
• Lw: lw $s1,20($s2)
  – PC : Address : Instruction: PC=PC+4
  – reg1,imm : ALU: address: data memory: reg2
• Beq: beq $s1,$s2,2
  – PC : Address : Instruction: PC=PC+4
  – reg1, reg2 : ALU:
  – Zero : PC=PC + 4*2
Multiplexers
             Can’t just join wires
              together
                  Use multiplexers
Control
           Building a Datapath
• Datapath
  – Elements that process data and addresses
    in the CPU
     • Registers, ALUs, mux’s, memories, …
           Instruction Fetch
                               Increment by
                               4 for next
32-bit                         instruction
register
               R-Format Instructions
• The processor’s 32 general-purpose registers are stored in a
  structure called a register file.
• Read two register operands
• Perform arithmetic/logical operation
• Write register result
          Load/Store Instructions
• Read register operands
• Calculate address using 16-bit offset
   – Use ALU, but sign-extend offset
• Load: Read memory and update register
• Store: Write register value to memory
              Branch Instructions
• Read register operands
• Compare operands
  – Use ALU, subtract and check Zero output
• Calculate target address
  – Sign-extend displacement
  – Shift left 2 places (word displacement)
  – Add to PC + 4
     • Already calculated by instruction fetch
            Branch Instructions
Just
re-routes
wires
                         Sign-bit wire
                         replicated
       Composing the Elements
• First-cut data path does an instruction in one
  clock cycle
  – Each datapath element can only do one function
    at a time
  – Hence, we need separate instruction and data
    memories
• Use multiplexers where alternate data sources
  are used for different instructions
R-Type/Load/Store Datapath
                        Full Datapath : add,lw,beq
The simple datapath for the core MIPS architecture combines the elements required by different instruction classes.
                A Simple Implementation Scheme
                          Control Unit
•Nine Control signals are generated by the main control unit using opcode from the
Instruction
          •RegDst
          •ALUSrc
          •MemtoReg
          •RegWrite
          •MemRead
          •MemWrite
          •Branch
          •Jump
          •ALUOp1 ALUOp0
•ALUControl (4) bits are generated by the ALUCU using two bit ALUOp and function
field from the instruction
•Multiple levels of decoding
    •reduce the size of the main control unit.
    • increase the speed of the control unit
                                         Funtion
Opcode                                   field 6 bit
field 6 bit                ALUOp                       4 bit ALU controls
              Main CU                   ALU CU
                           2 bits
        A Simple Implementation Scheme
                   ALU Control
• ALU used for
  – Load/Store: F = add ( to compute memory address) (00)
  – Branch: F = subtract ( to check if register content are equal) (01)
  – R-type: F = and, or, add, sub, slt (depends on funct field) (10)
             ALU control                 Function
                0000                       AND
                0001                        OR
                0010                        add
                 0110                     subtract
                 0111                 set-on-less-than
                 1100                      NOR
              ALU Control
ALU control       Function
    00             AND
    01              OR
    10              add
    10            subtract
              ALU Control
ALU control       Function
   000             AND
   001              OR
   010              add
   110            subtract
              ALU Control
       A              1101      A`            0010
       B              1001      B`            0110
       A|B            1101      A` & B`       0010
       (A | B)`       0010
       Demorgans : a NOR b = (a or b)` = a’ AND b’.
ALU control                     Function
   0000                           AND
   0001                              OR
   0010                              add
   0110                         subtract
   0111                   set-on-less-than
   1100                           NOR
          ALU Control
ALU control      Function
   0000            AND
   0001             OR
   0010             add
   0110           subtract
   0111       set-on-less-than
   1100            NOR
            ALU Control
            If A-B = -ive then
             A<B is true // set 1
            Else
            A<B is false // set 0
            A=4 -> 0100
            B=5 -> 1011
            A-B -> 1111 (-ive)
            Set =1=MSB it
            A=5 -> 0101
            B=4 -> 1100
            A-B -> 0001 (+ive)
• MSB Bit   set=0 =MSB bit
                                                         ALU Control
• 4 bit ALU controls are generated by a small control
unit based on value of the 6-bit funct field and 2 bit
control field ALUOp
                                        ALU control                 Function
                                             0000                     AND
                                             0001                      OR
                                             0010                     add
                                             0110                   subtract
                                             0111              set-on-less-than
                                             1100                     NOR
                                        Funtion
  Opcode
                                        field 6 bit
  field 6 bit              ALUOp                      4 bit ALU controls
                Main CU                 ALU CU
                           2 bits
                                      ALU Control
•     Assume 2-bit ALUOp derived from opcode
       – Only for R type the ALU control depends on funct field
       – Combinational logic derives ALU control
       – K-map(6 bit – 64 combinations and four bit output function)
opcode          ALUOp      Operation               Funct      ALU function       ALU control
lw                 00      load word             XXXXXX       add                   0010
sw                 00      store word            XXXXXX       add                   0010
beq                01      branch equal          XXXXXX       subtract              0110
R-type             10      Add (32)               100000      add                   0010
                           Subtract (34)          100010      subtract              0110
                           AND (36)               100100      AND                   0000
                           OR (37)                100101      OR                    0001
                           set-on-less-than       101010      set-on-less-than      0111
                           (42)
                               ALU Control
•   Create truth table for the interesting combinations of the function code field and the
    ALUOp bits
•   Once the truth table has been constructed, it can be optimized and then turned into
    gates
•   00 : lw/sw
•   01 : beq X1
•   10 :R type 1X
•   11 dont care
              The Main Control Unit
• Control signals derived from instruction
R-type   0            rs           rt           rd        shamt      funct
              31:26        25:21        20:16    15:11       10:6       5:0
Load/
         35 or 43     rs           rt           address
Store
              31:26        25:21        20:16                 15:0
Branch   4            rs           rt           address
              31:26        25:21        20:16                 15:0
             opcode   always       read,             write for       sign-extend
                      read         except            R-type and      shifted and
                                   for load          load            add with
                                                                     PC+4
The datapath with all necessary multiplexors and all control lines
Datapath With Control
               Datapath With Control
The effect of each of the seven control signals. When the 1-bit control to a two
way multiplexor is asserted, the multiplexor selects the input corresponding to 1.
Otherwise, if the control is deasserted, the multiplexor selects the 0 input.
                                         Nine control signals : AUOp 2 bit control
R-Type Instruction
Load Instruction
Branch-on-Equal Instruction
           Implementing Jumps
Jump   2           address
           31:26             25:0
• Jump uses word address
• Update PC with concatenation of
  – Top 4 bits of old PC
  – 26-bit jump address
  – 00
• Need an extra control signal decoded from
  opcode
Datapath With Jumps Added
           Performance Issues
• Longest delay determines clock period
  – Critical path: load instruction
  – Instruction memory → register file → ALU → data
    memory → register file
• Not feasible to vary period for different
  instructions
• Violates design principle
  – Making the common case fast
• We will improve performance by pipelining