Computer Architecture: Nguyễn Trí Thành
Computer Architecture: Nguyễn Trí Thành
11/27/2010                                    1
      Enhancing Performance
          with Pipelining
11/27/2010                    2
Pipelining
                    Start work ASAP!! Do not waste time!
                             6 PM   7   8   9   10   11   12    1   2 AM
                  Time
                 Task
                 order
                         A
                                                               Not pipelined
                         B
   Assume 30 min. each task – wash, dry, fold, store – and that
   separate tasks use separate hardware and so can be overlapped
                             6 PM   7   8   9   10   11   12    1   2 AM
                  Time
                 Task
                 order
                         A
                                                                Pipelined
                         B
                         D
11/27/2010                                                                     3
    Pipelined vs. Single-Cycle
    Instruction Execution: the Plan
             Program
             execution                        2             4             6            8             10            12            14        16           18
             order          Time
             (in instructions)
                                   Instruction                          Data                                                    Single-cycle
                lw $1, 100($0)        fetch
                                               Reg         ALU
                                                                       access
                                                                                 Reg
                                                                                           Instruction                           Data
                lw $2, 200($0)                             8 ns                               fetch
                                                                                                       Reg         ALU
                                                                                                                                access
                                                                                                                                         Reg
                                                                                                                                               Instruction
                lw $3, 300($0)                                                                                    8 ns                            fetch
                                                                                                                                                             ...
                                                                                                                                                     8 ns
                                                  Instruction                                Data
                                                                                                                                      Pipelined
                lw $2, 200($0)        2 ns                            Reg       ALU                       Reg
                                                     fetch                                  access
                                                                Instruction                                Data
                lw $3, 300($0)                      2 ns                         Reg          ALU                       Reg
                                                                   fetch                                  access
11/27/2010                                                        2 ns        2 ns           2 ns          2 ns          2 ns                                      4
Pipelining: Keep in Mind
         Pipelining does not reduce latency of a single
          task, it increases throughput of entire workload
         Pipeline rate limited by longest stage
                potential speedup = number pipe stages
                unbalanced lengths of pipe stages reduces
                 speedup
         Time to fill pipeline and time to drain it – when
          there is slack in the pipeline – reduces
          speedup
    11/27/2010                                                5
Example Problem
       Problem: for the laundry fill in the following table when
        1.   the stage lengths are 30, 30, 30 30 min., resp.
        2.   the stage lengths are 20, 20, 60, 20 min., resp.
11/27/2010                                                                                  6
Pipelining MIPS
11/27/2010                                                               7
Pipelining MIPS
    What makes it hard?
            structural hazards: different instructions, at different stages,
             in the pipeline want to use the same hardware resource
            control hazards: succeeding instruction, to put into pipeline,
             depends on the outcome of a previous branch instruction,
             already in pipeline
            data hazards: an instruction in the pipeline requires data to
             be computed by a previous instruction still in the pipeline
                 Program
                 execution                       2             4             6              8            10            12            14
                                 Time
                 order
                 (in instructions)
                                      Instruction                                   Data
                     lw $1, 100($0)                        Reg        ALU                       Reg
                                          fetch                                    access
                                                                                                                             Pipelined
                                                     Instruction                                 Data
                    lw $2, 200($0)        2 ns                           Reg        ALU                       Reg
                                                        fetch                                   access
                                                                                                                                  Hazard if single memory
                                                                   Instruction                                 Data
                    lw $3, 300($0)                     2 ns                            Reg        ALU                       Reg
                                                                      fetch                                   access
                                                                                 Instruction                                 Data
                    lw $4, 400($0)                                                                    Reg      ALU                        Reg
                                                                     2 ns           fetch                                   access
2 ns 2 ns 2 ns 2 ns 2 ns
             Program
             execution                              2             4             6             8            10         12            14     16
             order             Time
             (in instructions)
                                      Instruction                                    Data                                      Note that branch outcome is
                add $4, $5, $6                            Reg          ALU                        Reg
                                         fetch                                      access                                     computed in ID stage with
                                                    Instruction                                    Data                        added hardware (later…)
                 beq $1, $2, 40                        fetch
                                                                          Reg        ALU
                                                                                                  access
                                                                                                              Reg
                                         2ns
                                                                                Instruction                                 Data
                 lw $3, 300($0)                                       bubble       fetch
                                                                                                        Reg     ALU
                                                                                                                           access
                                                                                                                                     Reg
4 ns 2ns
                                                                  Pipeline stall
11/27/2010                                                                                                                                            10
Control Hazards
      Solution 2 Predict branch outcome
              e.g., predict branch-not-taken :
    Program
    execution                         2             4            6            8             10             12            14
    order             Time
    (in instructions)
                             Instruction                              Data
       add $4, $5, $6           fetch
                                         Reg               ALU
                                                                     access
                                                                                  Reg
                                          Instruction                               Data
        beq $1, $2, 40                                Reg              ALU                       Reg
                               2 ns          fetch                                 access
                                                        Instruction                                Data
        lw $3, 300($0)                                              Reg             ALU                         Reg
                                            2 ns           fetch                                  access
                                                Prediction success
   Program
   execution                          2             4            6            8             10             12            14
   order             Time
   (in instructions)
                             Instruction                              Data
       add $4, $5 ,$6                    Reg              ALU                     Reg
                                fetch                                access
                                       Instruction                                 Data
       beq $1, $2, 40                              Reg                ALU                    Reg
                                          fetch                                   access
                               2 ns
                                                        bubble       bubble       bubble         bubble     bubble
                                                                  Instruction                                    Data
             or $7, $8, $9                                                    Reg                 ALU                     Reg
                                                                     fetch                                      access
                                                   4 ns
11/27/2010                                                                                                                      11
                                 Prediction failure: undo (=flush) lw
    Control Hazards
    Solution 3 Delayed branch: always execute the sequentially next
     statement with the branch executing after one instruction delay –
     compiler’s job to find a statement that can be put in the slot that is
     independent of branch outcome
          MIPS does this – but it is an option in SPIM (Simulator -> Settings)
         Program
         execution                               2             4             6            8            10            12     14
         order             Time
         (in instructions)
                                                               Instruction                                   Data
                 lw $3, 300($0)                                                     Reg         ALU                   Reg
                                                     2 ns         fetch                                     access
2 ns
                                      2        4            6             8         10
                     Time
                                                                                         Instruction pipeline diagram:
      add $s0, $t0, $t1          IF       ID        EX           MEM           WB        shade indicates use –
                                                                                         left=write, right=read
         Program
         execution                    2        4        6             8        10
         order          Time
         (in instructions)
             add $s0, $t0, $t1   IF       ID       EX           MEM       WB
                                                                                         Without forwarding – blue line –
                                                                                         data has to go back in time;
                                                                                         with forwarding – red line –
             sub $t2, $s0, $t3
                                                                                         data is available in time
                                          IF       ID            EX       MEM       WB
11/27/2010                                                                                                                  13
Data Hazards
   Forwarding may not be enough
        e.g., if an R-type instruction following a load uses the result of the load –
         called load-use data hazard
                                            2            4            6            8          10         12        14
              Program        Time
              execution
              order
              (in instructions)
                                                                                                                   Without a stall it is impossible
                  lw $s0, 20($t1)      IF         ID            EX         MEM          WB
                                                                                                                   to provide input to the sub
                                                                                                                   instruction in time
                  sub $t2, $s0, $t3               IF            ID           EX        MEM          WB
                                            2          4             6           8           10        12      14
             Program         Time
             execution
             order
             (in instructions)
                                                                                                               With a one-stage stall, forwarding
                lw $s0, 20($t1)       IF         ID            EX         MEM          WB                      can get the data to the sub
                                                                                                               instruction in time
                                                bubble       bubble       bubble       bubble      bubble
 Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
                    Interchanged
sw $t2, 0($t1)
11/27/2010                           15
    Pipelined Datapath
           We now move to actually building a pipelined datapath
           First recall the 5 steps in instruction execution
       1.       Instruction Fetch & PC Increment (IF)
       2.       Instruction Decode and Register Read (ID)
       3.       Execution or calculate address (EX)
       4.       Memory access (MEM)
       5.       Write result into register (WB)
           Review: single-cycle processor
               all 5 steps done in a single clock cycle
               dedicated hardware required for each step
           What happens if we break the execution into multiple cycles, but keep
            the extra hardware?
    11/27/2010                                                                      16
Review - Single-Cycle Datapath
“Steps”
ADD
4 ADD
PC                                                                       <<2
                                  Instruction I
       ADDR        RD
                        32   16     32
                                              5    5      5
         Instruction
           Memory                         RN1     RN2   WN
                                                         RD1                                Zero
                                          Register File                            ALU
                                         WD
                                                         RD2              M
                                                                          U                    ADDR
                                                                          X
                                                                                                     Data
                                                                                                             RD
                                                        E                                           Memory               M
                                                                                                                         U
                                                   16   X     32                                                         X
                                                        T                                      WD
                                                        N
                                                        D
         IF
 11/27/2010                                       ID                          EX                   MEM              WB
                                                                                                                    17
 Instruction Fetch                  Instruction Decode             Execute/ Address Calc.   Memory Access         Write Back
Pipelined Datapath – Key Idea
    What happens if we break the execution into
     multiple cycles, but keep the extra hardware?
            Answer: We may be able to start executing a new
             instruction at each clock cycle - pipelining
    …but we shall need extra registers to hold data
     between cycles – pipeline registers
11/27/2010                                                     18
Pipelined Datapath
         4                                                                      ADD
                             64 bits                                128 bits
PC                                                                      <<2             97 bits                  64 bits
                                   Instruction I
       ADDR        RD
                        32    16     32
                                               5    5      5
         Instruction
           Memory                          RN1     RN2   WN
                                                          RD1
                                                                                          Zero
                                           Register File                       ALU
                                          WD
                                                          RD2             M
                                                                          U                   ADDR
                                                                          X
                                                                                                    Data
                                                                                                   Memory   RD          M
                                                          E                                                             U
                                                    16    X    32                                                       X
                                                          T                                   WD
                                                          N
                                                          D
         4                                                                      ADD
                             64 bits                                128 bits
PC                                                                      <<2             97 bits                  64 bits
                                   Instruction I
       ADDR        RD
                        32    16     32
                                               5    5      5
         Instruction
           Memory                          RN1     RN2   WN
                                                          RD1
                                                                                          Zero
                                           Register File                       ALU
                                          WD
                                                          RD2             M
                                                                          U                   ADDR
                                                                          X
                                                                                                    Data
                                                                                                   Memory   RD          M
                                                          E                                                             U
                                                    16    X    32                                                       X
                                                          T                                   WD
                                                          N
                                                          D
4 ADD
PC                                                                                 <<2
                                          Instruction I
       ADDR        RD
                        32           16     32
                                                      5    5      5
         Instruction
           Memory                                 RN1     RN2   WN
                                                                 RD1
                                                  Register File                          ALU
                                                 WD
                                                                 RD2                M
                                                                                    U                   ADDR
                                                                                    X
                                                                                                              Data
                                                                                                             Memory   RD             M
                                                                E                                                                    U
                                                           16   X     32                                                             X
                                                                T                                       WD
                                                                N
                                                                D
11/27/2010 21
                  ADD
                                                                             ADD
              4               64 bits                            133 bits
                                                                                      102 bits                69 bits
                                                                      <<2
PC
       ADDR         RD                5
                                          RN1          RD1
                         32                                                                Zero
         Instruction                      RN2
                                                                            ALU
                                      5
           Memory                               Register
                                      5
                                          WN      File RD2             M
                                          WD                           U                    ADDR
                                                                       X
                                                                                                   Data
                                                       E                                          Memory RD             M
                                                                                                                        U
                                                  16   X 32                                                             X
                                                       T                                    WD
                                                       N
                                  5                    D
11/27/2010                                          23
 Single-Clock-Cycle Diagram:
 Clock Cycle 1
        LW
                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D
 11/27/2010                                                                                                    24
 Single-Clock-Cycle Diagram:
 Clock Cycle 2
        SW                                LW
                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D
 11/27/2010                                                                                                    25
 Single-Clock-Cycle Diagram:
 Clock Cycle 3
       ADD                                SW                             LW
                  ADD
                                                                               ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                              ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                        ADDR
                                                                     X
                                                                                                     Data
                                                      E                                             Memory   RD         M
                                                                                                                        U
                                                 16   X   32                                                            X
                                                      T                                       WD
                                                      N
                                 5
                                                      D
 11/27/2010                                                                                                        26
 Single-Clock-Cycle Diagram:
 Clock Cycle 4
       SUB                               ADD                             SW                         LW
                  ADD
                                                                               ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                              ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                        ADDR
                                                                     X
                                                                                                     Data
                                                      E                                             Memory   RD         M
                                                                                                                        U
                                                 16   X   32                                                            X
                                                      T                                       WD
                                                      N
                                 5
                                                      D
 11/27/2010                                                                                                        27
 Single-Clock-Cycle Diagram:
 Clock Cycle 5
                                           SUB                       ADD                        SW                 LW
                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D
 11/27/2010                                                                                                    28
 Single-Clock-Cycle Diagram:
 Clock Cycle 6
                                                                     SUB                    ADD                    SW
                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D
 11/27/2010                                                                                                    29
 Single-Clock-Cycle Diagram:
 Clock Cycle 7
                                                                                            SUB                ADD
                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D
 11/27/2010                                                                                                    30
 Single-Clock-Cycle Diagram:
 Clock Cycle 8
                                                                                                               SUB
                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
        ADDR        RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D
 11/27/2010                                                                                                    31
  Alternative View –
  Multiple-Clock-Cycle Diagram
                     CC 1   CC 2   CC 3   CC 4    CC 5    CC 6      CC 7      CC 8
                                                                  Time axis
lw $t0, 10($t1)       IM    REG     ALU     DM      REG
  11/27/2010                                                                         32
Notes
   One significant difference in the execution of an R-type instruction
    between multicycle and pipelined implementations:
     register write-back for the R-type instruction is the 5th (the last
       write-back) pipeline stage vs. the 4th stage for the multicycle
       implementation. Why?
     think of structural hazards when writing to the register file…
   Worth repeating: the essential difference between the pipeline
    and multicycle implementations is the insertion of pipeline
    registers to decouple the 5 stages
   The CPI of an ideal pipeline (no stalls) is 1. Why?
   The RaVi Architecture Visualization Project of Dortmund U. has
    pipeline simulations – see link in our Additional Resources page
   As we develop control for the pipeline keep in mind that the text
    does not consider jump – should not be too hard to implement!
11/27/2010                                                             33
Recall Single-Cycle Control –
the Datapath
                                                                                                                                      0
                                                                                                                                      M
                                                                                                                                      u
                                                                                                                                      x
                                                                                                                       ALU
                                                                                                                 Add result           1
                 Add                                                                                  Shift                               PCSrc
                                                                    RegDst                           left 2
       4                                                            Branch
                                                                    MemRead
                                   Instruction [31 26]              MemtoReg
                                                         Control
                                                                    ALUOp
                                                                    MemWrite
                                                                    ALUSrc
                                                                    RegWrite
Instruction [5 0]
11/27/2010                                                                                                                                                    34
Recall Single-Cycle – ALU Control
    Instruction AluOp Instruction Funct Field Desired     ALU control
    opcode            operation              ALU action input
    LW          00      load word     xxxxxx       add             010
    SW          00      store word    xxxxxx       add             010
    Branch eq   01      branch eq     xxxxxx       subtract        110
    R-type      10      add           100000       add             010
    R-type      10      subtract      100010       subtract        110
    R-type      10      AND           100100       and             000
    R-type      10      OR            100101       or              001
    R-type      10      set on less   101010       set on less     111
 RegDst          The register destination number for the              The register destination number for the
                Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
 RegWrite        None                                              The register on the Write register input is written
                                                                               with the value on the Write data input
 AlLUSrc          The second ALU operand comes from the                 The second ALU operand is the sign-extended,
                second register file output (Read data 2)            lower 16 bits of the instruction
 PCSrc           The PC is replaced by the output of the adder          The PC is replaced by the output of the adder
                   that computes the value of PC + 4                    that computes the branch target
 MemRead         None                                                           Data memory contents designated by the address
                                                                                input are put on the first Read data output
 MemWrite        None                                                           Data memory contents designated by the address
                                                                                input are replaced by the value of the Write data input
 MemtoReg        The value fed to the register Write data input                 The value fed to the register Write data input
               comes from the ALU                                               comes from the data memory
    11/27/2010                                                                               37
Pipelined Datapath with Control I
                                                                                                                                                                  PCSrc
                           0
                           M
                            u
                            x
                           1
Add
                                                                                                                                 Add
             4                                                                                                          Add
                                                                                                                               result
                                                                                                                                               Branch
                                                                                                      Shift
                                                                        RegWrite                     left 2
                                                             Read                                                                                               MemWrite
                                            Instruction
datapath RegDst
11/27/2010                                                                                                                                                                                            38
Pipeline Control Signals
                  Instruction
                                Control    M       WB
EX M WB
        Note: The 6-bit funct field of the instruction required in the EX stage
         to generate ALU control can be retrieved as the 6 least significant
         bits of the immediate field which is sign-extended and passed from
         the IF/ID register to the ID/EX register
11/27/2010                                                                         40
Pipelined Datapath with Control II
                       PCSrc
                                                                                              ID/EX
                      0
                      M
                       u                                                                       WB
                       x                                                                                                                 EX/MEM
                      1
                                                                               Control         M                                          WB
                                                                                                                                                                                     MEM/WB
                                                                                               EX                                          M                                          WB
                               IF/ID
Add
                                                                                                                             Add
            4                                                                                                          Add result
                                                                   RegWrite
                                                                                                                                                  Branch
                                                                                                       Shift
                                                                                                      left 2
                                                                                                                                                                MemWrite
                                                                                                                                ALUSrc
                                                        Read
                                                                                                                                                                                              MemtoReg
                                       Instruction
       PC   Address                                     register 1
                                                                         Read
                                                                        data 1
                                                        Read
                                                        register 2                                                               Zero
                 Instruction
                                                               Registers Read                                                ALU ALU
                   memory                               Write                                                   0                                                             Read
                                                                        data 2                                                  result                Address                                      1
                                                        register                                                M                                                             data
                                                                                                                 u                                             Data                                M
                                                        Write                                                    x                                            memory                                u
                                                        data                                                                                                                                        x
                                                                                                                1
                                                                                                                                                                                                   0
                                                                                                                                                      Write
                                                                                                                                                      data
                                                     Instruction 16                      32                    6
                                                     [15– 0]                   Sign                                    ALU
     Control signals                                                          extend                                  control
                                                                                                                                                                           MemRead
Pipelined
                                                           IF/ID                                                                  ID/EX                                      EX/MEM                                         MEM/WB
                                                   0
                                                   M                                                                     00               00
                                                    u                                                                              WB
                                                    x
                                                   1                                                                     000              000                                       00
                                                                                                               Control             M                                          WB
                                                                                                                                      0                                             0                                           0
                                                                                                                         0000         00                                            0
                                                                                                                                   EX                                          M                                             WB 0
Execution
                                                                                                                                      0                                             0
Add
                                                                                                                                                                 Add
                                         4                                                                                                                 Add result
                                                                                                    RegWrite
                                                                                                                                           Shift                                         Branch
                                                                                                                                          left 2
                                                                                                                                                                                                       MemWrite
                                                                                                                                                                    ALUSrc
and
                                                                                         Read
                                                                                                                                                                                                                                     MemtoReg
                                                                   Instruction
                                  PC     Address                                         register 1       Read
                                                                                         Read            data 1
                                                                                         register 2                                                                  Zero
                                             Instruction
                                                                                                Registers Read                                                   ALU ALU
                                               memory                                    Write                                                      0                                                                Read
                                                                                                         data 2                                                     result                   Address                                      1
                                                                                         register                                                   M                                                                data
                                                                                                                                                     u                                                Data                                M
                                                                                         Write                                                       x                                               memory                                u
                                                                                         data                                                                                                                                              x
                                                                                                                                                    1
Control
                                                                                                                                                                                                                                          0
                                                                                                                                                                                             Write
                                                                                                                                                                                             data
                                                                                      Instruction
                                                                                      [15– 0]                   Sign                                       ALU                                                    MemRead
                                                                                                               extend                                     control
                                                                                      Instruction
                                                                                      [20– 16]
                                                                                                                                                   0             ALUOp
      Instruction
                                                                                                                                                   1
                                         Clock 1
                                                                                                                                                       RegDst
sequence: IF: sub $11, $2, $3 ID: lw $10, 20($1) EX: before<1> MEM: before<2> WB: before<3>
     lw      $10,   20($1)                         1
                                                    u
                                                    x
                                                                                 lw
                                                                                                                         11
                                                                                                                         010
                                                                                                                                   WB
                                                                                                                                          00
                                                                                                                                          000                                       00
                                                                                                               Control             M                                          WB
Add
                                                                                                    RegWrite
                                                                                                                                           Shift                                         Branch
                                                                                                                                          left 2
                                                                                                                                                                                                       MemWrite
     add     $14,   $8, $9                                                       1       Read
                                                                                                                                                                    ALUSrc
                                                                                                                                                                                                                                     MemtoReg
                                                                   Instruction
                                                                                         register 1
                                  PC     Address                                                          Read $1
                                                                                 X                       data 1
                                                                                         Read
                                                                                         register 2                                                                  Zero
                                             Instruction
                                                                                                Registers Read $X                                                ALU ALU
                                               memory                                    Write                                                      0                                                                Read
                                                                                                         data 2                                                     result                   Address                                      1
                                                                                         register                                                   M                                                                data
                                                                                                                                                     u                                                Data                                M
                                                                                         Write                                                       x                                               memory                                u
                                                                                         data                                                                                                                                              x
                                                                                                                                                    1
Instruction
                                                                                      Instruction
                                                                                                                Sign
                                                                                                               extend
                                                                                                                           20                              ALU
                                                                                                                                                          control
                                                                                                                                                                                                                  MemRead
     lw                                                                          10   [20– 16]                             10
                                                                                                                                                   0             ALUOp
                             Clock cycle 2
                                                                                                                                                   M
                                                                                      Instruction                                                   u
                                                                                 X    [15– 11]                                X                     x
                                                                                                                                                   1
11/27/2010                               Clock 2                                                                                                        RegDst                                                                                  42
                                    IF: and $12, $4, $5           ID: sub $11, $2, $3                                                        EX: lw $10, . . .                            MEM: before<1>                              WB: before<2>
Pipelined
                                                               IF/ID                                                                    ID/EX                                       EX/MEM                                         MEM/WB
                                                      0
                                                      M                                                                       10                11
                                                       u                                                                                 WB
                                                       x
                                                      1                              sub                                      000               010                                        00
                                                                                                                    Control              M                                           WB
                                                                                                                                            0                                              0                                           0
                                                                                                                              1100          00                                             0
                                                                                                                                         EX                                           M                                             WB 0
                                                                                                                                            1                                              0
Execution                                   4
                                                      Add
                                                                                                                                                                        Add
                                                                                                                                                                  Add result
                                                                                                         RegWrite
                                                                                                                                                  Shift                                         Branch
                                                                                                                                                 left 2
                                                                                                                                                                                                              MemWrite
                                                                                                                                                                           ALUSrc
                                                                                     2        Read
                                                                                                                                                                                                                                            MemtoReg
                                                                       Instruction
and
                                     PC     Address                                           register 1       Read $2                          $1
                                                                                     3        Read            data 1
                                                                                              register 2                                                                    Zero
                                                 Instruction
                                                                                                     Registers Read $3                                                  ALU ALU
                                                   memory                                     Write                                                        0                                                                Read
                                                                                                              data 2                                                       result                   Address                                      1
                                                                                              register                                                     M                                                                data
                                                                                                                                                            u                                                Data                                M
                                                                                              Write                                                         x                                               memory                                u
                                                                                              data                                                                                                                                                x
                                                                                                                                                           1
                                                                                                                                                                                                                                                 0
                                                                                                                                                                                                    Write
Control
                                                                                                                                                                                                    data
                                                                                           Instruction
                                                                                     X     [15– 0]                   Sign          X            20                ALU                                                    MemRead
                                                                                                                    extend                                       control
                                                                                           Instruction
                                                                                     X     [20– 16]                                X            10
                                                                                                                                                          0             ALUOp
                             Clock cycle 3                                           11
                                                                                           Instruction
                                                                                           [15– 11]                                11
                                                                                                                                                          M
                                                                                                                                                          1
                                                                                                                                                           u
                                                                                                                                                           x
sequence: IF: or $13, $6, $7 ID: and $12, $2, $3 EX: sub $11, . . . MEM: lw $10, . . . WB: before<1>
    lw       $10,   20($1)                            1
                                                       x
                                                                                     and
                                                                                                                    Control
                                                                                                                              000
                                                                                                                                         M
                                                                                                                                                000
                                                                                                                                                                                     WB
                                                                                                                                                                                           11
1 0 0
                                                                                                                                                                        Add
                                                                                                                                                                  Add result
                                                                                                         RegWrite
    or       $13,   $6, $7                                                                                                                        Shift
                                                                                                                                                 left 2
                                                                                                                                                                                                Branch
                                                                                                                                                                                                              MemWrite
                                                                                                                                                                           ALUSrc
                                                                                                                                                                                                                                            MemtoReg
                                                                       Instruction
                                                                                              register 1
                                     PC     Address                                                            Read $4                          $2
                                                                                     5                        data 1
                                                                                              Read
                                                                                              register 2                                                                    Zero
                                                 Instruction
                                                                                                     Registers Read $5                          $3                      ALU ALU
                                                   memory                                     Write                                                        0                                        Address                 Read
                                                                                                              data 2                                                       result                                                                1
                                                                                              register                                                     M                                                                data
                                                                                                                                                            u                                                Data                                M
                                                                                              Write                                                         x                                                                                     u
                                                                                                                                                                                                            memory                                x
                                                                                              data                                                         1
                                                                                                                                                                                                                                                 0
                                                                                                                                                                                                    Write
                                                                                                                                                                                                    data
                                                                                           Instruction
                                                                                     X     [15– 0]                   Sign          X                              ALU                                                    MemRead
                                                                                                                    extend                                       control
                                                                                           Instruction
                                                                                     X     [20– 16]                                X
                                                                                                                                                          0             ALUOp
                                                                                                                                                          M                                10
                             Clock cycle   4
                                       Clock 4
                                                                                     12
                                                                                           Instruction
                                                                                           [15– 11]                                12           11
                                                                                                                                                          1
                                                                                                                                                           u
                                                                                                                                                           x
11/27/2010                                                                                                                                                     RegDst
                                                                                                                                                                                                                                                       43
                                  IF: add $14, $8, $9          ID: or $13, $6, $7                                                         EX: and $12, . . .                          MEM: sub $11, . . .                         WB: lw $10, . . .
Pipelined
                                                   M                                                                       10                10
                                                    u                                                                                 WB
                                                    x
                                                   1                              or                                       000               000                                       10
                                                                                                                 Control              M                                          WB
                                                                                                                                         1                                             0                                           1
                                                                                                                           1100          10                                            0
                                                                                                                                      EX                                          M                                             WB 1
                                                                                                                                         0                                             0
Execution
                                                   Add
                                                                                                                                                                    Add
                                          4                                                                                                                   Add result
                                                                                                      RegWrite
                                                                                                                                               Shift                                        Branch
                                                                                                                                              left 2
                                                                                                                                                                                                          MemWrite
                                                                                                                                                                       ALUSrc
                                                                                  6        Read
                                                                                                                                                                                                                                        MemtoReg
                                                                    Instruction
                                   PC    Address                                           register 1       Read $6                          $4
and
                                                                                  7        Read            data 1
                                                                                           register 2                                                                   Zero
                                              Instruction                                                                                    $5
                                                                                                  Registers Read $7                                                 ALU ALU
                                                memory                            10       Write                                                       0                                                                Read
                                                                                                           data 2                                                      result                   Address                                      1
                                                                                           register                                                    M                                                                data
                                                                                                                                                        u                                                Data                                M
                                                                                           Write                                                        x                                               memory                               u
                                                                                           data                                                                                                                                              x
                                                                                                                                                       1
                                                                                                                                                                                                                                             0
                                                                                                                                                                                                Write
                                                                                                                                                                                                data
Control
                                                                                        Instruction
                                                                                  X     [15– 0]                   Sign          X                             ALU                                                    MemRead
                                                                                                                 extend                                      control
                                                                                        Instruction
                                                                                  X     [20– 16]                                X
                                                                                                                                                       0            ALUOp
                             Clock cycle 5                                        13
                                                                                        Instruction
                                                                                        [15– 11]                                13           12
                                                                                                                                                       M
                                                                                                                                                       u
                                                                                                                                                       x
                                                                                                                                                                                       11                                          10
Clock 5 1
     Instruction
                                                                                                                                                           RegDst
                                  IF: after<1>                 ID: add $14, $8, $9                                                        EX: or $13, . . .                           MEM: and $12, . . .                         WB: sub $11, . . .
    sequence:
                                                            IF/ID                                                                    ID/EX                                      EX/MEM                                         MEM/WB
                                                   0
                                                   M                                                                       10                10
                                                    u                                                                                 WB
    lw       $10,   20($1)                         1
                                                    x
                                                                                  add
                                                                                                                 Control
                                                                                                                           000
                                                                                                                                      M
                                                                                                                                             000
                                                                                                                                                                                 WB
                                                                                                                                                                                       10
1 0 1
                                                                                                                                                                    Add
                                                                                                                                                              Add result
                                                                                                      RegWrite
    or       $13,   $6, $7                                                                                                                     Shift
                                                                                                                                              left 2
                                                                                                                                                                                            Branch
                                                                                                                                                                                                          MemWrite
                                                                                                                                                                       ALUSrc
                                                                                  8
                                                                                                                                                                                                                                        MemtoReg
                                                                    Instruction
                                                                                           register 1
                                   PC    Address                                                            Read $8                          $6
                                                                                  9                        data 1
                                                                                           Read
                                                                                           register 2                                                                    Zero
                                              Instruction
                                                                                                  Registers Read $9                          $7                     ALU ALU
                                                memory                            11       Write                                                       0                                                                Read
                                                                                                           data 2                                                      result                   Address                                      1
                                                                                           register                                                    M                                                                data
                                                                                                                                                        u                                                Data                                M
                                                                                           Write                                                        x                                               memory                               u
                                                                                           data                                                                                                                                              x
                                                                                                                                                       1
Instruction
                                                                                        Instruction
                                                                                  X     [20– 16]                                X
                                                                                                                                                       0            ALUOp
                             Clock cycle 6                                        14
                                                                                        Instruction
                                                                                        [15– 11]                                14           13
                                                                                                                                                       M
                                                                                                                                                       u
                                                                                                                                                       x
                                                                                                                                                       1
                                                                                                                                                                                       12                                          11
Pipelined
                                                    M                                                                       00             10
                                                     u                                                                              WB
                                                     x
                                                    1                                                                       000            000                                        10
                                                                                                                  Control           M                                           WB
                                                                                                                                       1                                              0                                           1
                                                                                                                            0000       10                                             0
                                                                                                                                    EX                                           M                                             WB 0
                                                                                                                                       0                                              0
Execution
                                                    Add
                                                                                                                                                                   Add
                                           4                                                                                                                 Add result
                                                                                                       RegWrite
                                                                                                                                             Shift                                         Branch
                                                                                                                                            left 2
                                                                                                                                                                                                         MemWrite
                                                                                                                                                                      ALUSrc
                                                                                            Read
                                                                                                                                                                                                                                       MemtoReg
                                                                      Instruction
                                    PC    Address                                           register 1       Read                          $8
and
                                                                                            Read            data 1
                                                                                            register 2                                                                 Zero
                                                Instruction                                                                                $9
                                                                                                   Registers Read                                                  ALU ALU
                                                  memory                            12      Write                                                     0                                                                Read
                                                                                                            data 2                                                    result                   Address                                      1
                                                                                            register                                                  M                                                                data
                                                                                                                                                       u                                                Data                                M
                                                                                            Write                                                      x                                               memory                                u
                                                                                            data                                                                                                                                             x
                                                                                                                                                      1
                                                                                                                                                                                                                                            0
                                                                                                                                                                                               Write
                                                                                                                                                                                               data
                                                                                         Instruction
                                                                                         [20– 16]
                                                                                                                                                     0             ALUOp
                                                                                                                                                     M                                13                                          12
                                                                                         Instruction                                                  u
                                                                                         [15– 11]                                          14         x
                                                                                                                                                     1
                                               Clock 7                                                                                                    RegDst
Instruction IF: after<3> ID: after<2> EX: after<1> MEM: add $14, . . . WB: or $13, . . .
     sequence:                                      0
                                                    M
                                                              IF/ID
                                                                                                                            00
                                                                                                                                   ID/EX
                                                                                                                                           00
                                                                                                                                                                               EX/MEM                                         MEM/WB
                                                     u                                                                              WB
                                                     x
                                                    1                                                                       000            000                                        10
                                                                                                                  Control           M                                           WB
                                                                                                                                                                   Add
     and     $12,   $4, $7                 4                                                                                                                 Add result
                                                                                                       RegWrite
                                                                                                                                             Shift                                         Branch
                                                                                                                                            left 2
                                                                                                                                                                                                         MemWrite
     or      $13,   $6, $7                                                                  Read
                                                                                                                                                                      ALUSrc
                                                                                                                                                                                                                                       MemtoReg
                                                                      Instruction
                                    PC    Address                                           register 1
                                                                                                             Read
                                                                                                            data 1
                                                                                         Instruction
                                                                                         [15– 0]                   Sign                                      ALU                                                    MemRead
                                                                                                                  extend                                    control
                                                                                         Instruction
                                                                                         [20– 16]
                                                                                                                                                     0             ALUOp
                                                                                                                                                     1
                                                                                                                                                      u
                                                                                                                                                      x
                                                                                                                                                                                      14                                          13
Instruction IF: after<4> ID: after<3> EX: after<2> MEM: after<1> WB: add $14, . . .
                                                                                                                       0000
                                                                                                                               M
                                                                                                                                  0
                                                                                                                                  00
                                                                                                                                                                           WB
                                                                                                                                                                                 0
                                                                                                                                                                                 0
                                                                                                                                                                                                                             1
                                                                                                                               EX                                           M                                             WB 0
    sub    $11,   $2, $3                                                                                                          0                                              0
                                               Add
    and    $12,   $4, $7              4
                                                                                                                                                              Add
                                                                                                                                                        Add result
or $13, $6, $7
                                                                                                  RegWrite
                                                                                                                                        Shift                                         Branch
                                                                                                                                       left 2
                                                                                                                                                                                                    MemWrite
    add    $14,   $8, $9                                                               Read
                                                                                                                                                                 ALUSrc
                                                                                                                                                                                                                                  MemtoReg
                                                                 Instruction
                                                                                    Instruction
                                                                                    [15– 0]                   Sign                                      ALU                                                    MemRead
                                                                                                             extend                                    control
                                                                                    Instruction
                                                                                    [20– 16]
                                                                                                                                                0             ALUOp
                                                                                                                                                M                                                                            14
                                                                                                                                                 u
                           Clock cycle 9                                            Instruction
                                                                                    [15– 11]
                                                                                                                                                1
                                                                                                                                                 x
Clock 9 RegDst
    11/27/2010                                                                                                                                                                                                                               46
Revisiting Hazards
    So far our datapath and control have ignored
     hazards
    We shall revisit data hazards and control
     hazards and enhance our datapath and control
     to handle them in hardware…
11/27/2010                                          47
      Data Hazards and Forwarding
       Problem with starting an instruction before previous are finished:
             data dependencies that go backward in time – called data hazards
or     $13,        $6, $2
add    $14,        $2, $2        or $13, $6, $2                          IM     Reg              DM     Reg
sw     $15,        100($2)
                                 add $14, $2, $2                                IM      Reg             DM     Reg
      11/27/2010                                                                                                             48
Software Solution
    Have compiler guarantee never any data hazards!
         by rearranging instructions to insert independent instructions
          between instructions that would otherwise have a data hazard
          between them,
         or, if such rearrangement is not possible, insert nops
    sub      $2,    $1, $3                    sub         $2,      $1, $3
    lw       $10, 40($3)                      nop
    slt      $5, $6, $7                       nop
    and      $12, $2, $5             or       and         $12,      $2, $5
    or       $13, $6, $2                      or          $13,      $6, $2
    add      $14, $2, $2                      add         $14,      $2, $2
    sw       $15, 100($2)                     sw          $15,      100($2)
    Such compiler solutions may not always be possible, and nops
     slow the machine down
11/27/2010
                             MIPS: nop = “no operation” = 00…0 (32bits) = sll $0, $0, 0   49
Hardware Solution: Forwarding
11/27/2010                                                         50
Pipelined Datapath with Control
II (as before)
                      PCSrc
                                                                                             ID/EX
                     0
                     M
                      u                                                                       WB
                      x                                                                                                                 EX/MEM
                     1
                                                                              Control         M                                          WB
                                                                                                                                                                                    MEM/WB
                                                                                              EX                                          M                                          WB
                              IF/ID
Add
                                                                                                                            Add
           4                                                                                                          Add result
                                                                  RegWrite
                                                                                                                                                 Branch
                                                                                                      Shift
                                                                                                     left 2
                                                                                                                                                               MemWrite
                                                                                                                               ALUSrc
                                                       Read
                                                                                                                                                                                             MemtoReg
                                      Instruction
      PC   Address                                     register 1
                                                                        Read
                                                                       data 1
                                                       Read
                                                       register 2                                                               Zero
                Instruction
                                                              Registers Read                                                ALU ALU
                  memory                               Write                                                   0                                                             Read
                                                                       data 2                                                  result                Address                                      1
                                                       register                                                M                                                             data
                                                                                                                u                                             Data                                M
                                                       Write                                                    x                                            memory                                u
                                                       data                                                                                                                                        x
                                                                                                               1
                                                                                                                                                                                                  0
                                                                                                                                                     Write
                                                                                                                                                     data
                                                    Instruction 16                      32                    6
                                                    [15– 0]                   Sign                                    ALU
    Control signals                                                          extend                                  control
                                                                                                                                                                          MemRead
                           Program
                           execution order
                           (in instructions)
                              sub $2, $1, $3     IM          Reg           DM      Reg
 11/27/2010                                                                                                             53
                                 Dependencies between pipelines move forward in time
                                    ID/EX                                  EX/MEM                  MEM/WB
Hardware                                                                                 Data
                                                                                        memory              M
                                                                                                            u
                                                                                                            x
                                               M
                                               u
                                               x
                      Registers
                                                   ForwardA         ALU
                                               M                                         Data
                                               u                                        memory
                                               x                                                            M
                                                                                                            u
                                                                                                            x
                                        Rs   ForwardB
                                        Rt
                                        Rt     M
                                               u                                    EX/MEM.RegisterRd
                                        Rd
                                               x
                                                              Forwarding            MEM/WB.RegisterRd
                                                                 unit
 11/27/2010                                                                                                 54
              b. With forwarding Datapath after adding forwarding hardware
Forwarding Hardware:
Multiplexor Control
11/27/2010                                                                              55
Data Hazard: Detection and
Forwarding
        Forwarding unit determines multiplexor control according to the
         following rules:
1.        EX hazard
        if (      EX/MEM.RegWrite                       // if there is a write…
             and ( EX/MEM.RegisterRd ≠ 0 )                // to a non-$0 register…
             and ( EX/MEM.RegisterRd = ID/EX.RegisterRs ) ) // which matches, then…
          ForwardA = 10
11/27/2010                                                                            56
               Data Hazard: Detection and
               Forwarding
2.        MEM hazard
        if (      MEM/WB.RegWrite                       // if there is a write…
            and ( MEM/WB.RegisterRd ≠ 0 )                 // to a non-$0 register…
            and ( EX/MEM.RegisterRd ≠ ID/EX.RegisterRs )      // and not already a register match
                                                    // with earlier pipeline register…
            and ( MEM/WB.RegisterRd = ID/EX.RegisterRs ) ) // but match with later pipeline
                                                       register, then…
        ForwardA = 01
           This check is necessary, e.g., for sequences such as add $1, $1, $2; add $1, $1, $3; add $1, $1, $4;
           (array summing?), where an earlier pipeline (EX/MEM) register has more recent data
     11/27/2010                                                                                                   57
                    Forwarding Hardware with
                    Control                                  ID/EX
                                                                          Called forwarding unit, not hazard detection unit,
                                                                          because once data is forwarded there is no hazard!
                                                              WB
                                                                                               EX/MEM
                                            Control           M                                 WB
                                                                                                                       MEM/WB
IF/ID EX M WB
                                                                            M
                            Instruction
                                                                            u
                                                                            x
                                                 Registers
      Instruction                                                                                             Data
PC                                                                                   ALU
        memory                                                                                               memory              M
                                                                                                                                 u
                                                                            M                                                    x
                                                                            u
                                                                            x
                                          IF/ID.RegisterRs           Rs
                                          IF/ID.RegisterRt           Rt
                                          IF/ID.RegisterRt           Rt
                                                                            M                           EX/MEM.RegisterRd
                                          IF/ID.RegisterRd           Rd     u
                                                                            x
                                                                                  Forwarding            MEM/WB.RegisterRd
                                                                                     unit
                                                                                                   ID/EX
                                                                                              10           10
                                                                                                    WB
                                                                                                                                       EX/MEM
Control M WB
Forwarding
                                                                                                                                                                MEM/WB
IF/ID EX M WB
                                                                          2                   $2           $1
                                                                                                                   M
                                                            Instruction
                                                                          5                                        u
                                                                                                                   x
                                                                                  Registers
                                      Instruction                                                                                                       Data
                               PC                                                                                            ALU
                                        memory                                                                                                         memory             M
                                                                                              $5           $3
                                                                                                                                                                          u
                                                                                                                   M                                                      x
                                                                                                                   u
                                                                                                                   x
                                                                                               2           1
                                                                                               5           3
                                                                                                                   M
                                                                                               4           2       u
                                                                                                                   x
                                                                                                                          Forwarding
Execution Clock 3
example: add $9, $4, $2 or $4, $4, $2 and $4, $2, $5 sub $2, . . . before<1>
                                                                                                   ID/EX
                                                                                              10           10
                                                                                                    WB
sub    $2,   $1,   $3                                                                                                                  EX/MEM
                                                                                                                                              10
                                                                              Control               M                                   WB
                                                                                                                                                                MEM/WB
and    $4,   $2,   $5                                                                               EX                                   M                       WB
                                                    IF/ID
or     $4,   $4,   $2                                                     4                   $4           $2
                                                                          6                                        u
                                                                                                                   x
                                                                                  Registers
                                      Instruction                                                                                                       Data
                               PC                                                                                            ALU
                                        memory                                                                                                         memory             M
                                                                                              $2           $5
                                                                                                                                                                          u
                                                                                                                   M                                                      x
                                                                                                                   u
                                                                                                                   x
                                                                                               2           2
                                                                                               6           5
                                                                                                                   M                          2
                                                                                               4           4       u
                                                                                                                   x
                                                                                                                          Forwarding
11/27/2010
                        Clock cycle 4                                                                                        unit
                                                                                                                                                                                59
                            Clock 4
                             after<1>                     add $9, $4, $2                                        or $4, $4, $2                       and $4, . . .           sub $2, . . .
                                                                                                          ID/EX
                                                                                                     10           10
                                                                                                           WB
                                                                                                                                              EX/MEM
                                                                                                                                                     10
                                                                                    Control                M                                   WB
                                                                                                                                                                        MEM/WB
Forwarding
                                                                                                                                                                              1
                                                      IF/ID                                                EX                                   M                        WB
                                                                            4                        $4           $4
                                                                                                                           M
                                                              Instruction
                                                                            2                                              u
                                                                                                                           x
                                                                                         Registers
                                        Instruction                             2                                                                               Data
                                PC                                                                                                  ALU
                                          memory                                                                                                               memory                M
                                                                                                     $2           $2
                                                                                                                                                                                     u
                                                                                                                           M                                                         x
                                                                                                                           u
                                                                                                                           x
                                                                                                      4           4
                                                                                                      2           2
                                                                                                                           M                         4                        2
                                                                                                      9           4        u
     Execution
                                                                                                                           x
                                                                                                                                Forwarding
example Clock 5
                                                                                                          ID/EX
                                                                                                                  10
                                                                                                           WB
sub    $2,   $1,   $3                                                                                                                         EX/MEM
                                                                                                                                                     10
                                                                                    Control                M                                   WB
                                                                                                                                                                        MEM/WB
and    $4,   $2,   $5                                                                                      EX                                   M                        WB
                                                                                                                                                                              1
                                                      IF/ID
or     $4,   $4,   $2
                                                                                                                  $4
                                                                                                                           u
                                                                                                                           x
                                                                                         Registers
                                        Instruction                             4                                                                               Data
                                PC                                                                                                  ALU
                                          memory                                                                                                               memory                M
                                                                                                                  $2
                                                                                                                                                                                     u
                                                                                                                           M                                                         x
                                                                                                                           u
                                                                                                                           x
                                                                                                                  4
                                                                                                                  2
                                                                                                                           M                         4                        4
                                                                                                                  9        u
                                                                                                                           x
                                                                                                                                 Forwarding
11/27/2010
                        Clock cycle 6                                                                                               unit
                                                                                                                                                                                      60
                             Clock 6
 Data Hazards and Stalls
       Load word can still cause a hazard:
             an instruction tries to read a register following a load instruction that writes
              to the same register
As even a pipeline
                           or $8, $2, $6                      IM     Reg           DM     Reg
dependency goes
backward in time
                           add $9, $4, $2                            IM     Reg           DM     Reg
forwarding will not
solve the hazard
                           slt $1, $6, $7                                   IM     Reg           DM     Reg
             therefore, we need a hazard detection unit to stall the pipeline after the
 11/27/2010
              load instruction                                                                                 61
Pipelined Datapath with Control II
(as before)
                      PCSrc
                                                                                             ID/EX
                     0
                     M
                      u                                                                       WB
                      x                                                                                                                 EX/MEM
                     1
                                                                              Control         M                                          WB
                                                                                                                                                                                    MEM/WB
                                                                                              EX                                          M                                          WB
                              IF/ID
Add
                                                                                                                            Add
           4                                                                                                          Add result
                                                                  RegWrite
                                                                                                                                                 Branch
                                                                                                      Shift
                                                                                                     left 2
                                                                                                                                                               MemWrite
                                                                                                                               ALUSrc
                                                       Read
                                                                                                                                                                                             MemtoReg
                                      Instruction
      PC   Address                                     register 1
                                                                        Read
                                                                       data 1
                                                       Read
                                                       register 2                                                               Zero
                Instruction
                                                              Registers Read                                                ALU ALU
                  memory                               Write                                                   0                                                             Read
                                                                       data 2                                                  result                Address                                      1
                                                       register                                                M                                                             data
                                                                                                                u                                             Data                                M
                                                       Write                                                    x                                            memory                                u
                                                       data                                                                                                                                        x
                                                                                                               1
                                                                                                                                                                                                  0
                                                                                                                                                     Write
                                                                                                                                                     data
                                                    Instruction 16                      32                    6
                                                    [15– 0]                   Sign                                    ALU
    Control signals                                                          extend                                  control
                                                                                                                                                                          MemRead
11/27/2010                                                                        63
Mechanics of Stalling
    If the check to stall verifies, then the pipeline needs to stall only 1
     clock cycle after the load as after that the forwarding unit can
     resolve the dependency
    What the hardware does to stall the pipeline 1 cycle:
            does not let the IF/ID register change (disable write!) – this will cause
             the instruction in the ID stage to repeat, i.e., stall
            therefore, the instruction, just behind, in the IF stage must be stalled
             as well – so hardware does not let the PC change (disable write!) –
             this will cause the instruction in the IF stage to repeat, i.e., stall
            changes all the EX, MEM and WB control fields in the ID/EX pipeline
             register to 0, so effectively the instruction just behind the load
             becomes a nop – a bubble is said to have been inserted into the
             pipeline
                note that we cannot turn that instruction into an nop by 0ing all the bits
                 in the instruction itself – recall nop = 00…0 (32 bits) – because it has
                 already been decoded and control signals generated
11/27/2010                                                                              64
  Hazard Detection Unit
                                                      Hazard                    ID/EX.MemRead
                                                     detection
                                                        unit                              ID/EX
                                                                                           WB
                          IF/IDWrite
                                                                                                                        EX/MEM
                                                                                      M
                                                                 Control              u    M                             WB
                                                                                      x                                                         MEM/WB
                                                                            0
                            IF/ID                                                          EX                             M                          WB
  PCWrite
                                                                                                       M
                                       Instruction
                                                                                                       u
                                                                                                       x
                                                                     Registers
            Instruction                                                                                                               Data
      PC                                                                                                      ALU
              memory                                                                                                                 memory               M
                                                                                                                                                          u
                                                                                                       M                                                  x
                                                                                                       u
                                                                                                       x
                                                                   IF/ID.RegisterRs
                                                                   IF/ID.RegisterRt
                                                                   IF/ID.RegisterRt               Rt   M                         EX/MEM.RegisterRd
                                                                   IF/ID.RegisterRd               Rd   u
                                                                                                       x
                                                                   ID/EX.RegisterRt               Rs       Forwarding            MEM/WB.RegisterRd
                                                                                                  Rt          unit
bubble
                                                           IF/IDWrite
                                                                                                                                                                      EX/MEM
    Stalling
                                                                                                                       M
                                                                                                       Control         u            M                                  WB
                                                                                                                       x                                                                     MEM/WB
                                                                                                                 0
                                                              IF/ID                                                                 EX                                  M                     WB
1 $1
                                   PCWrite
                                                                                                                                                     M
                                                                         Instruction
                                                                                       X                                                             u
                                                                                                                                                     x
                                                                                                           Registers
                                             Instruction                                                                                                                             Data
                                        PC                                                                                                                  ALU
                                               memory                                                                                                                               memory             M
                                                                                                                              $X
                                                                                                                                                                                                       u
                                                                                                                                                     M                                                 x
                                                                                                                                                     u
                                                                                                                                                     x
     Execution                                                                                                                1
                                                                                                                               X
                                                                                                                               2
                                                                                                                                                     M
     example:
                                                                                                                                                     u
                                                                                                                                                     x
                                                                                                           ID/EX.RegisterRt                              Forwarding
                                                                                                                                                            unit
                            ClockClock
                                    cycle
                                       2
                                          2
     lw      $2,   20($1)        or $4, $4, $2                          and $4, $2, $5                                                   lw $2, 20($1)                      before<1>            before<2>
                                                                                            Hazard
     and     $4,   $2, $5                                                       2
                                                                                           detection
                                                                                              unit
                                                                                                                  ID/EX.MemRead
                                                                                                                                   ID/EX
                                                                                5
     or      $4,   $4, $2                                                                                                     00
                                                                                                                                    WB
                                                                                                                                                11
                                                           IF/IDWrite
EX/MEM
                                                                                       2                                      $2           $1
                                   PCWrite
                                                                                                                                                     M
                                                                         Instruction
                                                                                       5                                                             u
                                                                                                                                                     x
                                                                                                           Registers
                                             Instruction                                                                                                                             Data
                                        PC                                                                                                                  ALU
                                               memory                                                                                                                               memory             M
                                                                                                                              $5           $X
                                                                                                                                                                                                       u
                                                                                                                                                     M                                                 x
                                                                                                                                                     u
                                                                                                                                                     x
                                                                                                                               2           1
                                                                                                                               5           X
                                                                                                                                         2           M
                                                                                                                               4                     u
                                                                                                                                                     x
                                                                                                           ID/EX.RegisterRt                              Forwarding
                                                                                                                                                            unit
    11/27/2010              Clock cycle 3                                                                                                                                                               67
                                 Clock 3
                                        or $4, $4, $2                   and $4, $2, $5                                                       bubble                              lw $2, . . .            before<1>
                                                                                            Hazard
                                                                                                                      ID/EX.MemRead
                                                                                           detection
                                                                              2               unit                                     ID/EX
                                                                              5
                                                                                                                                  10           00
                                                                                                                                        WB
                                                          IF/IDWrite
                                                                                                                                                                           EX/MEM
                                                                                                                            M                                                     11
                                                                                                           Control          u           M                                   WB
                                                                                                                            x                                                                        MEM/WB
                                                                                                                     0
Stalling IF/ID
                                                                                       2                                          $2
                                                                                                                                        EX
                                                                                                                                             $2
                                                                                                                                                                             M                        WB
                             PCWrite
                                                                                                                                                       M
                                                                         Instruction
                                                                                       5                                                               u
                                                                                                                                                       x
                                                                                                               Registers
                                            Instruction                                                                                                                                      Data
                                   PC                                                                                                                            ALU
                                              memory                                                                                                                                        memory                M
                                                                                                                                  $5         $5
                                                                                                                                                                                                                  u
                                                                                                                                                       M                                                          x
                                                                                                                                                       u
                                                                                                                                                       x
2 2
Execution 5 5
                                                                                                                                                       M                          2
                                                                                                                                   4         4         u
     example
                                                                                                                                                       x
                                                                                                               ID/EX.RegisterRt                               Forwarding
                                                                                                                                                                 unit
                        Clock cycle 4
     (cont.):                     Clock 4
                                  add $9, $4, $2                        or $4, $4, $2                                                        and $4, $2, $5                      bubble                 lw $2, . . .
                                                                                            Hazard
                                                                                                                      ID/EX.MemRead
                                                                                           detection
                                                                               4
     lw      $2,   20($1)                                                      2
                                                                                              unit
                                                                                                                                  10
                                                                                                                                       ID/EX
                                                                                                                                               10
                                                                                                                                        WB
                                                           IF/IDWrite
                                                                                                                                                                            WB
                                                                                                                                                                                  0
                                                                                                                                                                                                     MEM/WB
     or      $4,   $4, $2                                    IF/ID
                                                                                                                      0
                                                                                                                            x
                                                                                                                                        EX                                   M                        WB
                                                                                                                                                                                                           11
                                                                                                                                  $4           $2
                                                                                                                                                       M
                                                                         Instruction
                                                                                       2                                                               u
                                                                                                                                                       x
                                                                                                               Registers
                                            Instruction                                                2                                                                                     Data
                                   PC                                                                                                                            ALU
                                              memory                                                                                                                                        memory                M
                                                                                                                                  $2           $5
                                                                                                                                                                                                                  u
                                                                                                                                                       M                                                          x
                                                                                                                                                       u
                                                                                                                                                       x
                                                                                                                                   4           2
                                                                                                                                   2           5
                                                                                                                                                       M                                                   2
                                                                                                                                   4           4       u
                                                                                                                                                       x
                                                                                                               ID/EX.RegisterRt                               Forwarding
                                                                                                                                                                 unit
                                                              IF/IDWrite
                                                                                                                                                                                EX/MEM
    Stalling
                                                                                                                               M                                                       10
                                                                                                              Control          u           M                                      WB
                                                                                                                               x                                                                          MEM/WB
                                                                                                                         0
                                                                                                                                                                                                                 0
                                                                 IF/ID                                                                     EX                                     M                         WB
4 $4
                                      PCWrite
                                                                                                                                                  $4
                                                                                                                                                             M
                                                                            Instruction
                                                                                          2                                                                  u
                                                                                                                                                             x
                                                                                                                   Registers
                                                Instruction                                                                                                                                       Data
                                           PC                                                                                                                          ALU
                                                  memory                                                                                                                                         memory                   M
                                                                                                                                     $2           $2
                                                                                                                                                                                                                          u
                                                                                                                                                             M                                                            x
                                                                                                                                                             u
                                                                                                                                                             x
4 4
Execution 2
                                                                                                                                      9
                                                                                                                                                  2
                                                                                                                                                  4
                                                                                                                                                             M
                                                                                                                                                             u
                                                                                                                                                                                       4
                                                                                                                                                             x
                                Clock cycle 6
     (cont.):                       Clock 6
                                                                                                                                                                                EX/MEM
                                                                                                                               M                                                       10
                                                                                                              Control          u           M                                     WB
       or        $4,   $4, $2                                                                                            0
                                                                                                                               x                                                                          MEM/WB
                                                                                                                                                                                                                 1
                                                                IF/ID                                                                      EX                                     M                        WB
       add       $9,   $4, $2
                                                                                                                                                  $4
                                      PCWrite
                                                                                                                                                             M
                                                                           Instruction
                                                                                                                                                             u
                                                                                                                                                             x
                                                                                                                   Registers
                                                Instruction                                               4                                                                                       Data
                                           PC                                                                                                                          ALU
                                                  memory                                                                                                                                         memory                   M
                                                                                                                                                  $2
                                                                                                                                                                                                                          u
                                                                                                                                                             M                                                            x
                                                                                                                                                             u
                                                                                                                                                             x
                                                                                                                                                  4
                                                                                                                                                  2
                                                                                                                                                             M                         4                         4
                                                                                                                                                  9          u
                                                                                                                                                             x
                                                                                                                  ID/EX.RegisterRt                                 Forwarding
                                                                                                                                                                      unit
11/27/2010                                                                     70
Predicting Branch-not-taken:
Misprediction delay
       Program           Time (in clock cycles)
       execution                  CC 1        CC 2   CC 3   CC 4   CC 5   CC 6   CC 7   CC 8   CC 9
       order
       (in instructions)
11/27/2010                                                                       73
  Optimized Datapath for Branch
  IF.Flush
                                        Hazard
                                       detection                                   IF.Flush control zeros out the instruction in the IF/ID
                                          unit
                        M                                                  ID/EX
                                                                                   pipeline register (which follows the branch)
                        u
                        x
                                                                           WB
                                                                                                           EX/MEM
                                                                   M
                                        Control                    u        M                               WB
                                                                   x                                                         MEM/WB
                                                              0
IF/ID EX M WB
             4                                 Shift
                                              left 2
                                                                                       M
                                                                                       u
                                                                                       x
                                                       Registers       =
                 Instruction                                                                                         Data
    PC                                                                                           ALU
                   memory                                                                                           memory               M
                                                                                                                                         u
                                                                                       M                                                 x
                                                                                       u
                                                                                       x
                                              Sign
                                             extend
                                                                                       M
                                                                                       u
                                                                                       x
                                                                                              Forwarding
                                                                                                 unit
Branch decision is moved from the MEM stage to the ID stage – simplified drawing
 11/27/2010                                                                 74
not showing enhancements to the forwarding and hazard detection units
                                         and $12, $2, $5                                    beq $1, $3, 7                                          sub $10, $4, $8                      before<1>              before<2>
IF.Flush
Pipelined 72
                                                                 48 x
                                                                      M
                                                                      u
                                                                                                  Hazard
                                                                                                 detection
                                                                                                    unit
                                                                                                                                  M
                                                                                                                                           ID/EX
                                                                                                                                            WB
                                                                                                                                                                                 EX/MEM
Branch
                                                                                             Control                              u         M                                     WB
                                                                                                                                  x                                                                      MEM/WB
                                                                                                       28
                                                                                                                             0
                                                                                    IF/ID                                                   EX                                     M                      WB
                                                                               48           44                   72
                                                             4
                                                                                                                                      $1
                                                                                                         Shift                                             M   $4
                                                                                                        left 2                                             u
                                                                                                                                                           x
                                                                                                                                  =
                                                                                                                      Registers
                                                                 Instruction                                                                                                                     Data
                                                   PC                                                                                                                  ALU
                                                                   memory                                                                                                                       memory              M
                                              72        44                                                                            $3
                                                                                                                                                                                                                    u
                                                                                                                                                           M   $8                                                   x
                                                                                                        7                                                  u
                                                                                                                                                           x
     Execution                                                                                         Sign
                                                                                                       extend
example: 10
                                                                                                                                                                    Forwarding
                                                                                                                                                                       unit
                               Clock cycle    3
36   sub     $10,   $4,   $8          Clock 3
                                       IF.Flush
44   and     $12    $2,   $5                                                                      Hazard
                                                                                                 detection
48   or      $13    $2,   $6                                          M
                                                                      u
                                                                                                    unit
                                                                                                                                           ID/EX
76 x WB
                                                                                                                                                                                  WB
                                                                                                                                                                                                         MEM/WB
                                                                                                                                  x
… 4
                                                                                                         Shift                                             M   $1
                                                                                                        left 2                                             u
72 lw        $4,    50($7)                         PC
                                                                 Instruction
                                                                                                                      Registers
                                                                                                                                  =                        x
                                                                                                                                                                       ALU
                                                                                                                                                                                                 Data
                                                                   memory                                                                                                                       memory              M
                                              76        72
                                                                                                                                                                                                                    u
                                                                                                                                                           M   $3                                                   x
                                                                                                                                                           u
                                                                                                                                                           x
                                                                                                                                                                    Forwarding
                                                                                                                                                                       unit
11/27/2010                                                                            76
Simple Example: Comparing
Performance
   Single-cycle (p. 373): average instruction time 8 ns
   Multicycle (p. 397): average instruction time 8.04 ns
   Pipelined:
      loads use 1 cc (clock cycle) when no load-use dependency
       and 2 cc when there is dependency – given 50% of loads
       are followed by dependency the average cc per load is 1.5
     stores use 1 cc each
     branches use 1 cc when predicted correctly and 2 cc when
       not – given 25% misprediction average cc per branch is 1.25
     jumps use 2 cc each
     ALU instructions use 1 cc each
     therefore, average CPI is
    1.5 × 23% + 1 × 13% + 1.25 × 19% + 2 × 2% + 1 × 43% = 1.18
     therefore, average instruction time is 1.18 × 2 = 2.36 ns
11/27/2010                                                      77