Co MODULE 3 - Merged
Co MODULE 3 - Merged
MUL R2,R10,R1
DIV R5,R3,R4
ADD R2,R5,R2
   SUB R5,R2,R6
18. A5-stage pipelined processor has Instruction Fetch(IF), Instruction Decode (ID), Operand
    Fetch (OF), Perform Operation (PO) and Write Operand (WO) stages. The IF, ID, OF and
    WO stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for
    ADD and SUB instructions, 3 clock cycles for MUL instruction, and 6 clock cycles for
    DIV instruction respectively. Operand forwarding is used in the pipeline. What is the
    number of clock cycles needed to execute the following sequence of instructions?
    Instruction Meaning of instruction
19. The instruction pipeline of a RISC processor has the following stages: Instruction Fetch
    (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Writeback
    (WB), The IF, ID, OF and WB stages take 1 clock cycle each for every instruction.
    Consider a sequence of 100 instructions. In the PO stage, 40 instructions take 3 clock
    cycles each, 35 instructions take 2 clock cycles each, and the remaining 25 instructions take
    1 clock cycle each. Assume that there are no data hazards and no control hazards. How
    many clock cycles are required for completion of execution of the sequence of instruction?
20. With a neat diagram, explain the classic five stage pipeline for a RISC Processor.
21. Explain various types of hazard mitigation techniques
22. List characteristics of RISC?
23. List different types of pipeline hazards?
24. Consider the following sequence of instructions being processed on the pipelined 5-stage
    RISC processor. Add R4, R2, R3 Store R5, #100(R4) Load R6, #200(R4) Subtract R7, R5,
    R6 Identify all the data dependencies in the above instruction sequence. For each
    dependency, indicate the two instructions and the register that causes the dependency
25. Explain about data hazards with an example. Illustrate forwarding method to minimize data
    hazard .
26. Explain three classes of instructions in RISC with example.
27. What is pipelining? Explain five stage pipeline for a RISC processor with an example .
28. Explain pipelined data path and control
29. Explain different pipeline hazards with example.
problem1
Ans:8
                     Problem 2
• The instruction pipeline of a RISC processor has the
  following stages: Instruction Fetch (IF), Instruction
  Decode (ID), Operand Fetch (OF), Perform Operation
  (PO) and Writeback (WB). The IF, ID, OF and WB stages
  take 1 clock cycle each for every instruction. Consider a
  sequence of 100 instructions. In the PO stage, 40
  instructions take 3 clock cycles each, 35 instructions
  take 2 clock cycles each, and the remaining 25
  instructions take 1 clock cycle each. Assume that there
  are no data hazards and no control hazards. The
  number of clock cycles required for completion of
  execution of the sequence of instructions is ______.
• Explanation: Given, total number of instructions (n) =
  100
  Number of stages (k) = 5
  Since, if n instructions take c cycle, so (c-1) stalls will
  occur for these instructions.
• Therefore, the number of clock cycles required = Total
  number of cycles required in general case + Extra cycles
  required (here, in PO stage)
  = (n + k – 1) + Extra cycles
  = (100 + 5 -1) + 40*(3-1)+35*(2-1)+20*(1-1)
  = (100 + 4) + 40*2+35*1+20*0
  = 104 + 115
  = 219 cycles
                         problem3
• A5-stage pipelined processor has Instruction Fetch(IF), Instruction
  Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Write
  Operand (WO) stages. The IF, ID, OF and WO stages take 1 clock
  cycle each for any instruction. The PO stage takes 1 clock cycle for
  ADD and SUB instructions, 3 clock cycles for MUL instruction, and 6
  clock cycles for DIV instruction respectively. Operand forwarding is
  used in the pipeline. What is the number of clock cycles needed to
  execute the following sequence of instructions? Instruction
  Meaning of instruction
• I0 :MUL R2 ,R0 ,R1        R2 = R0 *R1
• I1 :DIV R5 ,R3 ,R4        R5 = R3/R4
• I2 :ADD R2 ,R5 ,R2        R2 = R5+R2
• I3 :SUB R5 ,R2 ,R6         R5 = R2-R6
solution
                         Problem 4
• A 5-stage pipelined processor has the stages: Instruction Fetch (IF),
  Instruction Decode (ID), Operand Fetch (OF), Execute (EX) and Write
  Operand (WO). The IF, ID, OF, and WO stages take 1 clock cycle each
  for any instruction. The EX stage takes 1 clock cycle for ADD and
  SUB instructions, 3 clock cycles for MUL instruction, and 6 clock
  cycles for DIV instruction. Operand forwarding is used in the
  pipeline (for data dependency, OF stage of the dependent
  instruction can be executed only after the previous instruction
  completes EX). What is the number of clock cycles needed to
  execute the following sequence of instructions?
• MUL R2,R10,R1
• DIV R5,R3,R4
• ADD R2,R5,R2
• SUB R5,R2,R6
solution
                 Problem 5
• Consider an instruction pipeline with four
  stages with the stage delays 5 nsec, 6 nsec, 11
  nsec, and 8 nsec respectively. The delay of an
  inter-stage register stage of the pipeline is 1
  nsec. What is the approximate speedup of the
  pipeline in the steady state underideal
  conditions as compared to the corresponding
  non-pipelined implementation?
                  solution
• Consider an instruction pipeline with four
  stages (S1, S2, S3 and S4) each with
  combinational circuit only. The pipeline
  registers are required between each stage and
  at the end of the last stage. Delays for the
  stages and for the pipeline registers are as
  given in the figure:
•
• Explanation:
• Pipeline registers overhead is not counted in normal
  time execution
• So the total count will be
• 5+6+11+8= 30 [without pipeline]
• Now, for pipeline, each stage will be of 11 n-sec (+ 1 n-
  sec for overhead). and, in steady state output is
  produced after every pipeline cycle. Here, in this case
  11 n-sec. After adding 1n-sec overhead, We will get 12
  n-sec of constant output producing cycle.
• dividing 30/12 we get 2.5
Practice problem
solution
Practice problem
•   Solution-
•
•   Given-
•   Four stage pipeline is used
•   Delay of stages = 60, 50, 90 and 80 ns
•   Latch delay or delay due to each register = 10 ns
•
•   Part-01: Pipeline Cycle Time-
•
•   Cycle time
•   = Maximum delay due to any stage + Delay due to its register
•   = Max { 60, 50, 90, 80 } + 10 ns
•   = 90 ns + 10 ns
•   = 100 ns
•
•   Part-02: Non-Pipeline Execution Time-
•
•   Non-pipeline execution time for one instruction
•   = 60 ns + 50 ns + 90 ns + 80 ns
•   = 280 ns
•   Part-03: Speed Up Ratio-
•
•   Speed up
•   = Non-pipeline execution time / Pipeline execution time
•   = 280 ns / Cycle time
•   = 280 ns / 100 ns
•   = 2.8
•
•   Part-04: Pipeline Time For 1000 Tasks-
•
•   Pipeline time for 1000 tasks
•   = Time taken for 1st task + Time taken for remaining 999 tasks
•   = 1 x 4 clock cycles + 999 x 1 clock cycle
•   = 4 x cycle time + 999 x cycle time
•   = 4 x 100 ns + 999 x 100 ns
•   = 400 ns + 99900 ns
•   = 100300 ns
•   Part-05: Sequential Time For 1000 Tasks-
•
•   Non-pipeline time for 1000 tasks
•   = 1000 x Time taken for one task
•   = 1000 x 280 ns
•   = 280000 ns
•
•   Part-06: Throughput-
•
•   Throughput for pipelined execution
•   = Number of instructions executed per unit time
•   = 1000 tasks / 100300 ns
•
Practice problem
                                 solution
•   Solution-
•
•   Given-
•   Four stage pipeline is used
•   Delay of stages = 150, 120, 160 and 140 ns
•   Delay due to each register = 5 ns
•   1000 data items or instructions are processed
•
•   Cycle Time-
•
•   Cycle time
•   = Maximum delay due to any stage + Delay due to its register
•   = Max { 150, 120, 160, 140 } + 5 ns
•   = 160 ns + 5 ns
•   = 165 ns
• Pipeline Time To Process 1000 Data Items-
•
• Pipeline time to process 1000 data items
• = Time taken for 1st data item + Time taken for
  remaining 999 data items
• = 1 x 4 clock cycles + 999 x 1 clock cycle
• = 4 x cycle time + 999 x cycle time
• = 4 x 165 ns + 999 x 165 ns
• = 660 ns + 164835 ns
• = 165495 ns
• = 165.5 μs
Pipelining
Out line
 Definition of pipeline
 Advantages and disadvantage
 Type of pipeline (h/w) and (s/w)
 Latency and throughput
 hazards
pipeline
 It is technique of decomposing a sequential
 process into suboperation, with each
 suboperation completed in dedicated
 segment that operates concurrently with
 all other segments.
 Pipeline is commonly known as an assembly
 line operation.
             Example
                                                                        I1          F1       E1
                (a) Sequential
                execution
                                                                        I2                   F2      E2
                   Interstage buffer
                          B1
                                                                        I3                           F3       E3
 Instruction                                     Execution
    fetch                                          unit                           (c) Pipelined
    unit                                                                          execution
Instruction
                              I1            F1       D1        E1        W1
Fetch + Decode
+ Execution + Write           I2                     F2        D2        E2        W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
Interstage buffers
                                                 D : Decode
                       F : Fetch                 instruction                  E: Execute             W : Write
                      instruction                 and fetch                   operation               results
                                                  operands
                                       B1                           B2                      B3
                                                     (b) Hardware
                                                     organization
       Y = 0.08200 x 103
                                                 Adjust                        Normalize
 3) Add mantissas               Segment 4:
                                                exponent                         result
       Z = 1.0324 x 103
                                                    R                             R
 4) Normalize result
       Z = 0.10324 x 104
INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
1 Fetch an instruction from memory
2 Decode the instruction
3 Calculate the effective address of the operand
4 Fetch the operands from memory
5 Execute the operation
6 Store the result in the proper place
Instruction
I1 (Mul) F1 D1 E1 W1
I2 (Add) F2 D2 D2A E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
1. Data Forwarding,
2. Code reordering
3. Stall insertion.
Operand Forwarding
    Instruction
     I1                F1       E1
I3 F3 X
Ik Fk Ek
           D : Dispatch /
             Decode                 E : Execute            W : Write
             unit                   instruction             results
 Figure 8.10. Use of an instruction queue in the hardware organization of Figure 8.2b.
2- Conditional Braches
 A conditional branch instruction introduces
 the added hazard caused by the dependency
 of the branch condition on the result of a
 previous instruction.
 The decision to branch cannot be made until
 the execution of that instruction has been
 completed.
Delayed Branch
            LOOP          Shift_left         R1
                          Decrement          R2
                          Branch=0           LOOP
            NEXT          Add                R1,R3
            LOOP          Decrement          R2
                          Branch=0           LOOP
                          Shift_left         R1
            NEXT          Add                R1,R3
program counter.
    Types of instructions
-   1- Data Manipulation Instructions
-   2- Load and Store Instructions
-    3- Program Control Instructions
RiSC instruction
classification
Data Manipulation Instructions − Manage the data in
processor registers.
Data Transfer Instructions − These are load and store
instructions that use an effective address that is obtained by
adding the contents of two registers or a register and a
displacement constant provided in the instruction.
Program Control Instructions − These instructions use
register values and a constant to evaluate the branch
address, which is transferred to a register or the program
counter (PC).
Datapath and Control Considerations
 RISC
MODULE 3
         INTRODUCTION TO RISC
            INSTRUCTION SET
• Processors are broadly classified into RISC and
  CISC architecture based upon the implementation
  of various instruction set.
• Reduced Instruction Set Architecture (RISC) –
  The main idea behind is to make hardware
  simpler by using an instruction set composed of a
  few basic steps for loading, evaluating, and
  storing operations just like a load command will
  load data, store command will store the data.
             INTRODUCTION TO RISC
                INSTRUCTION SET
• Complex Instruction Set Architecture (CISC) –
  The main idea is that a single instruction will do all loading,
  evaluating, and storing operations just like a multiplication
  command will do stuff like loading data, evaluating, and storing it,
  hence it’s complex.
• Both approaches try to increase the CPU performance
• RISC: Reduce the cycles per instruction at the cost of the number of
  instructions per program.
• RISC approach: Here programmer will write the first load command
  to load data in registers then it will use a suitable operator and then
  it will store the result in the desired location.
• So, add operation is divided into parts i.e. load, operate, store due
  to which RISC programs are longer and require more memory to get
  stored but require fewer transistors due to less complex command.
comparison
RISC
Risc example
        Load-store architecture
• MIPS is a load-store architecture. What is a
  load-store architecture?
• Only load and store instructions access the
  memory, all other instructions use registers as
  operands. What is the motivation? Primary
  motivation is speedup –registers are faster.
• Reduced Instruction Set Computers (RISC) The
  instruction set has only a small number of
  frequently used instructions. This lowers
  processor cost, without much impact on
  performance. All instructions have the same
  length. Load-store architecture.
• Non-RISC machines are called CISC (Complex
  Instruction Set Computer). Example: Pentium
Example
        LOAD/STORE architecture
• The microcontroller architecture that utilizes small and
  highly optimized set of instructions is termed as the
  Reduced Instruction Set Computer or simply called as RISC.
  It is also called as LOAD/STORE architecture.