The Role of Performance
Chapter - 2
Why examining performance is important?
 • Hardware performance is often key to
   the effectiveness of an entire system.
Why assessing the performance is challenging?
• The scale and intricacy of modern software
  systems, together with the wide range of
  performance improvement techniques employed
  by hardware designers have made performance
  assessment much more difficult.
• For different types of applications, different
  performance metrics may be appropriate and
  different aspects of a computer system may be
  the most significant in determining overall
  performance.
           Defining Performance
   • How subtle the question of performance can be:
   • Example:
Airplane   Passenger Cruising Cruising Passenger
           Capacity range     speed    throughput
                     (miles) (m.p.h.) (passengers
                                       * m.p.h.)
Boeing 777    375      4630      610   2,28,750
Boeing 747    470      4150      610   2,86,700
Concorde      132      4000     1350 1,78,200
DC-8-50       146      8720      544   79,424
Which of the planes in this table had the best performance?
    Defining Performance
• Which of the planes in this table had the best
  performance?
• What is performance?
• If highest cruising speed – Concorde
• If longest range – DC-8
• If largest capacity – 747
• If we want to transport 450 passengers –
  which one will be the fastest one?
  – 747
      Defining Performance
Running a program on two different workstations
 Better
 Defining Performance
• Response Time or Execution Time :
  the time between the start and
  completion of a task.
• It includes the time for disk
  accesses, memory accesses, I/O
  activities, operating system
  overhead, CPU execution time, and
  so on.
       Defining Performance
                Better
                 Time
1 sec/op                      1 sec/op (V)
                Shared
                              2 sec/op (C)
               Computer
 Defining Performance
• Throughput or bandwidth:
• the total amount of work done in a
  given time.
 Throughput and Response Time
• Do the following changes to a computer
  system increase throughput, decrease
  response time, or both?
  – Replacing the processor in a computer with a
    faster version – both.
  – Adding additional processors to a system that
    uses processors for separate tasks –
    throughput (also response time).
    Changing either one often affect the other.
            Relative performance
                              1
        PerformanceX =
                       Execution timeX
Performance of X is greater than the performance of Y
        PerformanceX > PerformanceY
            1                      1
                         >
    Execution timeX          Execution timeY
     Execution timeY > Execution timeX
    X is faster than Y
       Relative performance
 • X is n times faster than Y, it means,
     PerformanceX
                      = n
     PerformanceY
PerformanceX     Execution timeY
               =                 =n
PerformanceY     Execution timeX
   Relative performance
• Example: If machine A runs a program in 10
  seconds and machine B runs the same program
  in 15 seconds, how faster is A than B?
  – A is n times faster than B if
       PerformanceA
                       = n
       PerformanceB
       Execution timeB              15
                             =n⇒       = 1.5
       Execution timeA              10
  – A is 1.5 times faster than B
    Relative performance
• We could also say that – Machine B is 1.5 times
  slower than machine A. since
            PerformanceA
                         = n
            PerformanceB
                           PerformanceA
      PerformanceB =
                                 n
  Measuring Performance
• Time is the measure of computer
  performance.
• Program execution time is measured in
  seconds per program.
• Wall-clock time / response time / elapsed
  time / execution time – total time to
  complete a task, including - disk
  accesses, memory access, I/O activity,
  OS overhead.
    Measuring Performance
   • CPU execution time or CPU time
   • is the time the CPU spends computing
     for a task and does not include time
     spent waiting for I/O or running other
     programs.
CPU execution time or CPU time ≤ Response time
  Measuring Performance
                User CPU time
CPU time
                System CPU time
• User CPU time – the CPU time spent in
  the program
• System CPU time – the CPU time spent
  in the OS performing tasks on behalf of
  the program
 Measuring Performance
             Execution Time
                          CPU time
For I/O        User CPU        System
and Others     time            CPU time
      Measuring Performance
      • Example:
      • Unix time command –
      • 90.7u 12.9s 2:39 65%
User CPU time    System CPU time Elapsed time
(90.7 seconds)   (12.9 seconds)   2*60 + 39 =
                                  (159 seconds)
                 90.7 + 12.9
                             = 0.65
                     159
    Measuring Performance
    • System Performance – considering
      elapsed time on an unloaded system
    • CPU Performance – considering user
√     CPU time.
Measuring Performance
• Clock cycle – Almost all computers
  are constructed using a clock that
  determines when events take place.
  These discrete time intervals are
  called clock cycles (ticks / clock ticks
  / clock periods / clocks / cycles).
• Clock rate – Inverse of clock period.
        Relating the Metrics
CPU execution time   CPU clock cycle   Clock cycle
                   =                 ×
  for a program       for a program       time
CPU execution time   CPU clock cycle for a program
                   =
  for a program               Clock rate
 Hardware designer can improve performance by
 reducing either the length of the clock cycle or
 the number of clock cycles required for a
 program.
     Improving Performance
Our favorite program runs in 10 seconds on
computer A, which has a 400 MHz clock. We
are trying to help a computer designer build a
machine, B, that will run this program in 6
seconds. The designer has determined that a
substantial increase in the clock rate is possible,
but this increase will affect the rest of the CPU
design, causing machine B to require 1.2 times
as many clock cycles as machine A for this
program. What clock rate should we tell the
designer to target?
    Improving Performance (Cont.)
                         CPU clock cycleA
      CPU timeA   =
                           Clock rateA
                            CPU clock cycleA
   10 Seconds      =
                          400 × 106 cycles/sec
CPU clock cycleA = 10 seconds × 400 × 106 cycles/sec
                 = 4000 × 106 cycles
                       CPU clock cycleB
     CPU timeB =
                         Clock rateB
                       1.2 × CPU clock cycleA
    CPU timeB     =
                            Clock rateB
  Improving Performance (Cont.)
                          1.2 × 4000 × 106 cycles
  6 seconds        =
                                Clock rateB
                          1.2 × 4000 × 106 cycles
     Clock rateB   =
                                 6 seconds
                       = 800 MHz
Machine B must therefore have twice the clock
rate of A to run the program in 6 seconds.
   Hardware Software Interface
  • Since Machine had to execute the
    instructions to run the program, the
    execution time must depend on the
    number of instructions in a program.
                                   Average clock
CPU clock cycles    Instructions
                 =               ×   cycles per
(for a program)    for a program
                                    instruction
                                     CPI
Using the Performance Equation
• Suppose we have two implementations
  of the same instruction set architecture.
  Machine A has a clock cycle time of 1 ns
  and a CPI of 2.0 for some program, and
  machine B has a clock cycle time of 2 ns
  and a CPI of 1.2 for the same program.
  Which machine is faster for this program,
  and by how much?
             Continuation
 Let the number of instructions of the program be I
 CPU clock cyclesA = I × 2.0
 CPU clock cyclesB = I × 1.2
 CPU timeA = CPU clock cyclesA × Clock cycle timeA
         = I × 2.0 × 1 ns = 2I ns
 CPU timeB = I × 1.2 × 2 ns = 2.4I ns
CPU performanceA   Execution timeB   2.4I ns
                 =                 =         = 1.2
CPU performanceB   Execution timeA    2I ns
 A is 1.2 times faster than B
              Continuation
    • Basic performance equation
CPU time = Instruction count × CPI × clock cycle time
                  Instruction count × CPI
    CPU time =
                         Clock rate
         Continuation
• Sometimes it is possible to compute the
  CPU clock cycles by looking at the
  different types of instructions and using
  their individual clock cycle counts.
                           n
• CPU clock cycle = iΣ= 1 (CPIi × Ci)
• Ci – No. of instructions of class i
• CPIi – CPI for instruction class i
    Comparing Code Segments
• Example
    – The hardware designer supplied:
    Instruction Class          CPI for this class
    A                          1
    B                          2
    C                          3
    – Two code sequences requires the following:
Code Sequence Instruction Counts for instruction class
                  A                B                C
1                 2                1                2
2                 4                1                1
    – Which code sequence executes the most instructions?
    – Which will be faster?
    – What is the CPI for each sequence?
              Solution
• Sequence 1 executes 2 + 1 + 2 = 5
  instructions.
• Sequence 2 executes 4 + 1 + 1 = 6
  instructions.
• So sequence 2 executes most
  instructions.
              Solution
• CPU clock cycles1 = (2×1) + (1×2) +
  (2×3) = 2 + 2 + 6 = 10 cycles
• CPU clock cycles2 = (4×1) + (1×2) +
  (1×3) = 4 + 2 + 3 = 9 cycles
• So code sequence 2 is faster.
                  Solution
          CPU clock cycles1    10
 CPI1   =                    =    = 2
          Instruction count1    5
         CPU clock cycles2    9
  CPI2 =                    =   = 1.5
         Instruction count2   6
When comparing two machines, we must look at all three
components, which combine to form execution time.
To Be Continued …