Chapter 4
Accessing and Understanding
        Performance
                              1
                     Outline
 Defining performance (4.2)
 CPU performance and its factors (4.2)
 Evaluating performance (4.3)
 About benchmark (4.3)
                                          2
                Why Study Performance?
 Conflicting      Goals
   –User
         Find the most suitable machine to get the job done at the lowest cost
            ⇒ Application-oriented metrics
   –Vendor
         Persuade you to buy their machine regardless of your needs
            ⇒Hardware-oriented metrics
 Know     the vocabulary and understand the issues, so that:
   –As a user/buyer, you can make better purchasing decisions
   –As an engineer, you can make better hardware/software design
     decision
                                                                              3
        Performance for a CPU Designer
 An  attempt to quantify how well a particular computer can
  perform a user’ s applications
 Problems:
   –Essentially a software+hardware issue
   –Different machines have different strengths and weaknesses
   –There is an enormous amount of hype and outright deception in
     the market –be wary
 Key to understanding underlying organizational
  motivation
   –Why is some hardware better than others for different programs?
   –What factors of system performance are hardware related? (e.g.,
    Do we need a new machine, or a new operating system?)
   –How does the machine's instruction set affect performance?
                                                                 4
                             Performance
 Why do we care about performance                   evaluation?
  –Purchasing perspective
         given a collection of machines, which has the
            –best performance ?
            –least cost ?
            –best performance / cost ?
   –Design perspective
         faced with design options, which has the
            –best performance improvement ?
            –least cost ?
            –best performance / cost ?
 How to measure, report,            and summarize performance?
  –Performance metric
  –Benchmark
                                                                   5
     Which of these airplanes has the best
                performance?
 What    metric defines performance?
  –Capacity, cruising range, or speed?
 Speed
  –Taking one passenger from one point to another in the least
   time
  –Transporting 450 passengers from one point to another
                                                           6
         Two Notions of “
                        Performance”
 Response Time (latency)
  –How long does it take for my job to run?
  –How long does it take to execute a job?
  –How long must I wait for the database query?
  –Time to do the task
 Throughput
   –How many jobs can the machine run at once?
   –What is the average execution rate?
   –How much work is getting done?
   –Total amount of work done in a give time
 If we upgrade a machine with a new processor what do
  we increase?
 If we add a new machine to the lab what do we increase?
                                                     7
                   Execution Time
 Elapsed   Time
  –counts everything (disk and memory accesses, I/O , etc.)
  –a useful number, but often not good for comparison
    purposes
 CPU   time
  –doesn't count I/O or time spent running other programs
  –can be broken up into system time, and user time
 Our   focus: user CPU time
  –time spent executing the lines of code that are "in" our
    program
                                                        8
                          Execution Time
   Execution time on a computer is typically divided into:
    – User time: Time spent executing instructions in the user code
    – System time: Time spent executing instructions in the kernel on behalf of the
      user code (e.g., opening files)
    – Other: Time when the system is idle or executing other programs
   Use “
        time”and “
                 top”commands in Unix to see these
                                                                             9
       Performance Expressed as Time
 Time  is the measure of computer performance and
  the only reliable one
 Performance
  –Bigger is better
  –Improve performance = decrease execution time
"   X is n times faster than Y" means
                                                   10
                       Time Measurement
 But   what does the “
                      time”mean?
  –Absolute time measures
         Difference between start and finish of an operation
         Synonyms: running time, elapsed time, completion time, execution time,
          response time, latency
              –1. Everything: response time => system performance
                  •Includes disk access, memory access, I/O, OS, CPU time
              –2. CPU only: CPU execution time or CPU time => CPU performance
                  •the time CPU spends for this task
                  •User CPU time and system CPU time
  –Relative (normalized) time measures
           Running time normalized to some reference time
              –3. In terms of clock cycles for computer designer
                                                                           11
                      Clock Cycles
 Instead of reporting execution time in seconds, we often
  use cycles
 Clock  “ticks”indicate when to start activities (one
  abstraction):
 cycle time = time between ticks = seconds per cycle
 clock rate (frequency) = cycles per second (1 Hz. = 1
  cycle/sec)
   A 200 MHz. clock has a
                                                          12
                     Outline
 Defining performance (4.2)
 CPU performance and its factors (4.2)
 Evaluating performance (4.3)
 About benchmark (4.3)
                                          13
CPU Time and its Factors
                           14
                         CPI
 The  average number of clock cycles each
  instruction takes to executed
 One way to comparing two different
  implementations of the same instruction set
 Overall CPI for a program
  –Number of cycles for each instruction type
  –Frequency of each instruction type in the program
    execution
                                                       15
              Performance Equation
 Performance  is determined by execution time
 Do any of the other variables equal performance?
  –# of cycles to execute program?
  –# of instructions in program?
  –# of cycles per second?
  –average # of cycles per instruction?
  –average # of instructions per second?
     MIPS (million instructions per second)
     When is it fair to compare two processors using MIPS?
                                                              16
      How to determine the three factors
 Instruction count
   –Using software tools by profiling, or
   –Simulator of the architecture, or
   –Hardware counters (accuracy varies)
   –You can measure it without knowing the CPU implementation
 CPI
  –Depends on design details in the computer
  –By detailed simulation or hardware counter
  –CPI should be measured
         You cannot get it from the “
                                     Manuals”
 Clock cycle
  –From the “ manuals”
                                                         17
      How to improve the performance
 Reduce  Instruction count to execute
 Increase the number of instruction per cycle
  (reduce CPI)
  –Concurrent execution of instructions
 Increase   clock rate
                                                 18
How Hardware and Software Affect
        Performance ?
        Indirect code
                                   19
Aspects of CPU Performance
                             20
                     Short Summary
 Performance  is determined by execution time
 Do any of the other variables equal performance?
   –# of cycles to execute program?
   –# of instructions in program?
   –# of cycles per second?
   –average # of cycles per instruction (CPI)?
   –average # of instructions per second (IPC)?
 Common    pitfall: thinking one of the variables is indicative
  of performance when it really isn’  t.
 Remember: Time is the only reliable measurement for
  performance
                                                          21
                     Outline
 Defining performance (4.2)
 CPU performance and its factors (4.2)
 Evaluating performance (4.3)
 About benchmark (4.3)
                                          22
             Evaluating Performance
 Which program shall be used to evaluate
 performance
  –Best one: real workload in your daily life
     It is not easy for everyone
  –Alternative: benchmark
     To predict the performance of the real workload
                                                        23
                                     Benchmarks
   Performance best determined by running a real application
     – Use programs typical of expected workload
     – Or, typical of expected class of applications
         compilers/editors, scientific applications, graphics, etc. GCC, tex, spice, Excel,
   Small benchmarks
     – took small fragments of code from inside application loops
     – nice for architects and designers, easy to standardize but it can be abused
          best for isolating performance of individual features of the machine
     – nice for architects and designers
     – easy to standardize
     – can be abused
     – Livermore Loops, LINPACK
     – Toy benchmarks: 10 ~ 100 lines
          Sieve of Erastosthenes, Puzzle, Quicksort, N-Queen
   Synthesis benchmarks:
     – Try to match average frequency of a large set of programs
     – Exercise the hardware in a manner to mimic real-world applications, but in a small
       piece of code.
     – Examples: Whetstone, Dhrystone –
             Performs a varied mix of instructions and uses the memory in various ways;
                                                                                               24
             How many “Whetstones”or “Dhrystones”per second your computer can do.
                     More Benchmarks
 Drystone[Weicker84]
 Whestone[Currow & Wichmann76]
  –University computer center jobs
  –12 loops
 SPEC Benchmarks
   –SPEC (System Performance Evaluation Cooperative)
       companies have agreed on a set of real program and inputs
       valuable indicator of performance (and compiler technology)
       can still be abused
   –SPEC 89
   –SPEC 92
   –SDPEC 95
   –SPEC2000
   –SPEC2004
                                                                      25
         Application-oriented Benchmarks
   CPU performance
    – SPEC, for scientific applications
   Server performance
    – Focus on throughput, response time to individual events
    – SPECweb99
   Graphics performance
    – 3D Mark
   Embedded computing
    – EEMBC
    – Automatic, consumer, networking, office automation, telecommunication
   Other research oriented
    – MediaBench
    – CommBench
                                                                         26
SPEC CPU 2000
                27
SPEC CINT2000
                28
SPEC CFP2000
               29
      Arithmetic Mean vs. Geometric Mean
   Problem
    – How you combine the normalized results or Can you ?
   When arithmetic mean applied to the normalized execution time
    – A is 5.05 times faster than B
    – B is 5.05 times faster than A
    – This is used in SPEC ratio
    – Result is strongly affected by the choosing reference machine
   Geometric means produces the same “
                                      relative”results whether we
    normalize to A or B
    – Pros: independent of the running time
    – Cons: Geometric mean does not track total execution time and thus can’
                                                                           t be
       used to predict relative execution time for a workload
   So what’
           s the solution to summary the performance
    – Measure the workload and weighted by their frequency of execution
                                                                        30
              Amdahl's Law
Execution
time after
improvement
                             31
                  Amdahl's Law
 Speedup   due to enhancement E:
 Suppose  that enhancement E accelerates a fraction
 F of the task by a factor S, and the remainder of the
 task is unaffected, then:
                                                 32
                 Amdahl’
                       s Law
 Floatingpoint instructions improved to run 2X; but
  only 10% of actual instructions are FP
                                               33
                           Example #3
   Our favorite program runs in 10 seconds on computer A, which has
    a 400 Mhz. clock. We are trying to help a computer designer build a
    new machine B, that will run this program in 6 seconds. The
    designer can use new (or perhaps more expensive) technology to
    substantially increase the clock rate, but has informed us that this
    increase will affect the rest of the CPU design, causing machine B to
    require 1.2 times as many clock cycles as machine A for the same
    program. What clock rate should we tell the designer to target?"
                                                                  34
    Speedup
f
f
              35
                         Summary
 Performance   is specific to a particular program/s
  –Total execution time is a consistent summary of performance
 For   a given architecture performance increases come from:
  –increases in clock rate (without adverse CPI affects)
  –improvements in processor organization that lower CPI
  –compiler enhancements that lower CPI and/or instruction count
  –Algorithm/Language choices that affect instruction count
 Amdahl’
        s law
                                                              36