Power
6.884 – Spring 2005    3/7/05   L11 – Power   1
                               Lab 2 Results
                      Pareto-Optimal Points
6.884 – Spring 2005                      3/7/05   L11 – Power   2
                          Standard Projects
         	 Two basic design projects
               –	 Processor variants (based on lab1&2 testrigs)
               –	 Non-blocking caches and memory system
               –	 Possible project ideas on web site
         	 Must hand in proposal before quiz on March
            18th, including:
               –	 Team members (2 or 3 per team)
               –	 Description of project, including the architecture
                  exploration you will attempt
6.884 – Spring 2005                      3/7/05	                       L11 – Power   3
                      Non-Standard Projects
         	 Must hand in proposal early by class on March
            14th, describing:
               –	 Team members (2 or 3)
               –	 The chip you want to design
               –	 The existing reference code you will use to build a
                  test rig, and the test strategy you will use
               –	 The architectural exploration you will attempt
6.884 – Spring 2005                      3/7/05	                        L11 – Power   4
                                                     Power Trends
                                      1000                                                               1000W
                                                                                                          CPU?
                                                                         Pentium    R    4 proc
                                       100
                      Power (watts)
                                        10
                                                                    Pentium   R   proc
                                                         386
                                         1   8086
                                              8080
                                       0.1
                                          1970       1980           1990            2000          2010           2020
                                               Figure by MIT OCW. Adapted from Intel. Used with permission.
           	 CMOS originally used for very low-power circuitry such as
              wristwatches
           	 Now some CPUs have power dissipation >100W
6.884 – Spring 2005                                                    3/7/05	                                          L11 – Power   5
                                  Power Concerns
       	 Power dissipation is limiting factor in many systems
              –	      battery weight and life for portable devices
              –	      packaging and cooling costs for tethered systems
              –	      case temperature for laptop/wearable computers
              –       fan noise not acceptable in some settings
        Internet data center, ~8,000 servers,~2MW
              –	 25% of running cost is in electricity supply for supplying
                 power and running air-conditioning to remove heat
       	 Environmental concerns
              –	 ~2005, 1 billion PCs, 100W each =>       100 GW
              –	 100 GW = 40 Hoover Dams
6.884 – Spring 2005                          3/7/05	                     L11 – Power   6
                      On-Chip Power Distribution
             Supply pad
         G                        Routed power distribution on two stacked
         V
                          A
                                  layers of metal (one for VDD, one for GND).
         G                        OK for low-cost, low-power designs with few
                      B           layers of metal.
         V
             V G V G
         V                    V   Power Grid. Interconnected vertical and
         G                    G   horizontal power bars. Common on most high-
                                  performance designs. Often well over half of
         V                    V
                                  total metal on upper thicker layers used for
         G                    G   VDD/GND.
             V G V G
                                  Via
             V G V G
         V                    V
                                  Dedicated VDD/GND planes. Very expensive.
         G                    G
                                  Only used on Alpha 21264. Simplified circuit
         V                    V   analysis. Dropped on subsequent Alphas.
         G                    G
             V G V G
6.884 – Spring 2005                           3/7/05                      L11 – Power   7
                      Power Dissipation in CMOS
                                             Short-Circuit
                                               Current
                                                                    Diode Leakage
                                                                       Current
                  Gate                                               Subthreshold
                                      Capacitor                     Leakage Current
                 Leakage                           CL
                                      Charging
                 Current
                                       Current
               Primary Components:
                Capacitor charging, energy is 1/2 CV2 per transition
                       the dominant source of power dissipation today
                Short-circuit current, PMOS & NMOS both on during transition
                       kept to <10% of capacitor charging current by making edges fast
                Subthreshold leakage, transistors don’t turn off completely
                       approaching 10-40% of active power in <180nm technologies
                Diode leakage from parasitic source and drain diodes
                       usually negligible
                Gate leakage from electrons tunneling across gate oxide
                       was negligible, increasing due to very thin gate oxides
6.884 – Spring 2005                                  3/7/05                               L11 – Power   8
                      Energy to Charge Capacitor
              VDD                            T              T
                       Isupply	
                                   E0	 → 1 = ∫ P(t) dt = VDD∫ Isupply(t) dt
                            Vout             0	             0
                                              VDD
                          CL             = VDD ∫ CL	 dVout = CL VDD2
                                                   0
           	 During 0->1 transition, energy CLVDD2 removed from
              power supply
           	 After transition, 1/2 CLVDD2 stored in capacitor, the
              other 1/2 CLVDD2 was dissipated as heat in pullup
              resistance
           	 The 1/2 CLVDD2 energy stored in capacitor is dissipated
              in the pulldown resistance on next 1->0 transition
6.884 – Spring 2005                      3/7/05	                              L11 – Power   9
                      Power Formula
       Power = activity * frequency * (1/2 CVDD2 +
         VDDISC)
                  + VDDISubthreshold
                  + VDDIDiode
                  + VDDIGate
       	 Activity is average number of transitions per
          clock cycle (clock has two)
6.884 – Spring 2005             3/7/05	                   L11 – Power   10
                       Switching Power
               Power ∝ activity * 1/2 CV2 * frequency
             Reduce   activity
             Reduce   switched capacitance C
             Reduce   supply voltage V
             Reduce   frequency
6.884 – Spring 2005              3/7/05              L11 – Power   11
      Reducing Activity with Clock Gating
                                                        Global   Enable
   Clock Gating	                                        Clock     Latch (transparent
         –	   don’t clock flip-flop if not needed                   on clock low)
         –	   avoids transitioning downstream logic
         –	   enable adds to control logic complexity
                                                                 Gated Local
         –	   Pentium-4 has hundreds of gated clock 
                                                                    Clock
              domains
                                                          D
       Q
                        Clock
                      Enable
          Latched Enable
                Gated Clock
6.884 – Spring 2005                           3/7/05	                          L11 – Power   12
       Reducing Activity with Data Gating
      Avoid data toggling in unused unit by gating off inputs
                                    Shifter          1
                      B
                                                     0
                                     Adder
       Shifter infrequently used
                                                         Shift/Add Select
                                    Shifter          1
                      B
  Could use transparent                              0
  latch instead of AND               Adder
  gate to reduce number
  of transitions, but
  would be bigger and
  slower.
6.884 – Spring 2005                  3/7/05                         L11 – Power   13
                Other Ways to Reduce Activity
         Bus Encodings
               –	 choose encodings that minimize transitions on average (e.g., Gray
                  code for address bus)
               –	 compression schemes (move fewer bits)
         Freeze “Don’t Cares”
               –	 If a signal is a don’t’ care, then freeze last dynamic value (using a
                  latch) rather than always forcing to a fixed 1 or 0.
               –	 E.g., 1, X, 1, 0, X, 0 ===> 1, X=1, 1, 0, X=0, 0
         Remove Glitches
               –	 balance logic paths to avoid glitches during settling
6.884 – Spring 2005                             3/7/05	                               L11 – Power   14
           Reducing Switched Capacitance
            Reduce switched capacitance C
                  – Careful transistor sizing (small transistors off critical path)
                  – Tighter layout (good floorplanning)
                  – Segmented structures (avoid switching long nets)
                            A     B         C
                                                         Shared bus driven by A
                                                         or B when sending values
                                                                  to C
                      Bus
                            A     B         C             Insert switch to isolate
                                                            bus segment when B
                                                                sending to C
6.884 – Spring 2005                             3/7/05                          L11 – Power   15
                      Reducing Frequency
          Doesn’t save energy, just reduces rate at which
           it is consumed (lower power, but must run
           longer)
            – Get some saving in battery life from 
              reduction in rate of discharge
6.884 – Spring 2005              3/7/05                 L11 – Power   16
                       Reducing Supply Voltage
       Quadratic savings in energy per transition (1/2 CVDD2)
        Circuit speed is reduced
        Must lower clock frequency to maintain correctness
                                                                                       CVDD
                                                                               T =
                                                                                d k(V - V ) α
                                                                                     DD  th
                                                                                α = 1− 2
                                                                          Delay rises sharply as
                                                                          supply voltage approaches
                                                                          threshold voltages
    Courtesy of Mark Horowitz and Stanford University. Used with permission.
6.884 – Spring 2005                                           3/7/05                            L11 – Power   17
      Voltage Scaling for Reduced Energy
       	 Reducing supply voltage by 0.5 improves energy
          per transition by ~0.25
       	 Performance is reduced – need to use slower
          clock
       	 Can regain performance with parallel
          architecture
       	 Alternatively, can trade surplus performance for
          lower energy by reducing supply voltage until
          “just enough” performance
                      Dynamic Voltage Scaling
6.884 – Spring 2005             3/7/05	                L11 – Power   18
            Parallel Architectures Reduce 
           Energy at Constant Throughput
            8-bit adder/comparator
                 40MHz at 5V, area = 530 kµ2
                 Base power Pref
            Two parallel interleaved adder/compare units
                 20MHz at 2.9V, area = 1,800 kµ2 (3.4x)
                 Power = 0.36 Pref
            One pipelined adder/compare unit
                 40MHz at 2.9V, area = 690 kµ2 (1.3x)
                 Power = 0.39 Pref
            Pipelined and parallel
                 20MHz at 2.0V, area = 1,961 kµ2 (3.7x)
                 Power = 0.2 Pref
                                  Chandrakasan et. al. “Low-Power CMOS Digital Design”,
                                                           IEEE JSSC 27(4), April 1992
6.884 – Spring 2005                          3/7/05                               L11 – Power   19
                   “Just Enough” Performance
                          Run fast then stop
             Frequency
                                               Run slower and just
                                                 meet deadline
                    t=0                 Time                 t=deadline
            Save energy by reducing frequency and
            voltage to minimum necessary
6.884 – Spring 2005                            3/7/05                     L11 – Power   20
                          Voltage Scaling on 
                      Transmeta Crusoe TM5400
              Frequency     Relative    Voltage   Relative   Relative
                (MHz)     Performance     (V)     Energy      Power
                              (%)                    (%)        (%)
                 700         100.0       1.65     100.0 100.0
                 600           85.7      1.60       94.0       80.6
                 500           71.4      1.50       82.6       59.0
                 400           57.1      1.40       72.0       41.4
                 300           42.9      1.25       57.4       24.6
                 200           28.6      1.10       44.4       12.7
6.884 – Spring 2005                     3/7/05                      L11 – Power   21
                           Leakage Power
         	 Under ideal scaling, want to reduce threshold voltage as
            fast as supply voltage
         	 But subthreshold leakage is an exponential function of
            threshold voltage and temperature
                                                                               1E-06
                                                                               1E-07
                                                 Subthreshold Current (A/µm)
                                                                               1E-08
                                 -q VT
                                 a kB T
           Isubthreshold = k e                                                 1E-09
                                                                                                 0 oC
                                                                                                        55oC
                                                                               1E-10
                                                                                                                  110oC
                                                                               1E-11
                                                                               1E-12
                                                                                    0.0	   0.2             0.4            0.6          0.8
                          Figure by MIT OCW.                                                     Threshold Voltage (VT)
6.884 – Spring 2005                            3/7/05	                                                                          L11 – Power   22
                                        Rise in Leakage Power
                                  250                                                          120%
                                  200
                                                                                               80%
                  Power (watts)
                                  150
                                  100
                                                                                               40%
                                   50
                                    0                                                           0%
                                             0.25m       0.18m           0.13m       0.1m   0.07m
                                                            Technology
                                          Active Power        Active Leakage Power
                                                          Figure by MIT OCW.
6.884 – Spring 2005                                           3/7/05                                  L11 – Power   23
             Design-Time Leakage Reduction
         Use slow, low-leakage transistors off critical path
         	 leakage proportional to device width, so use smallest
            devices off critical path
         	 leakage drops greatly with stacked devices (acts as drain
            voltage divider), so use more highly stacked gates off
            critical path
         	 leakage drops with increasing channel length, so slightly
            increase length off critical path
         	 dual VT - process engineers can provide two thresholds
            (at extra cost) use high VT off critical path (modern cell
            libraries often have multiple VT)
6.884 – Spring 2005                    3/7/05	                      L11 – Power   24
                      Critical Path Leakage
   Critical paths dominate leakage after applying design-
     time leakage reduction techniques
   Example: PowerPC 750
          5% of transistor width is low Vt, but these account for >50%
            of total leakage
   Possible approach, run-time leakage reduction
          – switch off critical path transistors when not needed
6.884 – Spring 2005                   3/7/05                       L11 – Power   25
                      Run-Time Leakage Reduction
                                                            Gate     Vbody > Vdd
   Body Biasing                                      Drain Source
         Vt increase by
            reverse-biased body effect
                            Body
         Large transition time and wakeup latency due to
         well cap and resistance
   Power Gating                                                      Vdd
         Sleep transistor between
                                                     Sleep signal
            supply and virtual supply lines                     Virtual Vdd
         Increased delay due to sleep transistor
                                                                   Logic cells
   Sleep Vector
         Input vector which minimizes leakage                0
         Increased delay due to mux and active energy due to
                                                             0
           spurious toggles after applying sleep vector
6.884 – Spring 2005                   3/7/05                          L11 – Power   26
             Power Reduction for Cell-Based 
                        Designs
         	 Minimize activity
               –	 Use clock gating to avoid toggling flip-flops
               –	 Partition designs so minimal number of components
                  activated to perform each operation
               –	 Floorplan units to reduce length of most active wires
          Use lowest voltage and slowest frequency
           necessary to reach target performance
               –	 Use pipelined architectures to allow fewer gates to
                  reach target performance (reduces leakage)
               –	 After pipelining, use parallelism to further reduce
                  needed frequency and voltage if possible
         	 Always use energy-delay plots to understand
            power tradeoffs
6.884 – Spring 2005                      3/7/05	                        L11 – Power   27
                         Energy versus Delay
                      Energy
                                     A
                                         B    C D            Constant
                                                           Energy-Delay
                                                              Product
                                                       Delay
     	 Can try to compress this 2D information into single number
            –	 Energy*Delay product
            –	 Energy*Delay2 – gives more weight to speed, mostly insensitive to supply
               voltage
         Many techniques can exchange energy for delay
         Single number (ED, ED2) often misleading for real designs
            –	 usually want minimum energy for given delay or minimum delay for given
               power budget
            –	 can’t scale all techniques across range of interest
     	 To fully compare alternatives, should plot E-D curve for each
        solution
6.884 – Spring 2005                          3/7/05	                             L11 – Power   28
                         Energy versus Delay
                               A better        B better
                      Energy
                                                     Architecture A
                                                     Architecture B
                                   Delay (1/performance)
     	 Should always compare architectures at the same
        performance level or at the same energy
     	 Can always trade performance for energy using
        voltage/frequency scaling
     	 Other techniques can trade performance for
        energy consumption (e.g., less pipelining, fewer
        parallel execution units, smaller caches, etc)
6.884 – Spring 2005                  3/7/05	                          L11 – Power   29
                      Temperature Hot Spots
         	 Not just total power, but power density is a problem for
            modern high-performance chips
         	 Some parts of the chip get much hotter than others
               –	 Transistors get slower when hotter
               –	 Leakage gets exponentially worse (can get thermal runaway
                  with positive feedback between temperature and leakage
                  power)
               – Chip reliability suffers
          Few good solutions as yet
               –	 Better floorplanning to spread hot units across chip
               –	 Activity migration, to move computation from hot units to
                  cold units
               –	 More expensive packaging (liquid cooling)
6.884 – Spring 2005                        3/7/05	                            L11 – Power   30
                      Itanium Temperature Plot
                                          Image removed due to copyright restrictions.
   Please see:
   Krishnamurthy, R., A. Alvandpour, S. Mathew, M. Anders, V. De, and S. Borkar. "High-Performance, Low-Power, and
   Leakage-Tolerance Challenges for Sub-70nm Microprocessor Circuits." (Session Invited Paper). IEEE European Solid State
   Circuits Conference, Sept. 25, 2002. Paper no. C17.01.
6.884 – Spring 2005                                        3/7/05                                            L11 – Power    31