Q Electrical Dinamic Power
Q Electrical Dinamic Power
          For properly sized and ratioed gates, the contribution to the overall dynamic power due
          to Pshortcircuit is of the order of 10-20%.
     2.
          Switching power              This is the power consumed due to charging and discharging
                              dissipation:
          of capacitive loads when the circuit has some activities due to change in inputs. The
          capacitive load at different circuit gates depends upon the fanout of the gate, output
          capacitance, and wiring capacitances. It may be noted that a node with load capacitance
          might not switeh when the clock is switching. To take care of this, a quantity called
          switching activity (a) is often used. It determines how often switching occurs on a node
          with load capacitance. If VoD is the supply voltage, Vawing is the change in voltage level
          of the switched capacitance, C is the capacitance being switched and f is the frequency
          of operation, the   switching power         is   given by,
                                        Pswitching=C X VDD x Vswing X a x f
   Subthreshold leakage: When gate voltage is below threshold voltage but very eloSe to
                                                                                                  It   is caused   by
     ,Subthreshold conduction current                      flows between    source   and drain.
202                                                         Low Power Embedded System Design
                                                                Reverse biased
                                                                Junction BTBT
                                                     Bulk
                         Fig.    11.1     Leakage components    in   a   transistor.
11.2.1
             Algorithmic         Power       Minimization
lt mainly focuses on reducing the number of
plementation. For example, in               operations requiring larger power          in a target
                                                                                                   iun
                               many         processors, the cost of an
may be different from
hrst
                         a
                             logical                                   addition/subtraction  operatlo
                                       operation. Thus, to check "whether x is equal to y", one
       pertorm  subtraction operation followed by
                 a
                                                                                                ay
 the other
           hand, if the logical operation takes      checking  the status register for
                                                                                       zero-bit.
uSing a comparison                              lesser power, 2 may be directly compared
                      instruction. The following are some of the                             wit
for selecting a                                                     important  issues to be juageu
                particular algorithm from alternatives:
   1. Memory
                reference: This is very important as
                                                         memory is normally off-chip from the
      processor. A large number of accesses to the
                                                       memory mean good amount of actlvILy
11.2 Power Reduction Techniques
                                                                                                         203
        the   address/data
                         bus lines. The memory access
                                                        pattern is also important. If the access
        Dattern sequential, only the least significant bits of address bus
                 is
                                                                           change, whereas for
        random access through the memory, most of the address bits will switch,   thus               creating
        higher power dissipation.
  2.    Presence of cache memory: The presence and structure of cache
                                                                             memory plays an im-
        portant role. Cache can be fruitfully utilized to reduce both execution     time and power
        of an implementation if the
                                       underlying algorithm has got locality in its behaviour. The
        locality may be both temporal and spatial in nature. While a
        to the fact that a                                                 temporal locality refers
                           memory location accessed at some time is also
        in near future, spatial locality means if a                           likely to be accessed
                                                    memory location is accessed at some time, its
        neighbouring locations are also likely to be accessed in near future. Thus,
         inside the CPU cache saves not only the                                      caching them
                                                     memory access time, but also the bus energy
         consumption is reduced.
  3.    Recomputation vs. memory load/store: Normal power minimization techniques at
        rithm level attempt to reduce the number of arithmetic                              algo-
                                                                  operations.  However, it may so
        happen that to reduce the number of operations, some repeatedly
        tion is done only once and stored at a                                performed computa-
                                                memory location. Later, as and when necessary
         it is reloaded from the
                                 memory. This may lead to increased power consumption due to
         extra memory accesses. If the
                                       operands are already available in CPU registers or on-chip
         cache, it may be better to recompute the value, instead of loading it from
                                                                                    memory, fromn
         power consumption point of view.
   4.    Compiler optimization technique: The typical techniques used by an optimizing
         can  be used to reduce power                                                      compiler
                                         consumption of a piece of code. The strategies involve
         strength reduction, common suberpression elimination, minimizing memory
         Loop unrolling is also often beneficial as it reduces loop overhead.            traffic etc.
   5.    Number representation: This is another area for
                                                              algorithmic power trade-off. The fol-
         lowing points may be noted:
              Fited     vs.
                              floating point representation:        Fixed point operations are much sinmpler
                than floating    point           Thus,   it
                                         ones.
                                                              normally leads to power saving, though accuracy
                may suffer.
              Sign-nagnitude vs.    2's co0mplement: Selection of
                                                                  sign-magnitude representation may
                have significant power saving over 2's
                                                       complement, if input samples are uncorrelated
                and range is minimized.
                Precision of operations: This is inmportant, since having lower
                                                                                precision allows one
                to reduce the size of space needed to store the values. A
                                                                             typical example of this
                is to reduce the number of bits in mantissa
                                                                portion in several signal processing
                applications including speech and image to improve circuit delay and power.
of supply voltage). Since all such systems are operating simultaneously, total power saving is
    1/n    of the   original   power. This has been        shown in Fig. 11.2(b). A problem with the scheme
    is that the hardware is duplicated with other necessary multiplexing and demultiplexing logic.
                                                                                   In this        the         schene,
    Another   possible architectural modification often suggested is pipelining.
     functional block of Fig. 11.2(a) is divided into a sequence of sub-blocks, each of approximately
     same    delay. Thus,       if the number of sub-blocks be n, from            pipelining principle, the overall
     system   can     produce output   at   a   rate   of about   nx   f. Now, if the supply voltage of individual
     stages is reduced bya factor of n, power reduces                  by a factor of 1/n. However, we need to
     accommodate extra latches between the stages for proper synchronization between them. This
     introduces some overhead in terms of area, performance and power as well. The scherne has
     been shown in Fig. 11.2(c).
                                                                                                            Input
                                                                                                   Vin
                                                                                                         Sub-biock 0
                                                              Input
                                                                                                            Latch
                                                                                                   Vin
                                                                                                         Sub-biock 1
                                        V/n|                V/n                 V/n
                                                                                                            Latch
            Supply
            voltage
                      Input                     Copy 0            Copy 1              Copy (n-1)
                                                                                                            Latch
                    Original
                      block                                                                        Vin
                                        Mod-n
                                       Counter                                                      Sub-biock (n-1)
  1,   Static    dynamic logic families: CMOS logic can be realized as static or dynamic
                vs.
     circuit, output is always precharged to 1. Thus, power will be consumed whenever     dynamic
     output is zero. Hence, the probability of a power consuming transition is
                                                                                    0.25, which is
     higher than a static gate. However, dynamic gate has lower input
     by a factor of 2 to 3) compared to static gate, as the                  capacitance (almost
                                                                p-network
     effective capacitance that a dynamic gate sees is much lower.
                                                                           is absent. Hence, the
     in distributing the
                                                                       But, the power consumed
                         precharging signal also needs to be considered.
  2. Glitches and hazards: This is another
                                             potential source of power consumption, particularly
      in static CMOS circuits. A
                                  glitch at the output of a gate can come due to the differences
       in arrival times of input signals. A typical example of it is AND-OR-INVERT based
        implementation the function f ab + ac. The circuit is shown in Fig. 11.3(a).
                        of                             =
                                       (a)
                                                                         D-
                                                                                 (b)
                      Fig.   11.3     (a) Circuit   with    hazard, (b) Circuit without    hazard.
  3.    Technology mapping:           The logic synthesis library often contains different
       of the                                                                              implementations
                same      logic   module. They normally differ in terms of area,
                                                                                  delay, power, ete. A logic
       synthesis procedure targeted to power minimization may choose implementations that
       require higher area or delay, but score better in terms of power. For example, consider
       a
         four-input AND function. Two possible implementations are shown in                          Fi8 1.a
       and Fig. 11.4(b), respectively. The ON-probabilities of the gates are also shown. Total
      +0.9375 x 0.0625 = 0.3555. Thus. the first implementation consumes more power than
      the second one. In this case, though there is no area penalty, the second implementation
      has one gate delay more than the first one.
                                                                         p     0.25
                        P=0.25
                                           P     0.0625                                      = 0.125
           p     0.5                                            p 0.5
                                                                                                       P    0.0625
                             P     0.25
                             (a)                                                      (b)
                       Fig. 11.4 Two different implementations of 4-input AND
                                                          State
                   Enable                                register
                    Clock               CilockClock
                                  Fig. 11.5 Clock gating of FSM.
VOD
                                                       Circuit              Sleep
                                                                          transistor
Active-
   going tu0 a low power state takes time. The longer the duration for which                    we walnt to
    shutdown           higher is the time taken duriug
                   a system,                                          reactivatio
      avoiding a power-down mode will cost unnecessary power.
      frequent power-down mode will affect system pertormance
      A naive approach may be to power-down a               system whenever     there s   o   eqest     This
will definitely affect performance severely.          A more sophisticated nethod is to 1        pred   ttre
 shutdow. In this approach, the goal is to predict the next arrival of service qnest anut wake
up the system just before that. Prediction can be made in severd dtferent ways us foellws.
208                                                      Low Power Embedded System Design
                                                                                       gn
  1. Ficed times: If the system does not receive any service request during an interval of
      length TON, it shuts down for a fixed period of time TorF. Choice of Ton and Topr
                                                    behaviour.
     may be made experimentally by studying system
  2. Analysing system state: In this approach, there is a constant monitoring of the ser-
     vice requests. The monitoring is done via a power manager that observes the system
                                                    Power
                                                   Manager
                          Power
                        management
                           commands
                                             Status        Status       Status
                                  Service                            Service
                                  Provider                          Requestor
                                                    Queue
                                   Applications
                                                  Kernel              Power
                                              ACPI driver           management
                                             AML interpreter
                             Device
                             drivers              ACPI
                                             ACPI tables
                                             ACPI registers
                                             ACPI BIOOs
Hardware platform