VLSI - Unit 2 Notes
VLSI - Unit 2 Notes
UNIT-2
                           COMBINATIONAL LOGIC CIRCUITS
Syllabus:
Examples of Combinational Logic Design, Elmore’s constant, Pass transistor Logic,
Transmission gates, static and dynamic CMOS design, Power dissipation – Low power design
principles
Introduction:
    The combinational logic (or non-regenerative) circuits; that have the property that at any point in time,
      the output of the circuit is related to its current input signals by some Boolean expression. No intentional
      connection between outputs and inputs is present.
    This is in contrast to another class of circuits, known as sequential or regenerative, for which the output
      is not only a function of the current input data, but also of previous values of the input signals. This is
      accomplished by connecting one or more outputs intentionally back to some inputs. Consequently, the
      circuit “remembers” past events and has a sense of history. A sequential circuit includes a combinational
      logic portion and a module that holds the state. Example circuits are registers, counters, oscillators, and
      memory.
      Any combinational logic can be constructed using NAND and NOR. If we change the values of input to
       a combinational logic device, there is some short but finite delay before the output changes. The output
       can’t change instantaneously. Whereas, in sequential circuits(registers) can change its value at a clock
       edge. A combinational logic device changes its output as soon as input changes plus a little delay from
       input to output.
                                                        1
                                                                                      EC 6601 - VLSI DESIGN
   Draw the minimum CMOS transistor network that implements the functionality of Boolean equation F=
    (A (B C + D))'
   Draw the minimum CMOS transistor network that implements the functionality of Boolean equation
    F= (A +(B' + CD)')'
   Draw the minimum CMOS transistor network that implements the functionality of Boolean equation
    F= (A' + B'C)
                       F = (A' + B'C) = ((A' + B'C)')'= (A (B'C)')' = (A (B + C'))'
                                                  2
                                                                                                  EC 6601 - VLSI DESIGN
      Draw the minimum CMOS transistor network that implements the functionality of Boolean equation F=
       (A+D+C)(B+E)
Elmore’s Constant:
    Elmore delay metric is a widely used model to compute signal delays for both analog and digital circuit
      interconnects.
    Net delay is the difference between the time a signal is first applied to the net and the time it reaches
      other devices connected to that net. It is due to the finite resistance and capacitance of the net. It is also
      known as wire delay or interconnect delay.
    Although it provides a limited accuracy and its applicability is limited to the step function type input
      signals, this model is extremely popular with simple analytical functions that can be easily incorporated
      into design and automation software.
      Elmore Delay Model:
      The Elmore delay model estimates the delay from a source switching to one of the leaf nodes changing
      as the sum over each node i of the capacitance Ci on the node, multiplied by the effective resistance
      𝑅𝑖𝑛 on the shared path from the source to the node and the leaf.
C1 C2 C3 CN
Static CMOS
    The most widely used logic style is static complementary CMOS. The static CMOS style is really an
       extension of the static CMOS inverter to multiple inputs.
    The primary advantage of the CMOS structure is robustness (i.e, low sensitivity to noise), good
       performance, and low power consumption with no static power dissipation.
    The complementary CMOS circuit style falls under a broad class of logic circuits called static circuits in
       which at every point in time (except during the switching transients), each gate output is connected to
       either VDD or Vss via a low-resistance path. Also, the outputs of the gates assume at all times the value
       of the Boolean function implemented by the circuit (ignoring, once again, the transient effects during
       switching periods).
                                                          3
                                                                                                EC 6601 - VLSI DESIGN
      This is in contrast to the dynamic circuit class, which relies on temporary storage of signal values on the
       capacitance of high-impedance circuit nodes. The latter approach has the advantage that the resulting
       gate is simpler and faster. Its design and operation are however more involved and prone to failure due
       to an increased sensitivity to noise.
Complementary CMOS
Concept
         A static CMOS gate is a combination of two networks, called the pull-up network (PUN) and the
           pull-down network (PDN). The figure shows a generic N input logic gate where all inputs are
           distributed to both the pull-up and pull-down networks.
         The function of the PUN is to provide a connection between the output and VDD anytime the
           output of the logic gate is meant to be 1 (based on the inputs). Similarly, the function of the PDN
           is to connect the output to VSS when the output of the logic gate is meant to be 0.
         The PUN and PDN networks are constructed in a mutually exclusive fashion such that one and
           only one of the networks is conducting in steady state. In this way, once the transients have
           settled, a path always exists between VDD and the output F, realizing a high output (“one”), or,
           alternatively, between VSS and F for a low output (“zero”). This is equivalent to stating that the
           output node is always a low-impedance node in steady state.
In constructing the PDN and PUN networks, the following observations should be kept in mind:
        • A transistor can be thought of as a switch controlled by its gate signal. An NMOS switch is on when
the controlling signal is high and is off when the controlling signal is low. A PMOS transistor acts as an inverse
switch that is on when the controlling signal is low and off when the controlling signal is high.
        • The PDN is constructed using NMOS devices, while PMOS transistors are used in the PUN. The
primary reason for this choice is that NMOS transistors produce “strong zeros,” and PMOS devices generate
“strong ones”.
        •A set of construction rules can be derived to construct logic functions shown in the figure.
        NMOS devices connected in series corresponds to an AND function. With all the inputs high, the series
combination conducts and the value at one end of the chain is transferred to the other end. Similarly, NMOS
transistors connected in parallel represent an OR function. A conducting path exists between the output and
input terminal if at least one of the inputs is high. Using similar arguments, construction rules for PMOS
networks can be formulated. A series connection of PMOS conducts if both inputs are low, representing a NOR
function (A.B = A+B), while PMOS transistors in parallel implement a NAND A+B = A·B.
        • The complementary gate is naturally inverting, implementing only functions such as NAND, NOR,
and XNOR. The realization of a non-inverting Boolean function (such as AND OR, or XOR) in a single stage is
not possible, and requires the addition of an extra inverter stage.
        • The number of transistors required to implement an N-input logic gate is 2N.
                                                        4
                                                                                               EC 6601 - VLSI DESIGN
Static CMOS
   1. Designers accustomed to AND and OR functions must learn to think in terms of NAND and NOR to
       take advantage of static CMOS.
   2. In manual circuit design, this is often done through bubble pushing. Compound gates are particularly
       useful to perform complex functions with relatively low logical efforts.
   3. When a particular input is known to be latest, the gate can be optimized to favor that input. Similarly,
       when either the rising or falling edge is known to be more critical, the gate can be optimized to favor
       that edge.
   4. We have focused on building gates with equal rising and falling delays; however, using smaller pMOS
       transistors can reduce power, area, and delay.
   5. In processes with multiple threshold Voltages, multiple flavors of gates can be constructed with different
       speed/leakage power trade-offs.
    Bubble Pushing
    CMOS stages are inherently inverting, so AND and OR functions must be built from NAND and NOR
     gates.
    DeMorgan’s law helps with this conversion:
                                 ̅̅̅̅̅
                                 𝐀. 𝐁 = 𝐀̅+𝐁  ̅ ; ̅̅̅̅̅̅̅̅
                                                    𝐀+𝐁=𝐀  ̅. 𝐁
                                                              ̅
    These relations are illustrated in the Figure.
    Compound Gates
    Static CMOS also efficiently handles compound gates computing various inverting combinations of
     AND/OR functions in a single stage.
    The function F = AB + CD can be computed with an AND-OR-INVERT-22 (AOI22) gate and an
     inverter, as shown in the Fig.
      In general, logical effort of compound gates can be different for different inputs. Above figure shows
       how logical efforts can be estimated for the AOI21, AOI22, and a more complex compound AOI gate.
      The transistor widths are chosen to give the same drive as a unit inverter.
                                                       5
                                                                                               EC 6601 - VLSI DESIGN
   The logical effort of each input is the ratio of the input capacitance of that input to the input capacitance
    of the inverter.
   For the AOI21 gate, this means the logical effort is slightly lower for the OR terminal (C) than for the
    two AND terminals (A, B).
   The parasitic delay is crudely estimated from the total diffusion capacitance on the output node by
    summing the sizes of the transistors attached to the output.
                                                     6
                                                                                            EC 6601 - VLSI DESIGN
 Skewed Gates
 In other cases, one input transition is more important than the other.
 HI-skew gates to favor the rising output transition and LO-skew gates to favor the falling output
  transition.
 This favoring can be done by decreasing the size of the noncritical transistor.
 The logical efforts for the rising (up) and falling (down) transitions are called gu and gd, respectively,
  and are the ratio of the input capacitance of the skewed gate to the input capacitance of an unskewed
  inverter with equal drive for that transition.
   Figure (a) shows how a HI-skew inverter is constructed by downsizing the nMOS transistor.
   This maintains the same effective resistance for the critical transition while reducing the input
    capacitance relative to the unskewed inverter of Figure (b), thus reducing the logical effort on that
    critical transition to gu=2.5/3 = 5/6.
   Of course, the improvement comes at the expense of the effort on the non-critical transition.
   The logical effort for the falling transition is estimated by comparing the inverter to a smaller unskewed
    inverter with equal pull-down current, shown in Figure (c), giving a logical effort of gd= 2.5/1.5 = 5/3.
   The degree of skewing (e.g.,the ratio of effective resistance for the fast transition relative to the slow
    transition) impacts the logical efforts and noise margins; a factor of two is common.
   The below figure catalogs HI-skew and LO-skew gates with a skew factor of two. Skewed gates are
    sometimes denoted with an H or an L on their symbol in a schematic.
 Alternating HI-skew and LO-skew gates can be used when only one transition is important. Skewed
  gates work particularly well with dynamic circuits.
 P/N Ratios
 Notice in the above figure, the average logical effort of the LO-skew NOR2 is actually better than that of
  the unskewed gate.
 The pMOS transistors in the unskewed gate are enormous in order to provide equal rise delay. They
  contribute input capacitance for both transitions, while only helping the rising delay.
                                                    7
                                                                                            EC 6601 - VLSI DESIGN
   By accepting a slower rise delay, the pMOS transistors can be downsized to reduce input capacitance
    and average delay significantly.
   For processes with a mobility ratio of n/ p = 2 as we have generally been assuming, the best ratios are
    shown below.
   Reducing the pMOS size from 2 to√2 for the inverter gives the theoretical fastest average delay, but this
    delay improvement is only 3%.
   However, this significantly reduces the pMOS transistor area. It also reduces input capacitance, which in
    turn reduces power consumption.
   Excessively slow rising outputs can also cause hot electron degradation and reducing the pMOS size.
   Figure (b)shows an example of ratioed logic, which uses a grounded PMOS load and is referred to as a
    pseudo-NMOS gate.
                                                    8
                                                                                                 EC 6601 - VLSI DESIGN
      The clear advantage of pseudo-NMOS is the reduced number of transistors (N+1 versus 2N for
       complementary CMOS). The nominal high output voltage (VOH) for this gate is VDD since the pull-
       down devices are turned off when the output is pulled high (assuming that VOL is below VTn). On the
       other hand, the nominal low output voltage is not 0 V since there is a fight between the devices in the
       PDN and the grounded PMOS load device. This results in reduced noise margins and more importantly
       static power dissipation.
      The sizing of the load device relative to the pull-down devices can be used to trade-off parameters such
       a noise margin, propagation delay and power dissipation. Since the voltage swing on the output and the
       overall functionality of the gate depends upon the ratio between the NMOS and PMOS sizes, the circuit
       is called ratioed. This is in contrast to the ratioless logic styles, such as complementary CMOS, where
       the low and high levels do not depend upon transistor sizes.
      A major disadvantage of the pseudo-NMOS gate is the static power that is dissipated when the output is
       low through the direct current path that exists between VDD and GND.
      The pull-down networks PDN1 and PDN2 use NMOS devices and are mutually exclusive (this is, when
       PDN1 conducts, PDN2 is off, and when PDN1 is off, PDN2 conducts), such that the required logic
       function and its inverse are simultaneously implemented.
      Assume now that, for a given set of inputs, PDN1 conducts while PDN2 does not, and that Out and Out
       are initially high and low, respectively. Turning on PDN1, causes Out to be pulled down, although there
       is still a fight between M1 and PDN1. Out is in a high impedance state, as M2 and PDN2 are both turned
       off. PDN1 must be strong enough to bring Out below VDD-|VTp|, the point at which M2 turns on and
       starts charging Out to VDD—eventually turning off M1. This in turn enables Out to discharge all the
       way to GND.
      The below figure shows an example of an XOR/XNOR gate.
                                                        9
                                                                                                   EC 6601 - VLSI DESIGN
   Notice that it is possible to share transistors among the two pull-down networks, which reduces the
   implementation overhead.
    The resulting circuit exhibits a rail-to-rail swing, and the static power dissipation is eliminated: in steady
      state, none of the stacked pull-down networks and load devices are simultaneously conducting.
    However, the circuit is still ratioed since the sizing of the PMOS devices relative to the pull-down
      devices is critical to functionality, not just performance.
    In addition to the problem of increase complexity in design, this circuit style still has a power-
      dissipation problem that is due to cross-over currents. During the transition, there is a period of time
      when PMOS and PDN are turned on simultaneously, producing a short circuit path.
Pass-Transistor Logic
Pass-Transistor Basics
    A popular and widely-used alternative to complementary CMOS is pass-transistor logic,which attempts
      to reduce the number of transistors required to implement logic by allowing the primary inputs to drive
      gate terminals as well as source/drain terminals.
    This is in contrast to logic families that we have studied so far, which only allow primary inputs to drive
      the gate terminals of MOSFETS.
    Implementation of the AND function shown in the figure constructed that way, using only NMOS
      transistors.
      In this gate, if the B input is high, the top transistor is turned on and copies the input A to the output F.
      When B is low, the bottom pass transistor is turned on and passes a 0. The switch driven by B seems to
       be redundant at first glance. Its presence is essential to ensure that the gate is static, this is that a low-
       impedance path exists to the supply rails under all circumstances, or when B is low.
      The advantage of this approach is that fewer transistors are required to implement a given function. For
       example, the implementation of the AND gate in the above figure requires 4 transistors (including the
       inverter required to invert B), while a complementary CMOS implementation would require 6
       transistors.
      The reduced number of devices has the additional advantage of lower capacitance.
      As we know, an NMOS device is effective at passing a 0 but is poor at pulling a node to VDD. When the
       pass transistor pulls a node high, the output only charges up to VDD -VTn. In fact, the situation is
       worsened by the fact that the devices experience body effect, as there exists a significant source-to-body
       voltage when pulling high.
                                                        10
                                                                                                EC 6601 - VLSI DESIGN
      Consider the case when the pass transistor is charging up a node with the gate and drain terminals set at
       VDD. Let the source of the NMOS pass transistor be labeled x. Node x will charge up to VDD-VTn(Vx):
      Pass-transistors require lower switching energy to charge up a node due to the reduced voltage swing.
       For the pass transistor circuit in the above figure.
      Assume that the drain voltage is at VDD and the gate voltage transitions to VDD. The output node
       charges from 0V to VDD-VTn (assuming that node x was initially at 0V) and the energy drawn from the
       power supply for charging the output of a pass transistor is given by:
       While the circuit exhibits lower switching power, it may consumes static power when the output is
        high—the reduced voltage level may be insufficient to turn off the PMOS transistor of the subsequent
        CMOS inverter.
Differential Pass Transistor Logic
        For high performance design, a differential pass-transistor logic family, called CPL or DPL, is
commonly used. The basic idea (similar to DCVSL) is to accept true and complementary inputs and produce
true and complementary outputs.
A number of CPL gates (AND/NAND, OR/NOR, and XOR/NXOR) are shown in Figure.
       CPL belongs to the class of static gates, because the output-defining nodes are always connected to
        either VDD or GND through a low resistance path. This is advantageous for the noise resilience.
     The design is very modular. In effect, all gates use exactly the same topology. Only the inputs are
        permutated. This makes the design of a library of gates very simple.More complex gates can be built by
        cascading the standard pass-transistor modules.
Robust and Efficient Pass-Transistor Design
        Unfortunately, differential pass-transistor logic, like single-ended pass-transistor logic, suffers from
static power dissipation and reduced noise margins, since the high input to the signal-restoring inverter only
charges up to VDD-VTn. Solutions proposed to deal with this problem as outlined below:
             The gate of the PMOS device is connected to the output of the inverter, its drain connected to the
              input of the inverter and the source to VDD.
           Assume that node X is at 0V (out is at VDD and the Mr is turned off) with B = VDD and A = 0. If
              input A makes a 0 to VDD transition, Mn only charges up node X to VDD-VTn.
           This is enough to switch the output of the inverter low, turning on the feedback device Mr and
              pulling node X all the way to VDD.
           This eliminates any static power dissipation in the inverter. Furthermore, no static current path
              can exist through the level restorer and the pass-transistor, since the restorer is only active when
              A is high.
           Advantage of this circuit is that all voltage levels are either at GND or VDD, and no static power
              is consumed.
           While this solution is appealing in terms of eliminating static power dissipation, it adds
              complexity since the circuit is now ratioed.
           The problem arises during the transition of node X from high-to-low. The pass transistor network
              attempts to pull-down node X while the level restorer pulls now X to VDD. Therefore, the pull-
              down device must be stronger than the pull-up device in order to switch node X and the output.
           Assume the notation R1 to denote the equivalent on-resistance of transistor M1, R2 for M2, and
              so on.
           When Rr is made too small, it is impossible to bring the voltage at node X below the switching
              threshold of the inverter. Hence, the inverter output never switches to VDD, and the level-
              restoring transistor stays on.
           This sizing problem can be reformulated as follows: the resistance of Mn and Mr must be such
              that the voltage at node X drops below the threshold of the inverter, VM = f(R1, R2). This
              condition is sufficient to guarantee a switching of the output voltage Vout to VDD and a turning
              off of the level-restoring transistor.
       Solution 2: Multiple-Threshold Transistors.
           A technology solution to the voltage-drop problem associated with pass-transistor logic is the use
              of multiple-threshold devices.
           Using zero threshold devices for the NMOS pass-transistors eliminates most of the threshold
              drop, and passes a signal close to VDD.
                                                       12
                                                                                               EC 6601 - VLSI DESIGN
              Notice that even if the devices threshold was implanted to be exactly equal to zero, the body
               effect of the device prevents a swing to VDD.
              All devices other than the pass transistors (i.e., the inverters) are implemented using standard
               high-threshold devices.
      The use of zero-threshold transistors can be dangerous due to the subthreshold currents that can flow
       through the pass-transistors, even if VGS is slightly below VT.
      This is demonstrated in the above figure, which points out a potential sneak dc-current path. While these
       leakage paths are not critical when the device is switching constantly, they do pose a significant energy-
       overhead when the circuit is in the idle state.
         The control signals to the transmission gate (C and C) are complementary. The transmission gate
          acts as a bidirectional switch controlled by the gate signal C.
        When C = 1, both MOSFETs are on, allowing the signal to pass through the gate. In short,
                                                        A=B, if C=1
       On the other hand, C = 0 places both transistors in cutoff, creating an open circuit between nodes A and
       B.
 Consider the case of charging node B to VDD for the transmission gate circuit in the below figure(a)
                                                       13
                                                                                               EC 6601 - VLSI DESIGN
      Node A is driven to VDD and transmission gate is enabled (C = 1 and C= 0). If only the NMOS pass-
       device were present, node B charges up to VDD-VTn at which point the NMOS device turns off.
      However, since the PMOS device is present and turned on (VGSp = -VDD), charging continues all the
       way up to VDD.
      Figure(b) shows the opposite case, this is discharging node B to 0. B is initially at VDD when node A is
       driven low.
      The PMOS transistor by itself can only pull down node B to VTp at which point it turns off. The parallel
       NMOS device however stays turned on (since its VGS = VDD) and pulls node B all the way to GND.
      Though the transmission gate requires two transistors and more control signals, it enables rail-to-rail
       swing.
      Transmission gates can be used to build some complex gates very efficiently. Figure (a) shows an
       example of a simple inverting two-input multiplexer. This gate either selects input A or B based on the
       value of the control signal S, which is equivalent to implementing the following Boolean function:
                                                     (a)
      Another example of the effective use of transmission gates is the popular XOR circuit shown in
       Figure(b)
(b)
                                                      14
                                                                                              EC 6601 - VLSI DESIGN
      An alternate logic style called dynamic logic is presented that obtains a similar result, while avoiding
       static power consumption. With the addition of a clock input, it uses a sequence of precharge and
       conditional evaluation phases.
                                                      15
                                                                                              EC 6601 - VLSI DESIGN
      During the precharge phase (CLK=0), the output is precharged to VDD regardless of the input values
       since the evaluation device is turned off.
      During evaluation (CLK=1), a conducting path is created between Out and GND if (and only if) A·B+C
       is TRUE. Otherwise, the output remains at the precharged state of VDD.
       The following function is thus realized:
       A number of important properties can be derived for the dynamic logic gate:
                 The logic function is implemented by the NMOS pull-down network. The construction of
                    the PDN proceeds just as it does for static CMOS.
                 The number of transistors (for complex gates) is substantially lower than in the static
                    case: N + 2 versus 2N.
                 It is non-ratioed. The sizing of the PMOS precharge device is not important for realizing
                    proper functionality of the gate. The size of the precharge device can be made large to
                    improve the low-to-high transition time.There is however, a trade-off with power
                    dissipation since a larger precharge device directly increases clock-power dissipation.
                 It only consumes dynamic power. Ideally, no static current path ever exists between VDD
                    and GND. The overall power dissipation, however, can be significantly higher compared
                    to a static logic gate.
                 The logic gates have faster switching speeds. There are two main reasons for this. The
                    first reason is due to the reduced load capacitance attributed to the lower number of
                    transistors per gate and the single-transistor load per fan-in. Second, the dynamic gate
                    does not have short circuit current, and all the current provided by the pull-down devices
                    goes towards discharging the load capacitance.
              Source 1 and 2 are the reverse-biased diode and sub-threshold leakage of the NMOS pull-down
               device M1, respectively.
                                                      16
                                                                                          EC 6601 - VLSI DESIGN
         The charge stored on CL will slowly leak away due these leakage sources, assuming that the
          input is at zero during evaluation.
         Charge leakage causes a degradation in the high level (Figure b). Dynamic circuits therefore
          require a minimal clock rate, which is typically on the order of a few kHz. This makes the usage
          of dynamic techniques unattractive for low performance products such as watches, or processors
          that use conditional clocks (where there are no guarantees on minimum clock rates).
         Note that the PMOS precharge device also contributes some leakage current.
         Leakage is caused by the high impedance state of the output node during the evaluate mode,
          when the pull down path is turned off.
         The leakage problem van be counteracted by reducing the output impedance on the output node
          during evaluation. This is often done by adding a bleeder transistor as shown in Figure(a)
         The only function of the bleeder—a pseudo-NMOS-like pull-up device—is to compensate for
          the charge lost due to the pull-down leakage paths.
         To avoid the ratio problems associated with this style of circuit and the associated static power
          consumption, the bleeder resistance is made high,or, in other words, the device is kept small.
         This allows the (strong) pull-down devices to lower the Out node substantially below the
          switching threshold of the inverter.
         Often, the bleeder is implemented in a feedback configuration to eliminate the static power
          dissipation (Figure b).
 Charge Sharing
     Another important concern in dynamic logic is the impact of charge sharing.
     Consider the circuit shown below.
                                                  17
                                                                                                  EC 6601 - VLSI DESIGN
              The influence on the output voltage is readily calculated. Under the above assumptions, the
               following initial conditions are valid: Vout(t = 0) = VDD and VX(t = 0) = 0. Two possible
               scenarios must be considered
        Which of the above scenarios is valid is determined by the capacitance ratio. The boundary condition
between the two cases can be determined by setting          equal to VTn in Eq (6.38), yielding
        Overall, it is desirable to keep the value of       below |VTp|. The output of the dynamic gate might be
connected to a static inverter, in which case the low level of Vout would cause static power consumption. One
major concern is circuit malfunction if the output voltage is brought below the switching threshold of the gate it
drives.
     The most common and effective approach to deal with the charge redistribution is to also precharge
        critical internal nodes, as is shown in the below figure.
     Since the internal nodes are charged to VDD during precharge, charge sharing does not occur. This
        solution obviously comes at the cost of increased area and capacitance.
       Capacitive Coupling
       The high impedance of the output node makes the circuit very sensitive to crosstalk effects.
       A wire routed over a dynamic node may couple capacitively and destroy the state of the floating node.
       Another equally important form of capacitive coupling is the backgate (or output-to-input) coupling.
       Consider the circuit shown in the figure in which a dynamic two-input NAND gate drives a static
        NAND gate.
                                                       18
                                                                                               EC 6601 - VLSI DESIGN
      A transition in the input In of the static gate may cause the output of the gate (Out2) to go low.
      This output transition couples capacitively to the other input of the gate, the dynamic node Out1,
       through the gate-source and gate-drain capacitances of transistor M4.
      A simulation of this effect is shown in the below figure, and demonstrates that the output of the dynamic
       gate can drop significantly.
      This further causes the output of the static NAND gate not to drop all the way down to 0V, and a small
       amount of static power is dissipated.
       If the voltage drop is large enough, the circuit can evaluate incorrectly, and the NAND output may not
       go low. When designing and laying out dynamic circuits, special care is needed to minimize capacitive
       coupling.
    Clock-Feedthrough
    A special case of capacitive coupling is clock-feedthrough, an effect caused by the capacitive coupling
     between the clock input of the precharge device and the dynamic output node.
    The coupling capacitance consists of the gate-to-drain capacitance of the precharge device, and includes
     both the overlap and the channel capacitances. This capacitive coupling causes the output of the
     dynamic node to rise above VDD on the low-to-high transition of the clock, assuming that the pull-down
     network is turned off.
    Subsequently, the fast rising and falling edges of the clock couple onto the signal node, as is quite
     apparent in the simulation of shown above.
    The danger of clock feedthrough is that it may cause the (normally reverse-biased) junction diodes of the
     precharge transistor to become forward-biased.
    This causes electron injection into the substrate, which can be collected by a nearby high impedance
     node in the 1 state, eventually resulting in faulty operation.
    CMOS latchup might be another result of this injection. For all purposes, high-speed dynamic circuits
     should be carefully simulated to ensure that clock-feedthrough effects stay within bounds.
      During the precharge phase (i.e., CLK =0), the outputs of both inverters are precharged to VDD.
      Assume that the primary input In makes a 0     1 transition (Figure b).
                                                      19
                                                                                                 EC 6601 - VLSI DESIGN
     On the rising edge of the clock, output Out1 starts to discharge. The second output should remain in the
      precharged state of VDD as its expected value is 1 (Out1 transitions to 0 during evaluation).
    However, there is a finite propagation delay for the input to discharge Out1 to GND. Therefore, the
      second output also starts to discharge.
    As long as Out1 exceeds the switching threshold of the second gate, which approximately equals VTn, a
      conducting path exists between Out2 and GND, and precious charge is lost at Out2.
    The conducting path is only disabled once Out1 reaches VTn, and turns off the NMOS pull-down
      transistor. This leaves Out2 at an intermediate voltage level.
    The correct level will not be recovered, as dynamic gates rely on capacitive storage in contrast to static
      gates, which have dc restoration. The charge loss leads to reduced noise margins and potential
      malfunctioning.
    The cascading problem arises because the outputs of each gate and hence the inputs to the next stages—
      are precharged to 1.
    Correct operation is guaranteed as long as the inputs can only make a single 0                 1 transition
      during the evaluation period2.
    Transistors are only be turned on when needed, and at most once per cycle. A number of design styles
      complying with this rule have been conceived.
Domino Logic
Concept:
    A Domino logic module consists of an n-type dynamic logic block followed by a static inverter (Figure
      shown below).
      During precharge, the output of the n type dynamic gate is charged up to VDD, and the output of the
       inverter is set to 0.
      During evaluation, the dynamic gate conditionally discharges, and the output of the inverter makes a
       conditional transition from 0       1.
      If one assumes that all the inputs of a Domino gate are outputs of other Domino gates3, then it is
       ensured that all inputs are set to 0 at the end of the precharge phase, and that the only transitions during
       evaluation are 0        1 transitions.
      The formulated rule is hence obeyed.
      The introduction of the static inverter has the additional advantage that the fan-out of the gate is driven
       by a static inverter with a low impedance output, which increases noise immunity.
      The buffer furthermore reduces the capacitance of the dynamic output node by separating internal and
       load capacitances.
      Consider now the operation of a chain of Domino gates.
       During precharge, all inputs are set to 0. During evaluation, the output of the first Domino block either
       stays at 0 or makes a 0       1 transition, affecting the second gate.
      This effect might ripple through the whole chain, one after the other, similar to a line of falling
       dominoes—hence the name.
                                                       20
                                                                                              EC 6601 - VLSI DESIGN
             Very high speeds can be achieved: only a rising edge delay exists, while
              𝑡𝑝𝐻𝐿 equals zero. The inverter can be sized to match the fan-out, which is already much smaller
              than in the complimentary static CMOS case, as only a single gate capacitance has to be
              accounted for per fan-out gate.
np-CMOS
             The Domino logic presented in the previous section has the disadvantage that each dynamic gate
              requires an extra static inverter in the critical path to make the circuit functional. np-CMOS,
              provides an alternate approach to cascading dynamic logic by using two flavors (n-tree and p-
              tree) of dynamic logic.
             In a p-tree logic gate, PMOS devices are used to build a pull-up logic network, including a
              PMOS evaluation transistor (Figure shown below).
              The NMOS predischarge transistor drives the output low during precharge. The output
              conditionally makes a 0       1 transition during evaluation depending on its inputs.
             np-CMOS logic exploits the duality between n-tree and p-tree logic gates to eliminate the
              cascading problem.
             If the n-tree gates are controlled by CLK, and p-tree gates are controlled using CLK, n-tree gates
              can directly drive p-tree gates, and vice-versa.
             Similar to Domino, n-tree outputs must go through an inverter when connecting to another n-tree
              gate.
             During the precharge phase (CLK = 0), the output of the n-tree gate, Out1, is charged np-CMOS
              logic exploits the duality between n-tree and p-tree logic gates to eliminate the cascading
              problem.
             If the n-tree gates are controlled by CLK, and p-tree gates are controlled using CLK, n-tree gates
              can directly drive p-tree gates, and vice-versa.
             Similar to Domino, n-tree outputs must go through an inverter when connecting to another n-tree
              gate. During the precharge phase (CLK = 0), the output of the n-tree gate, Out1, is charged.
Power Dissipation:
Sources of Power Dissipation:
Power dissipation in CMOS circuits comes from two components:
    Dynamic dissipation due to
           o Charging and discharging load capacitances as gates switch
           o “Short-circuit” current while both pMOS and nMOS stacks are partially ON
    Static dissipation due to
           o Subthreshold leakage through OFF transistors
           o Gate leakage through gate dielectric
           o Junction leakage from source/drain diffusions
           o Contention current in ratioed circuits
               Putting this together gives the total power of a circuit
                                                      21
                                                                                               EC 6601 - VLSI DESIGN
Dynamic Power
    Dynamic power consists mostly of the switching power, given in the below equation
                                                                   .
      The supply voltage VDD and frequency f are readily known by the designer. To estimate this power, one
       can consider each node of the circuit. The capacitance of the node is the sum of the gate, diffusion, and
       wire capacitances on the node.
      The effective capacitance of the node is its true capacitance multiplied by the activity factor. The
       switching power depends on the sum of the effective capacitances of all the nodes.
      Activity factors can be heavily dependent on the particular task being executed. For example, a
       processor in a cell phone will use more power while running video games than while displaying a
       calendar.
       Low power design involves considering and reducing each of the terms in switching power.
As VDD is a quadratic term, it is good to select the minimum VDD that can support the required frequency of
operation. Likewise, we choose the lowest frequency of operation that achieves the desired end performance.
The activity factor is mainly reduced by putting unused blocks to sleep. Finally, the circuit may be optimized to
reduce the overall load capacitance of each section.
    Activity Factor
         The activity factor is a powerful and easy-to-use lever for reducing power. If a circuit can be
            turned off entirely, the activity factor and dynamic power go to zero.
         Blocks are typically turned off by stopping the clock; this is called clock gating. When a block is
            on, the activity factor is 1 for clocks and substantially lower for nodes in logic circuits.
         The activity factor of a logic gate can be estimated by calculating the switching probability.
         Glitches can increase the activity factor.
    Clock Gating
          Clock gating ANDs a clock signal with an enable to turn off the clock to idle blocks.
          It is highly effective because the clock has such a high activity factor, and because gating the
            clock to the input registers of a block prevents the registers from switching and thus stops all the
            activity in the downstream combinational logic.
          Clock gating can be employed on any enabled register. The logic to compute the enable signal is
            easy; for example: a floating-point unit can be turned off when no floating-point instructions are
            being issued. Often, however, clock gating signals are some of the most critical paths of the
            chip.
          The clock enable must be stable while the clock is active (i.e., 1 for systems using positive edge-
            triggered flip-flops).
          Figure below shows how an enable latch can be used to ensure the enable does not change
            before the clock falls.
                                                       22
                                                                                            EC 6601 - VLSI DESIGN
          When a large block of logic is turned off, the clock can be gated early in the clock distribution
           network, turning off not only the registers but also a portion of the global network.
          The clock network has an activity factor of 1 and a high capacitance, so this saves significant
           power.
 Switching Probability
      The activity factor of a node is the probability that it switches from 0 to 1.
      This probability depends on the logic function. By analyzing the probability that each node is 1,
        we can estimate the activity factors.
      Define Pi to be the probability that node i is 1.
      𝑃 ̅𝑖 =1-𝑝𝑖 is the probability that node i is 0.
      𝑎𝑖 , the activity factor of node i, is the probability that the node is 0 on one cycle and 1 on the
        next. If the probability is uncorrelated from cycle to cycle is given as
                                                           ̅𝑖 𝑃𝑖
                                                       𝑎𝑖 =𝑃
 Glitches
       The switching probabilities computed in the previous section are only valid if the gates have zero
         propagation delay.
       In reality, gates sometimes make spurious transitions called glitches when inputs do not arrive
         simultaneously.
       For example, in Figure shown below, suppose ABCD changes from 1101 to 0111. Node n4 was
         1 and falls to 0.
 However, nodes n5, n6, n7, and Z may glitch before n4 changes, as shown below.
          The glitches cause extra power dissipation. Chains of gates are particularly prone to this
           problem.
          Glitching can raise the activity factor of a gate above 1 and can account for the majority of power
           in certain circuits such as ripple carry adders and array multipliers. Glitching power can be
           accurately assessed through simulations accounting for timing.
 Capacitance
      Switching capacitance comes from the wires and transistors in a circuit.
      Wire capacitance is minimized through good floor planning and placement (the locality aspect of
        structured design).
      Units that exchange a large amount of data should be placed close to each other to reduce wire
        lengths.
      Device-switching capacitance is reduced by choosing fewer stages of logic and smaller
        transistors.
      Minimum-sized gates can be used on non-critical paths.
                                                   23
                                                                                             EC 6601 - VLSI DESIGN
        Using a larger stage effort increases delay only slightly and greatly reduces transistor sizes.
         Therefore, gates that are large or have a high activity factor and thus dominate the power can be
         downsized with only a small performance impact.
      Similarly, registers should use small clocked transistors because their activity factor is an order
         of magnitude greater than transistors in combinational logic.
      The most energy-efficient way to drive long wires is with inverters or buffers rather than with
         more complex gates that have higher logical efforts.
      In general, large energy savings can be made by relaxing a circuit a small amount from the
         minimum delay point. Unfortunately, there are no closed-form methods to determine gate sizes
         that minimize energy under a delay constraint, even for circuits as simple as an inverter chain.
 Gate Sizing Under a Delay Constraint
      In many cases, we are willing to increase delay to save energy.
      First, consider a model to compute the energy of a circuit.
      If a unit inverter has gate capacitance 3C, then a gate with logical effort g, parasitic delay p, and
         drive x has gx times as much gate capacitance and px times as much diffusion capacitance.
      The switching energy of each gate depends on its activity factor, the diffusion capacitance of the
         gate, the wire capacitance 𝐶𝑤𝑖𝑟𝑒 , and the gate capacitance of all the stages it drives.
      The energy of the entire circuit is the sum of the energies of each gate.
          If wire capacitance is expressed in multiples of the capacitance of a unit inverter as c = 𝐶𝑤𝑖𝑟𝑒 /3C
           and we normalize energy for the capacitance and voltage of the process, the above equation
           becomes the sum of the effective capacitances of the nodes.
       Now, we seek to minimize E such that the worst-case arrival time is less than some delay D. The
        problem is still a posynomial and has a unique solution that can be found quickly by a good
        optimizer.
 Voltage
       Voltage has a quadratic effect on dynamic power.
       Therefore, choosing a lower power supply significantly reduces power consumption.
       As many transistors are operating in a velocity-saturated regime, the lower power supply may not
        reduce performance as much as long-channel models predict.
       The chip may be divided into multiple voltage domains, where each domain is optimized for the
        needs of certain circuits.
       Voltage also can be adjusted based on operating mode; for example, a laptop processor may
        operate at high voltage and high speed when plugged into an AC adapter, but at lower voltage
        and speed when on battery power.
       If the frequency and voltage scale down in proportion, a cubic reduction in power is achieved.
 Voltage Domains
            Some of the challenges in using voltage domains include converting voltage levels for
              signals that cross domains, selecting which circuits belong in which domain, and routing
              power supplies to multiple domains.
                 Figure above shows direct connection of inverters in two domains using high and low
                  supplies, VDDH and VDDL, respectively.
                 A gate in the VDDH domain can directly drive a gate in the VDDL domain. However, the
                  gate in the VDDL domain will switch faster than it would if driven by another VDDL
                  gate.
                                                    24
                                                                                          EC 6601 - VLSI DESIGN
                The timing analyzer must consider this when computing the contamination delay, lest a
                 hold time be violated.
                Unfortunately, the gate in the VDDL domain cannot directly drive a gate in the VDDH
                 domain.
                When n2 is at VDDL, the pMOS transistor in the VDDH domain has Vgs = VDDH –
                 VDDL. If this exceeds Vt, the pMOS will turn ON and burn contention current.
                Even if the difference is less than Vt, the pMOS will suffer substantially increased
                 leakage.
                This problem may be alleviated by using a high-Vt pMOS device in the receiver if the
                 voltage difference between domains is small enough.
                The standard method to handle voltage domain crossings is a level converter, shown in
                 the below figure.
               When A = 0, N1 is OFF and N2 is ON. N2 pulls Y down to 0, which turns on P1, pulling
                X up to VDDH and ensuring that P2 turns OFF.
               When A = 1, N1 is ON and N2 is OFF. N1 pulls X down to 0, which turns on P2, pulling
                Y up to VDDH.
               In either case, the level converter behaves as a buffer and properly drives Y between 0
                and VDDH without risk of transistors remaining partially ON.
               Unfortunately, the level converter costs delay and power at each domain crossing.
               An alterative approach is called clustered voltage scaling (CVS),in which two supply
                voltages can be used in a single block.
                                                 25
                                                                                              EC 6601 - VLSI DESIGN
                    Above figure shows a block diagram for a basic DVS system. The DVS controller takes
                     information from the system about the workload and/or the die temperature.
                    It determines the supply voltage and clock frequency sufficient to complete the workload
                     on schedule or to maximize performance without overheating.
                    A switching voltage regulator efficiently steps down Vin from a high value to the
                     necessary VDD. The core logic contains a phase-locked loop or other clock synthesizer to
                     generate the specified clock frequency.
                    The DVS controller determines the operating frequency, then chooses the lowest supply
                     voltage suitable for that frequency.
    Frequency
         Dynamic power is directly proportional to frequency, so a chip obviously should not run faster
           than necessary.
         Reducing the frequency also allows downsizing transistors or using a lower supply voltage,
           which has an even greater impact on power. The performance can be recouped through
           parallelism, especially if area is not as important as power.
    Short-Circuit Current
        Short-circuit power dissipation occurs as both pullup and pulldown networks are partially ON
           while the input switches.
    Resonant Circuits
         Resonant circuits seek to reduce switching power consumption by letting energy slosh back and
           forth between storage elements such as capacitors and inductors rather than dumping the energy
           to ground.
         The technique is best suited to applications such as clocks that operate at a constant frequency.
Static Power
                   Static power is consumed even when a chip is not switching.
                static power arises from subthreshold, gate, and junction leakage currents and contention
               current.
    Subthreshold Leakage
               Subthreshold leakage current flows when a transistor is supposed to be OFF. It is given by
                equation
For Vds exceeding a few multiples of the thermal voltage, it can be simplified to
                   where Ioff is the subthreshold current at Vgs = 0 and Vds = VDD, and S is the subthreshold
                   slope.
                  Ioff is a key process parameter defining the leakage of a single OFF transistor. It ranges
                   from about 100 nA/Rm for typical low-Vt devices to below 1 nA/Rm for high-Vt devices.
                                                      26
                                                                                           EC 6601 - VLSI DESIGN
              If Vds is small, Isub may decrease by roughly an order of magnitude from Ioff. kL is the
               body effect coefficient, which describes how the body effect modulates the threshold
               voltage.
             Raising the source voltage or applying a negative body voltage can further decrease
               leakage.
             Silicon on Insulator (SOI) circuits are attractive for low-leakage designs because they have
               a sharper subthreshold current rolloff.
   Gate Leakage
             Gate leakage occurs when carriers tunnel through a thin gate dielectric when a voltage is
              applied across the gate (e.g., when the gate is ON).
             Gate leakage is an extremely strong function of the dielectric thickness.
             It is normally limited to acceptable levels in the process by selection of the dielectric
              thickness.
             pMOS gate leakage is an order of magnitude smaller in ordinary SiO2 gates and can often
              be ignored, but it can be significant for other gate dielectrics.
   Junction Leakage
             Junction leakage occurs when a source or drain diffusion region is at a different potential
              from the substrate.
   Contention Current
             Static CMOS circuits have no contention current.
             However, certain alternative circuits inherently draw current even while quiescent.
             For example, pseudo-nMOS gates experience contention between the nMOS pulldowns and
              the always-on pMOS pullups when the output is 0.
             Current-mode logic and many analog circuits also draw static current. Such circuits should
              be turned OFF in sleep mode by disabling the pullups or current source.
   Static Power Estimation
             Static current estimation is a matter of estimating the total width of transistors that are
              leaking, multiplying by the leakage current per width, and multiplying by the fraction of
              transistors that are in their leaky state (usually onehalf ).
             Add the contention current if applicable. The static power is the supply voltage times the
              static current.
   Power Gating
             The easiest way to reduce static current during sleep mode is to turn off the power supply to
              the sleeping blocks.
             This technique is called power gating and is shown in Figure.
             The logic block receives its power from a virtual VDD rail, VDDV.
             When the block is active, the header switch transistors are ON, connecting VDDV to VDD.
             When the block goes to sleep, the header switch turns OFF, allowing VDDV to float and
              gradually sink toward 0.
             As this occurs, the outputs of the block may take on voltage levels in the forbidden zone.
             The output isolation gates force the outputs to a valid level during sleep so that they do not
              cause problems in downstream logic.
                                                  27
                             EC 6601 - VLSI DESIGN
                        28
     EC 6601 - VLSI DESIGN
29
     EC 6601 - VLSI DESIGN
30
     EC 6601 - VLSI DESIGN
31