VLSI DESIGN
18EC72
Module 4
Sequential circuit design:
Introduction, Circuit design for Latches and flip-flops
(10.1 and 10.3.1 to 10.3.4 of TEXT2)
Dynamic logic circuits:
Introduction ,Basic principles of pass transistor circuit,
Synchronous dynamic circuit techniques,
Dynamic CMOS circuit Techniques.
(9.1,9.2,9.4 to 9.5 of TEXT 1)
10.1 Introduction
• Sequential circuits in which the output depends on previous
as well as current inputs.
• Finite state machines and pipelines are two important
examples of sequential circuits.
• Sequential circuits are usually designed with flip-flops or
latches, which are sometimes called memory elements, that
hold data called tokens.
• The purpose of these elements is not really memory; instead,
it is to enforce sequence, to distinguish the current token
from the previous or next token. Therefore, we will call them
sequencing elements [Harris01a].
• Sequencing elements delay tokens that arrive too early,
preventing them from catching up with previous tokens.
• Unfortunately, they inevitably add some delay to that are
already critical, decreasing the performance of the system.
This extra delay is called sequencing overhead.
10.1 Introduction
• Static circuits refer to gates that have no clock input, such as
complementary CMOS, pseudo-nMOS, or pass transistor logic.
• Dynamic circuits refer to gates that have a clock input,
especially domino logic.
• To complicate terminology, sequencing elements themselves
can be either static or dynamic.
• A sequencing element with static storage employs some sort
of feedback to retain its output value indefinitely.
• An element with dynamic storage generally maintains its
value as charge on a capacitor that will leak away if not
refreshed for a long period of time.
• The choices of static or dynamic for gates and for sequencing
elements can be independent.
10.3.1 Conventional CMOS Latches
• Figure 10.17(a) shows a very simple
transparent latch built from a single transistor.
• It is compact and fast but suffers four limitations.
1. The output does not swing from rail-to-rail (i.e., from GND to
VDD); it never rises above VDD – Vt.
2. The output is also dynamic; in other words, the output
floats when the latch is opaque. If it floats long enough, it
can be disturbed by leakage.
3. D drives the diffusion input of a pass transistor directly,
leading to potential noise issues and making the delay
harder to model with static timing analyzers.
4. Finally, the state node is exposed, so noise on the output
can corrupt the state.
10.3.1 Conventional CMOS Latches
• Figure 10.17(b) uses a CMOS transmission gate in place of
the single nMOS pass transistor to offer rail-to-rail output
swings.
• It requires a complementary clock bar , which can be
provided as an additional input or locally generated from
through an inverter.
• Figure 10.17(c) adds an output inverter so that the state
node X is isolated from noise on the output.
• Of course, this creates an inverting latch.
• Figure 10.17(d) also behaves as an inverting latch
with a buffered input but unbuffered output.
• The inverter followed by a transmission gate is
essentially equivalent to a tristate inverter but has a
slightly lower logical effort because the output is
driven by both transistors of the transmission gate in
parallel.
• Figure 10.17(c) and (d) are both fast dynamic
latches.
• In modern processes, subthreshold leakage is large enough
that dynamic nodes retain their values for only a short time,
especially at the high temperature and voltage encountered
during burn-in test.
• Therefore, practical latches need to be staticized, adding
feedback to prevent the output from floating, as shown in
Figure 10.17(e).
• When the clock is 1, the input transmission gate is
ON, the feedback tristate is OFF, and the
latch is transparent.
When the clock is 0, the input
transmission gate turns OFF.
• Figure 10.17(f ) adds an input inverter so the input is
a transistor gate rather than unbuffered diffusion.
• Unfortunately, both (e) and (f ) reintroduced output
noise sensitivity: A large noise spike on the output
can propagate backward through the feedback gates
and corrupt the state node X.
• Figure 10.17(g) is a robust transparent latch that addresses all
of the deficiencies mentioned so far: The latch is static, all
nodes swing rail-to-rail, the state noise is isolated from output
noise, and the input drives transistor gates rather than
diffusion.
• Such a latch is widely used in standard cell applications
including the Artisan standard cell library [Artisan02].
• It is recommended for all but the most performance- or area-
critical designs.
• In semicustom datapath applications where input
noise can be better controlled, the inverting latch of
Figure 10.17(h) may be preferable because it is
faster and more compact.
• Intel uses this as a standard datapath latch
[Karnik01].
• Figure 10.17(i) shows the jamb latch, a variation of Figure 10.17(g) that
reduces the clock load and saves two transistors by using a weak feedback
inverter in place of the tristate.
• This requires careful circuit design to ensure that the tristate is strong
enough to overpower the feedback inverter in all process corners.
• Figure 10.17( j) shows another jamb latch commonly used in
register files and Field Programmable Gate Array (FPGA) cells.
Many such latches read out onto a single Dout wire and only
one latch is enabled at any given time with its RD signal.
• The Itanium 2 processor uses the latch shown in Figure
10.17(k) [Naffziger02].
• The dynamic latch of Figure 10.17(d) can also be drawn as a clocked
tristate, as shown in Figure 10.18(a).
• Such a form is sometimes called clocked CMOS (C2MOS) [Suzuki73].
• The conventional form using the inverter and transmission gate is
slightly faster because the output is driven through the nMOS and pMOS
working in parallel.
• Figure 10.18(b) shows another form of the tristate that swaps the data
and clock terminals.
• It is logically equivalent but electrically inferior because toggling D while
the latch is opaque can cause charge-sharing noise on the output node
[Suzuki73].
10.3.2 Conventional CMOS Flip-Flops
• Figure 10.19(a) shows a dynamic inverting flip-flop built
from a pair of back-to-back dynamic latches [Suzuki73].
• Either the first or the last inverter can be removed to
reduce delay at the expense of greater noise sensitivity on
the unbuffered input or output.
• Figure 10.19(b) adds feedback and another inverter to produce a
noninverting static flip-flop.
• The PowerPC 603 microprocessor datapath used this flip-flop design
without the input inverter or Q output [Gerosa94].
• Most standard cell libraries employ this design because it is simple,
robust, compact, and energy-efficient [Stojanovic99].
• However, some of the alternatives described later are faster.
• Figure 10.20(a) redraws Figure 10.19(a) with a built-in clock inverter.
When falls, both the clock and its complement are momentarily low
as shown in Figure 10.20(b), turning on the clocked pMOS transistors in
both transmission gates.
• If the skew (i.e., inverter delay) is too large, the data can sneak through
both latches on the falling clock edge, leading to incorrect operation.
• Figure 10.20(c) shows a C2MOS dynamic flip-flop built using C2MOS
latches rather than inverters and transmission gates [Suzuki73].
• Because each stage inverts, data passes through the nMOS stack of one
latch and the pMOS of the other, so skew that turns on both clocked
pMOS transistors is not a hazard.
• However, the flip-flop is still susceptible to failure from very slow edge
rates that turn both transistors partially ON.
• The same skew advantages apply even when an even number of
inverting logic stages are placed between the latches; this technique is
sometimes called NO RAce (NORA) [Gonclaves83].
• In practice, most flip-flop designs carefully control the delay of the clock
inverter so the transmission gate design is safe and slightly faster than
C2MOS [Chao89].
• For VLSI class projects where careful clock skew analysis is
too much work and performance is less important, a
reasonable alternative is to use a pair of two-phase
nonoverlapping clocks instead of the clock and its
complement, as shown in Figure 10.21.
• The flip-flop captures its input on the rising edge of1.
• By making the nonoverlap large enough, the circuit will work
despite large skews.
• However, the nonoverlap time is not used by logic, so it
directly increases the setup time and sequencing overhead
of the flip-flop (see Exercise 10.8).
• The layout for the flip-flop is shown on the inside front cover
and is readily adapted to use a single clock.
• Observe how diffusion nodes are shared to reduce parasitic
capacitance.
10.3.3 Pulsed Latches
• A pulsed latch can be built from a conventional CMOS
transparent latch driven by a brief clock pulse.
• Figure 10.22(a) shows a simple pulse generator, sometimes
called a clock chopper or one-shot [Harris01a].
• The pulsed latch is faster than a regular flip-flop because it
involves a single latch rather than two and because it allows
time borrowing.
• It can also consume less energy, although the pulse generator
adds to the energy consumption (and is ideally shared across
multiple pulsed latches for energy and area efficiency). The
drawback is the increased hold time.
• The Partovi pulsed latch in Figure 10.23 eliminates the need
to distribute the pulse by building the pulse generator into
the latch itself [Partovi96, Draper97].
• The weak crosscoupled inverters in the dashed box staticize
the circuit, although the latch is susceptible to back-driven
output noise on Q or Q unless an extra inverter is used to
buffer the output.
• The Partovi pulsed latch was used on the AMD K6 and
Athlon [Golden99], but is slightly slower than a simple latch
[Naffziger02].
• It was originally called an Edge Triggered Latch (ETL), but
strictly speaking is a pulsed latch because it has a brief
window of transparency.
10.3.4 Resettable Latches and Flip-Flops
• Most practical sequencing elements require a reset signal to
enter a known initial state on startup and ensure
deterministic behavior. Figure 10.24 shows latches and flip-
flops with reset inputs.
• There are two types of reset: synchronous and
asynchronous. Asynchronous reset forces Q low
immediately, while synchronous reset waits for the clock.
Synchronous reset signals must be stable for a setup and
hold time around the clock edge while asynchronous reset is
characterized by a propagation delay from reset to output.
• Synchronous reset simply requires ANDing the input D with
reset. Asynchronous reset requires gating both the data and
the feedback to force the reset independent of the clock.
• The tristate NAND gate can be constructed from a NAND
gate in series with a clocked transmission gate.
• Settable latches and flip-flops force the output high instead of
low. They are similar to resettable elements of Figure 10.24 but
replace NAND with NOR and reset with set.
• Figure 10.25 shows a flip-flop combining both asynchronous
set and reset.