Appl., Vol. 63, Pp. 199-223, 1978. Monthly, Vol. 82, Pp. 481-485, 1975
Appl., Vol. 63, Pp. 199-223, 1978. Monthly, Vol. 82, Pp. 481-485, 1975
[13] E. Ott, C. Grebogi, and J. A. Yorke, “Controlling chaos,” Phys. Rev. Lett., Most efforts for clock power reduction have focused on issues such
vol. 64, pp. 1196–1199, 1990. as reduced voltage swings, buffer insertion, and clock routing [2]. In
[14] F. R. Marotto, “Snap-back repellers imply chaos in < ,” J. Math. Anal. many cases, switching of the clock causes a great deal of unnecessary
Appl., vol. 63, pp. 199–223, 1978.
[15] T. Y. Li and J. A. Yorke, “Period three implies chaos,” Amer. Math. gate activity. For that reason, circuits are being developed with con-
Monthly, vol. 82, pp. 481–485, 1975. trollable clocks. This means that from the master clock other clocks
[16] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, are derived which, based on certain conditions, can be slowed down or
MD: Johns Hopkins Univ. Press, 1983. stopped completely with respect to the master clock. Obviously, this
[17] K. Shiraiwa and M. Kurata, “A generalization of a theorem of Marotto,”
in Proc. Japan Acad., vol. 55, 1980, pp. 286–289.
scheme results in power savings due to the following factors.
[18] T. Ushio and K. Hirai, “Chaos in nonlinear sampled-data control sys- 1) The load on the master clock is reduced and the number of re-
tems,” Int. J. Contr., vol. 38, pp. 1023–1033, 1983. quired buffers in the clock tree is decreased. Therefore, the power
[19] T. Ushio and K. Hirai, “Chaotic behavior in piecewise-linear sam-
dissipation of clock tree can be reduced.
pled-data control systems,” Int. J. Nonlinear Mech., vol. 20, pp.
493–506, 1985. 2) The flip flop receiving the derived clock is not triggered in idle
[20] L. Chen and K. Aihara, “Chaos and asymptotical stability in dis- cycles and the corresponding dynamic power dissipation is thus
crete-time neural networks,” Physica D, vol. 104, pp. 286–325, 1997. saved.
[21] G. Chen, S.-B. Hsu, and J. Zhou, “Snapback repellers as a cause of 3) The excitation function of the flip flop triggered by the derived
chaotic vibration of the wave equation with a van der Pol boundary con-
dition and energy injection at the middle of the span,” J. Math. Phys., clock may be simplified since it has a do not care condition in
vol. 39, pp. 6459–6489, 1998. the cycle when the flip flop is not triggered by the derived clock.
[22] E. Bollt, “Stability of order: An example of chaos “near” a linear map,” The clock-gating problem has been studied in [3]–[5]. In [3] the au-
Int. J. Bifurcat. Chaos, vol. 9, no. 10, pp. 2081–2090, 1999. thors presented a technique for saving power in the clock tree by stop-
ping the clock fed into idle modules. However, a number of engineering
issues related to the design of the clock tree were not addressed and,
hence, the proposed approach has not been adopted in practice. In [4], a
precomputation-based technique is used to generate a signal to control
the load enable pin of the flip flops in the data path. The control signal
Clock-Gating and Its Application to Low Power Design is derived by investigating the relationship between the latched input
of Sequential Circuits and the primary outputs of the combinational blocks in the data path.
The technique is useful only if the outputs of the block can be precom-
Qing Wu, Massoud Pedram, and Xunwei Wu puted (predicted) for certain input assignments. In [5], the authors use
a latch to gate the clock in control-dominated circuits. The problem is
Abstract—This paper models the clock behavior in a sequential circuit
that the additional latch receives the clock’s triggering signal, which re-
by a quaternary variable and uses this representation to propose and an- sults in extra power dissipation in the latch itself. Besides, this scheme
alyze two clock-gating techniques. It then uses the covering relationship results in the derived clock having a considerable skew with respect to
between the triggering transition of the clock and the active cycles of var- the master clock.
ious flip flops to generate a derived clock for each flip flop in the circuit. A This paper investigates various issues in deriving a gated clock from
technique for clock gating is also presented, which generates a derived clock
synchronous with the master clock. Design examples using gated clocks are a master clock. In Section II, a quaternary variable is used to model
provided next. Experimental results show that these designs have ideal logic the clock behavior and to discuss its triggering action on flip flops.
functionality with lower power dissipation compared to traditional designs. Based on this analysis, two clock-gating schemes are proposed. In Sec-
Index Terms—Clock gating, CMOS, logic, low power, sequential circuit, tion III, we use the covering relation between the clock and the transi-
synthesis. tion behaviors of the triggered flip flops to derive conditions for gating
the master clock. Two common sequential circuits, i.e., 8421 BCD
code up-counter and three-excess counter, are then described to illus-
I. INTRODUCTION trate the procedure for finding a derived clock. In Section IV, a new
The sequential circuits in a system are considered major contributors technique for clock gating is presented which generates a clock syn-
to the power dissipation since one input of sequential circuits is the chronous with the master clock. This eliminates the additional skew
clock, which is the only signal that switches all the time. In addition, between the master clock and the derived clock. Thus, the designed
the clock signal tends to be highly loaded. To distribute the clock and sequential circuit is a synchronous one. Finally, we present circuit sim-
control the clock skew, one needs to construct a clock network (often ulation results to prove the quality of the derived clock and its ability
a clock tree) with clock buffers. All of this adds to the capacitance of to reduce power dissipation in the circuit.
the clock net. Recent studies indicate that the clock signals in digital
computers consume a large (15–45%) percentage of the system power
II. DESCRIPTION FOR CLOCK BEHAVIOR AND CLOCK-GATING
[1]. Thus, the circuit power can be greatly reduced by reducing the
clock power dissipation. In a synchronous system, a flip flop is triggered by a certain direc-
tional transition of a clock signal. For the clock to be another signal
rather than the master clock, it must offer the same directional transi-
Manuscript received September 7, 1997; revised January 29, 1999. This work
was supported in part by DARPA under Contract F33615-95-C-1627 and in part tion to trigger the flip flop and it must be in step with the master clock.
by the NNSF of China under Grant 69773034. This paper was recommended by For the clock signal clk in a circuit if we denote its logic values
Associate Editor M. Glessner before and after a transition as clk(t) and clk+ (t), respectively, four
Q. Wu and M. Pedram are with the Department of Electrical Engineering- combinations can be used to express different behaviors of the clock as
shown in Table I, where a special quaternary variable clk
Systems, University of Southern California, Los Angeles, CA 90089 USA. ~ denotes the
X. Wu is with the Institute of Circuits and Systems, Ningbo University,
Ningbo, Zhejiang 315211, China. corresponding behavior. The four values are (0; ; ; 1), where ;
Publisher Item Identifier S 1057-7122(00)02319-9. represent two kinds of transition behaviors and 0, 1 represent two kinds
of holding behaviors. (Note that although they have the same forms as TABLE I
signal values 0 and 1, their meanings are different.) QUATERNARY REPRESENTATION FOR
BEHAVIORS OF A SIGNAL
In addition, we can also define a literal operation to identify the be-
havior of a clock
1; if clk
~ =b
clkb = ~ 6= b
(1)
0; if clk
where b 2 f0; ; ; 1g. Thus, the rising transition clk and the falling
transition clk of a clock are binary variables and can serve as argu-
ments of Boolean operations. For example, from Table I we have
(a)
(b)
(c)
Fig. 2. (a) Next state Karnaugh maps, (b) behavior Karnaugh maps, and (c) simplified next state Karnaugh maps.
Q1 = Q3 1 Q1 1 Q0 1 clk ; Q1 = Q3 1 Q1 1 Q0 1 clk (11) other conditions are. Therefore, the next state Karnaugh maps for flip
flops Q1 ; Q2 , and Q3 in Fig. 2(a) can be simplified to those shown in
Fig. 2(c).
From Fig. 2(a) and (c) we can get the corresponding both syn-
Q0 = Q0 1 clk ; Q0 = Q0 1 clk : (12) chronous and asynchronous designs, as shown in Fig. 3. (We say
Therefore, we have asynchronous, because now not all flip flops are triggered at the
same time.) Obviously, the corresponding combinational circuits are
Q3 + Q3 = (Q3 + Q2 1 Q1 ) 1 Q0 1 clk (13) simpler. Furthermore, since three flip flops Q3 ; Q2 ; Q1 have no
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 47, NO. 103, MARCH 2000 418
(a)
(b)
Fig. 3. Circuit realizations of BCD code up counter: (a) Synchronous design. (b) Asynchronous design.
+ Q
Q2 2 = (Q3 1 Q2 1 Q1 + Q1 1 Q0 ) 1 clk
=Q + Q
1 1 1 (Q3 1 Q2 )clk (22)
+Q (23)
Q1 1 = (Q3 1Q2 1Q0 +Q0 )1clk = Q0 +Q0 1(Q3 1Q2 )1clk
+ Q :
Q0 0 = clk (24)
+ Q
Q1 1 = [Q0 + (Q3 1 Q2 ) 1 clk ] : (26)
dynamic power dissipation half of the time when there is no clock
triggering, and because the simpler combinational circuits has lower Obviously, if we take clk3 =
Q2 , clk2 = [Q1 + (Q3 1 Q2 ) 1 clk ],
node capacitance, the asynchronous design is saving power. clk1 = [Q0 + (Q3 1 Q2 ) 1 clk ]
, and clk0 = clk , the covering relation
Example 2—Design of an Excess-Three Code Up Counter: The will set the excitation functions of all the four flip flops as Di = Qi
next state and state transition of an excess-three code up counter are (i = 0; 1; 2; 3). On the other hand, if we use the master clock for trig-
shown in Table III. Transition functions for each flip flop can be gering all four flip flops, we obtain the following complicated excita-
derived as below tion functions:
Q3
= Q2 1 Q1 1 Q0 1 clk
; Q3
= Q3 1 Q2 1 Q1 1 clk
(17) D3 = Q2 1 Q1 1 Q0 + Q3 1 Q2 ;
D2 = Q2 1 Q1 1 Q0 + Q3 1 Q1 + Q3 1 Q0 ;
Q2 = Q2 1 Q1 1 Q0 1 clk
; D1 = Q3 1 Q2 + Q1 1 Q0 + Q1 1 Q0 ;
Q 2 = (Q3 1 Q2 1 Q1 + Q2 1 Q1 1 Q0 ) 1 clk (18) D0 = Q0 :
Q1
= (Q3 1 Q2 1 Q0 + Q1 1 Q0 ) 1 clk
; Q1
= Q1 1 Q0 1 clk
Since the above D3 ; D2 ; and D1 have complicated forms, their corre-
sponding synchronous circuit realization will have a complicated com-
(19) binational circuit with more node capacitance and, hence, higher power
dissipation. On the other hand, the corresponding asynchronous circuit
Q0
= Q0 1 clk
;
Q0 = Q0 1 clk
: (20) realization with Di = Qi is much simpler. There is power saving since
419 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 47, NO. 103, MARCH 2000
(a) (b)
Fig. 4. BCD code up- counter by gating clock. (a) Asynchronous design. (b) Synchronous design.
the four flip flops are isolated from the triggering clock in the idle cy- We simulated the new design in Fig. 4(b) by SPICE 3f3 using
cles. 2 CMOS technology, which proved that the new design has an
ideal logic operation. We also measure the power dissipation of two
IV. SYNCHRONOUS DERIVED CLOCK AND ITS APPLICATION synchronous designs in Figs. 3(a) and 4(b). The power dissipation
diagrams are shown in Fig. 5 and prove that the new design reduces
In Example 1 of Section III, we take clki = Q0 , (i = 1; 2; 3). From the power dissipation by 22%.
(12) we can also write clki as clki = Q0 1 clk , (i = 1; 2; 3). Com-
paring this with (4), we have gi = 0, pi = Q0 , and clki = Q0 1 clk . Ac-
V. CONCLUSION
cording to this form of the derived clock we get another asynchronous
design, as shown in Fig. 4(a). At first glance, the circuit has one AND The behavioral description of a clock is the basis to analyze its trig-
gate more than the design in Fig. 3(b). Furthermore, it appears that the gering action on flip flops. Based on it, two types of clock-gating were
derived clock clk1-3 may have an increased phase delay. However, the introduced to form a derived clock. We showed that the procedure for
timing relation shown in Fig. 1 indicates that the transition delay of designing a derived clock could be systematized so as to isolate the trig-
clk1-3 is independent of the delay of the Q0 output. The delay between gered flip flop from the master clock in its idle cycles. The achieved
clk and clk1-3 is only 2tg (tg is the average delay of a gate), which is power saving can be significant. However, the additional clock skew
less than the delay of the flip-flop output. Based on the above discus- may lower the maximum operation frequency. Based on analyzing the
sion, we can rewrite clki = Q0 1 clk as clki3 = Q0 + clk . Furthermore, timing relation in clock gating, we then presented a new technique for
we take clk from the previous stage of the clock tree. Thus, we obtain generating the derived clock, which is synchronous with the master
a new design, as shown in Fig. 4(b). If we consider delay of the in- clock. Circuit simulation proved the quality of the new derived clock
verter and NOR gate being roughly the same, the falling transitions of and its capability to reduce power dissipation. More work is needed to
clk and clk13-3 in the circuit will occur simultaneously. This design is develop a systematic design procedure and an algorithm for realizing
synchronous in the sense that all flip flops are triggered in synchrony the proposed design principles for clock gating in large sequential cir-
with the global clock. cuits. The engineering issues mentioned in [3] have thus been resolved
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATION, VOL. 47, NO. 3, MARCH 2000 420
for practical application, opening the path for widespread adoption of and current noise source in are referred back to the input terminals. Fig.
the clock-gating technique in low-power design of custom IC’s. 1(b) is commonly adopted when the positive terminal is grounded. To
simplify calculation, in some models only en is adopted and in is ne-
REFERENCES glected [4], [5]. The advantage of these equivalent circuits is simplicity
and convenience. However, in the area of small-signal detection, the re-
[1] M. Pedram, “Power minimization in IC design: Principles and appli-
cations,” ACM Trans. Design Automation, vol. 1, no. 1, pp. 3–56, Jan. quirements of noise specifications in the course of calculation and de-
1996. sign of a low-noise circuit become higher. The shortcoming of Fig. 1(a)
[2] G. Friedman, “Clock distribution design in VLSI circuits: An overview,” and (b) is obvious: the correlation between voltage noise source en and
in Proc. IEEE ISCAS, San Jose, CA, May 1994, pp. 1475–1478. current noise source in is not considered, giving rise to inaccuracy.
[3] E. Tellez, A. Farrah, and M. Sarrafzadeh, “Activity-driven clock design
for low power circuits,” in Proc. IEEE ICCAD, San Jose, CA, Nov. 1995,
At present, methods for measuring en and in [6], [7] use a small
pp. 62–65. value of source resistance to measure an equivalent input voltage noise
[4] M. Alidina and J. Monteiro et al., “Precomputation-based sequential en and use a very large source resistance to measure an equivalent
logic optimization for low power,” IEEE Trans. VLSI Syst., vol. 2, pp. input current noise in . Because the correlation is not considered in this
426–436, Dec. 1994. method, the measuring method is only an approximate solution. In fact,
[5] L. Benini and G. De Micheli, “Symbolic techniques of clock-gating
logic for power optimization of control-oriented synchronous net- it can be calculated that the neglect of the correlation item can lead to,
works,” in Proc. European Design Test Conf., Paris, France, 1997, pp. at most, a 40% measurement error [7]. Thus, it is commonly believed
514–520. that the method can give only an approximate solution, and cannot give
an accurate solution.
To solve this problem, a more complete op amp noise model is pre-
sented in this paper, based on Fig. 1(c), which considers the correla-
tion between en and in for each input terminal and then the formula
of equivalent input noise power spectrum density for the inverting and
A Complete Operational Amplifier Noise Model: Analysis noninverting input terminals can be derived. With different source re-
and Measurement of Correlation Coefficient sistors, the noise model parameters of an op amp have been measured
by means of a low-frequency noise measuring system and the noise
Jiansheng Xu, Yisong Dai, and Derek Abbott model parameters, including the spectral correlation coefficient, are
presented.
Abstract—In contrast to the general operational amplifier (op amp)
noise model widely used, we propose a more complete and applicable noise
model, which considers the correlation between equivalent input voltage II. A COMPLETE NOISE MODEL AND ITS EQUIVALENT INPUT NOISE
noise source and current noise source . Based on the super-posi- POWER SPECTRUM
tion theorem and equivalent circuit noise theory, our formulae for the
equivalent input noise spectrum density of an op amp noise are applied In order to improve precision of the noise model, based on Fig. 1(a)
to both the inverting and noninverting input terminals. By measurement, and (b), we use one equivalent voltage noise source and one equiva-
we demonstrate that the new expressions are significantly more accurate. lent current noise source at each op amp input terminal in our model.
In addition, details of the measurement method for our noise model
parameters are given. A commercial operational amplifier (Burr–Brown
Second, it should be pointed out that the correlation between en and
OPA37A) is measured by means of a low-frequency noise power spectrum in at each input terminal should be considered for completeness. Let
measuring system and the measured results of its noise model parameters,
=
1 + j
2 be the spectral correlation coefficient (SCC), given
including the spectral correlation coefficient (SCC), are finally given. by
= Sei (f )= Se (f )Si (f ), in which Se (f ), Si (f ) are the power
Index Terms—Noise models, operational amplifiers, spectral correlation spectral densities of the voltage noise en and current noise in , respec-
coefficient. tively, and Sei (f ) is the cross-spectral density [8] between en1 and in1 .
Also let
0 =
10 + j
20 be the SCC between en2 and in2 , in which in1
and in2 are current noises at two input terminals of an op amp. Thus, it
I. INTRODUCTION can be concluded that there is no correlation between them. Fig. 1(c) is
Recently, integrated operational amplifiers (op amps) have been used a complete op amp noise model including eight parameters, i.e., en1 ,
in more and more practical applications. With the continual improve- in1 ,
=
1 + j
2 , en2 , in2 , and
0 =
10 + j
20 , each of which varies
ment of their noise characteristics, they have been commonly found in with frequency. It is obvious that all these parameters cannot be calcu-
the design of preamplifier circuits. For this reason, the calculation of lated by use of internal noise sources of an op amp, for noise sources
the circuit noise of an op amp and its low-noise design are paid more in an op amp are so many that it is very difficult to calculate them sep-
attention than ever. At present, the noise models [1]–[3] of the over- arately and accurately. However, they can be calculated by measuring
whelming majority of op amps are illustrated as in Fig. 1(a) and (b). equivalent input noise power spectrum with different source resistors.
The commonly accepted two-port noise model is in Fig. 1(a). The op Now the relation between the eight parameters and equivalent input
amp is considered noiseless and the equivalent voltage noise source en noise power spectrum can be derived as follows.
Let Z1 = R1 + jX1 , Z2 = R2 + jX2 and Zf = Rf + jXf ,
e12 = 4T R1 , e22 = 4T R2 , i2f = 4T =Rf , where e12 and e22 are the
Manuscript received June 1, 1998; revised May 20, 1999. This work was
supported in part by the China Natural Science Foundation under Contract thermal noise spectrum of resistance R1 and R2 , i2f is the current noise
69672023. This paper was recommended by Associate Editor K. Halonen. spectrum of resistance Rf . According to Fig. 2(a), its equivalent noise
J. Xu and Y. Dai are with the School of Information Science and Engineering, circuit can be drawn as in Fig. 2(b).
Jilin University of Technology, Changchun, China 130025. According to the superposition theorem, the gain of each noise
D. Abbott is with the Centre for Biomedical Engineering (CBME), Electrical
and Electronic Engineering Department, the University of Adelaide, Adelaide, source can be calculated first and then the total output noise can be
SA5005, Australia. obtained by addition of each noise source power. Multiplication by
Publisher Item Identifier S 1057-7122(00)02323-0. the square of the noise bandwidth finally gives the output noise power