Ali 13
Ali 13
4, August 2013
                                                                                           carry representation.
   Abstract—The presented work deals an ultra high-speed                                   Among these subcircuits, the second stage of partial
CMOS 4-2 compressor which is an essential part in fast digital                         product accumulation, often referred to as the carry save
arithmetic integrated circuits. Current-mode techniques have                           adder (CSA) tree [5]-[7], contributes most to the overall
been used to improve the overall performance of the
compressor. New fully differential proposed circuit improves
                                                                                       delay and a high fraction of silicon area. Therefore,
delay to less than 37% also reduces occupied area in                                   increasing the speed of CSA subcircuits is crucial to improve
comparison to other high-speed conventional compressor                                 the performance of the multiplier. Early designs of CSA tree
circuits. To evaluate the performance of the proposed circuit,                         used the Dadda‟s column compression technique [8] with the
conventional gate level structure has been chosen and all of the                       3-2 counters, or equivalently the full adders to reduce the
circuits have been simulated in 65-nm IBM CMOS process with                            partial product matrix. To reduce the delay of the partial
1.2V power supply voltage.
                                                                                       product accumulation stage, 4-2 compressors have been
  Index Terms—Digital logic, 4-2 compressor, CMOS, high                                widely employed nowadays for high speed multipliers.
speed, current-mode.                                                                   Because of their regular interconnection, these 4-2
                                                                                       compressors are ideal for the construction of regularly
                                                                                       structured Wallace tree with low complexity [7]-[9].
                           I. INTRODUCTION
   With ever-increasing possibilities that VLSI systems
provide to realize high-speed digital building blocks, there is
a trend toward using digital units to implement processing
algorithms even for executing the tasks that were originally
analog such as front-end communications. Microprocessors
and digital signal processors rely on efficient implementation
of fast arithmetic logic units to execute dedicated algorithms
such as convolution and filtering [1], [2]. Adders and
multipliers are most frequently and widely used arithmetic                                          Fig. 1. Block diagram of a 4-2 compressor.
cells in realizing these processors. In most of these
                                                                                          Several 4-2 compressor circuits have been proposed for
applications, multipliers dictate the overall performance of
                                                                                       high-speed applications [3]. In this paper, we begin with a
the system when speed and power consumption are
                                                                                       brief introduction of conventional compressors which are
considered as limiting factors. At the circuit design level,
                                                                                       composed of two full adders and each full adder optimized in
there is a great potential for optimization of these building
                                                                                       gate level to achieve high speed. After investigating the
blocks by voltage scaling or application of new CMOS logic
                                                                                       performances of this 4-2 compressor architecture and their
styles for the implementation of its embraced combinational
                                                                                       underlying building modules, a new very high speed current
circuits [3]. A fast array or tree multiplier is typically
                                                                                       mode fully differential 4-2 compressor is proposed. The 4-2
composed of three subcircuits:
                                                                                       compressors constructed with this current mode technique
1) A Booth encoder for the generation of a reduced number
                                                                                       exhibit superior speed efficiency comparing to other
     of partial products.
                                                                                       configurations.
2) A carry save structured accumulator for a further
     reduction of the partial products‟ matrix to only the
     addition of two operands.
                                                                                          II. THE CONVENTIONAL 4-2 COMPRESSOR STRUCTURE
3) A fast carry propagation adder (CPA) [4] for the
     computation of the final binary result from its stored                               4-2 compressor has five inputs and three outputs, as shown
                                                                                       in Fig. 1. The four inputs X0, X1, X2, and X3, and the output
                                                                                       have the same weight. Cin is the output carry of preceding
   Manuscript received November 17, 2012; revised January 28, 2013.                    module and Cout, the carry output of current stage is fed to the
   P. Aliparast is with the Department of Electrical Engineering, Heris
Branch, Islamic Azad University, Heris, Iran and the Faculty of Electrical             next compressor. The output Carry is weighted one binary bit
and Computer Engineering, University of Tabriz, Tabriz, Iran (e-mail:                  order higher. The compressor is governed by the following
p-aliparast@ tabrizu.ac.ir).                                                           basic equation:
   Z. D. Koozehkanani is with the Faculty of Electrical and Computer
Engineering, University of Tabriz, Tabriz, Iran (e-mail: zdaie@tabrizu.ac.ir).              X  X  X  X  C  Sum  2.(Carry  C )
                                                                                             0   1   2   3   in                   out 
   Farhad Nazari is with the Department of Electrical Engineering, Heris
Branch,     Islamic     Azad     University,       Heris,    Iran    (e-mail:
farhaad.nazari@herisiau.ac.ir).
                                                                                         Besides, to accelerate the carry save summation of the
Fig. 2. (a) Convential 4-2compressor scheme, (b) gate level strucure of a full
                                  adder.                                                               TABLE II: SIMPLIFICATION OF TABLE I
                                                                                                          DAC Current Sum Cout
                                                                                 594
                         International Journal of Computer Theory and Engineering, Vol. 5, No. 4, August 2013
X3 and Cin to be closed. Also, because of Cout=1 the switch                   it's useful for connected the outputs of the proposed
Cout will be closed. The current IDAC2 for this node will be:                 compressor to other static CMOS logic circuits without
                                                                              worry about drawing static current.
     I DAC 2  I DAC1  I  I  2 I  I DAC 2  I DAC1  2 I      (2)
1 1 1 0 1 0 1 1
                                                                        595
                           International Journal of Computer Theory and Engineering, Vol. 5, No. 4, August 2013
                                                                                                                                              X0         X0
                                                                                Fig. 10. Simulation results of proposed 4-2 compressor, (a)        and        , (b)
                                                                                       C out         C out
                                                                                               and           , (c) Carry and Carry , (d) Sum and Sum .
                                                                                                                VI. CONCLUSION
                                                                                   In this work, a new current mode fully differential 4-2
                                                                                compressor in 65-nm CMOS is presented and compared to a
                                                                                conventional structure compressor. Conventional structure in
                                                                                which the critical path delay reduction is done at gate level
                 Fig. 8. The output latch circuit scheme.
                                                                                has higher power consumption and delay. The proposed
                                                                                compressor shows the highest speed performance, while
                                                                                maintaining lower PDP (power-delay product). Also, the
                                                                                proposed circuit only requires 43 transistors and most of
                                                                                them are minimum size hence this structure occupies smaller
                                                                                area than other high-speed conventional 4-2 compressors. So
                                                                                this is an ideal subcircuit for implementing fast digital
                                                                                arithmetic units.
                                                                                                                   REFERENCES
                                                                                [1]   K. Prasad and K. K. Parhi, “Low-power 4-2 and 5-2 compressors,” in
                                                                                      Proceedings of 35th Asilomar Conference Onsignals, Systems and
                                                                                      Computers, vol. 1, pp. 129-133, 2001.
                                                                                [2]   P. J. Song and G. D. Micheli, “Circuit and architecture trade-offs for
                                                                                      high-speed multiplication,” IEEE Journal of Solid-State Circuits, vol
                                                                                      26, pp. 1184-1198, 1991.
             Fig. 9. 4-2 Compressor simulation environment.                     [3]   C. Chang, J. Gu, and M. Zhang, “Ultra low-voltage lowpower CMOS
                                                                                      4-2 and 5-2 compressors for fast arithmetic circuits,” IEEE Journal of
                                                                                      Transactions on Circuits and Systems Part I, vol. 51, pp. 1985-1997,
  Fig. 10(a) shows one of the inputs of the compressor (X0)                           2004.
when it changes state. In worst case condition first valid                      [4]   C. Nagendra, M. J. Irwin, and R. M. Owens, “Area-timepower
output after 38ps is Cout which is shown on Fig. 10(b). Then                          tradeoffs in parallel adders,” IEEE Journal of Transactions on Circuits
                                                                                      and Systems Part II, vol. 43, pp. 689-702, 1996.
50ps after input change, Carry will be valid and finally Sum
                                                                                [5]   S. Hsu, S. Mathew, M. Anders, B. Zeydel, V. Oklobdzija, R.
output of the succeeding compressor changes its state after                           Krishnamurthy, and S. Borkar, “A 110 GOPS/W 16-bit multiplier and
68ps. Simulation results show a reduced delay less than 68ps                          reconfigurable PLA loop in 90-nm CMOS,” IEEE Journal of
which is a considerable improvement compared to                                       Solid-State Circuits, vol. 41, pp. 256-264, 2006.
                                                                                [6]   S. F. Hsiao, M. R. Jiang, and J. S. Yeh, “Design of highspeed
conventional architecture. Table IV summarizes the                                    low-power 3-2 counter and 4-2 compressor for fast multipliers,”
comparison of two simulated structures with explained                                 Electronics Letters, vol. 34, no. 4, pp. 341-343, 1998.
environment in above.                                                           [7]   D. Radhakrishnan and A. P. Preethy, “Low-power CMOS pass logic
                                                                                      4-2 compressor for high-speed multiplication,” in Proceedings of 43rd
                                                                                      IEEE Midwest Symposium on Circuits System, vol. 3, pp. 1296-1298,
       TABLE IV: COMPARISON OF SIMULATED 4-2 COMPRESSORS
                                                                                      2000.
Structure                   Table Column Head
                                                                                [8]   Z. Wang, G. A. Jullien, and W. C. Miller, “A new design technique for
               Delay        Power          PDP              Number of                 column compression multipliers,” IEEE Transactions on Computers,
                (ps)        (µW)           (fJ)             Transistors               vol. 44, pp. 962-970, 1995.
Convential      180          110           19.8                 72              [9]   S. Veeramachaneni, K. Krishna, L. Avinash, S. Puppala, and M. B.
                                                                                      Srinivas, “Novel architectures for high-speed and low-power 3-2, 4-2
Porposed        68            48           3.26                 43                    and 5-2 compressors,” in Proceedings of IEEE 20th International
                                                                                      Conference on VLSI Design, 2007.
                                                                          596
                             International Journal of Computer Theory and Engineering, Vol. 5, No. 4, August 2013
                      P. Aliparast was born in Tabriz, Iran. He received B.Sc.                          Z. D. Koozehkanany received his Ph.D. degree in
                      degree from Islamic AZAD University of Tabriz, Tabriz,                            Electrical Engineering from the University of Brunel
                      Iran, in 2004 and M.Sc. degree from Urmia University,                             University of West London, UK in 1996. He has been
                      Urmia, Iran, in 2007, both in electronics engineering.                            teaching as an assistant professor in Urmia University
                      He is currently Ph.D. candidate in electronics                                    from 1996 to 2004 and in Tabriz University since 2004.
                      engineering in University of Tabriz, Tabriz, Iran. His                            At the time being he works as an associated professor in
                      research interests are analog and digital integrated                              Electronics Department in Tabriz University and his
                      circuit design for fuzzy and neural network applications,                         position is Dean of ECE faculty. His current scientific
analog integrated filter design and high-speed high-resolution digital to                               interests are analog integrated circuit design including
analog converters. He is currently with Islamic AZAD University of Heris,           Data Converters, RF IC Design and Optical Filter Design.
Heris, Iran, and Integrated Circuits Research Laboratory in Tabriz University,
Tabriz, Iran.                                                                                           F. Nazari was born in Heris, Iran. He received B.Sc.
                                                                                                        degree from Islamic AZAD University of Tabriz,
                                                                                                        Tabriz, Iran, in 2000 and M.Sc. degree from Iran
                                                                                                        University of Science and Technology, Tehran, Iran, in
                                                                                                        2008, both in Electrical engineering. He is currently
                                                                                                        with Islamic AZAD University of Heris, Heris, Iran.
597