0% found this document useful (0 votes)
38 views5 pages

Ali 13

Uploaded by

norassamoosavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views5 pages

Ali 13

Uploaded by

norassamoosavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Computer Theory and Engineering, Vol. 5, No.

4, August 2013

An Ultra High Speed Digital 4-2 Compressor in 65-nm


CMOS
Peiman Aliparast, Ziaadin D. Koozehkanani, and Farhad Nazari

 carry representation.
Abstract—The presented work deals an ultra high-speed Among these subcircuits, the second stage of partial
CMOS 4-2 compressor which is an essential part in fast digital product accumulation, often referred to as the carry save
arithmetic integrated circuits. Current-mode techniques have adder (CSA) tree [5]-[7], contributes most to the overall
been used to improve the overall performance of the
compressor. New fully differential proposed circuit improves
delay and a high fraction of silicon area. Therefore,
delay to less than 37% also reduces occupied area in increasing the speed of CSA subcircuits is crucial to improve
comparison to other high-speed conventional compressor the performance of the multiplier. Early designs of CSA tree
circuits. To evaluate the performance of the proposed circuit, used the Dadda‟s column compression technique [8] with the
conventional gate level structure has been chosen and all of the 3-2 counters, or equivalently the full adders to reduce the
circuits have been simulated in 65-nm IBM CMOS process with partial product matrix. To reduce the delay of the partial
1.2V power supply voltage.
product accumulation stage, 4-2 compressors have been
Index Terms—Digital logic, 4-2 compressor, CMOS, high widely employed nowadays for high speed multipliers.
speed, current-mode. Because of their regular interconnection, these 4-2
compressors are ideal for the construction of regularly
structured Wallace tree with low complexity [7]-[9].
I. INTRODUCTION
With ever-increasing possibilities that VLSI systems
provide to realize high-speed digital building blocks, there is
a trend toward using digital units to implement processing
algorithms even for executing the tasks that were originally
analog such as front-end communications. Microprocessors
and digital signal processors rely on efficient implementation
of fast arithmetic logic units to execute dedicated algorithms
such as convolution and filtering [1], [2]. Adders and
multipliers are most frequently and widely used arithmetic Fig. 1. Block diagram of a 4-2 compressor.
cells in realizing these processors. In most of these
Several 4-2 compressor circuits have been proposed for
applications, multipliers dictate the overall performance of
high-speed applications [3]. In this paper, we begin with a
the system when speed and power consumption are
brief introduction of conventional compressors which are
considered as limiting factors. At the circuit design level,
composed of two full adders and each full adder optimized in
there is a great potential for optimization of these building
gate level to achieve high speed. After investigating the
blocks by voltage scaling or application of new CMOS logic
performances of this 4-2 compressor architecture and their
styles for the implementation of its embraced combinational
underlying building modules, a new very high speed current
circuits [3]. A fast array or tree multiplier is typically
mode fully differential 4-2 compressor is proposed. The 4-2
composed of three subcircuits:
compressors constructed with this current mode technique
1) A Booth encoder for the generation of a reduced number
exhibit superior speed efficiency comparing to other
of partial products.
configurations.
2) A carry save structured accumulator for a further
reduction of the partial products‟ matrix to only the
addition of two operands.
II. THE CONVENTIONAL 4-2 COMPRESSOR STRUCTURE
3) A fast carry propagation adder (CPA) [4] for the
computation of the final binary result from its stored 4-2 compressor has five inputs and three outputs, as shown
in Fig. 1. The four inputs X0, X1, X2, and X3, and the output
have the same weight. Cin is the output carry of preceding
Manuscript received November 17, 2012; revised January 28, 2013. module and Cout, the carry output of current stage is fed to the
P. Aliparast is with the Department of Electrical Engineering, Heris
Branch, Islamic Azad University, Heris, Iran and the Faculty of Electrical next compressor. The output Carry is weighted one binary bit
and Computer Engineering, University of Tabriz, Tabriz, Iran (e-mail: order higher. The compressor is governed by the following
p-aliparast@ tabrizu.ac.ir). basic equation:
Z. D. Koozehkanani is with the Faculty of Electrical and Computer
Engineering, University of Tabriz, Tabriz, Iran (e-mail: zdaie@tabrizu.ac.ir). X  X  X  X  C  Sum  2.(Carry  C )
0 1 2 3 in out 
Farhad Nazari is with the Department of Electrical Engineering, Heris
Branch, Islamic Azad University, Heris, Iran (e-mail:
farhaad.nazari@herisiau.ac.ir).
Besides, to accelerate the carry save summation of the

DOI: 10.7763/IJCTE.2013.V5.756 593


International Journal of Computer Theory and Engineering, Vol. 5, No. 4, August 2013

partial products, it is imperative that the output Cout be


independent of the input Cin.
The conventional architecture of a 4-2 compressor consists
of two serially connected full adders, as shown in Fig. 2(a).
Straightforward implementation of this circuit leads to a long
critical path delay. Also because of uneven delay profiles of
outputs from different inputs, the CSA tree constructed from
such cells generates a lot of glitches. So, optimization at gate
level is suggested to alleviate these problems [3]. The
optimized gate level circuit for each full adder has been
illustrated in Fig. 2(b). For NAND gates in this Fig, CMOS Fig. 3. Proposed new full adder structure.
static circuits and for realizing XOR gates, transmission gate
(TG) circuits have been used. In overall for implementing 4-2 TABLE I: TRUTH TABLE OF THE FULL ADDER
compressor with this method, 72 transistors are required. Cin X0 X1 DAC Sum Cout
Current
0 0 0 0 0 0
0 0 1 I 1 0
0 1 0 I 1 0
0 1 1 2I 0 1
1 0 0 I 1 0
1 0 1 2I 0 1
1 1 0 2I 0 1
1 1 1 3I 1 1

Fig. 2. (a) Convential 4-2compressor scheme, (b) gate level strucure of a full
adder. TABLE II: SIMPLIFICATION OF TABLE I
DAC Current Sum Cout

III. PROPOSED NEW 4-2 COMPRESSOR STRUCTURE 0 0 0


In this section, we describe the new method for I 1 0
implementing 4-2 compressor of Fig. 2(a). This method is 2I 0 1
based on current mode circuits and adds the currents in
3I 1 1
analog form.
A. New Full Adder Architecture B. New 4-2 Compressor Architecture
If we consider the operation of a full adder, we can replace Considering the structure of a 4-2 compressor which is
it with a current mode digital to analog converter (DAC) constructed using two full adders (Fig. 2(a)). IDAC1 and IDAC2
which produces a current proportional to the inputs of full are analog currents corresponding to the outputs of each full
adder. Table I shows the truth table of the operation and the adder. To realize this compressor the full adder structure
amount of output current. If we examine DAC current proposed in section II. A. has been used (Fig. 3). But note that
column in Table I and the state of outputs Sum and Cout we the Sum output of first full adder is directly connected to
can change it to a simpler from as shown in Table II. If we second full adder so there is no need to current to voltage
pay attention to this truth table, we can easily set the required conversion. By subtracting 2I from IDAC1 current
output bits of full adder. The design procedure is as follow: corresponding to the Sum output of first full adder could be
according to the Table II it is enough that for currents higher created and the use of an additional comparator could be
than 1.5I with proper margin, we set Cout to 1 and if the avoided. Fig. 4 illustrates the proposed structure for 4-2
current is odd we will set Sum to 1. To do this, it is enough to compressor.
decrease 2I from DAC current when the DAC current is more At first, input bits X0, X1, X2 produce the Cout then using
than 2I. So in this manner, we can compare the output current Cout bit, first full adder DAC current (IDAC1), also Cin and X3
bits, the Carry output is generated. Finally, Carry and second
of the DAC with corresponding currents and set the required
full adder DAC current produce the Sum bit. To understand
bit and when DAC current is less than 2I we compare it with
how this circuit operates, we use one row of truth table as an
0.5I. In this case, if it is higher than 0.5I we will set Sum to 1.
example which is shown in Table III. For first full adder
For comparison of currents we have used two series current
consider inputs as X0 = 1, X1 = 0, X2 = 1, so in the circuit of
source that one of them works as a source and the other one
Fig. 4, switches X0, X2 will be closed and switch X1 will be
works as a sink. It is clear that if the source current is more
open and it results in IDAC1=2I. To generate Cout, it is enough
than the sink, voltage of the middle node goes to high and
to compare this current with 1.5I. Because, it is higher than
vice versa. Fig. 3 illustrates the proposed structure for a full 1.5I then Cout will be 1. Now, for the inputs of second full
adder. adder considering Cin=1, IDAC1=2I and X3=1, causes switches

594
International Journal of Computer Theory and Engineering, Vol. 5, No. 4, August 2013

X3 and Cin to be closed. Also, because of Cout=1 the switch it's useful for connected the outputs of the proposed
Cout will be closed. The current IDAC2 for this node will be: compressor to other static CMOS logic circuits without
worry about drawing static current.
I DAC 2  I DAC1  I  I  2 I  I DAC 2  I DAC1  2 I (2)

Again this current is higher than 1.5I which leads


to Carry  0 . As a results, „„Carry‟‟ switch will be closed and
current 2I will subtracted from IDAC2 and compared to 0.5I.
Because 0.5I is higher than zero will cause Sum to be 0. For
this example, inputs and the results have been summarized in
Table III. The same description can be used for other sets of
inputs.
Fig. 5. Cout generation circuit of proposed 4-2 compressor.

Fig. 4. Proposed new 4-2 compressor architecture.


Fig. 6. Carry generation circuit of proposed 4-2 compressor.
TABLE III: TRUTH TABLE FOR ONE ROW EXAMPLE OF THE PROPOSED 4-2
COMPRESSOR
Cin X3 X2 X1 X0 Sum Carry Cout

1 1 1 0 1 0 1 1

IV. CIRCUIT IMPLEMENTATIONS OF THE PROPOSED 4-2


COMPRESSOR ARCHITECTURE
As shown in Fig. 4, the proposed 4-2 compressor circuits
Fig. 7. Sum generation circuit of proposed 4-2 compressor.
compose of three sections (Cout generation circuit, Carry
generation circuit and Sum generation circuit). Fig. 5 shows
the Cout generation circuit. As it is clear from the Fig. 5 each
switches of X0, X1 and X2 is replaced with a differential switch.
The reason for using differential switch instead of single V. SIMULATION PERFORMANCE
MOS switch is the advantages of these switches in very high Fig. 9 shows the simulation environments for the 4-2
speed operation and producing signal and its complement at compressor. Each input is driven by a minimum size inverter
same time. On the other hand differential switches can be signal. For output load, the proposed 4-2 compressor used
switch with almost 2  V where V is overdrive voltage latch circuit that is shown in Fig. 8. Conventional compressor
of MOS transistor, so it can follow very small changes in structure used a minimum size inverter in output as a load.
differential input voltages. For implementation of the current This consideration provides a realistic simulation
sources in Fig. 4, they have been replaced with current mirror environment reflecting the compressor operation in actual
circuits. For a tradeoff between power consumption and applications. The simulation environments of 4-2 compressor
speed, we have chosen 2.5µA for the value of I. It is clear that (Fig. 9) consist of two cascaded 4-2 compressors. These
increase of I leads to increase of power and speed of compressors are running in parallel to simulate an actual
compressor. With same method for Cout generation circuit we compressor stage in the CSA tree. The dashed lines in Fig. 9
can implement circuits of Carry generation and Sum indicate the scenario of such potential critical paths with
generation sections. It is enough that each of switches delay time for each of them. For delay numbers, critical path
replaces with differential switches and each current source (from input bits to the Sum bit of the neighboring compressor)
replace with current mirror transistors. Figs. 6 and 7 show the has been considered. The delay is measured from the earliest
Carry generation circuit and Sum generation circuit input signal converge with its complement to the latest output
respectively. Fig. 8 shows the output latch scheme that has signal converge with its complement. The worst case delay is
been used as output load for the proposed 4-2 compressor. largest delay among all input data. For a fair comparison, a
Output voltage signal of this circuit can change rail to rail conventional structure using two full adders and suggested
while its input doesn‟t need a rail to rail voltage signal. Thus current steering structure implemented and have been

595
International Journal of Computer Theory and Engineering, Vol. 5, No. 4, August 2013

simulated using a 65-nm IBM CMOS process with 1.2V


power supply voltage. Fig. 10 shows simulation results for
the proposed compressor in worse case. Random input data
with a rate of 1GHz has been fed to the inputs of the
compressor. It should be noted that the simulation frequency
is not maximum operating frequency of the compressors. In
fact the compressors simulated are capable of operating
correctly much higher frequency than the simulation
frequency.

X0 X0
Fig. 10. Simulation results of proposed 4-2 compressor, (a) and , (b)
C out C out
and , (c) Carry and Carry , (d) Sum and Sum .

VI. CONCLUSION
In this work, a new current mode fully differential 4-2
compressor in 65-nm CMOS is presented and compared to a
conventional structure compressor. Conventional structure in
which the critical path delay reduction is done at gate level
Fig. 8. The output latch circuit scheme.
has higher power consumption and delay. The proposed
compressor shows the highest speed performance, while
maintaining lower PDP (power-delay product). Also, the
proposed circuit only requires 43 transistors and most of
them are minimum size hence this structure occupies smaller
area than other high-speed conventional 4-2 compressors. So
this is an ideal subcircuit for implementing fast digital
arithmetic units.

REFERENCES
[1] K. Prasad and K. K. Parhi, “Low-power 4-2 and 5-2 compressors,” in
Proceedings of 35th Asilomar Conference Onsignals, Systems and
Computers, vol. 1, pp. 129-133, 2001.
[2] P. J. Song and G. D. Micheli, “Circuit and architecture trade-offs for
high-speed multiplication,” IEEE Journal of Solid-State Circuits, vol
26, pp. 1184-1198, 1991.
Fig. 9. 4-2 Compressor simulation environment. [3] C. Chang, J. Gu, and M. Zhang, “Ultra low-voltage lowpower CMOS
4-2 and 5-2 compressors for fast arithmetic circuits,” IEEE Journal of
Transactions on Circuits and Systems Part I, vol. 51, pp. 1985-1997,
Fig. 10(a) shows one of the inputs of the compressor (X0) 2004.
when it changes state. In worst case condition first valid [4] C. Nagendra, M. J. Irwin, and R. M. Owens, “Area-timepower
output after 38ps is Cout which is shown on Fig. 10(b). Then tradeoffs in parallel adders,” IEEE Journal of Transactions on Circuits
and Systems Part II, vol. 43, pp. 689-702, 1996.
50ps after input change, Carry will be valid and finally Sum
[5] S. Hsu, S. Mathew, M. Anders, B. Zeydel, V. Oklobdzija, R.
output of the succeeding compressor changes its state after Krishnamurthy, and S. Borkar, “A 110 GOPS/W 16-bit multiplier and
68ps. Simulation results show a reduced delay less than 68ps reconfigurable PLA loop in 90-nm CMOS,” IEEE Journal of
which is a considerable improvement compared to Solid-State Circuits, vol. 41, pp. 256-264, 2006.
[6] S. F. Hsiao, M. R. Jiang, and J. S. Yeh, “Design of highspeed
conventional architecture. Table IV summarizes the low-power 3-2 counter and 4-2 compressor for fast multipliers,”
comparison of two simulated structures with explained Electronics Letters, vol. 34, no. 4, pp. 341-343, 1998.
environment in above. [7] D. Radhakrishnan and A. P. Preethy, “Low-power CMOS pass logic
4-2 compressor for high-speed multiplication,” in Proceedings of 43rd
IEEE Midwest Symposium on Circuits System, vol. 3, pp. 1296-1298,
TABLE IV: COMPARISON OF SIMULATED 4-2 COMPRESSORS
2000.
Structure Table Column Head
[8] Z. Wang, G. A. Jullien, and W. C. Miller, “A new design technique for
Delay Power PDP Number of column compression multipliers,” IEEE Transactions on Computers,
(ps) (µW) (fJ) Transistors vol. 44, pp. 962-970, 1995.
Convential 180 110 19.8 72 [9] S. Veeramachaneni, K. Krishna, L. Avinash, S. Puppala, and M. B.
Srinivas, “Novel architectures for high-speed and low-power 3-2, 4-2
Porposed 68 48 3.26 43 and 5-2 compressors,” in Proceedings of IEEE 20th International
Conference on VLSI Design, 2007.

596
International Journal of Computer Theory and Engineering, Vol. 5, No. 4, August 2013

P. Aliparast was born in Tabriz, Iran. He received B.Sc. Z. D. Koozehkanany received his Ph.D. degree in
degree from Islamic AZAD University of Tabriz, Tabriz, Electrical Engineering from the University of Brunel
Iran, in 2004 and M.Sc. degree from Urmia University, University of West London, UK in 1996. He has been
Urmia, Iran, in 2007, both in electronics engineering. teaching as an assistant professor in Urmia University
He is currently Ph.D. candidate in electronics from 1996 to 2004 and in Tabriz University since 2004.
engineering in University of Tabriz, Tabriz, Iran. His At the time being he works as an associated professor in
research interests are analog and digital integrated Electronics Department in Tabriz University and his
circuit design for fuzzy and neural network applications, position is Dean of ECE faculty. His current scientific
analog integrated filter design and high-speed high-resolution digital to interests are analog integrated circuit design including
analog converters. He is currently with Islamic AZAD University of Heris, Data Converters, RF IC Design and Optical Filter Design.
Heris, Iran, and Integrated Circuits Research Laboratory in Tabriz University,
Tabriz, Iran. F. Nazari was born in Heris, Iran. He received B.Sc.
degree from Islamic AZAD University of Tabriz,
Tabriz, Iran, in 2000 and M.Sc. degree from Iran
University of Science and Technology, Tehran, Iran, in
2008, both in Electrical engineering. He is currently
with Islamic AZAD University of Heris, Heris, Iran.

597

You might also like