See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/344666802
Design of Higher Order Multiplier with Approximate Compressor
Conference Paper · October 2020
DOI: 10.1109/CONECCT50063.2020.9198611
CITATIONS                                                                                              READS
0                                                                                                      9
3 authors, including:
            Deepa Thangavel
            SRM Institute of Science and Technology
            47 PUBLICATIONS   66 CITATIONS   
               SEE PROFILE
Some of the authors of this publication are also working on these related projects:
             Visible Light D2D Communication via LED View project
 All content following this page was uploaded by Deepa Thangavel on 15 October 2020.
 The user has requested enhancement of the downloaded file.
               Design of Higher Order Multiplier with
                     Approximate Compressor
     M.Maria Dominic Savio, Assistant Professor,                                    T.Deepa, Associate Professor,
      Department of Electronics and Communication                           Department of Electronics and Communication
                       Engieerning,                                                          Engieerning,
    SRM Institute of Science and Technology - 603203,                     SRM Institute of Science and Technology - 603203,
                     Tamilnadu,India.                                                      Tamilnadu, India.
                                                                                        deepat@srmist.edu.in
                mariadom@srmist.edu.in
    Abstract — In recent years imprecise multiplier has been             Several imprecise compressors are proposed in [6] – [9]
widely studied for image processing applications; this imprecise    and used in various multiplier architecture for image
multiplier is done through compressors. For imprecise               processing application with inaccurate execution but the
multiplication when the multiplication width is large then higher   resultant errors are tolerable. The multiplier is done with three
compressor adders are used to reduce the reduction stage. The       steps as follows 1) partial product generation 2) partial product
challenging task in higher compressor approximation via truth       reduction stage 3) Addition of reduced terms by adders. Out of
table, K-map is impossible. In this paper, the 8:2 compressor is    this the partial product reduction is more complicated,
designed and a novel comparison technique is developed for          consumes more power and creates delay. The Wallace and
approximation. The proposed 8:2 compressor is used in 16x16
                                                                    Dadda tree architecture is a very efficient method for partial
multiplier and compared with existing multiplier. The new novel
compressor is efficient in area, power, and delay. Another
                                                                    product reduction. In the reduction stage, the employment of
performance characteristic of error distance (ED) and               the compressor ensures more effective when considering to the
normalized error distance (NED) is compared between related         full-adder. The 4:2 compressors are suitable for all kind of
works. The proposed multiplier used in image multiplication then    multiplier [6]. But it is limited to 8x8 multiplications in the
the PSNR is compared.                                               case of 16x16, 32x32, and 64x64, because again the reduction
                                                                    stage will increase. So in [9] 15:4 compressors are used in
   Keywords—Approximate compressor; Normalized             Error    16x16 multiplications.
Distance (NED) ; Multiplier ; image processing.
                                                                        The approximate compressor plays a key role in low power
                                                                    circuits. An approximation can be achieved through K- map
                                                                    and truth table for lower order compressor. In K – map just
                       I. INTRODUCTION                              eliminating the essential prime implicant so as to reduce the
    To improve the energy efficiency of digital processing          device hardware. In the truth table, by correlating the input
systems (DPS), imprecise computing has been evolved. The            versus output, then the maximum correlated input and output
method of imprecise computing is generally achieved by              got bypassed without any hardware. In both case, the tradeoff
approximating some output function as input, and circuit            is maintained between the device hardware and image quality.
component will reduce. The DPS performs many operations             The quality of the image is measured through the peak signal
like convolution, correlation, filtering of signals. These          to noise ratio (PSNR).The PSNR is 30dB is enough for most of
operations are done through multiplier, subtractor, shifter,        the applications [9]. The acceptable range of PSNR value is
adder, divider, and comparator. If the DSP processor is used        above 20dB in [10], the literature developed multiplier for
for image processing then approximate arithmetic method is          image sharpening application. So depending on the application
used to reduce computational complexity with tolerable error        it may vary all the researchers concentrated to make as high as
without affecting the performance in application. With logic        possible between the range of 25-35dB. The demand for the
level simplification using Karnaugh map (K-map) four                higher-order compressor is raised when the multiplier width is
approximate subtractors are developed in [1] and it has been        increased; the approximation over the higher order compressor
used to develop approximate divider for background                  is difficult. The approximation done in 5:3 compressor, and
subtraction in image processing application with low power          used as the sub-component to develop the 15:4 compressor.
and low area computing. Several approximate comparator in           The error distance is only calculated in 5:3 compressors in [9],
[2]-[4] is designed to prove low cost in terms of power, area,      and the pass rate is calculated for 15:4 compressor.
speed and also used to remove the salt and pepper noise so that        This work addressed the issue of approximation in higher-
the degradation of quality is not affected the performance.         order compressor by a novel comparison method between
Compressor designs have been emerged for the alternate to           inputs and outputs. And the energy-efficient 8:2 compressor is
full-adder in the reduction tree stage of Wallace and Dadda         proposed. The rest of the paper organized as follows. Design of
multiplier. Normally compressor is designed with full-adders        8:2 compressors elaborated in section II. The design
in [5] novel compressor 4:2, 5:3 compressors are designed with      of different approximate compressors using a novel
XOR - MUX architecture to ensure low power and area.
978-1-7281-6828-9/20/$31.00 ©2020 IEEE
comparison technique is described in section III. Section IV
describes the design of 8x8, 16×16 multiplier. The performance
analysis is described in section V. Image multiplications are
done with proposed multiplier is given in section VI. Finally,
the conclusion is presented.
                      II. Design of 8:2 compressor
   The design of several 15:4, 9:4, 8:2 compressors are
proposed in [9], [11], [12], respectively. All these designs are
developed using full-adder and lower order compressors. The
approximation feasibility overall this design is very low. The
Proposed 8:2 compressor is designed with the straight forward
approach with the parallel stream of input to output through
XOR – MUX architecture as demonstrated in Fig.1.                                                                                             (b)
    A0
                                                                                                           Fig.2.  (a) 4:2 compressor using full-adder (b) 4:2 compressor based on
                                                                                                           XOR – MUX [11]
    X         M       C0
                                                M         2:1 Mux      X        XOR
                                                                                                               With the same approach using three 4:2 compressors 8:2
                                                                                                           compressor has constructed in [12] and [13]. The work
   A1
          X
    A2                                                                                                     proposed in this paper done with XOR – MUX so as to reduce
                  X            M       C1
                                                                                                           the power and delay, but area stands the same. The equation for
    A3
                                                                                                           the sum stands the same for all compressors XOR of all inputs,
                           X
    A4
                                                                                                           and every cout is computed by XOR output first two inputs fed
                                   X            M         C2
                                                                                                           to MUX select line and MUX inputs are the first and the third
    A5                                                                                                     one. The sum equation is given in eqn-1 and different carry
                                            X                                                              equations are possible given in eqn-2, and carry equation based
    A6                                                                                                     on XOR taken into account for the production of sum and
                                                      X            M       C3
                                                                                                           cascading up-to end-stage of carry. The sum and single-stage
    A7
                                                                                                           carry equations are given by.
                                                               X
    Ci0                                                                                                    Sum       = a0⊕a1⊕a2⊕a3⊕a4⊕a5⊕a6⊕a7⊕ci0⊕ci1⊕ci2⊕ci3⊕ci4… (1)
                                                                                                   CARRY
                                                                       X            M         C4
    Ci1                                                                                                    Carry     = ((a⊕b) c) + ((a ⊕ b) a)                            …. (2.1)
                                                                                X
    Ci2
                                                                                                                     = (a.b) +(b.c)+ (c.a)                                …. (2.2)
                                                                                        X              M             = ((a⊕b) c)+(b.a)                                    .... (2.3)
    Ci3
                                                                                                   X
                                                                                                                     = ((a+b) c)+(b.a)                                    …. (2.4)
    Ci4
                                                                                        SUM                    While looking the above equations, equation (2.4) is the
Fig.1.    8:2 compressor designed by XOR – MUX
                                                                                                           simplest way to implement, but considering both sum and
                                                                                                           different stages of cout in this design eqn (1) & (2.1) have taken
   The compressors are used in the multiplier for the tree                                                 for the perfect construction 8:2 compressor without any usage
reduction stage usually made up of full-adder. The full-adder is                                           the of lower-order compressor. If the compressor does not
named as 3:2 compressors or counter is usually used for the                                                consist lower order compressor approximation task will be
construction for any higher-order compressor. One full-adder                                               achieved any part of the circuit.
will be constructed with two XOR’s and one MUX is proposed
in [5] so as reduce the area and power without any change in                                                       III. DESIGN OF APPROXIMATION TECHNIQUE FOR 8:2
                                                                                                                                       COMPRESSORS
the truth table. With the same idea the many compressors are
constructed in [5], [7] were the 4:2 compressors are shown in                                                  The approximate computing is the major concern in
Fig.2.                                                                                                     reducing the power, area, and delay. Novel approximation
                                                                                                           technique is presented in this paper. The proposed 8:2
                                                                                                           compressors consist of 13 inputs and having 213 = 8912 input
                                                                                                           combinations. The circuit consists of 7 output cout0 – cout4, sum,
                                                                                                           carry. The previous work in [9] approximation is done in a
                                                                                                           lower-order compressor with tolerable error and used in the
                                                                                                           construction of higher-order compressors, so the erroneous in
                                                                                                           the higher-order compressor is not calculated accurately. This
                                                                                                           work overcomes the above problem by creating the architecture
                                                                                                           comparing all the inputs to all the outputs for the accurate
                                                                                                           calculation of error that can be created by approximating in any
                                                (a)                                                        part of the circuit. The flow chart shown in Fig.3 which
                                                                                                           demonstrates the correlation of every input to every output.
978-1-7281-6828-9/20/$31.00 ©2020 IEEE
                                                                                                    multiplier with Dadda structure and reduction stage is shown in
         Initilize -> Compressor Input Ki; i=1,2,3....13; Output Hx; x=1,2....7; counter enable     Fig.4.
                                           z=i*x; z=1 to 91;
                                                 Assign
                           Compartor input1 from comp input, i=1,2,3....13;
                         Comparator input2 from compressor output, x=1,2...7;
                                  C11,C12,.....C21,C22..... C136,C137;
                        counter enable from comparator output z=i*x; z=1 to 91;
                                         z1=C11.........z91=C137
                    i=0
                                                                    x=8
                                                                N
                    i++                   x++
                                                                                N
                                                                 ci=cx                  Oj=Oj;j++
                   i=14           N       x=0
                                                                    Y
                   Y
         collect all counter
         outputs Oj; j = 1,2…..91
                                                             Oj=Oj+1;j++
                                                                                                    Fig.4.   16x16 multiplier with 4:2 compressor
                                                                                                    B.Design-1 16x16 multiplier with exact 8:2 compressor
                       end
                                                                                        Y = yes         The proposed exact XOR-MUX 8:2 compressor is used to
                                                                                        N = no      build the 16x16 multiplier as shown in Fig.5.
Fig.3.       Flow chart for approximating 8:2 compressor
    The proposed approximation finder circuit consists of 91
counter and comparator. Each comparator is inputted with one
compressor input and one compressor’s output so that the input
of compressor ranges from a0 to a7 and ci0 to ci4 on total 13, the
outputs of the compressor range from cout0 to cout4, sum, carry.
The comparator-1 is inputted with a0 and cout0, comparator-2
with a0 and cout1 and so on, in this way the 91th comparator
inputted with ci4 and carry. These comparator outputs are fed
to counters. The equality level for every input to output
combinations is identified by 91 counter outputs for the entire
8912 input samples.
   With these accurate data, the different tolerable levels of
approximation can performed without affecting the image
quality.
                             IV. DESIGN OF MULTIPLIER
   The compressors are used in several multiplier architectures                                     Fig.5.   16x16 multiplier with 8:2 compressor
for the exact multiplication process. The approximate
                                                                                                        The multiplier with 8:2 compressor has only three reduction
compressor is used in discrete cosine transform (DCT)
                                                                                                    stages is less than the 4:2 compressor has the reduction stage of
operation in [8] for image processing applications. This paper
                                                                                                    four, thereby it is efficient to use higher-order compressor if the
proposed with several designs of approximate multiplier
                                                                                                    multiplier width is increased. In Fig.5 the color code is used to
Design-1 16x16 multiplier with exact 4:2 compressor,
                                                                                                    identify different components: pink – 8:2 compressors, red –
Design-2 16x16 multiplier with exact 8:2 compressor,
                                                                                                    4:2 compressors, thick and light blue – full-adders, orange –
Design-3 16x16 multiplier with approximate 8:2 compressor,
                                                                                                    half-adder.
Design-4 8x8 multiplier with exact 4:2 compressor [6],
Design-5 8x8 multiplier with 8:2 approximate compressor.                                            C.Design-3 16x16 multiplier with approximate 8:2 compressor
A.Design-1 16x16 multiplier with exact 4:2 compressor                                                   With the approximation finder method, many inputs are
                                                                                                    matched with outputs for 75% of input combination is
   The 16x16 multiplier is designed with existing exact 4:2
                                                                                                    discussed in section-V, In our proposed approximate 8:2
compressors in [6]-[8], and the performance metrics are
                                                                                                    compressor cin4 is bypassed to carry, so as to reduced area,
compared with the proposed multiplier. The design of
                                                                                                    power, delay without affecting the image quality.
978-1-7281-6828-9/20/$31.00 ©2020 IEEE
D. Design-4 8x8 multiplier with exact 4:2 compressor [6]
  To compare the efficiency of the proposed compressor the
design in reference [6] is designed with 4:2 exact compressors.
E. Design5 8x8 multiplier with 8:2 approximate compressors.
   The 8x8 multiplier in [6] designed with 4:2 compressor, full-
adder, and half-adder. The proposed multiplier is designed with
an 8:2 compressor where the possibilities occurs only in 7th to
9th column in the first stage the remaining stage stands the
same as shown in Fig.6. In [6] two imprecise compressors are       Fig.7. (a) 8:2 compressor using full-adder (b) 8:2 compressor based on
designed and the performance metrics like PSNR, ED, NED,           XOR-MUX
Area, power, delay. In this paper, the designed 8x8 multiplier                        Table I.Power Comparison 8:2 compressor
using an exact 8:2 compressor and the above performance
metrics is compared with the exact 4:2 compressor of [6].            S.no                          Design                       Power (µw)
                                                                     1.         BASED ON FULL-ADDER                               0.605
                                                                     2.         BASED ON XOR-MUX                                  0.493
                                                                   B. Approximation
                                                                      The Verilog simulation result of the counter shows the
                                                                   correlational value of inputs versus outputs as shown in Fig .8.
                                                                   Fig.8.        Approximation for 8:2 compressor
                                                                   Fig.8. shows that the a0cout0 combination is 6144 times equal
                                                                   out of 8192 cycles 75% to the overall cycle. TABLE II shows
Fig.6.   Replacement of 8:2 compressor in [6]                      the highly correlated combination.
                 V. RESULTS AND DISCUSSION                                              TABLE II. Input/Output correlations
A.POWER COMPARSION OF 8:2 COMPRESSOR
                                                                                            Correlated                           Correlated
     Compressors are generally made up of full-adder. Later         Input/Output                              Input/Output
                                                                                           cycles out of                        cycles out of
full-adder is modified with two XOR’s and one MUX in [8] so         combination                               combination
                                                                                               8192                                 8192
as reduce the area and power without any change in the truth
table. The proposed 8:2 compressor is designed with a cascade               a0cout0            6144                 a7cout3        6144
of full-adder in CMOS 90nm library and the power metrics
compared with 8:2 compressors with XOR-MUX as shown in                      a1cout0            6144                 cin0cout3      6144
Fig.7 and Table I shows the power comparison                                a2cout0            6144                 cin1cout4      6144
                                                                            a3cout1            6144                 cin2cout4      6144
                                                                            a4cout1            6144             cin3carry          6144
                                                                            a5cout2            6144             cin4carry          6144
                                                                            a6cout2            6144                 cin4sum        4096
                                                                   From TABLE II so many approximations can be performed. But
                                                                   approximation over a least significant bit (LSB) reduces the
                                                                   error distance (ED) and normalized error distance (NED), so
978-1-7281-6828-9/20/$31.00 ©2020 IEEE
cin4 is approximated as Carry in our compressors to reduce the
hardware. The Approximate compressor result is compared
with the exact compressor to verify the erroneous. The design
equation for NED is described in [6]. The TABLE III shows the
NED values of the proposed and existing design.
       TABLE III. Accurateness of imprecise 8x8 multiplier
              Design                           NED
          Imprecise [6]                      0.05061
            Proposed                         0.05070
                                                                               X                                     X
From the results, it is clearly shown that the proposed
multiplier produces an acceptable error.
C. Results of 16x16 bit multipliers.
The design of the existing and proposed multiplier is done with
Verilog HDL. The power, area, the delay is calculated in the
Cadence RTL compiler with 90nm technology library.
            TABLE IV. Analysis of 16x16 multipliers
          Design              Area(µm2)    Power(µw)     Time(ns)
     16x16 multiplier
                                                                               =                                    =
     with 15:4 Exact            4939          570            4.2
     compressor [9]
     16x16 multiplier
                                4955          585            4.7
with 4:2 compressor [6-8]
     16x16 multiplier
with 8:2 exact compressor       4930          565            4.3
    with XOR-MUX
                                                                             [EXACT]                            [EXACT]
     16x16 multiplier
  with approximate 8:2          4688          534            4.0
 compressor with XOR-
         MUX
From TABLE IV it is shown that the power and area of the
proposed multiplier are decreased by 8% while the delay is
increased to 9%. The proposed inexact multiplier is better than
the entire existing models in all performance metrics.
D. Image processing Application.
                                                                      [9] PSNR = 40.6dB                 [9]    PSNR =39.2dB
                                                                      Proposed PSNR = 43.2dB Proposed PSNR = 42.2dB
                                                                                   (b)                                   (c)
  Exact                 [9]                 Proposed                Fig.9.   a) Multiplication of two different image b) Squaring of image with
                   PSNR=28.6 dB           PSNR = 29.2 dB                       all range of pixel c) Squaring of standard test image
                           (a)
978-1-7281-6828-9/20/$31.00 ©2020 IEEE
The various image multiplication outputs of the existing and                     [4]     Monajati, M., Fakhraie, S.M. and Kabir, E., 2015. Approximate
proposed multiplier as shown in Fig.9. The output of the                                arithmetic for low-power image median filtering. Circuits, Systems, and
                                                                                        SignalProcessing, 34(10),pp.3191-3219
images (a) is the multiplication of two images and the PSNR
                                                                                 [5]     Chang, C.H., Gu, J. and Zhang, M., 2004. Ultra low-voltage low-power
values are verified. The output of the images (b) is the                                CMOS 4-2 and 5-2 compressors for fast arithmetic circuits. IEEE
multiplication of the same images, the image is chosen in the                           Transactions on Circuits and Systems I: Regular Papers, 51(10),
way that consists of all pixels from low value to high value to                         pp.1985-1997.
ensure the design will fit for all images. Images (c) are the                    [6]     Taheri, M., Arasteh, A., Mohammadyan, S., Panahi, A. and Navi, K.,
standard test image used in most image processing research.                             2020. A novel majority based imprecise 4: 2 compressor with respect to
                                                                                        the current and future VLSI industry. Microprocessors and
                               V. CONCLUSION                                            Microsystems, 73, p.102962.
                                                                                 [7]    Moaiyeri, M.H., Sabetzadeh, F. and Angizi, S., 2018. An efficient
    The new method of 8:2 approximate compressor designs                                majority-based compressor for approximate computing in the nano
was proposed in this paper. These approximate compressors                               era. Microsystem Technologies, 24(3), pp.1589-1601.
are used to design a 16x16 multiplier. The performance metrics                   [8]    Gorantla, A., 2017. Design of approximate compressors for
of the approximate design provide better results with a                                 multiplication. ACM Journal on Emerging Technologies in Computing
tolerable error. The proposed multiplier produces almost the                            Systems (JETC), 13(3), pp.1-17.
same range of PSNR value with previous design. The area and                      [9]    Marimuthu, R., Rezinold, Y.E. and Mallick, P.S., 2016. Design and
power of the new design are effective, but the latency is not                           analysis of multiplier using approximate 15-4 compressor. IEEE
                                                                                        Access, 5,pp.1027-1036.
improved so depends on their applications the researchers can
                                                                                 [10]   Guo, Y., Sun, H., Guo, L. and Kimura, S., 2018, October. Low-cost
choose the multiplier. In the future, this kind of approximate                          approximate multiplier design using probability-driven inexact
arithmetic can be focused on different areas of processor that                          compressors. In 2018 IEEE Asia Pacific Conference on Circuits and
are used in image processing so as to reduce area, power, and                           Systems (APCCAS) (pp. 291-294). IEEE.
delay.                                                                           [11]   Marimuthu, R., Bansal, D., Balamurugan, S. and Mallick, P.S., 2013.
                                                                                        Design      of    8-4   and 9-4      Compressors      Forhigh    Speed
                                 REFERENCES                                             Multiplication. American Journal of Applied Sciences, 10(8),p.893.
[1]     Gorantla, A. and Deepa, P., 2019. Design of Approximate Subtractors      [12]   Silveira, B., Paim, G., Abreu, B., Grellert, M., Diniz, C.M., da Costa,
        and Dividers for Error Tolerant Image Processing Applications. Journal          E.A.C. and Bampi, S., 2017. Power-efficient sum of absolute differences
        of Electronic Testing, pp.1-7.                                                  hardware architecture using adder compressors for integer motion
                                                                                        estimation design. IEEE Transactions on Circuits and Systems I:
[2]     Kim, Y., Zhang, Y. and Li, P., 2014. Energy efficient approximate
                                                                                        Regular Papers, 64(12), pp.3126-3137.
        arithmetic for error resilient neuromorphic computing. IEEE
        Transactions on Very Large Scale Integration (VLSI) Systems, 23(11),     [13]   Schiavon, T., Paim, G., Fonseca, M., Costa, E. and Almeida, S., 2016.
        pp.2733-2737                                                                    Exploiting adder compressors for power-efficient 2-D approximate DCT
                                                                                        realization. In 2016 IEEE 7th Latin American Symposium on Circuits &
[3]     Zhou, Y., Lin, J., Wang, J. and Wang, Z., 2018, October. Approximate
                                                                                        Systems (LASCAS) (pp. 383-386). IEEE.
        Comparator: Design and Analysis. In 2018 IEEE International
        Workshop on Signal Processing Systems (SiPS) (pp. 1-5). IEEE.
978-1-7281-6828-9/20/$31.00 ©2020 IEEE
      View publication stats