Chapter 9
Arithmetic
                   Addition
• Full adder (FA) logic circuit:
      adds two bits of the same weight, along
      with a carry-in bit, and produces a sum bit
      and a carry-out bit
• Ripple-carry adder:
      a chain of n FA stages, linked by carry bits,
      can add two n-bit numbers
     Addition/subtraction circuit
• An n-bit adder with external XOR gates can
     add or subtract two operands
• An FA stage produces its outputs
      after 2 logic gate delays
• Longest delay path through the
     adder/subtractor circuit: 2n gate delays,
     assuming a ripple-carry design
       Carry-lookahead addition
• Delay reduction: produce carry signals
       in parallel using carry-lookahead circuits
• First, form generate and propagate functions
      in each stage i
            ci+1 = xi yi + xi ci + yi ci
            ci+1 = xi yi + (xi + yi)ci
            Gi = xiyi             Pi = xi + yi
            ci+1 = Gi + Pici
Pi can be treated as XOR of xi and yi (Why??)
        Carry-lookahead circuits
• A 4-bit adder has four carry-out signals:
  c1 = G0 + P0c0
  c2 = G1 + P1G0 + P1P0c0
  c3 = G2 + P2G1 + P2P1G0 + P2P1P0c0
  c4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0c0
           Delay in 4-bit adder
• Ripple-carry design:
           8 gate delays: (2 for each FA) × 4
• Carry-lookahead design:
           1 for all Pi and Gi
           2 for all ci
         1 for all si
         
           4 gate delays
     Carry-lookahead for larger n
• Ideally, 2 gate delays for all ci regardless of n
• But max. number of inputs for AND/OR gates
     increases linearly with n
• Fan-in constraints for actual logic gates makes
     2-level logic for ci less practical for n  4
• Therefore, higher-level gen./prop. functions
     are used to produce carry bits in parallel
     for 4-bit and 16-bit adder blocks
P0I = P3P2P1P0    G0I = G3 + P3G2 + P3P2G1 + P3P2P1G0
c16 = G3I + P3IG2I + P3IP2IG1I + P3IP2IP1IG0I + P3IP2IP1IP0Ic0
                Multiplication
• Two, n-bit, unsigned numbers produce a
     2n-bit product when they are multiplied
• Multiplication can be done in a 2-dimensional
     combinational array composed of
     n2 basic cells, each containing an FA block,
     arranged in a trapezoidal shape
• Longest delay path is approx. 6n gate delays,
     along the right edge and
     across the bottom of the array
              Multiplication
• Sequential circuit multiplier
     is composed of three n-bit registers, an
     n-bit adder, and a control sequencer
• A sequence of n addition cycles generates
     a 2n-bit product
     Multiplying signed numbers
• The next six figures, Figures 6.8 through 6.13
     from the textbook, show how to perform
     multiplication of signed numbers
      in 2’s-complement representation
• Dealing with a negative multiplicand
     with basic sign extension is described first
• Then, the Booth algorithm is introduced
     as a way to deal with negative multipliers
• Benefit of Booth: potentially fewer additions
Booth Multiplication
(With appropriate
shifting)
         High-speed multipliers
• Neither the combinational array
     nor the sequential circuit multiplier
     are fast enough for high performance
     processors
• Two approaches are used for higher speed:
     1. Reduce the number of summands
     2. Use more parallelism in adding them
          Reducing summands
• Normally, to multiply a number M by
      1510 (=11112), four shifted versions of
      M are added
• Alternatively, the same result is obtained by
      computing 16M – M, where 16M is
      formed by shifting M to the left 4 times
• This basic idea, derived from the Booth
      algorithm, can be applied to reduce the
      number of summands
         Reducing summands
• Each pair of multiplier bits selects one
     summand from 5 possible versions of the
     multiplicand M: 0, M, -M, 2M, -2M
• Example: 6-bit, 2’s-complement operands
       Multiplier Q = 1 1 1 0 1 0
                      ----- ----- -----
 M version selected =     0 -1 -2
     Multiplier bit-pair recoding
• The full table of multiplicand selection
     decisions based on bit-pairing of the
     multiplier is shown in the next figure
• Since only one version of the multiplicand
     is added into the partial product for each
     pair of multiplier bits, only n/2 summands
     are added to do an n x n multiplication
  Parallelism in adding summands
• Three n-bit summands can be reduced to two
     by using n FA blocks, operating
     independently and in parallel
• This technique can be applied in the array
     multiplier, as shown in the next two
     figures
• The technique is called carry-save addition
          Carry-save addition
• 3-2 Reducer
• Group summands in threes and reduce each
     group to two in parallel
• Repeat until only two summands remain
• Add them in a conventional adder to
     generate the final sum
Division
A restoring division example.
A non-restoring division example.
     Floating-point (FP) numbers
• IEEE standard 754-2008 defines
     representation and operations for
     floating-point numbers
• The 32-bit single-precision format is:
     A sign bit: S (0 for +, 1 for -)
     An 8-bit signed exponent: E (base = 2)
     A 23-bit mantissa fraction magnitude: M
                FP numbers
• The value represented is
           +/- 1.M x 2E
• E is actually encoded as E’ = E + 127
     which is called an excess-127
     representation
                      FP numbers
• Example of 32-bit number representation:
       0       10000101               0110 …
       S          E’                    M
• Value represented (with E = E’ – 127 = 133 – 127 = 6):
       + 1.0110 x 26
• This is a called a normalized representation,
       with binary point to the right of first significant bit
• 64-bit double-precision is similar with more bits for E’ & M
Try examples at: http://evanw.github.io/float-toy/
       FP Addition/Subtraction
• Add/Subtract procedure:
 1. Shift mantissa of number with smaller
     exponent to the right
 2. Set exponent of result to larger exponent
 3. Perform addition/subtraction of mantissas
     and set sign of result
 4. Normalize the result, if necessary
           FP Addition example
• Perform C = A + B for
      A = 0 10000101 0110…
      B = 0 10000011 1010…
  1. Shift mantissa of B two places to right
  2. Set exponent of C to 10000101
  3. Add mantissas
             1.011000…
          + 0.011010…
          ----------------
             1.110010…
  4. C = 0 10000101 110010…
            FP Multiplication
• Multiply procedure:
 1. Add exponents and subtract 127
     (to maintain excess-127 representation)
 2. Multiply mantissas, determine sign of
     result
 3. Normalize result, if necessary
                FP Division
• Divide procedure:
 1. Subtract exponents and add 127
     (to maintain excess-127 representation)
 2. Divide mantissas, determine sign of result
 3. Normalize result, if necessary
 Implementation of FP operations
• A considerable amount of logic circuitry is
       needed to implement floating-point
       operations in hardware, especially if
       high performance is needed
• It is also possible to implement
       floating-point operations in software
• A hardware addition/subtraction unit
       is shown in the next figure
           Sections to Read
       (From Hamacher’s Book)
• Chapter on Arithmetic
  – All sections and sub-sections EXCEPT
     • Summand Addition Tree using 4-2 Reducers
     • Guard Bits and Truncation
     • Decimal to Binary Conversion