Finite Wordlength Effects in DSP
Finite Wordlength Effects in DSP
org
                                1         z
                    H(z)=          −1 =
                            1 − αz      z −α
                 Here observe that ‘ α ’ is the filter coefficient when this filter is implemented on some DSP processor or
         software, ‘ α ’ can have only discrete values. Let the discrete values of α be represented by α .
         Hence the actual transfer function which is implemented is given as,
                                 z
                     H (z) =
                               z −α
                The transfer function given by above equation is slightly different from H(z). Hence the actual frequency
         response will be different from desired response.
                 The input x(n) is obtained by sampling the analog input signal. Since the quantizer takes only fixed(discrete)
         values of x(n) ,error is introduced. The actual input can be denoted by x(n) .
                 x(n) =x(n)+e(n)
                  Here e(n) is the error introduced during A/D conversion process due to finite wordlength of the quantizer.
         Similarly error is introduced in the multiplication of α and y(n-1) in equation(1). This is because the product
         α y(n-1) has to be quantized to one of the available discrete values. This introduces error. These errors generate
         finite wordlength effects.
         Finite Wordlength Effects in IIR Digital Filters
         When an IIR filter is implemented in a small system, such as an 8-bit microcomputer, errors arise in representing
         the filter coefficients and in performing the arithmetic operations indicated by the difference equation. These
         errors degrade the performance of the filter and in extreme cases lead to instability.
                  Before implementing an IIR filter, it is important to ascertain the extent to which its performance will be
         degraded by finite wordlength effects and to find a remedy if the degradation is not acceptable. The effects of
         these errors can be reduced to acceptable levels by using more bits but this may be at the expense of increased
         cost.
         The main errors in digital IIR filters are:
         i. ADC Quantization Noise:
         This noise is caused by representing the samples of the input data ,by only a small number of bits.
www.specworld.in                                                 1                                          www.smartzworld.com
                                                                                                          www.jntuworldupdates.org
www.specworld.in                                                   2                                           www.smartzworld.com
                                                                                                                    www.jntuworldupdates.org
                                                            -1          0           1          2       v
                                                                        -1
                                      7                                                        5
                 x1 = 0.111 i.e.,                                x2 = 0.101 i.e.,
                                      8                                                        8
                                                   2
         Then        x1 + x2 = 1.010 i.e., −
                                                   8
                Here overflow has occured in addition due to finite precision and the digit before decimal point makes the
         number negative.
         Signal Scaling:
         Need for Scaling: Limit cycle oscillations can be avoided by using the nonlinear transfer characteristic. But it
         introduces distortion in the output. Hence it is better to perform signal scaling such that overflow or underflow
         does not occur and hence limit cycle oscillations can be avoided.
         Implementation of Signal Scaling
         Figure shows the direct form-II structure of IIR filter. Let the input x (n ) be scaled by a factor s 0 before the
         summing node to prevent overflow. With the scaling, the transfer function will be,
                       b0 + b1z −1 + b 2 z −2      B(z )
         H(z ) = s 0                          = s0
                              −1
                       1 + a1z + a 2 z   −2
                                                   A(z )
                                                                             w (n )
                                          x (n )            +                                      +       y (n )
                                                       s0
                                                                             z -1
                                                                 -a 1                   -b 1
                                                            +                                      +
                                                                              z -1
                                                                 -a 2                     -b 2
                     W (z )
         H ’(z ) =
                     X(z )
         From figure we can write above transfer function as,
www.specworld.in                                                        3                                              www.smartzworld.com
                                                                                                            www.jntuworldupdates.org
                        H ’(z ) =
                                            s0
                                    1 + a 1z + a 2 z − 2
                                            −1
                H ’(z ) =
                                s0
                               A(z )
                               W (z )
     Since, H (z ) =
                    ’
                               X(z )
                                    ’
                W (z )   s
                       = 0
                X(z ) A(z )
                               s 0 X(z )
                W (z ) =
                                A(z )
     (or)
                S(z ) =
                               1
                              A(z )
     Let                           ’ , then above equation becomes,
                W (z ) = s 0S(z )X(z )
     Evaluating z-transform on unit circle, we put z = e jω in above equation,
                              ( )          ( )( )
                        W e jω = s 0S e jω X e jω
     Taking inverse Fourier transform of above equation,
      ω(n ) =
                 1
                2π ∫          ( )
                     W e jω e jωn dω              =
                                                       1
                                                      2π ∫     ( )( )
                                                           s 0S e jω X e jω e jωn dω
     Hence ω (n ) =
                     s 02
                                        ( )( )
                                                               2
                                    ∫ S e X e e dω
                2                        jω  jω j ωn
                    4π 2
     Schwartz inequality states that,
                        ∫ x1 (t )x 2 (t )dt ≤∫ x1 (t ) dt ∫ x 2 (t ) dt
                                            2              2               2
                          2      s 02 
                                4π 
                                            2
                                                 ( )
                        ω (n ) ≤ 2 ∫ S e jω dω.∫ X e jω dω ,
                                                        2
                                                          
                                                                   ( )
              jωn
     since, e     =1
                      1
                                     ( )        
     ∴ ω2 (n ) ≤ s 02  ∫ S e jω dω. ∫ X e jω dω
                                 2   1        2
                                                            ( )
                       2π          2π          
2π n =0
                                       1                      
                                                 ( )
                                                      ∞
                        ω2 (n ) ≤ s 02  ∫ S e jω dω.∑ x 2 (n )
                                                  2
 2π n =0 
www.specworld.in                                                                   4                            www.smartzworld.com
                                                                                                                  www.jntuworldupdates.org
                                    dz     dz
                        dω =          jω =    , since e jω = z
                                   je      jz
         Putting these values in equation
                                       1                      
                                                  2 dz ∞
                                                           ( )
                        ω2 (n ) ≤ s 02  ∫ S e jω . .∑ x 2 (n )
                                        2π         jz n =0    
                                  ∞
                        ≤ s 02 ∑ x 2 (n ).                S(z ) .
                                                     1         2 dz
                                 n =0               2πj ∫         z
                                        ( )
         Here S(z ) = S(z ).S z −1 . Then we have,
                    2
                                                                         ( )
                                               ∞
                        ω2 (n ) ≤ s 02 ∑ x 2 (n ).                S(z ).S z −1 z −1dz
                                                             1
                                             n =0           2πj ∫
                                                                                                                       ( )
                                                                                                ∞
         Here the integration is executed over a closed contour i.e. ω (n ) ≤ s 0 ∑ x (n ).                     S(z ).S z −1 z −1dz
                                                                                                          1
                                                                                                             ∫
                                                                      2         2    2
n =0 2πj C
         (or)
                                         ∞
                                                       [             ( )
                        ω2 (n ) ≤ ∑ x 2 (n ) s 02 ∫ S(z ).S z −1 z −1dz
                                        n =0
                                                            C
                                                                                ]
                     Here ω2 (n ) represents instantaneous energy of signal after first summing node. And x 2 (n ) represents
         instantaneous energy of input signal. Overflow will not occur if
                                         ∞
                        ω2 (n ) ≤ ∑ x 2 (n )
                                        n =0
                        s 02
                                1
                               2πj ∫C
                                                    ( )
                                      S(z ).S z −1 z −1dz = 1
                                1       z −1dz
                               2πj ∫C A(z ).A z −1
                                                   =1
                        s 02
                                                      ( )
                                          1
                        s 02 =
                                   1       z −1dz
                                  2πj ∫C A(z ).A z −1      ( )
         (or)
www.specworld.in                                                                 5                                     www.smartzworld.com
                                                                                                          www.jntuworldupdates.org
                       α
      u (n )                        Q            v (n )
                           v (n )
                                          Figure: Quantization of multiplication or product
                   The above process can be represented by a statistical model for error analysis. The output υ (n) and error
      eα (n) in product quantization process.i.e.,
e α(n )
     i)                                 { }
               The error sequence eα (n ) is the sample sequence of a stationary white noise process.
     iii) The sequence {eα (n)} is uncorrelated with the sequence υ (n) and input sequence x(n).
     8.3.1Computational output round off noise
     Product Round-off Errors and its Reduction:
     The results of product or multiplication operations are quantized to fit into the finite wordlength,when the digital
     filters are implemented using fixed point arithmetic. Hence errors generated in such operation are called product
     round off errors.
             The effect of product Round-off errors can be analyzed using the statistical model of the quantization
     process. The noise due to product round-off errors reduces the signal to noise ratio at the output of the filter.
     Some times this ratio may be reduces below acceptable levels. Hence it is necessary to reduce the effects of
     product round-off errors.
                   There are two solutions available to reduce product round-off errors.
          a) Error feedback structures and
          b    )State space structure.
             The error feedback structures use the difference between the unquantized and quantized signal to reduce
     the round-off noise. The difference between unquantized and quantized signal is fed back to the digital filter
     structure in such a way that output noise power due to round-off errors is reduced.
www.specworld.in                                                       6                                      www.smartzworld.com
                                                                                                           www.jntuworldupdates.org
                                                                        z -1
                                                                                   e(n )
                                                            β                  - + +
                                                        K
                                               x (n )           +                  Q              y (n )
                                                                    v (n )
                                                                                           z -1
                                                                               α
                             Figure: First order error feedback structure to reduce product round-off error
         This error signal is fed back in the structure such that round-off noise is reduced. Such structure for first order
         digital filter is shown in figure.
         The incorporation of quantization error feedback as shown in figure helps in reducing the noise power at the
         output . This statement can be proved mathematically.
         Round-off Errors in FFT Algorithms:
         FFT is used in large number of applications. Hence it is necessary to analyze the effects due to finite wordlengths
         in FFT algorithms. The most critical error in FFT computation occurs due to arithmetic round-off errors.
                 The DFT is used in large number of applications such as filtering, correlation, spectrum analysis etc. In
         such applications DFT is computed with the help of FFT algorithms. Therefore it is important to study the
         quantization errors in FFT algorithms. These quantization effects mainly take place because of round-off errors.
         These errors are introduced when multiplications are performed in fixed point arithmetic.
                 FFT algorithms require less number of multiplications compared to direct computation of DFT. But it
         does not mean that quantization errors are also reduced in FFT algorithms.
                              1
                    σ 2x =                                      ......(1)
                             3N
                    For direct computation of DFT, the variance of quantization errors in multiplications is given as,
                             N 2
                    σq2 =      .∆                               ......(2)
                             3
                    Here σq is variance of quantization errors and ∆ is step size which is given as,
                          2
∆ = 2−b .....(3)
www.specworld.in                                                    7                                         www.smartzworld.com
                                                                                                      www.jntuworldupdates.org
                           N −2 b
              σq2 =          .2                                       .....(4)
                           3
             The signal to noise power ratio at the output (i.e., DFT coefficients) can be considered as the measure of
                                                                                      ( )
     quantization errors. This ratio is the ratio of variance of DFT coefficients σ 2x to the variance of quantization
            ( )
     errors σ q i.e.,
              2
                                                                   σ 2x 
                                                            DFT =  
             Signal to noise ratio in direct computation of        σ2 
                                                                   q  Direct DFT
     Where x(n-k) and y(n-k) are the input and output data samples,and bk and a k are the filter coefficients. In
     practice these variables are often represented as fixed point numbers. Typically , each of the products bk x(n-k)
www.specworld.in                                                                  8                        www.smartzworld.com
                                                                                                                                                                                www.jntuworldupdates.org
         and a k y(n-k) would require more bits to represent than any of the operands. For example, the product of a B-bit
         data and a B-bit coefficient is 2B bits long.
                  Truncation or rounding is used to quantize the products back to the permissible wordlength. Quantizing
         the products leads to errors,popularly known as round-off errors,in the output data and hence a reduction in the
         SNR. These errors can also lead to small-scale oscillations in the output of the digital filter,even when there is no
         input to the filter.
                                                                               x (n )           K    2 B b its                         B b its
                                                                       (a )                                              Q
                                                                                                                                                           y (n )
                                                                               x (n )           K     2 B b its                      B b its
                                                                       (b )                                              Σ
                                                                                                                                                          y (n )
                                                                                                                             c(n )
                Figure: Representation of the product quantization error: (a) a block diagram representation of the
                               quantization process; (b) a linear model of the quantization process
                  The figure(a) represents a block diagram of the product quantization process,and figure (b) represents a
         linear model of the effect of product quantization. The model consists of an ideal multiplier,witk infinite precision,
         in series with an adder fed by a noise sample, e(n), representing the error in the quantized product ,where we have
         assumed,for simplicity,that x(n),y(n), and K are each represented by B bits. Thus
                    y(n) = Kx(n) + e(n)
                    The noise power, due to each product quantization, is given by
                              q2
                σr =
                    2
                              12
                 Where r symbolizes the round-off error and q is the quantization step defined by the wordlength to which
         product is quantized. The round-off noise is assumed to be a random variable with zero mean and constant
         variance. Although this assumption may not always be valid, it is useful in assesing the performance of the filter.
         Product of Round-off Errors on Filter Performance:
         The effects of round-off errors on filter performance depend on the type of filter structure used and the point at
         which the results are quantized.
                  The above figure represents the quantization noise model for the direct form building block. It is assumed
         in the figure that the input data,x(n),output data,y(n),and the filter coefficients are represented as B-bit numbers
         (including the sign bit). The products are quantized back to B bits after multiplication by rounding (or truncation).
                                                                                                                                                 e (n )
                                       2B            B             B           2B                                 z -1                                                      z -1
                                            Σ                              Σ
                                  b1                                                -a 1
                                  s1
                           z -1                 e2                                      z -1                                   b1                                   -a 1
                                                                                                                               s1
                                       2B            B             B            2B                                                                                           z -1
                                            Σ                              Σ                                      z -1
                              b2                                                 -a 2
                              s1
                                                e3                                                                            b2                                    -a 2
                                                                                                                              s1
         Figure: Product quantization noise model for the direct form filter section. All the noise sources in (a) have
www.specworld.in                                                                                             9                                                                       www.smartzworld.com
                                                                                                                                                                                             www.jntuworldupdates.org
                       Since all five noise sources, e1 to e5 in figure(a),feed to the same point (that is into the middle adder), the
     total output noise power is the sum of the individual noise powers(figure(b)).
e1 e2
       x (n )                                                             w (n )                        B              B b its        x (n )                              w (n )
                       Σ                          B b its                                                                                      Σ
                                       Σ                                                       Σ                   Σ                                                                               Σ
          1 /s                                                                       s 1b 0                            y (n )         1 /s 1                                      s1b 0                 y (n )
                                                                              z -1
                           e 1 (n )                   w (n -1 )                                 e 6 (n )
                                                                                                                                                                          z -1
                                                  B                 2B                   2B                 B
                                                            Σ                                       Σ                                                                             w (n -1 )
                                                                       -a 1          s1b 1
                                                                                                                                                                                     s1b 1
                                                                              z -1                                                                                 -a 1
                                                 e 3 (n )                                               e 5 (n )
                                                                                        w (n -2 )
                                             B                  2B                          2B               B                                                             z -1
                                                       Σ                                            Σ                                                                                     -b 2
                                                                       -a 2          s 1b 2
                                                                                                                                                                                   w (n -2 )
                                                            e 2 (n )                                    e 4 (n )
                                                                                                                                                            -a 2                          s1b 2
     Figure: Product quantization noise model for the canonic filter section. The noise sources feeding the same
     point in (a) have been combined in (b)
                              5q 2          1                   −1 dz
                                                                        2 5q 2  ∞ 2  2                                                         5q 2
                σ or        =                    ∫                               ∑ f (k ) s1
                       2
                                                                       s1
                                                                                                                                                               2 2
                                                   F ( z ) F ( z   )
                                                                     z  = 12  k =0                                                           =      F ( z ) 2 s1
                              12           2∏ j c                                                                                                12
                                              1
     Where F(z) =
                                      1 + a1 z + a2 z − 2
                                                      −1
f(k) = Z −1 [F ( z )] is the inverse z-transform of F(z),which is also the impulse response from each noise source to
                                             2         q2
     the filter output, . 2 is the L2 norm squared and    is the intrinsic product round-off noise power. The total
                                                       12
     noise power at the filter output is the sum of the product round-off noise and the ADC quantization noise.
                   σ 0 = σ 0 A + σ or
                             2               2                  2
                                   q2  ∞ 2                                    q2
                                                                                                                       [                                ]
                                                                ∞
                                 =      ∑
                                   12  k = 0
                                               h ( k ) + 5 s1 ∑
                                                             2
                                                               k =0
                                                                    f 2
                                                                        ( k )  =
                                                                               12
                                                                                   H ( z ) 2 + 5s12 F ( z )
                                                                                           2                                                        2
                                                                                                                                                    2
                       For canonic section, figure(a) , the noise model again includes a scale factor as this generates a round-off
     error of its own. The noise sources e1 (n) to e3 (n) all feed to the left adder, whilst the noise sources e4 (n) to
      e6 (n) feed directly into the filter output. Combination of the noise sources feeding to the same point leads to the
     noise model of figure(b). Assuming uncorrelated noise sources, the total noise contribution is simply the sum of
     the individual noise contributions:
                                                                                                    [                      ]
                                                 ∞
                                      3q 2                                    3q 2 3q 2
                 σ or =                      ∑         f 2 (k ) +
                           2                                                                   2
                                                                                  =     F ( z) 2 + 1
                                      12     k =0                             12 12
www.specworld.in                                                                                                                 10                                                               www.smartzworld.com
                                                                                                         www.jntuworldupdates.org
                  Where f(k) is the impulse response from the noise source e 1 to the filter output, and F(z) the corresponding
         transfer function given by
                                b0 + b1 z −1 + b2 z −2
                 F ( z ) = s1
                                1 + a1 z −1 + a2 z − 2 s1
                                                       = H(z)
                        q2                  ∞ 2  q2
                                                                              {[           ]+ H ( z) }
                                      ∞
                           31 + s1 ∑ h (k ) + ∑ h (k ) =
                                   2     2
                                                              3 1 + s12 H ( z )
                                                                                       2           2
                      =
                        12          =
                                     k 0      k 0
                                                 =        12                          2           2
f(v )
                                                                                           v
                                                                -1        0        1
                                                                          -1
Let y r (n ) be the output of the system after the product term 0.95 y(n − 1) is quantized after rounding. i.e.,
y r (n ) = Q r [0.95y(n − 1)]+ x (n )
                                 0.75 for n = 0
         Let            x (n ) = 
                                  0   for n ≠ 0
         Let b = 4 bits are used to represent the quantized product excluding sign bit.
         With n=0
www.specworld.in                                                         11                                 www.smartzworld.com
                                                                                                      www.jntuworldupdates.org
y r (n ) = Q r [0.95y r (n − 1)]+ x (n )
          [0.75]10 = [0.11]2
     ∴ 4-bits rounded value of [0.11]2 will be [0.1100]2 i.e., 0.75 only.
                    ∴ y r (1) = 0.6875
               This means the actual value of y r (1) = 0.7125 is changes to 0.6875 due to 4-bits quantization.
     With n = 2
      [0.653125]10 = [0.1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 ⋅ ⋅ ⋅]
     ∴ Q r [0.653125]10 = [0.1 0 1 0]2 upto 4 bits              = [0.625]10
     ∴ y r (2 ) = 0.625
     With n = 3
∴ y r (3) = 0.625
www.specworld.in                                                       12                                  www.smartzworld.com
                                                                                                       www.jntuworldupdates.org
                      1     1
         Here δ =       b
                          = 4 = 0.0625
                      2    2
                       y(n − 1) ≤
                                    0.0625/2
                                    1 − 0.95
                                             ≤ 0.625
         Dead band = [-0.625, +0.625]
         Signal Scaling to Prevent Limit Cycle Oscillations: This is zero input condition. Following table lists the
         values of y(n ) before and after quantization. Here the values are rounded to nearest integer value.
                                        n    y(n ) before quantization   y (n ) after quantization
                                        -1   12                          12
                                        0    10.8                        11
                                        1    9.72                        10
                                        2    8.748                       9
                                        3    7.8732                      8
                                        4    7.08588                     7
                                        5    6.377292                    6
                                        6    5.7395628                   6
                                        7    5.1656065                   5
                                        8    4.6490459                   5
                                         Table: Values of y(n ) before and after quantization
From table observe that if y(− 1) ≤ 5 , y(n ) = y(− 1) for n ≥ 0 for zero input. Hence the dead band will be [− 5,5].
                    Since the values are rounded to nearest integer after quantization, the step size will be δ = 1 . Hence
         dead band can also be calculated as follows:
                      δ/2
         y(n − 1) =        ,Here α = 0.9 , y(n − 1) =
                                                       1/ 2
                                                              =5
                      1− α                            1 − 0.9
         Thus the dead band is [− 5,5] .
         Dynamic Range Scaling to Prevent the Effects of Overflow: The overflow can take place at some internal
         nodes when the digital filters are implemented by using fixed point arithmetic. Such nodes can be inputs/outputs
         of address or multipliers. This overflow can take place even if the inputs are scaled. Because of such overflow at
         intermediate points,produces totally undesired output or oscillations. The overflow can be avoided by scaling the
         internal signal levels with the help of scaling multipliers. These scaling multipliers are inserted at the apprppriate
         points in the filter structure to avoid possibilities of overflow. Sometimes these scaling multipliers are absorbed
         with the existing multipliers in the structure to reduce the total number and complexity.
                 At which node the overflow will take place is not known in advance. This is because the overflow
         depends upon type of input signal samples. Hence whenever overflow takes place at some node, the scaling
         should be done dynamically. Hence dynamic range scaling in the filter structure can avoid the effects of overflow.
         Let ur (n) be the signal sample at r th node in the structure. Then the scaling should ensure that,
www.specworld.in                                                 13                                        www.smartzworld.com
                                                                                                       www.jntuworldupdates.org
Let Qt (x) be the value after truncation ,then truncation error will be,
           ε r = Qt ( x) − ( x)
     Here x is the original value of the number.
     Rounding Error : This error is introduced whenever the number is rounded off to the nearest digital level.The
     number of bits used to represent the rounded number are generally less than the number of bits required for actual
     number.
              Let Qr (x) be the value after rounding.Then rounding error will be,
            ε r = Qr ( x) − x
      Here x is the original value of a number.
     Tradeoff between roundoff and overflow noise:
     Scaling operation
     Scaling is a process of readjusting certain internal gain parameters in order to constrain internal signals to a range
     appropriate to the hardware with the constraint that the transfer function from input to output should not be
     changes.
              The filter in figure with unscaled node x has the transfer function
                  H(z) = D(z)+F(z)G(z)                     ......(1)
             To scale the node x, we divide F(z) by some number β and multiply G(z) by the same number as in
     figure. Although the transfer function does not change by this operation, the signal level at node x has been
     changes. The scaling parameter β can be chosen to meet any specific scaling rule such as
                                    ∞
                   l1 scaling :β = ∑ f (i )              ......(2)
                                   i =0
                                           ∞
                   l2 scaling :β = δ      ∑ f (i )
                                          i =0
                                                 2
                                                           …..(3)
              Where f(i) is the unit-sample response from input to the node x and the parameter δ can be interpreted
     to represent the number of standard deviations
                                                               D (z)
                                          IN                                        OUT
                                                       F (z)     x          G (z)
                                                                (a)
www.specworld.in                                                       14                                   www.smartzworld.com
                                                                                                                                        www.jntuworldupdates.org
D (z)
                                                           IN                                                              OUT
                                                                                F (z)/β     x1     βG (z)
                                                                                           (b )
                                      Figure: A filter with unscaled node x and (b) A filter with scaled node x’
                  Representable in the register at node x if the input is unit-variance white noise. If the input is bound by
         u (n ) ≤ 1 , then,
                                      ∞                          ∞
                   x (n ) =       ∑ f (i)u (n − i) ≤ ∑ f (i )                       ......(4)
                                  i =0                          i =0
         Equation represents the true bound on the range of x and overflow is completely avoided by l1 scaling in (2),
         which is the most stringent scaling policy.
                  In many cases, input can be assumed to be white noise. Although we cannot compute the variance at node
         x. for unit-variance white noise input,
                    [             ]
                                           ∞
                   E x 2 (n ) = ∑ f 2 (i )                                          .....(5)
                                          i =0
         Since most input signals can be assumed to be white noise, l 2 scaling is commonly used. In addition, (5) can be
         easily computed. Since (5) is the variance (not a strict bound), there is a possibility of overflow, which can be
         reduced by increasing δ in (3). For large values of δ , the internal variables are scaled conservatively so that no
         overflow occurs. However, there is a trade –off between overflow and roundoff noise, since increasing δ deteriorates
         the output SNR (signal to noise ratio).
                                                                                                                                    a
                                                 8 b its
                            1 5 b its
                                                                                                  u (n )           +                           x (n )
                                                                                                                                    D
         u (n )         +                                              x (n )
                                            D
                              8 b its
                                                                                                           (ro u n d o ff erro r)
                                                  Figure: Model of roundoff error
         Roundoff Noise: If two W-bit fixed point fraction numbers are multiplies together, the product is (2W-1) bit
         long. This product must eventually be quantized to W-bits by rounding or truncation. For example, consider the
         1st –order IIR filter shown in figure. Assume that the input wordlength is W=8bits. If the multiplier coefficient
         wordlength is also the same, then to maintain full precision in the output we need to increase the output wordlength
         by 8 bits per iterations. This is clearly infeasible. The alternative is to roundoff (or truncate) the output to its
         nearest 8-bit representation.
www.specworld.in                                                                          15                                               www.smartzworld.com
                                                                                                                       www.jntuworldupdates.org
P e (X )
                                                                                                               1
                                                                                                               ∆
                                                                                                                   x
                                                                                                           ∆
                                                               −∆                                          2
                                                                2
                                                          Figure: Error probability distribution
             The result of such quantization introduces roundoff noise e(n). For mathematical ease a system with
     roundoff can be modeled as an infinite precision system with an external error input. For example in the previous
     case (shown in figure) we round off the output of the multiply add operation and an equivalent model is shown in
     figure.
             Although rounding is not a linear operation, its effect at the output can be analyzed using linear system
     theory with the following assumptions about e(n):
     1.        E(n) is uniformly distributed white noise.
     2.        E(n) is a wide –sense stationary random process, i.e., mean and covariance of e(n) are independent of the
               time index n.
     3.        E(n) is uncorrelated to all other signals such as input and other noise signals.
                  Let the wordlength of the output be W-bits, then the roundoff error e(n) can be given by
                  − 2 − (w −1)           2− (w −1)
                               ≤ e(n ) ≤                                         .....(6)
                      2                     2
             Since the error is assumed to be uniformly distributed over the interval given in (6), the corresponding
     probability distribution is shown in figure, where ∆ is the length of the interval (i.e., 2 − (w −1) ).
                  Let us compute the mean E[e(n)] and variance E e 2 (n ) of this error function. [    ]
                                                                       ∆
                                        ∆
                                                           1 x2  2
                  E[e(n )] = ∫          2
                                         ∆   xPe (x )dx =        = 0 .....(7)
                                       −
                                         2                 ∆ 2 −∆
                                                                            2
                      b                                                         CT
      u (n )                      x (n + 1 )     z -1              x (n )                y (n )
e(n )
www.specworld.in                                                                                  16                      www.smartzworld.com
                                                                                                          www.jntuworldupdates.org
                           2 −2 w
                   σe2 =                                           ......(9)
                             3
                  Where σe2 is the variance of the roundoff error in a finite precision, W-bit wordlength system. Since the
         variance is proportional to 2 −2 w , increase in wordlength by 1 bit decreases the error by a factor of 4.
                  The purpose of analyzing roundoff noise is to determine its effect at the output signal. If the noise
         variance at the output is not negligible in comparison to the output signal level, the worlength should be increase
         or some low noise structures should be used. Therefore, we need to compute SNR at the output, not just the noise
         gain to the output. In the noise analysis, we use a double length accumulator model, which means rounding is
         performed after two (2w-1)-bit products are added. Also, notice that multipliers are the sources for roundoff
         noise.
         Effects of Coefficient Quantization in FIR filters: Let us consider the effects of coefficient quantization in FIR
         filters. Consider the transfer function of the FIR filter of length M,
                              M −1
                   H(z) =     ∑ h( n) z
                               n =0
                                          −n
                  The quantization of h(n) takes place during implementation of filter. Let the quantized coefficeints be
         denoted by h(n) and e(n) be the error in quantization. Then we can write,
              h(n) = h(n)+e(n)
         And the new filter transfer function becomes,
                       M −1                     M −1                  M −1          M −1
             H ( z ) = ∑ h(n) z − n =
                       n =0
                                                ∑ [h(n) + e(n)]z −n = ∑ h(n) z −n + ∑ e(n) z −n
                                                n=0                   n=0           n=0
                                                                                                  = H(z)+E(z)
M −1
         Where,       E(z)=     ∑ e( n ) z
                                n =0
                                             −n
H (z )
H (z )
E (z)
www.specworld.in                                                       17                                       www.smartzworld.com
                                                                                                     www.jntuworldupdates.org
              H (ω ) = H (ω ) + E (ω )
              Here E (ω ) is the error in the desired frequency response which is given as,
                                   M −1
                       E (ω ) = ∑ e( n)e
                                              − j ωn
n=0
             The upper bound is reached if all the errors have same sign and have the maximum value in the range. If
     we consider e(n) to be statistically independent random variables, then more realistic bound is given by standard
     derivation of E( ω ) i.e.;
                                               8.6DEADBAND EFFECTS
     Deadband and Deadband of First Order Filter: Dead band is the range of output amplitudes over which limit
     cycle oscillations take place
     Dead band of first order filter
     Consider the first order filter,
      y(n ) = αy(n − 1) + x (n )
     Here α y(n − 1) is the product term. After rounding it to ‘b’ bits we get,
                                                       δ
     The error due to rounding is less than              . Hence,
                                                       2
                                          δ
      Q[αy(n − 1) − αy(n − 1)] ≤
                                          2
     From equation (1) above equation can be written as,
                                                       δ
                   ± y(n − 1) − αy(n − 1) ≤
                                                       2
                             δ
     ∴ y(n − 1)[1 − α ] ≤
                             2
                   δ/2
     ∴ y(n − 1) ≤ 1 − α
www.specworld.in 18 www.smartzworld.com