Introduction to Digital Signal Processing
Introduction
 DSP is a technique of performing the mathematical operations on the
  signals in digital domain
 As real time signals are analog in nature we need to first convert the
  analog signal to digital, then we have to process the signal in digital
  domain and again converting back to analog domain
 Thus ADC is required at the input side whereas a DAC is required at
  the output end
 A typical DSP system is as shown in Fig1.1
                  Fig 1.1: A Typical DSP System
Need for DSP
Analog signal Processing has the following drawbacks:
 They are sensitive to environmental changes
 Aging
 Uncertain performance in production units
 Variation in performance of units
 Cost of the system will be high
 Scalability
Advantages and Disadvantages of DSP
 Greater accuracy
 Cheaper
 Flexibility in configuration, as reconfiguration is possible by
  changing the program, whereas an analog system reconfiguration
  would involve, redesigning the hardware
 Easy data storage on magnetic media without loss of fidelity
 Implementation of better algorithms for processing
 Adaptable to low frequency signal processing, where analog
  processors would require very large passive elements like inductors
  and capacitors
 Feasibility of time sharing of a signal processor among a number of
  signals
  However DSP suffers from few limitations:
 System complexity is increased because of the need to convert real-
  life analog signals to digital and the processed digital signal back to
  analog
 Limited bandwidth due to constraint on sampling rate
 Power consumption
Applications of DSP
 Consumer electronic appliances - TV, music synthesizers, stereos,
  recorders etc
 Image processing techniques like image compression, image
  enhancement, image analysis and recognition
 Process control
 Robotics
 Medical application like CT scans, X-ray scanning, MRI, ECG, EEG
  etc
 Speech processing
 Earth quake monitoring
 Communication application like modems, cellular phones, FAX etc
A Digital Signal Processing System
 A computer or a processor is used for digital signal processing
 Antialiasing filter is a LPF which passes signal with frequency less
  than or equal to half the sampling frequency in order to avoid
  Aliasing effect
 Similarly at the other end, reconstruction filter is used to reconstruct
  the samples from the staircase output of the DAC (Fig1.2)
             Fig 1.2 The Block Diagram of a DSP System
Signals that occur in a typical DSP are as shown in Fig 1.3.
Fig 1.3: (a) Continuous time signal (b) Sampled Signal
(c) Sampled Data Signal (d) Quantized Signal (e) DAC Output
The Sampling Process
 ADC process involves sampling the signal and then quantizing the
  same to a digital value
 In order to avoid Aliasing effect, the signal has to be sampled at a
  rate at least equal to the Nyquist rate
 The condition for Nyquist Criterion is as given below,
                            fs≥2 fmax
  where , fs is the sampling frequency
  fmax is the maximum frequency component in the message signal
 If the sampling of the signal is carried out with a rate less than the
  Nyquist rate, the higher frequency components of the signal cannot
  be reconstructed properly
Example:
  If fs=20kHz, then fmax should be ≤10kHz. If the input signal has a
  frequency components greater than fmax, then these components are
  converted into low frequency ones, resulting in a noise called
  aliasing noise
Problem 1: Determine the Nyquist rate for the analog signal
                   x(t)=3cos100t-6sin400t+4cos60t
Solution:
  The frequencies present in x(t) are
            f1=50Hz, f2=200Hz, f3=30Hz
  The max frequency component fmax=f2=200Hz. Hence as per
  sampling theorem Nyquist rate is fs=2fmax= 400Hz
Problem 2: The signal is sampled at a rate of 300Hz. After sampling
  the signal is reconstructed from the samples. Determine the
  frequency of the reconstructed analog signal. comment on the
  aliasing error
Solution: The signal in problem1 has frequencies 50Hz, 200Hz, 30Hz
  and is sampled at 300Hz
  30Hz and 50Hz are less than fs/2 (i.e, 150Hz) and hence not affected
  by aliasing and hence are
  The 200Hz component of x(t) is above fs/2 and is changed by
  aliasing effect. In the reconstructed analog filter the 200Hz filter will
  be aliased into 200-fs=200-300=-100Hz (minus sign represents a
  phase shift 0f 1800)
  In summary the reconstructed analog signal has the frequency
  components 30Hz, 50Hz and 100Hz
Problem 3: Find the minimum sampling rate for y(t) such that no
  aliasing results. Find suitable antialiasing filter (type of cut-off
  frequency) if the sampling rate must be 6KHz
Solution:
  The maximum frequency component of y(t) is 5kHz, since beyond
  this the amplitude is very low. Hence the minimum sampling rate for
                   y(t)=2fmax =10kHz i.e, fs=10kHz
  Using a sampling rate of 6 kHz results in aliasing error which can be
  avoided using anti-aliasing filter
  For 6kHz sampling rate, the maximum frequency component in y(t)
  should be less than 6kHz/2=3kHz. Hence use a low pass filter with
  cut-off of 3kHz as anti-aliasing filter
Discrete Time Sequences
 Consider an analog signal x(t) given by,
                  x(t)= A cos (2ft)
 If this signal is sampled at a Sampling Interval T, in the above
  equation replacing t by nT we get,
          x(nT) = A cos (2fnT) where n= 0,1, 2,..etc
 For simplicity denote x (nT) as x (n)
          x(n) = A cos (2fnT) where n= 0,1, 2,..etc
 We have fs=1/T also θ = 2fnT
           x(n) = A cos (2fnT)= A cos (2fn/fs) = A cos θn
  The quantity θ is called as digital frequency
                  θ = 2fT = 2f/fs radians
                   Fig 1.5 A Cosine Waveform
 A sequence that repeats itself after every period N is called a
  periodic sequence
 Consider a periodic sequence x (n) with period N
           x(n)=x (n+N), n=……..,-1,0,1,2,……..
 Frequency response gives the frequency domain equivalent of a
  discrete time sequence
 It is denoted as X(e jθ), i.e,
 Frequency response of a discrete sequence involves both magnitude
  response and phase response
Problem 4: A signal x(t) has the frequency components of 50Hz,
  250Hz and 600Hz. It is sampled with 4kHz sampling rate. Determine
  the resulting digital frequencies (in radians)
Solution:
            Digital frequency θn=2fn/fs
                              θ1=2x50/4k=0.025rad
                              θ2=2x250/4k=0.125rad
                              θ3=2x600/4k=0.3rad
Problem 5: A signal is sampled with 8kHz sampling frequency.
  Determine the analog frequencies (in Hz and rad/s) for the digital
  frequencies 0, /4, /3, /2,  and 3/4 radians.
Solution:
Discrete Fourier Transform (DFT) and Fast Fourier Transform
  (FFT)
DFT Pair
 DFT is used to transform a time domain sequence x(n) to a frequency
  domain sequence X(k)
 The equations that relate the time domain sequence x(n) and the
  corresponding frequency domain sequence X(k) are called DFT Pair
  and is given by,
DFT(FFT):
                                                where (k=0,1,.......,N-
  1)
IDFT(IFFT):
The Relationship between DFT and Frequency Response
 We have
  Also
                                        , k = 0, 1, 2,...(N-1)
 From the above expression it is clear that we can use DFT to find the
  Frequency response of a discrete signal
 Spacing between the elements of X(k) is given as
                  f=fs/N=1/NT=1/T0
  Where T0 is the signal record length
 It is clear from the expression of f that, in order to minimize the
  spacing between the samples N has to be a large value
 Although DFT is an efficient technique of obtaining the frequency
  response of a sequence, it requires more number of complex
  operations like additions and multiplications
 Thus many improvements over DFT were proposed
 One such technique is to use the periodicity property of the twiddle
  factor e-j2/N
 Those algorithms were called as Fast Fourier Transform
  Algorithms
 The following table depicts the complexity involved in the
  computation using DFT algorithms
      Table 1.1 Complexity in DFT algorithm
       Operations         Number of Computations
Complex Multiplications             N2
Complex Additions                N (N-1)
Real Multiplications               4N2
Real Additions                  2N (2N-1)
Trigonometric Functions            2N2
  FFT algorithms are classified into two categories viz
  1. Decimation in Time FFT
  2. Decimation in Frequency FFT
 In decimation in time FFT the sequence is divided in time domain
  successively till we reach the sequences of length 2
 Whereas in Decimation in Frequency FFT, the sequence X(k) is
  divided successively
 The complexity of computation will get reduced considerably in case
  of FFT algorithms
Problem 6: Compute the 5-point DFT of the sequence x[n]=[1 2 4] and
  hence plot its magnitude and phase spectra
Solution: x[n]=[1 2 4] is a 3-point sequence and hence 5-point DFT
  can be applied
  The 5-point DFT is computed as
  where k=0, 1, 2, 3, 4 and N=5
                    x[0]=1; x[1]=2; x[2]=4
  X[k]=x(0)e-j20k/N+ x(1)e-j2k/N+ x(2)e-j22k/N
           =1+2xe-j2k/5+4x e-j4k/5
  For k=0, X(0)=7
Similarly,
     X(1)=4.55-1.93
     X(2)=2.71.3399
     X(3)=2.7-1.3399
     X(4)=4.551.93
Magnitude of X(k)=X(k)=7,4.55,2.7,2.7,4.55
Phase of X(k)=X(k)= 0,-1.93,1.3399,-1.3399,1.93
Problem 7: A signal x(t) is filtered and sampled using a sampling rate
  of 8kHz. If 1024 samples of this signal are used to compute the
  discrete Fourier transform X(k), find the frequency spacing between
  adjacent X(k) elements. What is the analog frequency corresponding
  to k=12, 30, 64, 120, 200, 320, 400
Solution: The frequency spacing,
              f=fs/N=8KHz/1024=7.8125Hz
  Analog frequency corresponding to k=12 is
              fxk=7.8125x12=93.75 Hz
  Similarly the analog frequencies corresponding to k=30, 64, 120,
  200, 320 and 400 are 234.375Hz, 500Hz, 937.5Hz, 1562.5Hz,
  2500Hz and 3125Hz respectively
Fast Fourier Transform (FFT)
 The direct computation of DFT and IDFT requires a large number of
  complex multiplies
 A number of algorithms have been developed to efficiently compute
  DFT and IDFT
 These algorithms use power of 2 points and exploit the periodic
  nature of the complex exponential ej2nk/N occurring in the DFT and
  IDFT equations
 Table below compares the complex multiples needed to compute
  DFT directly by using an FFT algorithm called radix-2 algorithm
 The radix-2 algorithm uses N that is an integer power of 2, such as 2,
  4, 8, 16, etc
 It is possible to show that the DFT requires N2 complex multiples
   and radix-2 FFT algorithm requires N/2log 2N
 This produces computational savings for larger value of N
             Table: Computations complexity of FFT algorithm
 N      Direct          FFT computation    Amount of saving   Ratio of computational
point   computation     (radix2)N/2log2N   of complex         complexity direct:FFT
                N2                         multiplications    N2/(N/2)log2N=2N2/log2N
   2             4               1                    3                     4
   4             16              4                   12                     4
   8             64             12                   52                   5.33
  16            256             32                  244                     8
  32           1024             80                  944                   12.8
  64           4096            192                 3904                   21.8
 256          65536           1024                64512                    64
1024      1048576106         5120                  106                  204.8
Linear Time Invariant Systems
 A system which satisfies superposition theorem is called as a linear
  system and a system that has same input output relation at all times
  is called a Time Invariant System
 Systems, which satisfy both the properties, are called LTI systems
 An LTI System
 LTI systems are characterized by its impulse response or unit sample
  response in time domain whereas it is characterized by the system
  function in frequency domain
Convolution
 Convolution is the operation that related the input output of an LTI
  system, to its unit sample response
 The output of the system y(n) for the input x(n) and the impulse
  response of the system being h (n) is given as
  where, x(n) is the input of the system
            h(n) is the impulse response of the system
            y(n) is the output of the system
Problem 8: Find the convolution of the sequence x(n)=[1 2 3 4] and
  h(n)=[3 2 1]
Solution:
                 y(n)= x(n) * h(n)=[3 8 14 20 11 4]
Problem 9: Compute and plot the sequence x(n) * h(n), where x[n]= [3
  4 2 11 0 7 -1 10] and h=[1.2 3.7 4.8]. Determine the length of the
  computed sequence
Solution:
                 y(n)= x(n) * h(n)
  y(n)= 3.6000    15.9000   31.6000   39.8000    50.3000   61.2000
  24.7000 41.9000 32.2000 48.0000
 Length of the computed sequence y(n)=lenx(n)+lenh(n)=8+3-1=10
                                      plot of sequence x(n) * h(n)
Matlab code is:
 x=[3 4 2 11 0 7 -1 10];
 h=[1.2 3.7 4.8];
 y=conv(x,h);
 figure(1);
 stem(y);
 l=length(y);
Problem 10: Find the sequence X(k) and H(k) using FFT. Next
  multiply two sequences to generate the sequence Y(k)=X(k).H(k).
  Use IFFT to compute y(n)
Solution :
                 x=[3 4 2 11 0 7 -1 10];
                 h=[1.2 3.7 4.8];
                 xlen=length(x);
                 hlen=length(h);
                 ylen=xlen+hlen-1;
                 N=ylen;
                 Xk=fft(x,N);
                  Hk=fft(h,N);
                  Yk=Xk.*Hk;
                  y=ifft(Yk)
                  figure(1);
                  stem(y);
 X(k)= 36.0000,-5.8262 - 5.7921i, -7.6803 - 3.4410i,    9.8262 -
  2.9919i, 14.6803 - 0.8123i, -28.0000, 14.6803 + 0.8123i, 9.8262 +
  2.9919i, -7.6803 + 3.4410i, -5.8262 + 5.7921i 
 H(k)= 9.7000,5.6766 - 6.7399i, -1.5399 - 6.3403i,-3.8266 -
  0.6975i,-0.3101 + 2.3903i, 2.3000, -0.3101 - 2.3903i,-3.8266 +
  0.6975i,-1.5399 + 6.3403i, 5.6766 + 6.7399i
 Y(k)= 102x3.4920,-0.7211 + 0.0639i, -0.0999 + 0.5399i,-0.3969 +
  0.0459i, -0.0261 + 0.3534i,-0.6440, -0.0261 - 0.3534i, -0.3969 -
  0.0459i,-0.0999 - 0.5399i, -0.7211 - 0.0639i 
 y={ 3.6000 , 15.9000,     31.6000,     39.8000,   50.3000, 61.2000 ,
  24.7000 , 41.9000, 32.2000, 48.0000
              The result is same as the first problem
Problem 11: For the above problem use 8-point DFT and IDFT.
  Compare these results with that obtained in above example
Solution:
                 x=[3 4 2 11 0 7 -1 10];
                 h=[1.2 3.7 4.8];
                 Xk=fft(x,8);
                 Hk=fft(h,8);
                 Yk=Xk.*Hk;
                 y=ifft(Yk)
 y ={   35.8000 63.9000 31.6000 39.8000 50.3000 61.2000
  24.7000 41.9000}
 The last two samples of y(n)=conv(x,h) i.e, 32.8 and 48 are added
  with first 2 samples 3.6 and 15.9 respectively to result 35.8 and 63.9
  of circular convolution output
Z Transformation
 Z Transformations are used to find the frequency response of the
  system
 The Z Transform for a discrete sequence x(n) is given by,
The System Function
 An LTI system is characterized by its System function or the
  transfer function
 The system function of a system is the ratio of the Z transformation
  of its output to that of its input
 It is denoted as H(z) and is given by
                             H(z) = Y(z)/ X(z)
 The magnitude and phase of the transfer function H(z) gives the
  frequency response of the system
 From the transfer function we can also get the poles and zeros of the
  system by solving its numerator and denominator respectively
Digital Filters
 Filters are used to remove the unwanted components in the sequence
 They are characterized by the impulse response h(n)
 The general difference equation for an Nth order filter is given by,
 A typical digital filter structure is as shown in figure below
                    Structure of a Digital Filter
 Values of the filter coefficients vary with respect to the type of the
  filter
 Design of a digital filter involves determining the filter coefficients
 Based on the length of the impulse response, digital filters are
  classified into two categories viz Finite Impulse Response (FIR)
  Filters and Infinite Impulse Response (IIR) Filters
FIR Filters
 FIR filters have impulse responses of finite lengths
 In FIR filters the present output depends only on the past and present
  values of the input sequence but not on the previous output
  sequences
 Thus they are non recursive hence they are inherently stable
 FIR filters possess linear phase response and hence they are very
  much applicable for the applications requiring linear phase response
 The difference equation of an FIR filter is represented as
 The frequency response of an FIR filter is given as
 Also
 The major drawback of FIR filters is, they require more number of
  filter coefficients to realize a desired response as compared to IIR
  filters
 Thus the computational time required will also be more
Problem 12: Find the magnitude and phase response of an FIR filter
  represented by the difference equation
                  y(n)= 0.5 x(n) + 0.5 x(n-1)
Solution:
                  As, y(n)= 0.5 x(n) + 0.5 x(n-1)
                  h(n)= 0.5 δ(n) + 0.5 δ(n-1) = [0.5 0.5]
                  H(z)= 0.5+0.5z-1
                  H(ejθ)= 0.5+0.5 e -jθ
                         = 0.5+0.5 cosθ -j0.5 sinθ
                         =0.5 (1+ cosθ) -j0.5 sinθ
                  = [0.5*2* cos2 (θ/2)]-j[0.5*2* sin (θ/2)* cos (θ/2)]
                  = cos2 (θ/2) -j[sin (θ/2)* cos (θ/2)]
mag (H(ejθ)) = sqrt (cos4 (θ/2) +sin2 (θ/2) cos2 (θ/2))
                 = sqrt [cos2 (θ/2)( cos2 (θ/2)+ sin2 (θ/2))]
                 = cos (θ/2)
Similarly,
Phase (H(ejθ)) = tan –1[-(sin (θ/2) cos (θ/2))/ cos2 (θ/2)]
                 = tan –1[-tan (θ/2)]
                 = - (θ/2)
The magnitude and phase response curves of the designed FIR filter
are as shown in figure below
FIR Filter Design
 Frequency response of an FIR filter is given by the following
  expression,
 Design procedure of an FIR filter involves the determination of the
  filter coefficients bk
IIR Filters
 Unlike FIR filters, IIR filters have infinite number of impulse response
  samples
 They are recursive filters as the output depends not only on the past
  and present inputs but also on the past outputs
 They generally do not have linear phase characteristics
 Typical system function of such filters is given by,
 Stability of IIR filters depends on the number and the values of the
  filter coefficients
 The major advantage of IIR filters over FIR is that, they require lesser
  coefficients compared to FIR filters for the same desired response,
  thus requiring less computation time
Problem 13: Obtain the transfer function of the IIR filter whose
  difference equation is given by, y(n)= 0.9y(n-1)+0.1x(n)
Solution:
                          y(n)= 0.9y(n-1)+0.1x(n)
  Taking Z transformation both sides
                          Y(z)= 0.9 z-1 Y(z) + 0.1 X(z)
                          Y(z) [ 1- 0.9 z-1] = 0.1 X(z)
  The transfer function of the system is given by the expression,
                          H(z)= Y(z)/X(z)
                                = 0.1/ [ 1- 0.9 z-1]
 Realization of the IIR filter with the above difference equation is as
  shown in figure below
                             IIR Filter Structure
IIR Filter Design
 IIR filters can be designed using two methods viz using analog filter
  design techniques and direct method
 In this approach, a digital filter can be designed based on its
  equivalent analog filter
 An analog filter is designed first for the equivalent analog
  specifications for the given digital specifications
 Then using appropriate frequency transformations, a digital filter
  can be obtained
 The filter specifications consist of passband and stopband ripples in
  dB and Passband and Stopband frequencies in rad/sec
                     Low pass Filter Specifications
 Direct IIR filter design methods are based on least squares fit to a
  desired frequency response
 These methods allow arbitrary frequency response specifications
Decimation and Interpolation
 Decimation and Interpolation are two techniques used to alter the
  sampling rate of a sequence
 Decimation involves decreasing the sampling rate without
  violating the sampling theorem
 Interpolation   increases the sampling rate        of a sequence
  appropriately by considering its neighbouring samples
Decimation
 Decimation is a process of dropping the samples without violating
  sampling theorem
 The factor by which the signal is decimated is called as decimation
  factor and it is denoted by M
 It is given by,
  where
                      Decimation Process
Problem 14: Let x(n)=[3 2 2 4 1 0 –3 –2 –1 0 2 3] be decimated with a
  factor of 2. Let the filtered sequence be w(n)=[2.1 2 3.9 1.5 0.1 –2.9 –2 –
  1.1 0.1 1.9 2.9]. Obtain the decimated sequence y(m)
 Sequence y(m) can be obtained by dropping every alternative sample of
  w (n)
                          y(m) = [2 1.5 -2.9 -1.1 1.9]
Problem 15: Let x(n)=[1 0 2 4.1 5 6 7 3 5.2] and filter sequence h(n)=[0.2
  1.2 -0.3 0.4] be decimated by a factor of 3. Obtain the decimated
  sequence
Solution:
 The output sequence after filter is w(n)=[0.2 1.2 0.1 3.62 5.2 6.77 8.74
  9.2 4.94 8.14 -0.36 2.08]
                  The decimated sequence is y(m)=[0.2 3.62 8.74 8.14]
Interpolation
 Interpolation is a process of increasing the sampling rate by
  inserting new samples in between
 The input output relation for the interpolation, where the sampling
  rate is increased by a factor L, is given as,
  where w(n)= x(m/L), m=0, ±L, ±2L……
                   0 for other values of m
                      Interpolation Process
Problem 16: Let x(n)= [0 3 6 9 12] be interpolated with L=3. If the
  filter coefficients of the filters are bk=[1/3 2/3 1 2/3 1/3], obtain the
  interpolated sequence
Solution:
  After inserting zeros,
                           w(m) = [0 0 0 3 0 0 6 0 0 9 0 0 12]
                           bk=[1/3 2/3
                                    ↑ 1 2/3 1/3]
  We have,
            y(m)= Σ bk w(m-k) = b-2 w(m+2)+ b-1 w(m+1)+ b0 w(m)+ b1
                   w(m-1)+ b2 w(m-2)
Substituting the values of m, we get
y(0)= b-2 w(2)+ b-1 w(1)+ b0 w(0)+ b1 w(-1)+ b2 w(-2)= 0
y(1)= b-2 w(3)+ b-1 w(2)+ b0 w(1)+ b1 w(0)+ b2 w(-1)=1
y(2)= b-2 w(4)+ b-1 w(3)+ b0 w(2)+ b1 w(1)+ b2 w(0)=2
Similarly we get the remaining samples as,
                y(n) = [ 0 1 2 3 4 5 6 7 8 9 10 11 12]
 Computational Accuracy in DSP Implementations
Introduction
 The issues related to computational accuracy of algorithms when
  implemented using programmable digital signal processors are
  discussed
 Various formats of number representation and their effect on the
  dynamic range and precision of signals represented using these
  formats are discussed
 The various sources of errors in the implementation of     DSP
  algorithms and how to control these errors while designing DSP
  systems are discussed
Following topics to be discussed
1. Number formats for signals and coefficients in DSP systems
2. Dynamic range and precision
3. Sources of error in DSP implementations
4. A/D conversion errors
5. DSP computational errors
6. D/A conversion errors
Number formats for signals and coefficients in DSP systems
 In a digital signal processor, as in any other digital system, signals
  are represented as numbers right from the input, through different
  stages of processing, to the input
 The DSP structures, such as filters, also require numbers to specify
  coefficients
 There are various ways of representing these numbers, depending on
  the range and precision of signals and coefficients to be represented,
  hardware complexity, and speed requirements
 The typical formats used for numbers to represent signals and
  coefficients in DSP systems
 In a DSP, signals are represented as discrete sets of numbers from the
  input stage along through intermediate processing stages to the
  output
 Even DSP structures such as filters require numbers to specify
  coefficients for operation
 Two typical formats for these numbers are :
1. Fixed-point format
2. Floating-point format
Fixed point format
 The simplest scheme of number representation is the format in which
  the number is represented as an integer or fraction using a fixed
  number of bits
 An n-bit fixed-point signed integer −2n-1 ≤ x ≤ 2n-1 −1 is represented
  as:
        x = -s. 2n-1 + bn-2 .2n-2 + bn-3 .2n-3 + … + b1 .21 + b0 .20 ---- (1)
  where s represents the sign of the number (s = 0 for positive and s =
  1 for negative)
 Fig 1.12 (a) Fixed point format to represent signed integers
 Fig 1.12 (b) Fixed point format to represent signed fractions
  Example 1: What is the range of numbers that can be
  represented in a fixed point format using 16 bits if the numbers
  are treated as (a) signed integers (b) signed fractions?
  Solution: (a). Using 16 bits, the range of integers that can be
  represented is determined by substituting n=16 in equation 1 and is
  given as
                            -215 to +215 -1
                       i.e, -32,768 to +32,767
(b). The range of fractions, as determined from equation 1 using n=16,
  is given as
                            -1 to +(1-2-15)
                        i.e, -1 to +.999969482
 In DSP implementations, multiplication of integers produces
  numbers that may require more bits to represent, and in the event of
  a fixed number of available bits, it may create wraparound
 The wraparound generates the most negative number after the most
  positive number, and vice versa
 The problem can be tackled by using fractional representation
 When two fractions are multiplied, the result is still a fraction
 The resulting fraction may use the same number of bits as the
  original fractions by discarding the less significant bits
Double Precision Fixed point format
 To increase the range of numbers that can be represented in fixed
  point format, one obvious approach is to increase its size
 If the size is doubled, the range of numbers increases substantially
 Simply doubling the size and still using the fixed point format
  creates a double precision fixed point format
 But such a format requires double the storage for the same data and
  may need double the number of accesses for the same size of data
  bus of the DSP device
Floating point format
 For DSP applications, if an algorithm involves summation of a large
  number of products, it requires a large number of bits to represent the
  signal to allow for adequate signal growth over the summation
 However, since a processor architecture will not allow for an
  unlimited number of bits, some processor choose a floating point
  format for signal processing computations
 A floating point number is made up of a mantissa M x and an
  exponent Ex such that its value x is represented as
                               x= Mx2Ex
 If two floating point numbers x and y are multiplied, the product xy
  is given by
                            xy= MxMy2Ex+ Ey
 Implementation of a floating point multiplier must contain a
  multiplier for the mantissa and an adder for the exponent
 An addition of floating point numbers requires normalization of the
  numbers to be added so that they have the same components
 A commonly used single-precision floating-point representation is
  the IEEE 754-1985 format given as:
              IEEE-754 format for floating point numbers
 F represents the magnitude fraction of the mantissa, and the exponent
  E is an integer
 Further in determining the mantissa, an implied 1 is placed
  immediately before the binary point of the fraction
 The sign bit provides the sign of the fractional part of the number
 That is to say, with n bits for F, the range of fractional numbers that
  can be represented in the mantissa is –(2-2 -n) to +(2-2-n)
 The bias depends upon the bits reserved for the component
 In Fig, the bias is 127, the largest positive number represented by 8
  bits
 The value of E can be from 0 to 255
 In double precision representation, the exponent uses 11 bits, making
  the bias value as 1023
  Example 2: Find the decimal equivalent of the floating point
  binary number 1011000011100. Assume a format similar to
  IEEE-754 in which the MSB is the sign bit followed by 4
  exponent bits followed by 8 bits for the fractional part
  Solution: The number is negative, as the sign bit is 1
                       F= 2-4+2-5+2-6 = 0.109375
                              E=21+22=6
                    Thus the value of the number is
                   x= -1.109375x2(6-7) = -0.5546875
 Floating point format, used to increase the range of numbers that can
  be represented, suffers from the problem of speed reduction for DSP
  computation
 More steps are required to complete a floating point computation
  compared to fixed point computation
  Dynamic range and Precision
 The dynamic range of a signal is the ratio of the maximum value to
  the minimum value that the signal can take in the given number
  representation scheme
 The dynamic range of a signal is proportional to the number of bits
  used to represent it and increases by 6dB for every additional bit
  used for the representation
 The number of bits used to represent a signal also determines the
  resolution or the precision with which the signal can be represented
 However, the time taken for certain operations such as the A/D
  conversion increases with the increase in the number of bits
 Resolution is the minimum value that can be represented using a
  number representation format
 For instance, if N bits are used to represent a number from 0 to 1, the
  smallest value it can take is the resolution and is given as
                     Resolution = 1/2N for large N
 Resolution of a number representation format is normally expressed
  as number of bits used in the representation
 At times, it is also expressed as a percentage
 Precision is an issue related to the speed of DSP implementation
 In general, techniques to improve the precision of an implementation
  reduce its speed
 Larger word size improves the precision but may pose a problem
  with the speed of the processor, especially if its bus width is limited
 For example, if the 32 bit product of a 16x16 multiplication has to be
  preserved without loss of precision, two memory accesses are
  required to store and recall the product using a 16 bit bus
  Example 3: Calculate the dynamic range and precision of each
  of the following number representation formats
1. 24 bit, single precision, fixed point format
2. 48 bit, double precision, fixed point format
3. A floating point format with a 16 bit mantissa and an 8 bit
   exponent
  Solution: a. Since each bit gives a dynamic range of 6dB, the total
  dynamic range is 24x6 =144dB
             Percentage resolution is (1/224)x100 = 6x10-6
  b. Since each bit gives a dynamic range of 6dB, the total dynamic
  range is 48x6 =288dB
            Percentage resolution is (1/248)x100 = 4x10-13
 For floating point representation, the dynamic range is determined by
  the number of bits in the exponent. Since there are 8 exponent bits, the
  dynamic range is (28-1)x6= 1530dB
 The percentage resolution depends on the number of bits in the
  mantissa. Since there are 16 bits in the mantissa, the resolution is
                        (1/216)x100 = 1.5x10-3 %
Sources of error in DSP implementations
 A DSP system consists of an A/D converter, a DSP device, and a D/A
  converter
 The accuracy of a DSP implementation depends upon a number of
  factors contributed by the A/D and D/A conversions and how the
  calculations are performed in the DSP device
 The error in the A/D and D/A in the representation of analog signals
  by a limited number of bits is called the quantization error
 The quantization error decreases with the increase in the number of
  bits used to represent signals in A/D and D/A converters
 The errors in the DSP calculations are due to the limited word length
  used
 These errors depend upon how the algorithm is implemented in a
  given DSP architecture
 These errors can be reduced by using a larger word length for data
  and by using rounding, instead of truncation, in calculations
  A/D Conversion Errors
 Consider an A/D converter, shown in Fig, with b bits used to
  represent an unsigned signal value
 Its digital representation is of the form …xxx….x, where there are b
  bits after the assumed binary point
 In this kind of binary representation, the value of the least significant
  bit is given by
                                  Δ=2-b
 The maximum error due to quantization depends on b
 The quantization error for a given conversion as shown in the model
  of Fig is given by
                                ε = xq – x
  where x is the input and xq is the quantized output
 This error is called the truncation error if the signal value above the
  largest integral multiple of Δ is simply dropped
 It is called the rounding error if the value is rounded to the nearest
  integral multiple of Δ. This way the rounding limits the error to ±Δ/2
 The signal to noise ratio (SNR) is a measure that is used to evaluate
  the performance of the A/D converter
  DSP Computational Errors
 The DSP computations involve using the digitized signal values and
  DSP structures represented by coefficients
 These numbers are typically represented in the signed fractional 2’s
  complement form
 The computation almost always involve multiplications or multiply
  and accumulate (MAC) operations
  D/A Conversion Errors
 A source of error in a D/A converter is due to the fact that, typically,
  a D/A converter uses fewer bits in conversion than the number of bits
  required by the computed result, produced by the DSP device
 This is equivalent to the truncation or the rounding off error in the
  A/D converter and can be handled