Project Definitions
Project Definitions
Projects
l=0
x[n l]h
k
[l].   (1)
The  convolution sum  above represents the  analysis stage of  a  FIR  lter  bank.   The  K  bandpass lters
can be designed in MATLAB using the  fir1 function, and where the coecients of the designed lters
are stored on the DSP memory.   After the subband processors are computed, i.e.,  y
k
[n] = g
k
{x
k
[n]}, the
5
synthesis stage of a FIR lter bank simply weighs the subband output signals  y
k
[n] to yield the output
signal  y[n] according to
y[n]   =
K1
k=0
y
k
[n].   (2)
The  structure  of  an  analysis-synthesis FIR  lter  bank  is  illustrated  in  gure  3.   Note  that  we  have  no
Synthesis filter bank
g
0
g
1
g
K-1
x
0
[n]
x
1
[n]
x
K-1
[n]
y
0
[n]
y
1
[n]
y
K-1
[n]
x[n]
  y[n]
h
0
[n]
h
1
[n]
h
K-1
[n]
Analysisfilter bank
FIR
Figure 3:   Analysis-synthesis FIR lter bank structure, note that the subband selective analysis bandpass
lters  h
k
[n] are FIR lters.
decimation in this lter bank structure.
2.2   IIR  lter  bank
An IIR lter bank is very similar to a FIR lter bank in that it also uses a bank of bandpass lters in the
analysis stage.   However, the bandpass lters are now IIR lters described by the polynomial function
H
k
[z]   =
  B
k
[z]
A
k
[z]
  =
  b
0
 +b
1
z
1
+. . . +b
M
z
M
1 +a
1
z
1
+. . . +a
P
z
P
  ,   (3)
where  b
m
  are FIR coecients and  a
p
  are IIR coecients.   One example of an IIR lter bank using eight
second-order-sections is illustrated in gure 4.   Each subband signal  x
k
[n] is computed using a dierence
equation
x
k
[n]   =
P
p=1
a
p
x
k
[n p] +
M
m=0
b
m
x[n m].   (4)
The design of the K IIR bandpass lters can be done in MATLAB, and where the a
p
 and b
m
 polynomial
coecients  of   the  designed  lters  are  stored  on  the  DSP  memory.   After  the  subband  processors  are
computed,  i.e.,   y
k
[n] =  g
k
{x
k
[n]},  the  synthesis stage of  an IIR  lter  bank simply  weighs the  subband
output signals  y
k
[n] to yield the output signal  y[n] according to
y[n]   =
K1
k=0
y
k
[n].   (5)
The  structure  of  an  analysis-synthesis  IIR  lter  bank  is  illustrated  in  gure  5.   Note  that  we  have  no
decimation in this lter bank structure.
6
0   0.1   0.2   0.3   0.4   0.5
50
40
30
20
10
0
10
H0[f]   H1[f]   H2[f]   H3[f]   H4[f]   H5[f]   H6[f]   H7[f]
1
0
l
o
g
|
H
k
[
f
]
|
2
Normalized  frequency  -  f
Figure 4:   Example of an IIR  lter  bank using eight bands.   The IIR  bandpass lters  are implemented,
in  this  example,   using  second-order-sections on  the  form  H
k
[z]   =
  1z
2
12r cos 
k
z
1
+r
2
z
2
,   where  r  and  
k
determine the -3 dB-bandwidth and center frequency of each bandpass lter, respectively.
2.3   WOLA  lter  bank
The Weighted OverLap Add (WOLA) lter bank is an ecient lter bank technique frequently used in
DSPs, see, e.g., [2, 3].   Four variables, together with an analysis window function w[n], dene the WOLA
lter  bank,   namely;   L  the  length  of   the  analysis  window,   D  the  decimation  rate  (block  rate),   K  the
number of subbands, and  D
F
  the synthesis window decimation rate.
The  analysis  stage,   see  gure  6,   accepts  a  block  of   D  new  input  data  samples.   Each  new  block  is
fed  into  an  input  FIFO  buer  u[n],   of  length  L  samples.   The  data  in  the  input  FIFO  is  element-wise
weighted by the analysis window function and stored into a temporary buer t
1
[n] = u[n]  w[n], of length
L samples.   The temporary buer is time-folded into another temporary vector t
2
[n], of length K samples.
The time-folding means that the elements of  t
1
[n] are modulo-K-added to  t
2
[n], according to
t
2
[n]   =
L/K1
m=0
t
1
[n +mK].   (6)
The temporary buer  t
2
[n] is circularly shifted by  K/2 samples in order to produce a zero-phase signal
for the FFT. This means that the upper half of  t
2
[n] is swapped place with the lower half of  t
2
[n].   The
circularly shifted buer  t
2
[n] is then fed into a  K-sized FFT to compute the subband signals  x
k
[n].
The synthesis stage of the WOLA lter bank implements the actual weighted overlap-add procedure,
i.e.,   the  WOLA  procedure.   The  synthesis stage,  see  gure  7,   starts by  applying  a  size-K  IFFT  to  the
processed subband signals  y
k
[n].   The IFFT output is circularly shifted  K/2 samples, to counter-act the
circular shift  used  in  the  analysis stage, and  the  circularly shifted  data is  stored in  a temporary buer
t
3
[n],   of   size  K  samples.   The  buer   t
3
[n]   is  then  stacked,   by  repetition,   in  the  buer   t
4
[n]   of   length
L/D
F
, where  L is the analysis window length, and  D
F
  is the synthesis window decimation factor.   The
buer  t
4
[n]   is  weighted  by  a  synthesis  window  function  z[n]  of  size  L/D
F
,   dened  as  z[n]   =  w[nD
F
],
i.e.,   a  factor  D
F
  decimated  analysis window  function.   The  weighted  data  is  summed  with  the  data  in
the output FIFO,  t
5
[n] of length  L/D
F
, and the output FIFO data is over-written with the summation
result, i.e.,  t
5
[n]   t
5
[n] + z[n]  t
4
[n].   The output FIFO is then shifted left by  D samples, i.e.,  D  zeros
are lled from the FIFOs rear, and the out-shifted data is the actual output data block,  y[nD].
7
Synthesis filter bank
g
0
g
1
g
K-1
x
0
[n]
x
1
[n]
x
K-1
[n]
y
0
[n]
y
1
[n]
y
K-1
[n]
x[n]
  y[n]
H
0
[z]
H
1
[z]
H
K-1
[z]
Analysisfilter bank
IIR
Figure 5:   Analysis-synthesis IIR lter bank structure.   Note that the subband selective analysis bandpass
lters  H
k
[z] are IIR lters.
The WOLA lter bank is popular due to its simplicity and since many DSP technologies have built
in  support  for  circular  buers  and  FIFOs,   which  of   course  helps  when  implementing  the  WOLA  l-
ter  bank.   Also,   DSPs  have  usually  direct  support  for  FFT/IFFT  which  renders  furthermore  ecient
implementations.
Due  to  the  complexity  of   this  lter   bank  structure,   a  reference  MATLAB  implementation  of   the
WOLA lter bank is provided in appendix A.
2.4   Uniform  FFT  modulated  lter  bank
We  use  a  set  of   K  bandpass  lters,   H
k
[z]   for  k  =  0 . . . K  1,   with  impulse  response  functions  h
k
[n],
each of length  N  taps.   These bandpass lters are created by modulating (frequency shifting) a low-pass
prototype lter (which is equal to the rst bandpass lter at DC frequency, i.e.,  h
0
[n]), according to
h
k
[n]   =   W
kn
K
  h
0
[n], for  n = 0 . . . N 1,   (7)
where  W
K
  = e
j
2
K
 .   The Z-transform of each modulated bandpass lter is
H
k
[z]   =
n=0
h
k
[n]z
n
=
n=0
W
kn
K
  h
0
[n]z
n
=
n=0
h
0
[n]
_
W
k
K
z
_
n
= H
0
[W
k
K
z].   (8)
So, it is clear that each bandpass lter H
k
[z] = H
0
[W
k
K
z] is a frequency shifted (modulated) version of the
prototype lter H
0
[z].   The notation uniform in the section title comes from the fact that the lters are
uniformly distributed on the frequency axis during the modulation process.   There are also non-uniform
modulation techniques which are not described here.
We  shall   now  assume  that  the  bandpass  signals,   i.e.,   the  input  signal   ltered  by  each  modulated
bandpass lter, are subject to decimation a factor D, where D = K/O, and O denotes the over-sampling
ratio.   A lter with a following decimator can be implemented using a polyphase composition, see, e.g.,
[1].   The corresponding polyphase implementation is achieved by dividing the prototype lter  H
0
[z] into
D  polyphase  components,   denoted  as   P
d
[z]   with  impulse  response  function  p
d
[n]   for   d  =  0 . . . D  1,
according to
H
0
[z]   =
D1
d=0
z
d
P
d
[z
D
],   (9)
p
d
[n]   =   h
0
[d +nD],   (10)
P
d
[z]   =
n=0
p
d
[n]z
n
=
n=0
h
0
[d +nD]z
n
.   (11)
8
t
1
[n]
u[n]
w[n]
L
x[nD]
D
L
Circular
shift
FFT
K
x
0
[n]
x
1
[n]
x
K-1
[n]
g
0
g
1
g
K-1
y
0
[n]
y
1
[n]
y
K-1
[n]
t
2
[n]
K   K   K   K   K   K
L
K
Figure 6:   Analysis stage of a WOLA lter bank structure (reprinted from [3]).
Hence,   inserting  (9)  and  (11)  into  (8),   yields  a  polyphase  representation  of   each  modulated  bandpass
lter  H
k
[z] as follows
H
k
[z]   =   H
0
[W
k
K
z] =
D1
d=0
W
dk
K
  z
d
P
d
_
W
kD
K
  z
D
=
D1
d=0
W
dk
K
  z
d
n=0
h
0
[d +nD]
_
W
kD
K
  z
D
_
n
=
=
D1
d=0
W
dk
K
  z
d
n=0
h
0
[d +nD]W
nkD
K
  z
nD
=
D1
d=0
W
dk
K
  z
d
n=0
h
0
[d + nD]W
nk
O
  z
nD
,
(12)
where  we  used  that   W
nkD
K
  =  W
nk
O
  .   The  factor   W
nk
O
  is  used  to  introduce   O  groups  of   subband
polyphase components, where each group will be described by the index  g = 0 . . . O  1.   Each group  g
contains subbands having indices k = g +lO for l = 0 . . . D1.   Considering only the indices k belonging
to group  g, i.e.,  k = g + lO, yields  W
nk
O
  = W
n(g+lO)
O
  = (W
g
O
)
n
.   Each subband component of group  g
9
t
5
[n]
t
4
[n]
z[n]
Output FIFO
Circular
shift
IFFT
K
x
0
[n]
x
1
[n]
x
K-1
[n]
g
0
g
1
g
K-1
y
0
[n]
y
1
[n]
y
K-1
[n]
K
Zeros
Output y[nD]
K
L/D
F
L/D
F
D
D
t
3
[n]
L/D
F
Figure 7:   Synthesis stage of a WOLA lter bank structure (reprinted from [3]).
is expressed as
H
g+lO
[z]   =
D1
d=0
W
d(g+lO)
K
  z
d
n=0
h
0
[d +nD]W
n(g+lO)
O
  z
nD
=
=
D1
d=0
W
dl
D
  z
d
n=0
W
dg
K
  h
0
[d +nD] (W
g
O
)
n
.      .
=q
g,d
[n]
z
nD
.      .
=Q
g,d
[z
D
]
=
=
D1
d=0
W
dl
D
  z
d
Q
g,d
[z
D
] =
_
W
0l
D
  . . . W
(D1)l
D
_
z
0
Q
g,0
[z
D
]
.
.
.
z
(D1)
Q
g,D1
[z
D
]
,   (13)
where we used the fact that W
dlO
K
  = W
dl
D
  .   We denote the lters q
g,d
[n] = W
dg
K
  (W
g
O
)
n
h
0
[d+nD] as the
d
th
modied polyphase component of group g, and they have the Z-transforms Q
g,d
[z] =
n=0
q
g,d
[n]z
n
.
We  now  stack  all   subbands  related  to  the  g
th
group,   i.e.,   subbands  with  indices   k  =  g + lO  for   l   =
10
0 . . . D 1, in a vector H
g
[z] according to
H
g
[z]   =
H
g
[z]
H
g+O
[z]
.
.
.
H
g+(D1)O
[z]
W
00
D
  . . .   W
(D1)0
D
.
.
.
  .
.
.
  .
.
.
W
0(Dl)
D
  . . .   W
(D1)(Dl)
D
z
0
Q
g,0
[z
D
]
.
.
.
z
(D1)
Q
g,D1
[z
D
]
 =
=   DW
D
1
z
0
Q
g,0
[z
D
]
.
.
.
z
(D1)
Q
g,D1
[z
D
]
,   (14)
where  W
D
  is identied  as the  D
th
-order discrete Fourier  transformation (DFT)  matrix,  i.e.,  W
D
1
is
the  D
th
-order inverse discrete Fourier transformation (IDFT) matrix.   Note that the subband indices in
each group are reordered and thus need to be rearranged in an increasing order after each analysis step.
Note that the size of the DFT/IDFT operations is  D, the decimation rate.   This lter bank structure
requires that a block of D new data samples are input to the lter bank for each update of the lter bank
states.   Hence,   the  sampling  rate  of  the  subband  signals is  F
S
/D,   where  F
S
  is  the  sampling  frequency
of   the  fullband  signal,   x[n].   If   D  is  a  power  of   two,   e.g.,   D  =  2
p
for  any  integer   p   >  0,   then  the
DFT/IDFT operation is eciently implemented using the fast Fourier transform (FFT) and its inverse,
the IFFT. Many DSPs have built in support for FFT/IFFT which may be exploited to yield an ecient
implementation.
The analysis stage of this lter bank is illustrated in gure 8.
g
0
g
O
g
(D-1)O
x
0
[n]
x
O
[n]
x
(D-1)O
[n]
x[nD]
x[(n-1)D+1]
q
0,0
[n]
q
0,1
[n]
q
0,D-1
[n]
Group 0
q
O-1,0
[n]
q
O-1,1
[n]
q
O-1,D-1
[n]
Group O-1
x[nD-1]
IFFT
D
IFFT
D
g
O-1
g
2O-1
g
K-1
x
O-1
[n]
x
2O-1
[n]
x
K-1
[n]
y
0
[n]
y
O
[n]
y
(D-1)O
[n]
y
O-1
[n]
y
2O-1
[n]
y
K-1
[n]
Figure 8:   Uniform FFT modulated analysis lter bank.
So far, only the analysis lter bank has been discussed.   The synthesis is derived in a similar fashion
as the analysis, but its derivation is left as an exercise for the interested student.   The synthesis of this
lter bank is illustrated in gure 9.   It may, however, be noted that the major dierence between analysis
and synthesis lies in the lters q
g,d
[n], of the analysis, that are replaced by r
g,d
[n], in the synthesis.   These
11
g
0
g
O
g
(D-1)O
x
0
[n]
x
O
[n]
x
(D-1)O
[n]
g
O-1
g
2O-1
g
K-1
x
O-1
[n]
x
2O-1
[n]
x
K-1
[n]
y
0
[n]
y
O
[n]
y
(D-1)O
[n]
y
O-1
[n]
y
2O-1
[n]
y
K-1
[n]
FFT
D
FFT
D
r
0,0
[n]
r
0,1
[n]
r
0,D-1
[n]
Group 0
r
O-1,0
[n]
r
O-1,1
[n]
r
O-1,D-1
[n]
Group O-1
y[nD]
y[(n-1)D+1]
y[nD-1]
Figure 9:   Uniform FFT modulated synthesis lter bank.
lters  r
g,d
[n] are referred to as the  d
th
modied polyphase component of group g, but they are based on
a dierent prototype lter.
It is also noted that the subband signals of the FFT modulate lter bank are complex valued data,
which may require special attention in some implementations.
Due  to  the  complexity  of   this  lter   bank  structure,   a  reference  MATLAB  implementation  of   the
uniform FFT modulated lter bank is provided in appendix B.
3   Speech  Processing  Methods
Each  lter  bank  structure,   see  section  1.1,   will   be  used  as  a  fundament  to  implement  eight  dierent
speech processing methods,  described next.   Hence,  in total there will be 4  8 = 32 unique projects in
this course.
We have three groups of methods; noise suppression (sections 3.1, 3.2, 3.3), speech suppression (sec-
tions 3.4, 3.5, 3.6), and speech utility functions (sections 3.7 and 3.8).   Noise suppression methods enhance
speech and suppress noise, and they can be used for instance to enhance speech in human communica-
tion systems, e.g., VoIP or cellular telephony.   Speech suppression methods suppress, on the other hand,
speech signals in favor of noise, and can, for instance, be used in, e.g., public address, Karaoke systems,
or as a support function to generate better noise references for other noise suppression systems.   Speech
utility functions are used to enhance part of a signal related to speech.
In the following we shall assume that the received subband signals comprise a mixture of clean speech
plus degrading noise, i.e.,
x
k
[n]   =   s
k
[n] +v
k
[n],   (15)
where  s
k
[n] denotes clean subband speech and  v
k
[n] denote subband noise.
3.1   Adaptive  Gain  Equalizer  Noise  Suppressor
The Adaptive Gain Equalizer (AGE) is a simple but eective method to reduce noise in speech signals
[4].   The AGE uses the fact that speech can be considered non-stationarity compared to the noise.   Two
12
energy, or magnitude, measures track the stationary background noise level and the varying speech plus
noise level.   The quotient between these two measures form a gain function that amplies non-stationary
speech signals, hence, the methods name Adaptive Gain Equalizer.   We shall add a minor adjustment
to the original method described in [4], where the gain function is now:
g
k
[n]   =   min
_
1,
_
  A
x;k
[n]
L
k
 A
x;k
[n]
_
p
k
_
,   (16)
where  L
k
,   A
x;k
[n],   A
x;k
[n]   and  p
k
  are  dened  in  [4].   Since  g
k
[n]   has  only  one  lter  tap,   the  subband
output signal is computed using multiplication as  y
k
[n] = g
k
[n]  x
k
[n].
The measures  A
x;k
[n] and  A
x;k
[n] can be implemented using recursive averages, see section D.5.
Evaluate the methods noise suppression performance using sound les from [5, 6].
3.2   Spectral  Subtraction  Noise  Suppressor
The  spectral   subtraction  method  is   frequently  used  as  speech  enhancement   method  in  industry  and
research today [7].   The spectral subtraction method assumes that the spectral properties of the noise do
not change much from just before speech is active until the speech is active.   Hence, if there is no speech
present, the input signal is  x[n] = v[n] and a spectrum of the background noise level, denoted as
  
P
v;k
[n],
is estimated.   The spectrum of the input signal, with or without speech, is
P
x;k
[n]   =   P
s;k
[n] +P
v;k
[n],   (17)
since  we  assume  that  noise  and  speech  are  uncorrelated,   i.e.,   that  P
sv;k
[n]   =  0.   We  may  rewrite  this
relationship as
P
s;k
[n]   =   P
x;k
[n] P
v;k
[n].   (18)
Hence,   the  spectral   subtraction  method  continuously  subtracts  the  noise  spectrum  estimate  from  the
input signal spectrum to acquire an approximate speech spectrum, i.e.,
P
s;k
[n]   =   max
_
P
x;k
[n] 
  
P
v;k
[n], 0
_
,   (19)
where the max-function is to ensure a constantly positive spectrum estimate.   The enhanced speech signal
is
y
k
[n]   =
  
P
s;k
[n]
1
p
e
j
x
k
[n]
,   (20)
where
 
x
k
[n] is the phase of the noisy speech and  p determines if magnitude (p = 1) or power spectrum
(p = 2) should be estimated.
The  spectral   subtraction  method  requires  a  Voice  Activity  Detector   (VAD)   that   detects  if   speech  is
active  in  one  frame.   The  VAD  detection  is  used  to  trigger  the  background noise  spectrum  estimation,
which should be conducted during speech-silence periods, i.e., when the VAD detector is zero.   One ex-
ample implementation of a VAD is given in appendix C.
The power spectrums can be estimated using recursive averages, see section D.5.
Evaluate the methods noise suppression performance using sound les from [5, 6].
13
3.3   Wiener  Filter  Noise  Suppressor
The Wiener lter provides an optimal estimate of a signal embedded in additive noise by minimizing the
Mean Squares Error (MSE),
E{|e
k
[n]|
2
}   =   E{|s
k
[n] g
k
[n]  x
k
[n]|
2
},   (21)
where  E{  } is the expectation operator.   In other words, the Wiener lter gives the optimal estimate of
speech given a linear combination (ltering) of speech plus noise.   The MSE is expanded as
E{|e
k
[n]|
2
}   =   E{|s
k
[n]|
2
} 2g
k
[n]  E{|s
k
[n]|
2
} +g
k
[n]
2
 E{|x
k
[n]|
2
},   (22)
since  E{s
k
[n]x
k
[n]} =  E{|s
k
[n]|
2
} + E{s
k
[n]v
k
[n]} =  E{|s
k
[n]|
2
}.   The  minimum  MSE  is  found  where
E{|e
k
[n]|
2
}
g
k
[n]
  = 0, which renders that
E{|e
k
[n]|
2
}
g
k
[n]
  =   2  E{|s
k
[n]|
2
} + 2g
k
[n]  E{|x
k
[n]|
2
} = 0,   (23)
g
k
[n]   =
  E{|s
k
[n]|
2
}
E{|x
k
[n]|
2
}
.   (24)
Replacing true expectations with their estimates yields the Wiener solution
g
k
[n]   =
  
P
s;k
[n]
P
x;k
[n]
,   (25)
where we may use the trick from spectral subtraction in order to obtain a clean speech spectrum estimate,
i.e., equation (19).
Note  that  we  need  a  VAD  to  detect  if   speech  is  active  in  one  frame.   The  VAD  detection  is  used  to
trigger the background noise spectrum estimation, which should be conducted during speech-silence pe-
riods, i.e., when the VAD detector is zero.   One example implementation of a VAD is given in appendix C.
The power spectrums
  
P
v;k
[n] and
  
P
x;k
[n] can be estimated using recursive averages, see section D.5.
Evaluate the methods noise suppression performance using sound les from [5, 6].
3.4   Adaptive  Gain  Equalizer  Speech  Signal  Suppressor
Modify the adaptive gain equalizer method in section 3.1 so that, instead of suppressing noise, speech is
suppressed.
Evaluate speech suppression performance using sound les from [5, 6].
3.5   Spectral  subtraction  Speech  Signal   Suppressor
Modify  the  spectral   subtraction  method  in  section  3.2  so  that,   instead  of  suppressing  noise,   speech  is
suppressed.
Evaluate speech suppression performance using sound les from [5, 6].
3.6   Wiener  Filter  Speech  Signal  Suppressor
Modify the Wiener lter in section 3.3 so that, instead of suppressing noise, speech is suppressed.
Evaluate speech suppression performance using sound les from [5, 6].
14
3.7   Automatic  Gain  Control
Automatic  Gain Control (AGC)  is  sometimes referred to as Dynamic  Range Compression (DRC).  The
objective  of  an  AGC  function  is  to  expand  low  signals  and  compress  large  signals  to  a  predetermined
signal  level.   This  is  useful   practically  since  we  know  that  an  AGC-processed  signal  has  a  well-dened
magnitude.   Also, in some applications, such as in hearing-defenders, we use AGC-functions to hear weak
sounds better and to protect against harmful loud sounds.
The  AGC  is  a  recursive  method  that  controls  a  signal   gain  g
k
[n]   so  that  the  power  of   the  AGC-
output signal, i.e.,  E{|y
k
[n]|
2
} where  y
k
[n] = g
k
[n]  x
k
[n], meets a pre-dened subband target power P
k
.
Dene an input signal power estimate as  p
k
[n]  E{|x
k
[n]|
2
}.   The output signal power is  E{|y
k
[n]|
2
} =
g
k
[n]
2
 E{|x
k
[n]|
2
}   g
k
[n]
2
 p
k
[n].   The  gain  is  increased  if  the  output  power  is  less  than  the  target
power and it is decreased if the output power is above the target power.   The gain is usually bounded so
that  G
min;k
  g
k
[n]  G
max;k
, where  G
min;k
  < 1 and  G
max;k
  > 1 are system constants.   The amount of
decrease and increase in gain value is controlled by the attack and release times T
A,k
 and T
R;k
, respectively
(see appendix D.5 for a denition of how these attack and release times relate to recursive averages).   A
VAD (see appendix C) is usually used to have the AGC operate only on speech signals.   The VAD gives a
speech activity detection d[n] that can be zero or one, zero if there is no speech and one if there is speech
present.   The gain for next input sample is updated as follows;
g
k
[n + 1]   =
(1 
A;k
)g
k
[n] +
A;k
G
min;k
  if  g
k
[n]
2
 p
k
[n] > P
k
,
(1 
R;k
)g
k
[n] +
R;k
G
max;k
  else if  g
k
[n]
2
 p
k
[n]  P
k
  and  d[n] = 1,
g
k
[n]   else.
(26)
Evaluate the AGC performance using sound les from [5, 6].
3.8   Calibrated  Parametric  Equalizer
A parametric equalizer is a structure that allows subband gains g
k
 to be tuned according to a predened
function.   The output of a parametric equalizer is  y
k
[n] = g
k
 x
k
[n], note that  g
k
[n]  g
k
  since we expect
the gains to be xed during normal operation.   The parametric equalizer can, for instance, provide large
gain for low subbands to improve the base sounds (low frequency sounds).
We shall, in this project, have a calibrated parametric equalizer structure where the user can modify
the gain gauges of the equalizer in an implicit manner.   By sending in a known sound le, and simulta-
neously pressing one DSP button, the equalizer estimates the power-level of the input sound le.   When
the user releases the button,  the equalizer recomputes the equalizer gain values so that the sound from
the sound le is equalized.   The calibrated parametric equalizer has two operating modes;
Mode  1:   Equalize  This is the default  operating mode where the  input signal is simply scaled by the
gain values, i.e.,  y
k
[n] = g
k
 x
k
[n].
Mode  2:   Calibrate  During the calibration, a sound le is input to the equalizer and the power spectrum
density of the input signal is estimated, i.e.,
  
P
x;k
[n]  E{|x
k
[n]|
2
}.   When the calibration ends, the
gain  values  are  updated  according  to  g
k
  =
_
  P
T;k
P
x;k
[n]
,   where   P
T;k
  are  nominal   power  spectrum
density values dened based on system settings (i.e.,  P
T;k
 needs to be calibrated as well).   The gain
is  bounded  as  G
min;k
   g
k
   G
max;k
,  where  G
min;k
  < 1  and  G
max;k
  > 1.   When  the  calibration
ends, and the gains are updated, the processing falls back to the default equalize mode.
Note  that  the  user  should  be  able  to  switch  between  these  two  operating  modes,   and  the  active  mode
should  be  marked  by  a  led-light.   On  startup,   before  the  user  has  commenced  a  calibration,   the  gains
should be set to 1.
15
4   Project  groups
The  projects  in  the  course  focus  this  year  on  various  topics  for  speech  signal   processing.   Four  main
project groups are used as umbrella projects.   All projects within each main project group are using the
same underlying platform but each project implements a unique speech signal processing method.   It is
recommended  that  all  projects within  the  same main  project  group collaborate regarding development
of the common platform and it is required that each group implement  and report their  unique method
individually.
Main  project  group  1
Group   Filter bank   Application
1   FIR (2.1)   Adaptive Gain Equalizer Noise Suppressor (3.1)
2   FIR (2.1)   Spectral Subtraction Noise Suppressor (3.2)
3   FIR (2.1)   Wiener Filter Noise Suppressor (3.3)
4   FIR (2.1)   Adaptive Gain Equalizer Speech Signal suppressor (3.4)
5   FIR (2.1)   Spectral Subtraction Speech Signal suppressor (3.5)
6   FIR (2.1)   Wiener Filter Speech Signal Suppressor (3.6)
7   FIR (2.1)   Automatic Gain Control (3.7)
8   FIR (2.1)   Calibrated Parametric Equalizer (3.8)
Main  project  group  2
Group   Filter bank   Application
9   IIR (2.2)   Adaptive Gain Equalizer Noise Suppressor (3.1)
10   IIR (2.2)   Spectral Subtraction Noise Suppressor (3.2)
11   IIR (2.2)   Wiener Filter Noise Suppressor (3.3)
12   IIR (2.2)   Adaptive Gain Equalizer Speech Signal suppressor (3.4)
13   IIR (2.2)   Spectral Subtraction Speech Signal suppressor (3.5)
14   IIR (2.2)   Wiener Filter Speech Signal Suppressor (3.6)
15   IIR (2.2)   Automatic Gain Control (3.7)
16   IIR (2.2)   Calibrated Parametric Equalizer (3.8)
Main  project  group  3
Group   Filter bank   Application
17   WOLA (2.3)   Adaptive Gain Equalizer Noise Suppressor (3.1)
18   WOLA (2.3)   Spectral Subtraction Noise Suppressor (3.2)
19   WOLA (2.3)   Wiener Filter Noise Suppressor (3.3)
20   WOLA (2.3)   Adaptive Gain Equalizer Speech Signal suppressor (3.4)
21   WOLA (2.3)   Spectral Subtraction Speech Signal suppressor (3.5)
22   WOLA (2.3)   Wiener Filter Speech Signal Suppressor (3.6)
23   WOLA (2.3)   Automatic Gain Control (3.7)
24   WOLA (2.3)   Calibrated Parametric Equalizer (3.8)
Main  project  group  4
Group   Filter bank   Application
25   Modulated (2.4)   Adaptive Gain Equalizer Noise Suppressor (3.1)
26   Modulated (2.4)   Spectral Subtraction Noise Suppressor (3.2)
27   Modulated (2.4)   Wiener Filter Noise Suppressor (3.3)
28   Modulated (2.4)   Adaptive Gain Equalizer Speech Signal suppressor (3.4)
29   Modulated (2.4)   Spectral Subtraction Speech Signal suppressor (3.5)
30   Modulated (2.4)   Wiener Filter Speech Signal Suppressor (3.6)
31   Modulated (2.4)   Automatic Gain Control (3.7)
32   Modulated (2.4)   Calibrated Parametric Equalizer (3.8)
16
5   Project  report  and  presentation
5.1   Method  Evaluation
It is expected that each group undertakes a thorough evaluation of their implemented structure, both in
MATLAB and by commencing measurements on their DSP implementation.   Each method describes how
the respective implementation can be evaluated.
It  is  furthermore  expected  that  the  evaluation  analyzes  relevant  system  parameters,   and  that  the
evaluation tries to nd values of each system parameter that provides an optimal performance in respective
application.   System  parameters  include  lter  bank  parameters,   and  parameters  from  each  respective
speech processing method.
5.2   Project  presentation
1.   Each group should prepare to orally present, demonstrate and defend their respective implementa-
tion.
2.   Each group should nd a way to demonstrate their solution using a setup that clearly shows the
implementation performance.
3.   The total time for oral presentation and demonstration, combined, is restricted to 15 min per group.
Observe that each group needs to pass the presentation as well as the report, so be well prepared for the
presentation.
5.3   Project  report
1.   Each group will write a project report of maximum 5 pages.
2.   All   MATLAB  and  DSP  C  or  assembly  code  should  be  submitted  together  with  the  report  in  a
compressed zip le.
3.   The report disposition and formatting should follow:
  The report should be formatted using IEEE manuscript template for conference proceedings,
see
http://www.ieee.org/conferences_events/conferences/publishing/templates.html.
  The  front  page  of  the  report  should  show  the  title  of  the  group  project,   the  full   name  and
personal-id number of all project members, the course code and the report submission date.
  The  report  should  briey  (max  2  pages)  describe  the  groups  implementation  including  the
lter banks.
  The remaining pages should be used for describing results from the evaluation and discussions
regarding the implementation, or improvements made to each respective project, conclusions,
and for references.
  Note  that  the  report  should  discuss  the  MATLAB  reference  implementation  as  well   as  the
DSP implementation.
  Relevant gures, tables and equations should have proper labels and they must be referred to
form the text, i.e., oating gures that are not referred to form the text are disallowed.
  The amount of source code listing within the report should e kept at a minimum!
4.   The report must be submitted electronically by email to the course teacher (bsa@bth.se) before
the project presentation.
17
Observe that reports longer than 5 pages, or reports that do not conform to the manuscript template or
the instructions above, will be rejected and the project group will be failed.
If  major  errors are  found  in  the  report,   or  if  the  attached  MATLAB  or  DSP  source  code  does  not
execute, the student group will be given two extra weeks to correct the errors.   The two extra weeks start
at the day the errors have been pointed out by the course teacher.   If the student group fails to submit a
correction to the report, or the source code, within these two weeks, the group fails.
18
References
[1]   P. P. Vaidyanathan.   Multirate Systems and Filter Banks.   Prentice Hall, 1993.
[2]   R.   E.   Crochiere.   A  weighted  overlap-add  method  of   short-time  fourier  analysis/synthesis.   IEEE
Trans. Acoust. Speech and Sig. Proc., ASSP-28:99102, February 1980.
[3]   R. Brennan and T. Schneider.   An ultra low-power dsp system with a exible lterbank.   IEEE  35th
Asilomar Conference on Signals, Systems and Computers, 1:809813, February 2001.
[4]   N. Westerlund, M. Dahl, and I. Claesson.   Speech enhancement for personal communication using an
adaptive gain equalizer.   Elsevier  Signal Processing, 85(6):10891101, 2005.
[5]   Y.   Hu  and  P.   Loizou.   Subjective  evaluation  and  comparison  of   speech  enhancement   algorithms.
Elsevier Speech Communication, 49:588601, 2007.
[6]   Y. Hu and P. Loizou. Noizeus:  A noisy speech corpus for evaluation of speech enhancement algorithms.
Internet:   http://www.utdallas.edu/
~
loizou/speech/noizeus/, 1 Nov. 2010.
[7]   S. F. Boll.   Suppression of acoustic noise in speech using spectral subtraction.   IEEE  Trans. Acoust.
Speech and Sig. Proc., ASSP-27:113120, April 1979.
19
Appendices
A   WOLA  lter  bank  reference  implementation
The following lists a MATLAB reference implementation for the WOLA lter bank.   It assumes an input
block, denoted as xblock, of D input data samples and it produces an output block, denoted as yblock,
of   D  output  data  samples.   The  initialization  should  be  run  once  at  program startup,   then  analysis  is
executed before synthesis for each new block of input samples, and we assume that subband processors
transform the subband data before synthesis.
A.1   Example  initialization
%   prototype   filter   length
L   =   128;
%   number   of   subbands
K   =   32;
%   oversampling   ratio
OS   =   2;
%   prototype   filter   decimation   ratio
DF   =   1;
%   output   prototype   filter   length
LOUT   =   L   /   DF;
%   data   decimation   rate
D   =   N/OS;
%   analysis   prototype   filter
w   =   0.85*N*fir1(L-1,1/N);
%   synthesis   prototype   filter
if   LOUT   <   L
wOUT   =   w(1:DF:L);
else
wOUT   =   w;
end;
%   input   FIFO
xmem   =   zeros(L,1);
%   output   FIFO
ymem   =   zeros(LOUT ,1);
A.2   Analysis
%   update   input   FIFO
xmem   =   [xmem(D+1:end);   xblock ];
%   weight   input   FIFO   with   analysis   prototype   window
temp   =   xmem   .*   w;
%   time -folding
xn   =   temp(1:K);
for   k   =   K+1:K:L-K+1
xn   =   xn   +   temp(k:k+K -1);
end;
%   circular   shift
xn   =   [xn(K/2+1: end);   xn(1:K/2)];
%   fft
20
X   =   fft(xn,   K);
%   since   we   know   input   data   is   real   valued,   its   Fourier   transform   is
%   complex   conjugate   symmetric ,   and   we   may   omit   analysis   of   the   upper
%   frequency   half ,   since   it   can   anyway   be   reconstructed   from   the
%   lower   frequency   half ...
X   =   X(1:K/2+1);
A.3   Synthesis
%   reconstruct   the   upper   frequency   half ,   since   we   can   assume   complex
%   conjugate   symmetry   of   the   data.
Y   =   [Y;   conj(Y(end -1: -1:2))];
%   compute   ifft ,   certify   that   the   result   is   real   valued.
temp   =   real(ifft(Y));
%   revert   circular   shift
temp   =   [temp(K/2+1: end);   temp(1:K/2)];
%   repeat   the   ifft -data ,   weight   with   synthesis   prototype   window,   and
%   add   with   output   FIFO.
ymem   =   repmat(temp ,LOUT/K,1)   .*   wOUT   +   ymem;
%   shift   output   FIFO
yblock   =   ymem(1:D);
ymem   =   [ymem(D+1:end);   zeros(D,1)];
B   Uniform  FFT  modulated  lter   bank  reference  implementa-
tion
The following lists a MATLAB reference implementation for the uniform FFT modulated lter bank.   It
assumes an input block, denoted as  xblock, of  D  input data samples and it produces an output block,
denoted as yblock, of D output data samples.   The initialization should be run once at program startup,
then  analysis  is  executed  before  synthesis  for  each  new  block  of   input  samples,   and  we  assume  that
subband processors transform the subband data before synthesis.
B.1   Example  initialization
%   number   of   subbands
K   =   32;
%   oversampling   ratio
O   =   4;
%   decimation   rate
D   =   K/O;
%   number   of   taps   per   polyphase   component
P   =   3;
%   total   prototype   filter   length   is   L   =   D*P;
%   prototype   filter
PROTFLT   =   sqrt(2/O)*fir1(D*P   -   1,   1/K);
%   compute   analysis   and   synthesis   modified   polyphase   components   for
%   each   group ,   g
POLYPHS_A   =   zeros(D,   P,   O   );
POLYPHS_S   =   zeros(D,   P,   O   );
IDXK   =   0   :   P-1;
for   g   =   0:O-1
for   d   =   0:D-1
21
POLYPHS_A(d+1,   :,   g+1)   =   fliplr(D   *   exp(   j*g*2*pi*d/K   )   ...
*   PROTFLT(   IDXK*D   +   d   +   1   )   .*   exp(   j*2*pi*g*IDXK/O   ));
POLYPHS_S(d+1,   :,   g+1)   =   flipdim(conj( POLYPHS_A(d+1,   :,   g+1)),   2);
end;
end;
%   input   and   output   FIFOs
SMPMEM_A   =   zeros(D,   P);
SMPMEM_S   =   zeros(D,   P,   O);
%   temporary   and   intermediate   variables
X   =   zeros(K/2+1,   1);
TempA   =   zeros(D,   1);
IDXI   =   1:(K/2)/O;
IDX_A   =   zeros ((K/2)/O,   O);
IDX_S   =   zeros(D,   O);
for   g   =   1   :   O
IDX_A(:,   g)   =   g   :   O   :   K/2;
IDX_S(:,g)   =   g   :   O   :   K;
end;
B.2   Analysis
%   update   analysis   FIFO
SMPMEM_A   =   [xblock ,   SMPMEM_A(:,   1:end -1)];
%   go   through   all   groups
for   g   =   1:O
%   filter   with   respective   modified   polyphase   component ,   and   compute   the
%   IFFT
TempA   =   ifft(   sum(   SMPMEM_A   .*   POLYPHS_A(:,:,g),   2)   );
%   since   the   subband   indices   for   each   group   are   in   an   irregular   order ,
%   we   use   IDX_A   to   reorder   the   subband   indices   in   an   increasing   order.
X(IDX_A(:,g))   =   TempA(IDXI ,   :);
if   g==1
X(end)   =   TempA(IDXI(end)+1,   :);
end;
%   note   that   X   contains   only   the   first   frequency   half   since   we   assume
%   that   xblock   is   real   valued,   which   implies   that   X   is   complex   conjugate
%   symmetric.
end;
B.3   Synthesis
%   reconstruct   the   upper   frequency   half
Temp   =   [Y;   conj(Y(end -1: -1:2))];
%   go   through   each   synthesis   group   and   update   the   synthesis   FIFOs   and
%   compute   the   FFT ,   here ,   IDX_S   rearranges   the   subband   indices   from
%   a   linear   ordering   into   an   order   according   to   each   group   g.
for   g   =   1   :   O
SMPMEM_S(:,   2:end ,   g)   =   SMPMEM_S(:,   1:end -1,   g);
SMPMEM_S(:,   1,   g)   =   fft(   Temp(IDX_S(:,   g),   :));
end;
%   compute   the   output   by   filtering   synthesis   FIFO   with   synthesis   modified
%   polyphase   components.   real   is   used   to   ensure   real   output   data.
yblock   =   real(sum(sum(SMPMEM_S   .*   POLYPHS_S ,   3),   2));
22
C   Voice  activity  detection
The  following  lists  a  MATLAB  reference  implementation  for  a  Voice  Activity  Detector  (VAD).   It  as-
sumes one input sample, denoted as  xsample, and it produces one VAD-decision, denoted as  VADout.   If
VADout=0 it  indicates  there  is  no  speech  and  if   VADout=1 it  indicates  speech.   The  VAD  decision  can
be  used,   for  instance,   to  estimate  background  noise  during  speech  pauses,   i.e.,   when  VADout=0.   The
initialization  should  be  run  once  at  program startup,   then  the  VAD  function  is  executed  for  each  new
input sample.
C.1   Example  initialization
%   Sampling   frequency   in   Hz
fs   =   8000;
%   Time   constants   of   the   VAD
tA   =   1/(fs *0.025);   %   0.025   s
tB   =   1/(fs *0.1);   %   0.1   s
tBF   =   1/(fs*5);   %   5   s
%   SNR   detection   threshold
SNR_THS   =   1;
%   Number   of   subbands   in   VAD
K   =   4;
%   VAD   filter   bank   parameters
alpha   =   0.99;
T   =   3;
q   =   linspace(0,1,K+T).^2;
q   =   0.5*(q(1:end -T)   +   q(2:end -T+1));
R   =   1-pi/2*sqrt(q)/K;
theta   =   pi*q;
%   VAD   filter   bank   memory
ZMEM   =   zeros(K,2);
%   Energy   measures
E   =   zeros(K,1);
Eflr   =   zeros(K,1);
%   temporary   vector
xk   =   zeros(K ,1);
C.2   VAD  function
%   Compute   VAD   analysis   filter   bank
for   k   =   1:K
Ak   =   [1,-2*R(k)*cos(theta(k)),R(k)^2];
Bk   =   [1,-alpha];
[xk(k),   ZMEM(k ,:)]   =   filter(Bk ,   Ak,   xsample ,   ZMEM(k,:));
end;
%   Compute   subband   energy
E   =   (1-tA)*E   +   tA*xk.^2;
if   n<3/tA
Eflr   =   E;
end;
%   Estimate   subband   SNR
SNREstimate   =   (E   -   Eflr)./Eflr;
%   Update   background   noise   estimate
23
id   =   find(SNREstimate   <=   SNR_THS);
jd   =   find(SNREstimate   >   SNR_THS);
if   length(id)   >   0
Eflr(id)   =   min((1-tB)*Eflr(id)   +   tB*E(id),   E(id));
end;
if   length(jd)   >   0
Eflr(jd)   =   min((1-tBF)*Eflr(jd)   +   tBF*E(jd),   E(jd));
end;
%   Compute   VAD   detection   output
VADOut   =   double(mean(SNREstimate)   >   SNR_THS);
D   Implementation  Issues  and  Hints
D.1   Use  MATLAB  test  vectors
It  is  suggested  that  you  rst  implement  all   parts  of  your  method  in  MATLAB.   First,   to  see  you  have
understood the concept and, second, that you have a correct reference implementation.   Then, to verify
parts of your DSP-implementation,  generate random test input  vectors in  MATLAB  and compute test
output vectors.   Then, in your DSP-implementation, input the same test input vectors and compute (in
the DSP) corresponding output vectors.   If the MATLAB output vector and DSP output vector are the
same, your implementation is correct for those test vectors.
D.1.1   Example  of  test  vectors
Assume I want to verify that my function mymul is correctly implemented.   This function implements the
following
mymul(a, b)   =   a  b,   (27)
for two real variables  a and  b.   In MATLAB, my code is
function   [out]   =   mymul(a,b);
out   =   a*b;
And my corresponding C-implementation is
float   mymul(float   a,   float   b)
{
return   a*b;
}
To test my C-implementation, i generate random test vectors in MATLAB, e.g.,
NUM_TESTS   =   3;
TEST_VEC_a   =   round (10000*randn(NUM_TESTS ,1))/10;
TEST_VEC_b   =   round (10000*randn(size(TEST_VEC_a)))/10;
for   k   =   1: NUM_TESTS
fprintf(1,   mymul (%.1f,%.1f)=%.2f\n,   TEST_VEC_a(k),   TEST_VEC_b(k),   ...
mymul(TEST_VEC_a(k),   TEST_VEC_b(k)));
end;
Example MATLAB console output:
mymul ( -354.1, -801.2)=283704.92
mymul ( -1372.4 ,23.0)= -31565.20
mymul (877.1 ,285.1)=250061.21
When I use the same input data  a, b to my C-function I see that I get an identical output, which leads
me to conclude that my C-implementation works as expected (for these test vectors).
24
D.2   Analysis  lter  bank  verication
A simple way to verify and test an analysis lter bank implementation is to input a sinusoidal signal, where
the sinusoidals frequency is exactly matching the center frequency of one subband, and then analyze the
subband  signals  x
k
[n].   We  know  that  the  amplitude  of   adjacent  subbands  should  be  very  low  in  the
middle  of  each subband,  which  means we  have a  little  amount of  aliasing.   Hence,   if  the  analysis stage
is correctly implemented we shall se most of the signal energy only in the subband where the sinusoidal
signal is centered.
In  this  test,   the  subband  processors  are  g
k
  =  1  for  k  =  0 . . . K  1.   An  example  is  illustrated  in
gure 10 where a sinusoidal signal is centered in band  k = 2.   A corresponding gure showing the input
subband signals  x
k
[n] after the analysis lter bank is given in gure 11.
0   0.1   0.2   0.3   0.4   0.5
80
70
60
50
40
30
20
10
0
10
H0[f]   H1[f]   H2[f]   H3[f]   H4[f]   H5[f]   H6[f]   H7[f]
1
0
l
o
g
|
H
k
[
f
]
|
2
Normalized  frequency  -  f
Figure  10:   An  eight  band  analysis lter  bank  (shaded)  with  a  sinusoidal signal centered  in  the  middle
of band  k = 2 (black).   Notice that adjacent bands have a very small magnitude (< 60 dB) where the
sinusoidal signal is located, which means we have little aliasing.
D.3   Analysis-synthesis  lter  bank  verication
It is straightforward to verify a DSP-implementation of an analysis-synthesis lter bank.   What you need
is  a  white  noise  generator  (e.g.,   randn  command  in  MATLAB),   subband  processors  that  are  constant
scalars, and a power spectrum estimator (e.g., psd command in MATLAB). The subband processors are
now scalar weights that can take on values g
k
 = 0 or g
k
 = 1, see the lter bank conguration in gure 12.
We note that a linear system is dened by the relationship y[n] = (hx)[n] which has the corresponding
Fourier transform relationship  Y [f] = H[f]  X[f].   The square modulus of this relationship is |Y [f]|
2
=
|H[f]|
2
 |X[f]|
2
.   Taking the  expectation value  yields  E{|Y [f]|
2
} = |H[f]|
2
 E{|X[f]|
2
}  which is  equal
to  P
yy
[f] = |H[f]|
2
 P
xx
[f], where  P
xx
[f] and  P
yy
[f] are the power spectral densities (PSD) of the input
signal  x[n] and output signal  y[n], respectively.   Hence, the transfer magnitude function is computed as
|H[f]|   =
_
Pyy[f]
Pxx[f]
.   We  will   use  this  last  relationship  to  estimate  the  lter  bank  transfer  function  and
thereby verify the lter bank.
The protocol we follow to verify a lter bank implementation is:
1.   Generate a noise sequence  x[n] by, for instance using the  randn command in MATLAB. Store this
sequence as a wav-le.
25
100   200   300   400   500
x0[n]
x1[n]
x2[n]
x3[n]
x4[n]
x5[n]
x6[n]
x7[n]
Sample  index
Figure 11:   Input subband signals from an eight band analysis lter bank with a sinusoidal signal centered
in the middle of band  k = 2.   Notice that most of the signal magnitude is found in subband 2.   A small
initial transient can be seen for all bands around 70 samples, this is the step due to the sinusoidal signal
being activated is seen across subbands.
2.   Input the generated noise sequence x[n] to a DSP-project that simply copies the ADC input to the
DAC output,  i.e.,  without  any lter  bank processing.   Estimate the  PSD  of the  DAC  output  and
store the PSD as
  
P
xx
[f].   This step is illustrated in gure 13.
3.   According to gure 14, repeat the following for each subband  k = 0 . . . K 1:
(a)   Set the subband processor weights to zero except for at index  k where the weight is one, i.e.,
g
m
  =
_
  1,   if  m = k
0,   if  m = k
  for  m = 0 . . . K 1.
(28)
(b)   Input the generated noise sequence  x[n] to the ADC input.
(c)   Perform the analysis-synthesis lter bank operation.
(d)   Output the synthesis lter bank output signal  y[n] to the DAC output channel.
(e)   Estimate the PSD of the DAC output and store the PSD as
  
P
yy|k
[f].
(f)   Estimate the transfer magnitude function, as |
H
k
[f]| =
_
P
yy|k
[f]
Pxx[f]
  .
(g)   Compare  the  estimated  transfer  magnitude  function |
H
k
[f]|  to  the  ideal  transfer  magnitude
function, |H
k
[f]|.
(h)   Repeat steps 3.a to 3.g for each subband  k = 0 . . . K 1.
One  example  on  how  the  results  from  the  verication  procedure  may  look  for  the  third  band,   i.e.,
h
2
[n], of an eight band lter bank is illustrated in gure 15.   We see that the estimated transfer magnitude
function |
H
2
[f]| is very similar to the true transfer magnitude function |H
2
[f]| which leads us to conclude
that the lter bank implementation of subband  k = 2 is correct.
26
Analysis
filter 
bank
x
0
[n]
x
1
[n]
x
K-1
[n]
y
0
[n]
y
1
[n]
y
K-1
[n]
x[n]   y[n]
Synthesis
filter 
bank
g
0
g
1
g
K-1
Figure  12:   Analysis-synthesis  lter  bank  conguration  used  to  verify  the  lter  bank  implementation.
The  subband  signals  x
k
[n]  are scaled  by subband  scaling constants  g
k
  to  yield  subband  output  signals
y
k
[n] = g
k
 x
k
[n].
White 
noise
generator
Power 
spectrum
estimator
P
xx
[f]
ADC   DAC
x[n]
DSP board
x[n]
Copyinput to the output
Figure 13:  Pass a generated noise le through the DSP system without doing any processing of the signal,
estimate the PSD of the output signal.
D.4   Estimating  DSP  processing  load
It is important to ensure that the implemented method does not require more than available DSP cycles
for  its  computation.   Else,  we  risk that  the  DSP  processed  sound  contains  undesirable  artifacts  due  to
e.g.   clipping.   The DSP  processing load can be  estimated  by  using a General Purpose IO  (GPIO) pin.
Connect any available GPIO pin to, e.g., an oscilloscope.   In your DSP-software, dene the same GPIO
as an output pin.   Now, when you start processing a new sample, or block of samples, in your DSP-code,
set the GPIO high, and when your software is nished processing, set the GPIO low.   This means that
the  GPIO  signal   will   be  toggeling  between  low  and  high,   and  it  is  low  when  you  are  not  processing
samples and high when you are processing samples.   By measuring the duty cycle on the GPIO pin we
can  estimate  the  processing load.   If,   for  instance,   the  GPIO  signals  duty  cycle  is  50  %,   it  means  the
time the GPIO is low is the same as the time the GPIO is high, hence, we utilize 50 % of the available
resources.   Note that you should have less than 100 % resource utilization, and you should reserve a small
processing margin (say 5 %) for other over-head tasks performed by the DSP to avoid sound artifacts.
D.5   Recursive  Averages
Recursive averages are highly  useful  in  practical implementations  due  to  their  low  computational cost.
In  eect,   they  implement  an  averaging memory  whose  memory  length  can,   at  least  approximately,   be
dened arbitrarily.
Let  us  consider  a  one-to-one  mapping  (i.e.,   a  scalar  function  of   a  scalar  variable)  f
k
  of   a  random
variable,  e.g.,   x
k
[n],   and  then  we  consider  its  expectation,  i.e.,   E{f
k
(x
k
[n])}.   Auto-regressive averages
are used to continuously approximate this expectation and where the approximation is updated for every
new input sample.   Denote the average that approximate E{f
k
(x
k
[n])} as  a
k
[n], it is dened as
a
k
[n]   =   (1 
k
)a
k
[n 1] +
k
f
k
(x
k
[n]),   (29)
27
White 
noise
generator
Power 
spectrum
estimator
P
yy
[f]
ADC   DAC
Filter 
bank
x[n]   y[n]
DSP board
Figure  14:   Pass  a  generated  noise  le  through  the  DSP  system  and  apply  an  analysis-synthesis  lter
bank where the subband processors are zero except for at one subband index where it is one, and then
estimate the PSD of the output signal.
0   0.1   0.2   0.3   0.4   0.5
80
70
60
50
40
30
20
10
0
10
H2[f]
1
0
l
o
g
|
H
2
[
f
]
|
2
Normalized  frequency  -  f
Figure 15:   Verication of  the  second band  of  an  eight  band  lter  bank  using the  proposed verication
method, where the estimated transfer magnitude function |
H
2
[f]| (black) and the true transfer magnitude
function |H
2
[f]| (gray) are overlayed.
where  
k
  is  referred  to  as  the  learning parameter.   The  learning  parameter  is  dened  as  
k
  =
  1
FST
;k
,
where  F
S
  is  the  sampling  frequency  in  Hz,   and  T
;k
  is  the  averages memory  length  in  seconds.   Note
that if the data  x
k
[n] is decimated by a factor  D, for instance by a lter  bank, then the corresponding
denition  of  the  learning  parameter  is  
k
  =
  D
FST
;k
.   We  see  that  a
k
[n]   is  low-pass  ltering  f
k
(x
k
[n]).
Hence, if  
k
 1, we will have  a
k
[n]  E{f
k
(x
k
[n])}, asymptotically.
D.5.1   Example  1:   Power  and  magnitude  spectrum  estimation
Let the factor  p decide if we estimate power or magnitude spectrum of a signal  x
k
[n].   If  p = 2, a power
spectrum  is  estimated,   and  if   p  =  1,   a  magnitude  spectrum  is  estimated.   The  spectrum  of   x
k
[n]   is
E{|x
k
[n]|
p
}.   The spectrum is estimated using a recursive average
  
P
x;k
[n], according to
P
x;k
[n]   =   (1 
k
)
 
P
x;k
[n 1] +
k
|x
k
[n]|
p
.   (30)
28
D.5.2   Example  2:   Background  noise  spectrum  estimation
Let the factor  p decide if we estimate power or magnitude spectrum of the background noise of a signal
x
k
[n].   If  p = 2, a power spectrum is estimated, and if  p = 1, a magnitude spectrum is estimated.   Let a
VAD (see section C) signal if there is speech in a signal frame, or not.   The VAD detector value is d[n] = 0
if  there  is  no  speech  in  the  frame  and  it  is  d[n]  =  1  if  there  is  speech  in  the  frame.   Since  we  want  to
estimate background noise, we should only estimate when  d[n] = 0 because  x
k
[n] =  v
k
[n] if  there is no
speech present.   The spectrum of  v
k
[n] is  E{|x
v
[n]|
p
}.   The spectrum is estimated, during speech pauses,
using a recursive average
  
P
v;k
[n], according to
P
v;k
[n]   =
_
  (1 
k
)
 
P
x;k
[n 1] +
k
|x
k
[n]|
p
if  d[n] = 0,
P
v;k
[n 1]   else.
  (31)
29