Skip to content

aud.filterbank

Olivier Lartillot edited this page Aug 7, 2018 · 4 revisions

This operator is a specialisation of the general signal processing operator sig.filterbank focused on auditory models. The filterbank decomposition models an actual process of human perception, corresponding to the distribution of frequencies into critical bands in the cochlea.

Flowchart interconnections

Same as in sig.filterbank.

Filterbank selection

Two basic types of filterbanks are proposed:

'Gammatone'

aud.filterbank(…,'Gammatone') carries out a Gammatone filterbank decomposition (Patterson et al, 1992). It is known to simulate well the response of the basilar membrane. It is based on a Equivalent Rectangular Bandwidth (ERB) filterbank, meaning that the width of each band is determined by a particular psychoacoustical law. For Gammatone filterbanks, sig.filterbank calls the Auditory Toolbox routines MakeERBFilters and ERBfilterbank. This is the default choice when calling aud.filterbank.

https://miningsuite.googlecode.com/svn/wiki/SigFilterbank_gammatone.png   Ten ERB filters between 100 and 8000Hz (Slaney, 1998)

  • aud.filterbank(...,'Lowest',f) indicates the lowest frequency f, in Hz. Default value: 50 Hz.

'2Channels'

aud.filterbank(…,'2Channels') performs a computational simplification of the filterbank using just two channels, one for low-frequencies, below 1000 Hz, and one for high-frequencies, over 1000 Hz (Tolonen and Karjalainen, 2000). On the high-frequency channel is performed an envelope extraction using a half-wave rectification and the same low-pass filter used for the low-frequency channel. This filterbank is mainly used for multi-pitch extraction (cf. aud.pitch).

Further options

  • aud.filterbank(…,'NbChannels',N) specifies the number of channels in the bank. By default: N = 10. This option is useless for '2Channels'.

  • aud.filterbank(…,'Channel',c) – or aud.filterbank(…,'Channels',c) – only output the channels whose ranks are indicated in the array c (default: c = (1:N) )

Preselected filterbanks

sig.filterbank(…,p) specifies predefined filterbanks, all implemented using elliptic filters, by default of order 4:

  • p = 'Mel': Mel scale (cf. aud.spectrum(…,'Mel')).
  • p = 'Bark': Bark scale (cf. aud.spectrum(…,'Bark')).
  • p = 'Scheirer' proposed in (Scheirer, 1998) corresponds to sig.filterbank(…,'CutOff',[-Inf 200 400 800 1600 3200 Inf])
  • p = 'Klapuri' proposed in (Klapuri, 1999) corresponds to sig.filterbank(…,'CutOff',44*[2.^ ([ 0:2, ( 9+(0:17) )/3 ]) ])

Example

aud.filterbank('ragtime.wav')

https://miningsuite.googlecode.com/svn/wiki/SigFilterbank_ex1.png

If the number of channels exceeds 20, the audio waveform decomposition is represented as a single image bitmap, where each line of pixel represents each successive channel:

aud.filterbank('ragtime.wav','NbChannels',40)

https://miningsuite.googlecode.com/svn/wiki/SigFilterbank_ex2.png

Clone this wiki locally