Skip to main content

Voice Activity Detection Using Higher Order Statistics

  • Conference paper
Computational Intelligence and Bioinspired Systems (IWANN 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3512))

Included in the following conference series:

  • 3155 Accesses


A robust and effective voice activity detection (VAD) algorithm is proposed for improving speech recognition performance in noisy environments. The approach is based on filtering the input channel to avoid high energy noisy components and then the determination of the speech/non-speech bispectra by means of third order auto-cumulants. This algorithm differs from many others in the way the decision rule is formulated (detection tests) and the domain used in this approach. Clear improvements in speech/non-speech discrimination accuracy demonstrate the effectiveness of the proposed VAD. It is shown that application of statistical detection test leads to a better separation of the speech and noise distributions, thus allowing a more effective discrimination and a tradeoff between complexity and performance. The algorithm also incorporates a previous noise reduction block improving the accuracy in detecting speech and non-speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse environments. Speech Communitation (3), 261–276 (2003)

    Google Scholar 

  2. ETSI, Voice activity detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels, ETSI EN 301 708 Recommendation (1999)

    Google Scholar 

  3. ITU, A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70, ITU-T Recommendation G.729-Annex B (1996)

    Google Scholar 

  4. Sangwan, A., Chiranth, M.C., Jamadagni, H.S., Sah, R., Prasad, R.V., Gaurav, V.: VAD techniques for real-time speech transmission on the Internet. In: IEEE International Conference on High-Speed Networks and Multimedia Communications, pp. 46–50 (2002)

    Google Scholar 

  5. Gustafsson, S., et al.: A psychoacoustic approach to combined acoustic echo cancellation and noise reduction. IEEE Trans. on S.&A. Proc. 10(5), 245–256 (2002)

    MathSciNet  Google Scholar 

  6. Sohn, J., et al.: A statistical model-based vad. IEEE S.Proc.L. 16(1), 1–3 (1999)

    Article  MathSciNet  Google Scholar 

  7. Bouquin-Jeannes, R.L., Faucon, G.: Study of a voice activity detector and its influence on a noise reduction system. Speech Communication 16, 245–254 (1995)

    Article  Google Scholar 

  8. Woo, K., et al.: Robust vad algorithm for estimating noise spectrum. Electronics Letters 36(2), 180–181 (2000)

    Article  Google Scholar 

  9. Li, Q., et al.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. on S.&A. Proc. 10(3), 146–157 (2002)

    Google Scholar 

  10. Marzinzik, M., et al.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Trans. on S.&A. Proc. 10(6), 341–351 (2002)

    Google Scholar 

  11. Chengalvarayan, R.: Robust energy normalization using speech/non-speech discriminator for German connected digit recognition. In: Proc. of EUROSPEECH 1999, Budapest, Hungary, September 1999, pp. 61–64 (1999)

    Google Scholar 

  12. Tucker, R.: Vad using a periodicity measure. In: IEE Proceedings, Communications, Speech and Vision, vol. 139(4), pp. 377–380 (1992)

    Google Scholar 

  13. Nemer, E., et al.: Robust vad using hos in the lpc residual domain. IEEE Trans. S.&A. Proc. 9(3), 217–231 (2001)

    Google Scholar 

  14. Brillinger, D., et al.: Asymptotic theory of estimates of kth order spectra. Spectral Analysis of Time Series. Wiley, Chichester (1975)

    Google Scholar 

  15. Rao, T.S.: A test for linearity of stationary time series. Journal of Time Series Analysis 1, 145–158 (1982)

    Article  Google Scholar 

  16. Hinich, J.: Testing for gaussianity and linearity of a stationary time series. Journal of Time Series Analysis 3, 169–176 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  17. Tugnait, J.: Two channel tests for common non-gaussian signal detection. IEE Proceedings-F 140, 343–349 (1993)

    Google Scholar 

  18. Ramírez, J., et al.: An effective subband osf-based vad with noise reduction for robust speech recognition. IEEE Trans. on S.&A. Proc. (in press)

    Google Scholar 

  19. Ramírez, J., Segura, J.C., Benítez, M.C., de la Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Communication 42(3-4), 271–287 (2004)

    Article  Google Scholar 

  20. Moreno, A., et al.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: II LREC Conference (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Górriz, J.M., Ramírez, J., Segura, J.C., Hornillo, S. (2005). Voice Activity Detection Using Higher Order Statistics. In: Cabestany, J., Prieto, A., Sandoval, F. (eds) Computational Intelligence and Bioinspired Systems. IWANN 2005. Lecture Notes in Computer Science, vol 3512. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26208-4

  • Online ISBN: 978-3-540-32106-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics