Skip to main content

Stream-Based Classification and Segmentation of Speech Events in Meeting Recordings

  • Conference paper
Multimedia Content Representation, Classification and Security (MRCS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4105))

  • 1454 Accesses

Abstract

In this paper, we presents a stream-based speech event classification and segmentation method in meeting recordings. Four speech events are considered: normal speech, laughter, cough and pause between talks. hidden Markov Models (HMMs) are used to model these speech events and a model topology optimization using Bayesian Information Criterion (BIC) is applied. Experimental results have shown that our system can obtain satisfying results. Based on the detected speech events, the recording of the meeting is structured using an XML-based description language and is visualized by a browser.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ajmera, J., Lathoud, G., McCowan, I.: Clustering and segmenting speakers and their locations in meetings. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. 605–608 (2004)

    Google Scholar 

  2. Dielmann, A., Renals, S.: Dynamic Bayesian networks for Meeting structuring. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. 629–632 (2004)

    Google Scholar 

  3. Temko, A., Nadeu, C.: Classification of Meeting-Room Acoustic Events with Support Vector Machines and Variable-Feature-Set Clustering. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), pp. 505–508 (2005)

    Google Scholar 

  4. Truong, K., Leeuwen, D.: Automatic Detection of Laughter. In: Proceeding of European Conference on Speech Communication and Technology (Interspeech 2005), pp. 485–488 (2005)

    Google Scholar 

  5. Kennedy, L.S., Ellis, D.P.W.: Laughter Detection of in Meetings. In: Proceeding of NIST ICASSP 2004 Meeting Recognition Workshop (2004)

    Google Scholar 

  6. Cai, R., Lu, L., Zhang, H.-J., Cai, L.-H.: Highlight Sound Effects Detection in Audio Stream. In: Proceeding of IEEE International Conference on Multimedia and Expo. (ICME 2003), pp. 37–40 (2003)

    Google Scholar 

  7. Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ogata, J., Asano, F. (2006). Stream-Based Classification and Segmentation of Speech Events in Meeting Recordings. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds) Multimedia Content Representation, Classification and Security. MRCS 2006. Lecture Notes in Computer Science, vol 4105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11848035_104

Download citation

  • DOI: https://doi.org/10.1007/11848035_104

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39392-4

  • Online ISBN: 978-3-540-39393-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics