Stream-Based Classification and Segmentation of Speech Events in Meeting Recordings

Ogata, Jun; Asano, Futoshi

doi:10.1007/11848035_104

Jun Ogata²⁰ &
Futoshi Asano²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4105))

Included in the following conference series:

International Workshop on Multimedia Content Representation, Classification and Security

1454 Accesses

Abstract

In this paper, we presents a stream-based speech event classification and segmentation method in meeting recordings. Four speech events are considered: normal speech, laughter, cough and pause between talks. hidden Markov Models (HMMs) are used to model these speech events and a model topology optimization using Bayesian Information Criterion (BIC) is applied. Experimental results have shown that our system can obtain satisfying results. Based on the detected speech events, the recording of the meeting is structured using an XML-based description language and is visualized by a browser.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition

Smart Ambient Sound Analysis via Structured Statistical Modeling

Histogram Based Method for Unsupervised Meeting Speech Summarization

References

Ajmera, J., Lathoud, G., McCowan, I.: Clustering and segmenting speakers and their locations in meetings. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. 605–608 (2004)
Google Scholar
Dielmann, A., Renals, S.: Dynamic Bayesian networks for Meeting structuring. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. 629–632 (2004)
Google Scholar
Temko, A., Nadeu, C.: Classification of Meeting-Room Acoustic Events with Support Vector Machines and Variable-Feature-Set Clustering. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), pp. 505–508 (2005)
Google Scholar
Truong, K., Leeuwen, D.: Automatic Detection of Laughter. In: Proceeding of European Conference on Speech Communication and Technology (Interspeech 2005), pp. 485–488 (2005)
Google Scholar
Kennedy, L.S., Ellis, D.P.W.: Laughter Detection of in Meetings. In: Proceeding of NIST ICASSP 2004 Meeting Recognition Workshop (2004)
Google Scholar
Cai, R., Lu, L., Zhang, H.-J., Cai, L.-H.: Highlight Sound Effects Detection in Audio Stream. In: Proceeding of IEEE International Conference on Multimedia and Expo. (ICME 2003), pp. 37–40 (2003)
Google Scholar
Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1, Umezono, Tsukuba, Ibaraki, 305-8568, Japan
Jun Ogata & Futoshi Asano

Authors

Jun Ogata
View author publications
You can also search for this author in PubMed Google Scholar
Futoshi Asano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Multimedia Signal Processing and Pattern Recognition Lab., Dept. of Electronics and Communications Eng., Istanbul Technical University, 34469, Istanbul, Turkey
Bilge Gunsel
Department of Computer Science and Engineering, Michigan State University,
Anil K. Jain
College of Engineering, Koç University, 34450, Sarıyer, İstanbul, Turkey
A. Murat Tekalp
Department of Electrical and Electronics Engineering, Boğaziçi University, Istanbul, Turkey
Bülent Sankur

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ogata, J., Asano, F. (2006). Stream-Based Classification and Segmentation of Speech Events in Meeting Recordings. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds) Multimedia Content Representation, Classification and Security. MRCS 2006. Lecture Notes in Computer Science, vol 4105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11848035_104

Download citation

DOI: https://doi.org/10.1007/11848035_104
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39392-4
Online ISBN: 978-3-540-39393-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Stream-Based Classification and Segmentation of Speech Events in Meeting Recordings

Abstract

Access this chapter

Preview

Similar content being viewed by others

The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition

Smart Ambient Sound Analysis via Structured Statistical Modeling

Histogram Based Method for Unsupervised Meeting Speech Summarization

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Stream-Based Classification and Segmentation of Speech Events in Meeting Recordings

Abstract

Access this chapter

Preview

Similar content being viewed by others

The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition

Smart Ambient Sound Analysis via Structured Statistical Modeling

Histogram Based Method for Unsupervised Meeting Speech Summarization

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation