Abstract
In this paper, we presents a stream-based speech event classification and segmentation method in meeting recordings. Four speech events are considered: normal speech, laughter, cough and pause between talks. hidden Markov Models (HMMs) are used to model these speech events and a model topology optimization using Bayesian Information Criterion (BIC) is applied. Experimental results have shown that our system can obtain satisfying results. Based on the detected speech events, the recording of the meeting is structured using an XML-based description language and is visualized by a browser.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ajmera, J., Lathoud, G., McCowan, I.: Clustering and segmenting speakers and their locations in meetings. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. 605–608 (2004)
Dielmann, A., Renals, S.: Dynamic Bayesian networks for Meeting structuring. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. 629–632 (2004)
Temko, A., Nadeu, C.: Classification of Meeting-Room Acoustic Events with Support Vector Machines and Variable-Feature-Set Clustering. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), pp. 505–508 (2005)
Truong, K., Leeuwen, D.: Automatic Detection of Laughter. In: Proceeding of European Conference on Speech Communication and Technology (Interspeech 2005), pp. 485–488 (2005)
Kennedy, L.S., Ellis, D.P.W.: Laughter Detection of in Meetings. In: Proceeding of NIST ICASSP 2004 Meeting Recognition Workshop (2004)
Cai, R., Lu, L., Zhang, H.-J., Cai, L.-H.: Highlight Sound Effects Detection in Audio Stream. In: Proceeding of IEEE International Conference on Multimedia and Expo. (ICME 2003), pp. 37–40 (2003)
Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ogata, J., Asano, F. (2006). Stream-Based Classification and Segmentation of Speech Events in Meeting Recordings. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds) Multimedia Content Representation, Classification and Security. MRCS 2006. Lecture Notes in Computer Science, vol 4105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11848035_104
Download citation
DOI: https://doi.org/10.1007/11848035_104
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39392-4
Online ISBN: 978-3-540-39393-1
eBook Packages: Computer ScienceComputer Science (R0)