Abstract
Segmenting news video into stories is among key issues for achieving efficient treatment of news-based digital libraries. In this paper we present a novel unsupervised algorithm that combines audio and video information for automatic partitioning news videos into stories. The proposed algorithm is based on the detection of anchor shots within the video. In particular, a set of audio/video templates of anchorperson shots is first extracted in an unsupervised way, then shots are classified by comparing them to the templates using both video and audio similarity. Finally, a story is obtained by linking each anchor shot with all successive shots until another anchor shot, or the end of the news video, occurs. Audio similarity is evaluated by means of a new index and helps to achieve better performance in anchor shot detection than pure video approach. The method has been tested on a wide database and compared with other state-of-the-art algorithms, demonstrating its effectiveness with respect to them.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kraaij, W., Smeaton, A.F., Over, P., Arlandis, J.: TRECVID 2004 - An Overview. TREC Video Retrieval Evaluation Online Proc., http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html
Wang, C., Wang, Y., Liu, H.Y., He, Y.X.: Automatic Story Segmentation of News Video Based on Audio-Visual Features and Text Information. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi’an, November 2–5, pp. 3008–3011 (2003)
Wei, W., Gao, W.: Automatic Segmentation of News Items Based on Video and Audio Features. Journal of Computer Science and Technology 17(2), 189–195 (2002)
De Santo, M., Percannella, G., Sansone, C., Vento, M.: An Unsupervised Shot Classification System for News Video Story Detection. In: Abate, A.F., Nappi, M., Sebillo, M. (eds.) Multimedia Database and Image Communication, pp. 93–104. World Scientific Publ., Singapore (2005)
Gao, X., Tang, X.: Unsupervised Video-Shot Segmentation and Model-Free Anchorperson Detection for News Video Story Parsing. IEEE Trans. on Circ. and Syst. for Video Tech. 12(9), 765–776 (2002)
Swanberg, D., Shu, C.F., Jain, R.: Knowledge Guided Parsing in Video Databases. In: Proc. of SPIE Symposium on Electronic Imaging: Science and Technology, San Jose, CA, pp. 13–24 (1993)
Smoliar, S.W., Zhang, H.J., Tao, S.Y., Gong, Y.: Automatic Parsing and Indexing of News Video. Multimedia Systems 2(6), 256–265 (1995)
Hanjalic, A., Lagendijk, R.L., Biemond, J.: Semi-Automatic News Analysis, Indexing, and Classification System Based on Topics Preselection. In: Proc. of SPIE, Electronic Imaging, San Jose, CA (1999)
Bertini, M., Del Bimbo, A., Pala, P.: Content-Based Indexing and Retrieval of TV News. Pattern Recognition Letters 22, 503–516 (2001)
Snoek, C.G.M., Worring, M.: Multimodal Video Indexing: A Review of the State-of-the-art. Multimedia Tools and Applications 25, 5–35 (in press, 2005)
Qi, W., Gu, L., Jiang, H., Chen, X.R., Zhang, H.J.: Integrating Visual, Audio and Text Analysis for News Video. In: 7th IEEE Int. Conf. on Image Processing, Vancouver, British Columbia, Canada (2000)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Viola, P., Jones, M.: Rapid Object Detection Using a Boosted Cascade of Simple Features. In: Proc. of the IEEE CVPR Conference, vol. 1, pp. 511–518 (2001)
Lee, H.Y., Lee, H.K., Ha, Y.H.: Spatial Color Descriptor for Image Retrieval and Video Segmentation. IEEE Transactions on Multimedia 5(3), 358–367 (2003)
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A Real-Time Text-Independent Speaker Identification System. In: IEEE ICIAP Conference, Mantova, Italy, pp. 632–637 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
De Santo, M., Percannella, G., Sansone, C., Vento, M. (2006). Unsupervised News Video Segmentation by Combined Audio-Video Analysis. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds) Multimedia Content Representation, Classification and Security. MRCS 2006. Lecture Notes in Computer Science, vol 4105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11848035_37
Download citation
DOI: https://doi.org/10.1007/11848035_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39392-4
Online ISBN: 978-3-540-39393-1
eBook Packages: Computer ScienceComputer Science (R0)