Skip to main content

Unsupervised News Video Segmentation by Combined Audio-Video Analysis

  • Conference paper
Multimedia Content Representation, Classification and Security (MRCS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4105))

  • 1478 Accesses


Segmenting news video into stories is among key issues for achieving efficient treatment of news-based digital libraries. In this paper we present a novel unsupervised algorithm that combines audio and video information for automatic partitioning news videos into stories. The proposed algorithm is based on the detection of anchor shots within the video. In particular, a set of audio/video templates of anchorperson shots is first extracted in an unsupervised way, then shots are classified by comparing them to the templates using both video and audio similarity. Finally, a story is obtained by linking each anchor shot with all successive shots until another anchor shot, or the end of the news video, occurs. Audio similarity is evaluated by means of a new index and helps to achieve better performance in anchor shot detection than pure video approach. The method has been tested on a wide database and compared with other state-of-the-art algorithms, demonstrating its effectiveness with respect to them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Kraaij, W., Smeaton, A.F., Over, P., Arlandis, J.: TRECVID 2004 - An Overview. TREC Video Retrieval Evaluation Online Proc.,

  2. Wang, C., Wang, Y., Liu, H.Y., He, Y.X.: Automatic Story Segmentation of News Video Based on Audio-Visual Features and Text Information. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi’an, November 2–5, pp. 3008–3011 (2003)

    Google Scholar 

  3. Wei, W., Gao, W.: Automatic Segmentation of News Items Based on Video and Audio Features. Journal of Computer Science and Technology 17(2), 189–195 (2002)

    Article  Google Scholar 

  4. De Santo, M., Percannella, G., Sansone, C., Vento, M.: An Unsupervised Shot Classification System for News Video Story Detection. In: Abate, A.F., Nappi, M., Sebillo, M. (eds.) Multimedia Database and Image Communication, pp. 93–104. World Scientific Publ., Singapore (2005)

    Google Scholar 

  5. Gao, X., Tang, X.: Unsupervised Video-Shot Segmentation and Model-Free Anchorperson Detection for News Video Story Parsing. IEEE Trans. on Circ. and Syst. for Video Tech. 12(9), 765–776 (2002)

    Article  Google Scholar 

  6. Swanberg, D., Shu, C.F., Jain, R.: Knowledge Guided Parsing in Video Databases. In: Proc. of SPIE Symposium on Electronic Imaging: Science and Technology, San Jose, CA, pp. 13–24 (1993)

    Google Scholar 

  7. Smoliar, S.W., Zhang, H.J., Tao, S.Y., Gong, Y.: Automatic Parsing and Indexing of News Video. Multimedia Systems 2(6), 256–265 (1995)

    Article  Google Scholar 

  8. Hanjalic, A., Lagendijk, R.L., Biemond, J.: Semi-Automatic News Analysis, Indexing, and Classification System Based on Topics Preselection. In: Proc. of SPIE, Electronic Imaging, San Jose, CA (1999)

    Google Scholar 

  9. Bertini, M., Del Bimbo, A., Pala, P.: Content-Based Indexing and Retrieval of TV News. Pattern Recognition Letters 22, 503–516 (2001)

    Article  MATH  Google Scholar 

  10. Snoek, C.G.M., Worring, M.: Multimodal Video Indexing: A Review of the State-of-the-art. Multimedia Tools and Applications 25, 5–35 (in press, 2005)

    Article  Google Scholar 

  11. Qi, W., Gu, L., Jiang, H., Chen, X.R., Zhang, H.J.: Integrating Visual, Audio and Text Analysis for News Video. In: 7th IEEE Int. Conf. on Image Processing, Vancouver, British Columbia, Canada (2000)

    Google Scholar 

  12. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)

    MATH  Google Scholar 

  13. Viola, P., Jones, M.: Rapid Object Detection Using a Boosted Cascade of Simple Features. In: Proc. of the IEEE CVPR Conference, vol. 1, pp. 511–518 (2001)

    Google Scholar 

  14. Lee, H.Y., Lee, H.K., Ha, Y.H.: Spatial Color Descriptor for Image Retrieval and Video Segmentation. IEEE Transactions on Multimedia 5(3), 358–367 (2003)

    Article  MathSciNet  Google Scholar 

  15. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A Real-Time Text-Independent Speaker Identification System. In: IEEE ICIAP Conference, Mantova, Italy, pp. 632–637 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

De Santo, M., Percannella, G., Sansone, C., Vento, M. (2006). Unsupervised News Video Segmentation by Combined Audio-Video Analysis. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds) Multimedia Content Representation, Classification and Security. MRCS 2006. Lecture Notes in Computer Science, vol 4105. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39392-4

  • Online ISBN: 978-3-540-39393-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics