skip to main content
research-article

VideoSnapping: interactive synchronization of multiple videos

Published: 27 July 2014 Publication History

Abstract

Aligning video is a fundamental task in computer graphics and vision, required for a wide range of applications. We present an interactive method for computing optimal nonlinear temporal video alignments of an arbitrary number of videos. We first derive a robust approximation of alignment quality between pairs of clips, computed as a weighted histogram of feature matches. We then find optimal temporal mappings (constituting frame correspondences) using a graph-based approach that allows for very efficient evaluation with artist constraints. This enables an enhancement to the "snapping" interface in video editing tools, where videos in a time-line are now able snap to one another when dragged by an artist based on their content, rather than simply start-and-end times. The pairwise snapping is then generalized to multiple clips, achieving a globally optimal temporal synchronization that automatically arranges a series of clips filmed at different times into a single consistent time frame. When followed by a simple spatial registration, we achieve high quality spatiotemporal video alignments at a fraction of the computational complexity compared to previous methods. Assisted temporal alignment is a degree of freedom that has been largely unexplored, but is an important task in video editing. Our approach is simple to implement, highly efficient, and very robust to differences in video content, allowing for interactive exploration of the temporal alignment space for multiple real world HD videos.

References

[1]
Agarwala, A., Zheng, K. C., Pal, C., Agrawala, M., Cohen, M. F., Curless, B., Salesin, D., and Szeliski, R. 2005. Panoramic video textures. ACM Trans. Graph. 24, 3, 821--827.
[2]
Baker, S., and Matthews, I. 2004. Lucas-kanade 20 years on: A unifying framework. IJCV 56, 3, 221--255.
[3]
Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., and Szeliski, R. 2011. A database and evaluation methodology for optical flow. IJCV 92, 1, 1--31.
[4]
Bloom, V., Makris, D., and Argyriou, V. 2012. G3d: A gaming action dataset and real time action recognition evaluation framework. In CVPR Workshops, 7--12.
[5]
Bryan, N. J., Smaragdis, P., and Mysore, G. J. 2012. Clustering and synchronizing multi-camera video via landmark cross-correlation. In ICASSP, 2389--2392.
[6]
Caspi, Y., and Irani, M. 2002. Spatio-temporal alignment of sequences. IEEE TPAMI 24, 11, 1409--1424.
[7]
Diego, F., Ponsa, D., Serrat, J., and López, A. M. 2011. Video alignment for change detection. IEEE Transactions on Image Processing 20, 7, 1858--1869.
[8]
Diego, F., Serrat, J., and López, A. M. 2013. Joint spatio-temporal alignment of sequences. IEEE Transactions on Multimedia 15, 6, 1377--1387.
[9]
Dijkstra, E. W. 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269--271.
[10]
Evangelidis, G. D., and Bauckhage, C. 2013. Efficient subframe video alignment using short descriptors. IEEE TPAMI 35, 10, 2371--2386.
[11]
Jiang, Y.-G., Ngo, C.-W., and Yang, J. 2007. Towards optimal bag-of-features for object categorization and semantic video retrieval. In CIVR, 494--501.
[12]
Kang, S. B., Uyttendaele, M., Winder, S. A. J., and Szeliski, R. 2003. High dynamic range video. ACM Trans. Graph. 22, 3, 319--325.
[13]
Li, R., and Chellappa, R. 2010. Aligning spatio-temporal signals on a special manifold. In ECCV (5), 547--560.
[14]
Liu, C., Yuen, J., and Torralba, A. 2011. Sift flow: Dense correspondence across scenes and its applications. IEEE TPAMI 33, 5, 978--994.
[15]
Lowe, D. G. 1999. Object recognition from local scale-invariant features. In ICCV, 1150--1157.
[16]
Ngo, C.-W., Ma, Y.-F., and Zhang, H. 2005. Video summarization and scene detection by graph modeling. IEEE Trans. Circuits Syst. Video Techn. 15, 2, 296--305.
[17]
Pádua, F. L. C., Carceroni, R. L., Santos, G. A. M. R., and Kutulakos, K. N. 2010. Linear sequence-to-sequence alignment. IEEE TPAMI 32, 2, 304--320.
[18]
Prim, R. C. 1957. Shortest connection networks and some generalizations. Bell system technical journal 36, 6, 1389--1401.
[19]
Rao, C., Gritai, A., Shah, M., and Syeda-Mahmood, T. F. 2003. View-invariant alignment and matching of video sequences. In ICCV, 939--945.
[20]
Rüegg, J., Wang, O., Smolic, A., and Gross, M. H. 2013. Ducttake: Spatiotemporal video compositing. Comput. Graph. Forum 32, 2, 51--61.
[21]
Sand, P., and Teller, S. J. 2004. Video matching. ACM Trans. Graph. 23, 3, 592--599.
[22]
Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47, 1--3, 7--42.
[23]
Shrestha, P., Barbieri, M., and Weda, H. 2007. Synchronization of multi-camera video recordings based on audio. In ACM Multimedia, 545--548.
[24]
Summa, B., Tierny, J., and Pascucci, V. 2012. Panorama weaving: fast and flexible seam processing. ACM Trans. Graph. 31, 4, 83.
[25]
Ukrainitz, Y., and Irani, M. 2006. Aligning sequences and actions by maximizing space-time correlations. In ECCV (3), 538--550.
[26]
Vedaldi, A., and Fulkerson, B., 2008. VLFeat: An open and portable library of computer vision algorithms.
[27]
Yücer, K., Jacobson, A., Hornung, A., and Sorkine, O. 2012. Transfusive image manipulation. ACM Trans. Graph. 31, 6, 176.
[28]
Zhou, F., and la Torre, F. D. 2009. Canonical time warping for alignment of human behavior. In NIPS, 2286--2294.
[29]
Zhou, F., and la Torre, F. D. 2012. Generalized time warping for multi-modal alignment of human motion. In CVPR, 1282--1289.
[30]
Zimmer, H., Bruhn, A., and Weickert, J. 2011. Optic flow in harmony. IJCV 93, 3, 368--388.

Cited By

View all
  • (2024)Efficient GPU Cloth Simulation with Non-distance Barriers and Subspace ReuseACM Transactions on Graphics10.1145/368776043:6(1-16)Online publication date: 19-Dec-2024
  • (2024)Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360° Image OutpaintingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337208530:5(2734-2744)Online publication date: 5-Mar-2024
  • (2023)Eventfulness for Interactive Video AlignmentACM Transactions on Graphics10.1145/359211842:4(1-10)Online publication date: 26-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 33, Issue 4
July 2014
1366 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2601097
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 July 2014
Published in TOG Volume 33, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. alignment
  2. synchronization
  3. video editing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient GPU Cloth Simulation with Non-distance Barriers and Subspace ReuseACM Transactions on Graphics10.1145/368776043:6(1-16)Online publication date: 19-Dec-2024
  • (2024)Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360° Image OutpaintingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337208530:5(2734-2744)Online publication date: 5-Mar-2024
  • (2023)Eventfulness for Interactive Video AlignmentACM Transactions on Graphics10.1145/359211842:4(1-10)Online publication date: 26-Jul-2023
  • (2022)Training a Deep Remastering ModelACM SIGGRAPH 2022 Talks10.1145/3532836.3536228(1-2)Online publication date: 27-Jul-2022
  • (2022)Searching for Fast Demosaicking AlgorithmsACM Transactions on Graphics10.1145/350846141:5(1-18)Online publication date: 13-May-2022
  • (2022)Self-Supervised Human Pose based Multi-Camera Video SynchronizationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547766(1739-1748)Online publication date: 10-Oct-2022
  • (2022)Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in the WildIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2020.301242944:2(1066-1080)Online publication date: 1-Feb-2022
  • (2022)(ChinaVis 2019) uncertainty visualization in stratigraphic correlation based on multi-source data fusionJournal of Visualization10.1007/s12650-019-00579-022:5(1021-1038)Online publication date: 11-Mar-2022
  • (2022)iMoCap: Motion Capture from Internet VideosInternational Journal of Computer Vision10.1007/s11263-022-01596-7130:5(1165-1180)Online publication date: 12-Mar-2022
  • (2021)Aesthetic-guided outward image croppingACM Transactions on Graphics10.1145/3478513.348056640:6(1-13)Online publication date: 10-Dec-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media