This is an implementation of our work "OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions". In this work, we present a data construction pipeline that can create data pairs and a diffusion Transformer for subject-driven video customization under different control conditions. If you find our repo useful, please give it a star ⭐ and consider citing our paper. Thank you :)
Figure 2:The overall framework of our OmniVCus
- 2025.09.19 : Our paper has been accepted by NeurIPS 2025. 🎉 🎊
- 2025.06.30 : Our paper is on arxiv now. 🚀
- 2025.06.28 : Our project page has been built up. Feel free to check the video generation results on the project page.
Qualitative Comparison
Prompt: The woman in IMG1 is talking to a man on a street
Top-left: Input Image. Top-right: SkyReels-A2. Bottom-left: OmniGen + Wan2.1-I2V. Bottom-right: Ours.
@article{cai2025omnivcus,
title={OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions},
author={Cai, Yuanhao and Zhang, He and Chen, Xi and Xing, Jinbo and Hu, Yiwei and Zhou, Yuqian and Zhang, Kai and Zhang, Zhifei and Kim, Soo Ye and Wang, Tianyu and others},
journal={arXiv preprint arXiv:2506.23361},
year={2025}
}