UniVST: A Unified Framework for Training-free Localized Video Style Transfer [Official Code of PyTorch]
Ministry of Education of China, Xiamen University, China.
2 Kunlun Skywork AI. β β Corresponding Author.
β’ 2025.10: π₯ UniVST now supports five backbones, including advanced rectified-flow models. β’ 2025.09: π₯ The code has been reorganized and several bugs have been fixed. β’ 2025.05: π₯ The project page of UniVST is now available. β’ 2025.01: π₯ The official code of UniVST has been released. β’ 2024.10: π₯ The paper of UniVST has been submitted to arXiv.
We propose UniVST, a unified framework for training-free localized video style transfer based on diffusion models. UniVST first applies DDIM inversion to the original video and style image to obtain their initial noise and integrates Point-Matching Mask Propagation to generate masks for the object regions. It then performs AdaIN-Guided Localized Video Stylization with a three-branch architecture for information interaction. Moreover, Sliding-Window Consistent Smoothing is incorporated into the denoising process, enhancing the temporal consistency in the latent space. The overall framework is illustrated as follows:
git clone https://github.com/QuanjianSong/UniVST.git
# Installation with the requirement.txt
conda create -n UniVST python=3.10
conda activate UniVST
pip install -r requirements.txt
# Or installation with environment.yaml
conda env create -f environment.yaml
We provide five different backbone options: SD-v1.5, SD-v2.1, Animatediff-v2, SD-v3.0, and SD-v3.5. You can freely choose the backbone for your video stylization tasks.
SD-v1.5/SD-v2.1
You can run with a single click sh scripts/start_sd.sh
to get the stylized results. Alternatively, you can also follow the steps below for customization.
CUDA_VISIBLE_DEVICES=1 python src/sd/run_content_inversion_sd.py \
--content_path examples/contents/mallard-fly \
--output_path results/contents-inv \
--is_opt
Then, you will find the content inversion result in the results/contents-inv/sd/mallard-fly
.
CUDA_VISIBLE_DEVICES=1 python src/sd/run_style_inversion_sd.py \
--style_path examples/styles/00033.png \
--output_path results/styles-inv
Then, you will find the style inversion result in the results/styles-inv/sd/00033
.
CUDA_VISIBLE_DEVICES=1 python src/mask_propagation.py \
--feature_path results/contents-inv/sd/mallard-fly/features/inversion_feature_map_2_block_301_step.pt \
--backbone 'sd' \
--mask_path 'examples/masks/mallard-fly.png' \
--output_path 'results/masks'
Then, you will find the mask propagation result in the results/masks/sd/mallard-fly
.
β’ 4.Perform localized video style transfer. [Optional, you can also omit the mask_path to complete the overall style transfer.]
CUDA_VISIBLE_DEVICES=1 python src/sd/run_video_style_transfer_sd.py \
--content_inv_path results/contents-inv/sd/mallard-fly/inversion \
--style_inv_path results/styles-inv/sd/00033/inversion \
--mask_path results/masks/sd/mallard-fly \
--output_path results/stylizations
Then, you will find the stylization result in the results/stylizations/sd/mallard-fly_00033
.
Animatediff-v2
First, you need to download the motion module to the dir ckpts
.
Then, you can run with a single click sh scripts/start_animatediff.sh
to get the stylized results. Alternatively, you can also follow the steps below for customization.
CUDA_VISIBLE_DEVICES=1 python src/animatediff/run_content_inversion_animatediff.py \
--content_path examples/contents/mallard-fly \
--output_path results/contents-inv \
--is_opt
Then, you will find the content inversion result in the results/contents-inv/animatediff/mallard-fly
.
CUDA_VISIBLE_DEVICES=1 python src/animatediff/run_style_inversion_animatediff.py \
--style_path examples/styles/00033.png \
--output_path results/styles-inv \
Then, you will find the style inversion result in the results/styles-inv/animatediff/00033
.
CUDA_VISIBLE_DEVICES=1 python src/mask_propagation.py \
--feature_path results/contents-inv/animatediff/mallard-fly/features/inversion_feature_map_2_block_301_step.pt \
--backbone 'animatediff' \
--mask_path 'examples/masks/mallard-fly.png' \
--output_path 'results/masks'
Then, you will find the mask propagation result in the results/masks/animatediff/mallard-fly
.
β’ 4.Perform localized video style transfer. [Optional, you can also omit the mask_path to complete the overall style transfer.]
CUDA_VISIBLE_DEVICES=1 python src/animatediff/run_video_style_transfer_animatediff.py \
--content_inv_path results/contents-inv/animatediff/mallard-fly/inversion \
--style_inv_path results/styles-inv/animatediff/00033/inversion \
--mask_path results/masks/animatediff/mallard-fly \
--output_path results/stylizations
Then, you will find the stylization result in the results/stylizations/animatediff/mallard-fly_00033
.
SD-v3.0/SD-v3.5
You can run with a single click sh scripts/start_sd3.sh
to get the stylized results. Alternatively, you can also follow the steps below for customization.
CUDA_VISIBLE_DEVICES=1 python src/sd3/run_content_inversion_sd3.py \
--content_path examples/content/mallard-fly \
--output_path results/content-inv \
--is_rf_solver
Then, you will find the content inversion result in the results/content-inv/sd3/mallard-fly
.
CUDA_VISIBLE_DEVICES=1 python src/sd3/run_style_inversion_sd3.py \
--style_path examples/style/00033.png \
--output_path results/style-inv \
--is_rf_solver # use rf_solver
Then, you will find the style inversion result in the results/style-inv/sd3/00033
.
CUDA_VISIBLE_DEVICES=1 python src/mask_propagation.py \
--feature_path results/content-inv/sd3/mallard-fly/features/inversion_feature_map_2_block_301_step.pt \
--backbone 'sd3' \
--mask_path 'examples/mask/mallard-fly.png' \
--output_path 'results/masks'
Then, you will find the mask propagation result in the results/masks/sd3/mallard-fly
.
β’ 4.Perform localized video style transfer. [Optional, you can also omit the mask_path to complete the overall style transfer.]
CUDA_VISIBLE_DEVICES=1 python src/animatediff/run_video_style_transfer_animatediff.py \
--content_inv_path results/content-inv/animatediff/mallard-fly/inversion \
--style_inv_path results/style-inv/animatediff/00033/inversion \
--mask_path results/masks/animatediff/mallard-fly \
--output_path results/stylization
Then, you will find the stylization result in the results/stylization/sd3/mallard-fly_00033
.
π€ If you find this code helpful for your research, please cite:
@article{song2024univst,
title={UniVST: A Unified Framework for Training-free Localized Video Style Transfer},
author={Song, Quanjian and Lin, Mingbao and Zhan, Wengyi and Yan, Shuicheng and Cao, Liujuan and Ji, Rongrong},
journal={arXiv preprint arXiv:2410.20084},
year={2024}
}