Distillation and Post-Training for Wan2.1

TL;DR — High-quality video generation in just five denoising steps.
See the results in the Combined Techniques section.

This project focuses on the distillation and post-training processes for the Wan2.1 model, aiming to enhance its efficiency and performance. All data processing, training and inference code along with model weights are open-sourced.

References

This work is based on:

The work is done at Zulution AI

Data Preprocessing

Assuming your video data has been uniformly resized, run scripts/data_preprocess/preprocess.sh to preprocess the features required for model input.
An example directory structure can be found under ./data.

Distillation Techniques

The following distillation techniques are utilized in this project:

CFG Distillation

Description: Eliminates the need for unconditional generation by fusing guidance information with temporal information using sinusoidal positional encoding and a Multi-Layer Perceptron (MLP).
Performance: Achieves 2X acceleration with minimal performance degradation.
Training: Run the script scripts/train/sh/distill_cfg_i2v.sh.
Comparative Results:

cfg_distill_26_SuperVi_step40_shift3_guide5.mp4	org_26_SuperVi_step40_shift3_guide5.mp4
cfg_distill_38_The.wom_step40_shift5_guide5.mp4	org_The.wom_step40_shift5_guide5.mp4

Step Distillation

Training: Run the script scripts/train/sh/distill_step.sh.
Mode: Defaults to text-to-video (t2v). For image-to-video (i2v) training, add the --i2v flag.
Types:
- Consistency Distillation:
  - Run with the --distill_mode consistency flag.
- Half Step Distillation:
  - Performance: Achieves 2X acceleration with minimal performance loss.
  - Description: Consolidates the original two prediction steps into a single step.
  - Training: Run with the --distill_mode half flag.
  - Example Results: halfed, 20 steps

halfed_38_The.wom_step20_shift10_guide5.mp4

halfed_26_SuperVi_step20_shift10_guide5.mp4

orignial cfg-distilled 30 steps vs. further step-distilled 15 steps

cfg_._step30_shift3_guide5.mp4

halfed_._step15_shift9_guide8.mp4

DMD2

Description: Distribution Matching Distillation is applied to further optimize the model.
Implementation: For detailed implementation, see DMD2_wanx.

Reinforcement Learning (RL)

Method: Implemented according to the concept from DRaFT.
Reward: HPSReward V2.1, with implementation from Easyanimate.
Best Practices: Our experiments found it best to train with LoRA and apply reward on the first frame.
Training: Run the script RL/sh/debug.sh.
Inference: Set lora_alpha to a smaller value than during training for more natural-looking videos.
Example Results:

org_In.the._step7_shift13_guide8.mp4

RL_In.the._step7_shift13_guide8.mp4

Inference

Text-to-Video (t2v)
- Run scripts/inference/inference.sh for text-to-video generation tasks.
- Sample prompts are provided in test_prompts.txt and moviibench_2.0_prompts.txt.
Image-to-Video (i2v)
- Run scripts/inference/i2v.sh for image-to-video generation tasks.
- Sample images and corresponding prompts are available in the examples/i2v directory.
Configuration
- Update the transformer_dir variable in the scripts to point to your model checkpoint directory.
- Adjust LoRA-related settings in generate.py if using LoRA models.

Combined Techniques

By combining distillation (based on DMD2) and RL, we can achieve high-quality video generation in just 5 steps:

A.coffe_step5_shift7_guide8.mp4

A.littl_step5_shift15_guide8.mp4

After.j_step5_shift15_guide8.mp4

The.aft_step5_shift7_guide8.mp4

In.the._step5_shift7_guide8.mp4

Model Weights

All pretrained model weights are available for download:

Baidu Pan: https://pan.baidu.com/s/1wUCrRY9Fu8GdDMTZXdc7tw?pwd=m9kn
Access Code: m9kn

Environment Setup

Dependencies: All required packages are listed in the environment.yml file.

FastVideo: This project requires FastVideo. Please install our forked version from:

git clone https://github.com/azuresky03/FastVideo.git
cd FastVideo
pip install -e .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distillation and Post-Training for Wan2.1

Table of Contents

References

Data Preprocessing