Skip to content

azuresky03/distill_wan2.1

Repository files navigation

Distillation and Post-Training for Wan2.1

TL;DR — High-quality video generation in just five denoising steps.
See the results in the Combined Techniques section.

This project focuses on the distillation and post-training processes for the Wan2.1 model, aiming to enhance its efficiency and performance. All data processing, training and inference code along with model weights are open-sourced.

Table of Contents

References

This work is based on:

The work is done at Zulution AI

Data Preprocessing

  • Assuming your video data has been uniformly resized, run scripts/data_preprocess/preprocess.sh to preprocess the features required for model input.
  • An example directory structure can be found under ./data.

Distillation Techniques

The following distillation techniques are utilized in this project:

CFG Distillation

  • Description: Eliminates the need for unconditional generation by fusing guidance information with temporal information using sinusoidal positional encoding and a Multi-Layer Perceptron (MLP).
  • Performance: Achieves 2X acceleration with minimal performance degradation.
  • Training: Run the script scripts/train/sh/distill_cfg_i2v.sh.
  • Comparative Results:
cfg_distill_26_SuperVi_step40_shift3_guide5.mp4
org_26_SuperVi_step40_shift3_guide5.mp4
cfg_distill_38_The.wom_step40_shift5_guide5.mp4
org_The.wom_step40_shift5_guide5.mp4

Step Distillation

  • Training: Run the script scripts/train/sh/distill_step.sh.
  • Mode: Defaults to text-to-video (t2v). For image-to-video (i2v) training, add the --i2v flag.
  • Types:
    • Consistency Distillation:
      • Run with the --distill_mode consistency flag.
    • Half Step Distillation:
      • Performance: Achieves 2X acceleration with minimal performance loss.
      • Description: Consolidates the original two prediction steps into a single step.
      • Training: Run with the --distill_mode half flag.
      • Example Results: halfed, 20 steps
halfed_38_The.wom_step20_shift10_guide5.mp4
halfed_26_SuperVi_step20_shift10_guide5.mp4

orignial cfg-distilled 30 steps vs. further step-distilled 15 steps

cfg_._step30_shift3_guide5.mp4
halfed_._step15_shift9_guide8.mp4

DMD2

  • Description: Distribution Matching Distillation is applied to further optimize the model.
  • Implementation: For detailed implementation, see DMD2_wanx.

Reinforcement Learning (RL)

  • Method: Implemented according to the concept from DRaFT.
  • Reward: HPSReward V2.1, with implementation from Easyanimate.
  • Best Practices: Our experiments found it best to train with LoRA and apply reward on the first frame.
  • Training: Run the script RL/sh/debug.sh.
  • Inference: Set lora_alpha to a smaller value than during training for more natural-looking videos.
  • Example Results:
org_In.the._step7_shift13_guide8.mp4
RL_In.the._step7_shift13_guide8.mp4

Inference

  • Text-to-Video (t2v)

    • Run scripts/inference/inference.sh for text-to-video generation tasks.
    • Sample prompts are provided in test_prompts.txt and moviibench_2.0_prompts.txt.
  • Image-to-Video (i2v)

    • Run scripts/inference/i2v.sh for image-to-video generation tasks.
    • Sample images and corresponding prompts are available in the examples/i2v directory.
  • Configuration

    • Update the transformer_dir variable in the scripts to point to your model checkpoint directory.
    • Adjust LoRA-related settings in generate.py if using LoRA models.

Combined Techniques

By combining distillation (based on DMD2) and RL, we can achieve high-quality video generation in just 5 steps:

A.coffe_step5_shift7_guide8.mp4
A.littl_step5_shift15_guide8.mp4
After.j_step5_shift15_guide8.mp4
The.aft_step5_shift7_guide8.mp4
In.the._step5_shift7_guide8.mp4

Model Weights

All pretrained model weights are available for download:

Environment Setup

  • Dependencies: All required packages are listed in the environment.yml file.
  • FastVideo: This project requires FastVideo. Please install our forked version from:
    git clone https://github.com/azuresky03/FastVideo.git
    cd FastVideo
    pip install -e .

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published