Diffusion Self-Distillation for Zero-Shot Customized Image Generation

CVPR 2025

Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein

This repository represents the official implementation of the paper titled "Diffusion Self-Distillation for Zero-Shot Customized Image Generation".

This repository is still under construction, many updates will be applied in the near future.

It currently supports the subject-preserving generation model, and the relighting model is under further alpha testing.

✅ What works well

Objects / Merchandise / Logos / Try-ons
Illustrations / Comics / Manga / Anime
Generic Character Designs

⚠️ Limitations

Photorealistic face identity: We did not train the model specifically for face identity, as many other dedicated models excel in this area.
Relighting/Structure-preserved generation: The model is under further alpha testing and will be released in the future.

Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific instance in novel contexts, i.e., "identity-preserving generation". This setting, along with many other tasks (e.g., relighting), is a natural fit for image+text-conditional generative models. However, there is insufficient high-quality paired data to train such a model directly. We propose Diffusion Self-Distillation, a method for using a pre-trained text-to-image model to generate its own dataset for text-conditioned image-to-image tasks. We first leverage a text-to-image diffusion model's in-context generation ability to create grids of images and curate a large paired dataset with the help of a Visual-Language Model. We then fine-tune the text-to-image model into a text+image-to-image model using the curated paired dataset. We demonstrate that Diffusion Self-Distillation outperforms existing zero-shot methods and is competitive with per-instance tuning techniques on a wide range of identity-preservation generation tasks, without requiring test-time optimization.

🔌 Community Integrations

ComfyUI

ComfyUI-DSD: An unofficial ComfyUI custom node package that integrates Diffusion Self-Distillation for subject-driven generation in the ComfyUI environment.

🛠️ Setup

📦 Repository

Clone the repository (requires git):

git clone https://github.com/primecai/diffusion-self-distillation.git
cd diffusion-self-distillation

💻 Dependencies

For the environment, run:

pip install -r requirements.txt

You may need to setup Google Gemini API key to use the prompt enhancement feature, which is optional but highly recommended.

📦 Pretrained Models

Download our pretrained models from Hugging Face or Google Drive and unzip. You should have the following files:

transformers
- config.json
- diffusion_pytorch_model.safetensors
pytorch_lora_weights.safetensors

🏃 Inference

To generate subject-preserving images, simply run:

CUDA_VISIBLE_DEVICES=0 python generate.py \
    --model_path /PATH/TO/transformer \                         # Path to the 'transformer' folder
    --lora_path /PATH/TO/pytorch_lora_weights.safetensors \     # Path to the 'pytorch_lora_weights.safetensors' file
    --image_path /PATH/TO/conditioning_image.png \              # Path to the conditioning image
    --text "this character sitting on a chair" \                # Text prompt
    --output_path output.png \                                  # Path to save the output image
    --guidance 3.5 \                                            # Guidance scale
    --i_guidance 1.0 \                                          # True image guidance scale, set to >1.0 if you want to enhance the image conditioning
    --t_guidance 1.0 \                                          # True text guidance scale, set to >1.0 if you want to enhance the text conditioning
    --model_offload \                                           # Enable basic model offloading to CPU to reduce GPU memory usage (recommended, requires ~23.7GB VRAM)
    # --sequential_offload \                                    # Enable more aggressive sequential offloading (saves more memory but much slower, requires < 1GB VRAM)
    # --disable_gemini_prompt \                                 # Disable Gemini prompt enhancement, not recommended unless you have a very detailed prompt

For <24GB GPU memory, consider using the --model_offload or --sequential_offload option.

Training

TBD

Todo

Release the training code.
Release relighting model.
Model quantization to support <24 GB GPU memory.
Release subject-preserving generation model.

🎓 Citation

Please cite our paper:

@inproceedings{cai2024dsd,
    author={Cai, Shengqu and Chan, Eric Ryan and Zhang, Yunzhi and Guibas, Leonidas and Wu, Jiajun and Wetzstein, Gordon.},
    title={Diffusion Self-Distillation for Zero-Shot Customized Image Generation},
    booktitle={CVPR},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
pipeline.py		pipeline.py
recaption.py		recaption.py
requirements.txt		requirements.txt
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

✅ What works well

⚠️ Limitations

🔌 Community Integrations

ComfyUI

🛠️ Setup

📦 Repository

💻 Dependencies

📦 Pretrained Models

🏃 Inference

Training

Todo

🎓 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

primecai/diffusion-self-distillation

Folders and files

Latest commit

History

Repository files navigation

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

✅ What works well

⚠️ Limitations

🔌 Community Integrations

ComfyUI

🛠️ Setup

📦 Repository

💻 Dependencies

📦 Pretrained Models

🏃 Inference

Training

Todo

🎓 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages