CVPR 2025
Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein
This repository represents the official implementation of the paper titled "Diffusion Self-Distillation for Zero-Shot Customized Image Generation".
This repository is still under construction, many updates will be applied in the near future.
It currently supports the subject-preserving generation model, and the relighting model is under further alpha testing.
- Objects / Merchandise / Logos / Try-ons
- Illustrations / Comics / Manga / Anime
- Generic Character Designs
- Photorealistic face identity: We did not train the model specifically for face identity, as many other dedicated models excel in this area.
- Relighting/Structure-preserved generation: The model is under further alpha testing and will be released in the future.
Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific instance in novel contexts, i.e., "identity-preserving generation". This setting, along with many other tasks (e.g., relighting), is a natural fit for image+text-conditional generative models. However, there is insufficient high-quality paired data to train such a model directly. We propose Diffusion Self-Distillation, a method for using a pre-trained text-to-image model to generate its own dataset for text-conditioned image-to-image tasks. We first leverage a text-to-image diffusion model's in-context generation ability to create grids of images and curate a large paired dataset with the help of a Visual-Language Model. We then fine-tune the text-to-image model into a text+image-to-image model using the curated paired dataset. We demonstrate that Diffusion Self-Distillation outperforms existing zero-shot methods and is competitive with per-instance tuning techniques on a wide range of identity-preservation generation tasks, without requiring test-time optimization.
- ComfyUI-DSD: An unofficial ComfyUI custom node package that integrates Diffusion Self-Distillation for subject-driven generation in the ComfyUI environment.
Clone the repository (requires git):
git clone https://github.com/primecai/diffusion-self-distillation.git
cd diffusion-self-distillationFor the environment, run:
pip install -r requirements.txt
You may need to setup Google Gemini API key to use the prompt enhancement feature, which is optional but highly recommended.
Download our pretrained models from Hugging Face or Google Drive and unzip. You should have the following files:
transformersconfig.jsondiffusion_pytorch_model.safetensors
pytorch_lora_weights.safetensors
To generate subject-preserving images, simply run:
CUDA_VISIBLE_DEVICES=0 python generate.py \
--model_path /PATH/TO/transformer \ # Path to the 'transformer' folder
--lora_path /PATH/TO/pytorch_lora_weights.safetensors \ # Path to the 'pytorch_lora_weights.safetensors' file
--image_path /PATH/TO/conditioning_image.png \ # Path to the conditioning image
--text "this character sitting on a chair" \ # Text prompt
--output_path output.png \ # Path to save the output image
--guidance 3.5 \ # Guidance scale
--i_guidance 1.0 \ # True image guidance scale, set to >1.0 if you want to enhance the image conditioning
--t_guidance 1.0 \ # True text guidance scale, set to >1.0 if you want to enhance the text conditioning
--model_offload \ # Enable basic model offloading to CPU to reduce GPU memory usage (recommended, requires ~23.7GB VRAM)
# --sequential_offload \ # Enable more aggressive sequential offloading (saves more memory but much slower, requires < 1GB VRAM)
# --disable_gemini_prompt \ # Disable Gemini prompt enhancement, not recommended unless you have a very detailed promptFor <24GB GPU memory, consider using the --model_offload or --sequential_offload option.
TBD
- Release the training code.
- Release relighting model.
- Model quantization to support <24 GB GPU memory.
- Release subject-preserving generation model.
Please cite our paper:
@inproceedings{cai2024dsd,
author={Cai, Shengqu and Chan, Eric Ryan and Zhang, Yunzhi and Guibas, Leonidas and Wu, Jiajun and Wetzstein, Gordon.},
title={Diffusion Self-Distillation for Zero-Shot Customized Image Generation},
booktitle={CVPR},
year={2025}
}