This repo includes a reference implementation for the autospeculative decoding from Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation
-
spec_ddpm_inf.pyis the simplest entry point to understand the method. It keeps the same flat iterative structure as the vanilla DDPM and maintain a speculation until it is rejected, effectively implementing the ASD with infinite speculation length. -
spec_ddpm_k.pyuses a nested loop structure and speculate k steps ahead in parallel. It can be run on one GPU to get the algorithmic speed up reported in the paper, and it can also be run on multiple GPUs to get the wall-clock speed up in the paper.
conda create -n asd python=3.11.10
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
pip install diffusers==0.31.0 transformers==4.45.2 accelerate==1.0.1 tabulate open_clip_torchmkdir images
# asd with speculation length infinity
python spec_ddpm_inf.py
# with a specific speculation length
python spec_ddpm_k.py --spec_len 3
# with multiple GPUs, this will then distribute the parallel network calls onto different GPUs
python spec_ddpm_k.py --spec_len 3 --num_gpu 3You will need to download the full dataset if you want to evaluate clip score.
mkdir data
cd data
# download image, we dont need image but they make loading dataset easy
wget http://images.cocodataset.org/zips/val2017.zip
unzip val2017.zip
# download captions
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip annotations_trainval2017.zip@inproceedings{
hu2025diffusion,
title={Diffusion Models are Secretly Exchangeable: Parallelizing {DDPM}s via Auto Speculation},
author={Hengyuan Hu and Aniket Das and Dorsa Sadigh and Nima Anari},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=n08niE37ku}
}