Moving beyond the constraints of autoregressive modeling, UNI-D² brings the benefits of iterative refinement—data efficiency, bidirectional context, and parallel decoding—to the text domain. This library serves as the missing foundation for this rapidly growing space, featuring an easily extendible architecture that supports multiple modular training methods. Our unified pipeline facilitates rapid experimentation, enables comparable evaluation, and standardizes benchmarks across the field.
- Hydra + Lightning entry point (
python -m discrete_diffusion) for experimenting with MDLM, UDLM, BD3LM, FlexMDM, GIDD, SEDD, PartitionMDLM, and CANDI papers. - Sampling helpers that cover absorbing, BD3LM, GIDD, partition, uniform, autoregressive, FlexMDM, and CANDI samplers plus a reusable
scripts/generate_samples.shwrapper. - Scripts that reproduce training recipes for datasets such as LM1B, OWT, and Text8.
pip install -e .If you want to isolate dependencies in a Conda env, create/activate it before running the editable install:
conda create -n uni-d2 python=3.11
conda activate uni-d2
pip install -e .For systems with Flash Attention (CUDA 11.4+), install it after the editable install to boost throughput:
pip install flash-attn --no-build-isolationFor development you can install the extras that power testing and docs:
pip install -e "[dev]"
pip install flash-attn --no-build-isolation # optionalThe pyproject.toml/requirements.txt pair declare the dependencies that power training, evaluation, and sampling.
The library caches datasets and artifacts in ~/.cache/discrete_diffusion by default. You can customize this location in two ways:
-
Environment Variable (Recommended):
export DISCRETE_DIFFUSION_SCRATCH_DIR="/path/to/your/data"
-
Command Line Override:
python -m discrete_diffusion ... scratch_dir=/path/to/your/data
Run the Hydra-powered CLI exported at src/discrete_diffusion/__main__.py with dataset/model/algorithm overrides. A minimal example:
PYTHONPATH=src python -u -m discrete_diffusion \
data=openwebtext-split \
model=small \
algo=mdlm \
loader.batch_size=32 \
trainer.devices=8 \
hydra.run.dir=./outputs/owt/mdlmThe examples/ directory contains dataset-specific recipes (e.g., bd3lm/owt.sh, udlm/text8.sh). Override any Hydra config key by appending key=value pairs on the command line.
Once you have a checkpoint, use the evaluation script:
PYTHONPATH=src python -m discrete_diffusion.evaluations.generate_samples \
checkpoint_path=outputs/owt/bd3lm_block16_debug/checkpoints/last.ckpt \
num_samples=16 \
num_steps=2000configs/: Hydra configuration tree for datasets, models, and learners.examples/: Scripts and notebooks that reproduce experiments and visualizations.src/discrete_diffusion: Entry points, Hydra CLI, and the discrete diffusion training API.outputs/: Default Hydra root for logged checkpoints and metrics.docs/: Supporting documentation for research artifacts.pyproject.toml/requirements.txt: Dependency and tooling metadata.
- MDLM – Sahoo, Subham, et al., Simple and effective masked diffusion language models, NeurIPS 2024.
- UDLM – Schiff, Yair, et al., Simple guidance mechanisms for discrete diffusion models, arXiv 2024.
- FlexMDM – Kim, Jaeyeon, et al., Any-Order Flexible Length Masked Diffusion, arXiv 2025.
- Block Diffusion – Arriola, Marianne, et al., Block diffusion: Interpolating between autoregressive and diffusion language models, arXiv 2025.
- GIDD – von Rütte, Dimitri, et al., Generalized interpolating discrete diffusion, arXiv 2025.
- SEDD – Lou, Aaron, Chenlin Meng, and Stefano Ermon, Discrete diffusion modeling by estimating the ratios of the data distribution, arXiv 2023.
- PartitionMDLM – Deschenaux, Justin, Lan Tran, and Caglar Gulcehre, Partition Generative Modeling: Masked Modeling Without Masks, arXiv 2025.
- CANDI – Pynadath, Patrick, Jiaxin Shi, and Ruqi Zhang, CANDI: Hybrid Discrete-Continuous Diffusion Models, arXiv 2025.
Our goal is to maintain this repository as the unified starting point for future research in discrete diffusion for text, keeping it ever-growing and relevant as the field progresses. As such, it will effectively always be a work in progress. We welcome any contributions to help it evolve—ranging from full paper implementations with benchmarking to recommendations for features to be added.
We are grateful to the authors of the respective papers for open-sourcing their codebases, which served as a foundation for this library.
If you use UNI-D² in your research, please cite it using the following BibTeX entry:
@software{nadimpalli2025unid2,
author = {Nadimpalli, Kalyan Varma and Pauline, Vincent and Kapl, Ferdinand and Karimi-Mamaghan, Amir Mohammad and Tong, Alexander and Dittadi, Andrea and Bauer, Stefan},
title = {{UNI-D²: A Unified Codebase for Discrete Diffusion Language Models}},
version = {0.1.0},
url = {https://github.com/nkalyanv99/UNI-D2},
year = {2025}
}MIT License
Copyright (c) 2025 Kalyan Varma Nadimpalli
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.