CleanDiffusion

CleanDiffusion 是一个面向教学学习的图像生成研究代码库，目标是用 CleanRL 的方式组织 diffusion / flow / consistency 算法：每个端到端算法尽量就是一个 .py 文件。

本项目不是企业级框架，也不追求复杂抽象。读者应该能够打开一个算法文件，从上到下看到：

配置
数据加载
模型
objective
sampler
CFG
训练循环
采样入口
checkpoint
TensorBoard / metrics.jsonl

代码组织还必须服务阅读连贯性。单文件不是简单堆代码，而是让算法主线清楚可见：

文件顶部说明算法目标、数学约定、运行命令。
配置和少量工具函数先出现。
数据和模型保持简洁，不抢算法主线。
objective / sampler / CFG 是文件中最突出的部分。
训练循环按真实端到端流程书写，让读者能顺着数据流读完。
checkpoint、日志、图片保存等工程辅助放在不打断算法理解的位置。

核心原则

单文件端到端实现优先。
算法相关代码和端到端流程必须突出。
代码阅读顺序必须连贯，服务教学理解。
教学可读性优先于代码复用率。
允许少量重复，避免深层抽象。
不引入复杂 registry。
不写过深 inheritance。
same model / same dataset / same logging。
对比实验时只改变 objective 和 sampler。

计划支持

DDPM
DDIM
CFG
Flow Matching
Rectified Flow
Teacher-free Consistency Training
Consistency Flow Matching
DMD-lite
DMD2-style experiments

数学约定

全项目统一：

x0 = data/image
x1 = Gaussian noise
t=0 -> noise
t=1 -> data
x_t = (1 - t) * x1 + t * x0
v_target = x0 - x1

禁止在不同算法文件中混用 x0=noise、x1=image。

当前阶段

阶段 0：实现 clean_diffusion/ddpm.py。

ddpm.py 必须包含：

DDPM noise prediction objective
DDPM / DDIM sampler
CFG
训练
采样
保存
resume
TensorBoard
metrics.jsonl
sample images
checkpoint metadata

完成阶段 0 前，不进入 FM / CFM / DMD。

当前进展

clean_diffusion/ddpm.py：已完成 DDPM + DDIM + CFG 单文件 baseline，并通过 train / resume / sample smoke test。
clean_diffusion/fm.py：已完成 Flow Matching + Euler + CFG 单文件 baseline，并通过 train / resume / sample smoke test。
clean_diffusion/rectified_flow.py：已完成 Rectified Flow + RF Euler + CFG 单文件 baseline，并通过 train / resume / sample smoke test。
clean_diffusion/consistency.py：已完成 teacher-free consistency + one-step CFG 单文件 baseline，并通过 train / resume / sample smoke test。
clean_diffusion/cfm.py：已完成 Consistency Flow Matching + CFM Euler + CFG 单文件 baseline，并通过 train / resume / sample smoke test。
clean_diffusion/dmd_lite.py：已完成 DMD-lite one-step student distillation 单文件 baseline，并通过 train / resume / sample smoke test。
clean_diffusion/dmd2.py：已完成 DMD2-style teaching experiment 单文件 baseline，并通过 train / resume / sample smoke test。

系统学习入口

从系统学习 diffusion 的角度，建议先读：

docs/00_diffusion_learning_map.md：DDPM / FM / RF / Consistency / CFM / DMD 的学习顺序和概念关系。
docs/01_experiment_matrix.md：最小 smoke test、采样器对比、CFG 对比和失败模式记录模板。
docs/02_ddpm_parameterizations.md：epsilon / x0 / v / score 四种 DDPM 参数化的公式关系。
docs/03_sampler_comparison.md：DDPM / DDIM / Euler / Heun / DPM-Solver-lite 的教学对比。
docs/04_noise_and_time_schedules.md：DDPM noise schedule 与 flow time schedule 的区别。
docs/05_model_conditioning_notes.md：UNet、time embedding、text conditioning、CFG、attention 的阅读笔记。
docs/06_latent_diffusion_teaching_plan.md：latent diffusion 教学版本的实现计划。
docs/07_evaluation_notes.md：FID、CLIP score、precision/recall 和人工检查表。
docs/08_failure_case_log.md：失败样例记录模板和常见定位清单。
clean_diffusion/toy_fm_2d.py：二维 Flow Matching toy，用可视化方式理解 x_t、velocity 和 Euler sampler。
clean_diffusion/toy_ddpm_parameterizations.py：DDPM 参数化转换的可执行公式校验。

2D toy 运行示例：

python clean_diffusion/toy_fm_2d.py train \
  --run-name toy_fm_2d_smoke \
  --max-steps 200 \
  --batch-size 512

python clean_diffusion/toy_fm_2d.py sample \
  --ckpt outputs/toy_fm_2d_smoke/last.pt \
  --steps 32

DDPM 参数化校验：

python clean_diffusion/toy_ddpm_parameterizations.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.claude		.claude
.codex		.codex
MEMORY		MEMORY
clean_diffusion		clean_diffusion
docs		docs
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
SESSION.md		SESSION.md
TODO.md		TODO.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CleanDiffusion

核心原则

推荐文件结构

计划支持

数学约定

当前阶段

当前进展

系统学习入口

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CleanDiffusion

核心原则

推荐文件结构

计划支持

数学约定

当前阶段

当前进展

系统学习入口

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages