This project uses uv for fast and reliable dependency management.
-
Install
uv:If you don't have
uvinstalled, follow the official instructions. For most systems:curl -LsSf https://astral.sh/uv/install.sh | sh -
Create a Virtual Environment:
From the project's root directory, create and activate a virtual environment:
# Create the virtual environment in a .venv folder uv venv --seed # Activate it (on Linux/macOS) source .venv/bin/activate # On Windows (Powershell): # .venv\Scripts\Activate.ps1
.venv\Scripts\deactivate
-
Dependency solve that thang!:
uv sync --extra cuda -
run it!:
thanks to valued contributor kalomaze for moduleifying the source
python -m src.causal_sweep_testvalidate package structure
-
why low resolution checkerboards?
technically speaking this project is testing 2 hard things:
1: generating data without supervisory signals like a perceptual or GAN loss or conditioning models
2: at multiple different image scales concurrently across pretraining, which breaks many feature extraction and visual learning backbones
the checkerboard patterns are the simplest synthetic data i could imagine in 1 minute which provides examples of most of the values the R,G,B colorspace can take, where there are both local features (edges, contiguous shapes) and global features (color pairs, parallel lines??) which don't need a supervisory model to measure and interpret.
you can, and should, calculate the frechet distance of the color histograms found in your output samples vs the color histograms in your generated training samples! generalization of the frechet distance over parameter-free metrics to spatial correlation and parallelism/contiguity of color regions is left as an exercise to whoever forks the project next.
-
wait so does all of the stuff in this repo work?
hell yeah it does!
even the weird torch kernel compilation interceptor server-client pair. i dont recommend using it (easy 2 comment out) but i don't recommend not using it either.
- actually maybe the vllm at home stuff doesn't work
-
why am i reading notes about 'vllm at home'
to find out more about 'vllm at home' please refer the poster behind this thread, @sameqcu or sqcu.dev for contact, to a technical recruiter or better yet something like a residency at one of the labs offering one right now.
supporting materials on the kv-caching and paged attention rempapping are attached in
notes/for convenience. try running them and the project's source code through your favorite language models for review! -
is a diffusion model and language model a diffusion language model?
this is almost the case! the main missing functionality to get 'diffusion language model' behavior out of this code is... well. hm. you would input some 'image latents' as 1 dimensional spans of the length of your token continuation, and you would use the same noise schedule rules and training objective as the image patch latents, really. the result would be a 'span' of embeddings or hidden states which should have sensible logit values when queried through a textLM head.
surprisingly, this methodology would actually use the same kind of learned
token -> embeddinglayer used by a logit head! the overall training flow would be:train: tokenize text -token_embed_layer-> text embeddings -forwards_noise_schedule-> noisy_text_embeddings -model.forwards(z_t)-> nte_vpred -diffusion_loss-> backprop -> optim_step -> repeat eval: tokenize text -token_embed_layer-> text embeddings -forwards_noise_schedule-> noisy_text_embeddings -model.forwards(z_t)-> nte_vpred -reverse_noise_schedule-> denoised_text_embeddings -text_lm_head-> logits of shape [Length, Vocabulary] -sample_logits-> diffusion denoised text.further details are attached in
notes/. try running them and the project's source code through your favorite language models for review!