GitHub

Setup and Installation

This project uses uv for fast and reliable dependency management.

Install uv:

If you don't have uv installed, follow the official instructions. For most systems:
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Create a Virtual Environment:

From the project's root directory, create and activate a virtual environment:

# Create the virtual environment in a .venv folder
uv venv --seed

# Activate it (on Linux/macOS)
source .venv/bin/activate
# On Windows (Powershell):
# .venv\Scripts\Activate.ps1

.venv\Scripts\deactivate

Dependency solve that thang!:

uv sync --extra cuda
run it!:

thanks to valued contributor kalomaze for moduleifying the source

python -m src.causal_sweep_test

validate package structure
why low resolution checkerboards?

technically speaking this project is testing 2 hard things:

1: generating data without supervisory signals like a perceptual or GAN loss or conditioning models

2: at multiple different image scales concurrently across pretraining, which breaks many feature extraction and visual learning backbones

the checkerboard patterns are the simplest synthetic data i could imagine in 1 minute which provides examples of most of the values the R,G,B colorspace can take, where there are both local features (edges, contiguous shapes) and global features (color pairs, parallel lines??) which don't need a supervisory model to measure and interpret.

you can, and should, calculate the frechet distance of the color histograms found in your output samples vs the color histograms in your generated training samples! generalization of the frechet distance over parameter-free metrics to spatial correlation and parallelism/contiguity of color regions is left as an exercise to whoever forks the project next.
wait so does all of the stuff in this repo work?

hell yeah it does!

even the weird torch kernel compilation interceptor server-client pair. i dont recommend using it (easy 2 comment out) but i don't recommend not using it either.
- actually maybe the vllm at home stuff doesn't work
why am i reading notes about 'vllm at home'

to find out more about 'vllm at home' please refer the poster behind this thread, @sameqcu or sqcu.dev for contact, to a technical recruiter or better yet something like a residency at one of the labs offering one right now.

supporting materials on the kv-caching and paged attention rempapping are attached in notes/ for convenience. try running them and the project's source code through your favorite language models for review!
is a diffusion model and language model a diffusion language model?

this is almost the case! the main missing functionality to get 'diffusion language model' behavior out of this code is... well. hm. you would input some 'image latents' as 1 dimensional spans of the length of your token continuation, and you would use the same noise schedule rules and training objective as the image patch latents, really. the result would be a 'span' of embeddings or hidden states which should have sensible logit values when queried through a textLM head.

surprisingly, this methodology would actually use the same kind of learned token -> embedding layer used by a logit head! the overall training flow would be:
```
train:
tokenize text -token_embed_layer->
text embeddings -forwards_noise_schedule->
noisy_text_embeddings -model.forwards(z_t)->
nte_vpred -diffusion_loss-> backprop -> optim_step -> repeat

eval:
tokenize text -token_embed_layer->
text embeddings -forwards_noise_schedule->
noisy_text_embeddings -model.forwards(z_t)->
nte_vpred -reverse_noise_schedule-> 
denoised_text_embeddings -text_lm_head->
logits of shape [Length, Vocabulary] -sample_logits->
diffusion denoised text.
```
further details are attached in notes/. try running them and the project's source code through your favorite language models for review!

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.inductor_cache		.inductor_cache
__pycache__		__pycache__
configs		configs
experiments_mix		experiments_mix
notes		notes
oldscripts		oldscripts
src		src
.gitignore		.gitignore
README.MD		README.MD
batch_viz.png		batch_viz.png
bench_kvc_vs_zc.py		bench_kvc_vs_zc.py
main.py		main.py
proof_scenario_1.png		proof_scenario_1.png
proof_scenario_2.png		proof_scenario_2.png
proof_scenario_3.png		proof_scenario_3.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Setup and Installation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

SQCU/logsnrcat

Folders and files

Latest commit

History

Repository files navigation

Setup and Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages