Highlights
- Pro
Stars
MoE training for Me and You and maybe other people
Latent Collaboration in Multi-Agent Systems
A calm, CLI-native way to semantically grep everything, like code, images, pdfs and more.
[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
MAGI-1: Autoregressive Video Generation at Scale
A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.
Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch
🫀 Lovely Grad - tiny tensors need some love
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Jupyter notebooks and other materials developed for the Columbia course APMA 4300
A Python Library for Learning Non-Euclidean Representations
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Train an LLM to generate cracked Manim animations for mathematical concepts.
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
Kernels Written for 100 days of CUDA Challenge
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Master programming by recreating your favorite technologies from scratch.
Fully open reproduction of DeepSeek-R1
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO