Stars
100M tokens. Infinite compute. Lowest val loss wins.
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
Long Context Pre-Training with Lighthouse Attention
The agent that grows with you
Cuda kernels for leveraging LLM sparsity to improve throughput and decrease the memory requirements during inference and training.
Muon is an optimizer for hidden layers in neural networks
SmoothE: Differentiable E-Graph Extraction (ASPLOS'25 Best Paper)
TokenSpeed is a speed-of-light LLM inference engine.
cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.
Node0: A collaborative event powered by Protocol Learning, our decentralized approach to AI development
Benchmark and deploy optimized LLM models on GPU servers with vLLM or SGLang. Chose from a list of optimized recipes for popular models or create your own with custom configurations. Run benchmarks…
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
A beautiful, simple, clean, and responsive Jekyll theme for academics
just-every / code
Forked from openai/codexEvery Code - push frontier AI to it limits. A fork of the Codex CLI with validation, automation, browser integration, multi-agents, theming, and much more. Orchestrate agents from OpenAI, Claude, G…
FlashKDA: high-performance Kimi Delta Attention kernels
how few training tokens can you use to reach a target validation loss?
Accelerating MoE with IO and Tile-aware Optimizations
A dedicated effort to make an optimized, bleeding edge vLLM image using Docker to support DGX comprehensively
FlashInfer: Kernel Library for LLM Serving
Experiment on replacing the Scaled Dot-Product Attention in Transformers for a distance-based metric: the Radial Basis Function (RBF) kernel.
Trains small LMs. Designed for training on SimpleStories
Dataset Generation Code for SimpleStories
🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.