Stars
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing,…
TokenSpeed is a speed-of-light LLM inference engine.
A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch
🚀 Sliding Window Attention Training for Efficient Large Language Models
Code snippets and reproductions from JustAByte
Accelerating MoE with IO and Tile-aware Optimizations
Code for the papers: “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling” and “Adaptive Block-Scaled Data Types”
torchcomms: a modern PyTorch communications API
CyPari is a Python3 extension module for Windows, macOS and linux. The user interface, and most of the underlying code, is the same for CyPari as for Sage's cypari2 module, but CyPari is completely…
Data and tools for generating and inspecting OLMo pre-training data.
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
An open-source efficient deep learning framework/compiler, written in python.
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Entropy Based Sampling and Parallel CoT Decoding
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
💫 Beautiful spinners for terminal, IPython and Jupyter
A Data Streaming Library for Efficient Neural Network Training
nsync is a C library that exports various synchronization primitives, such as mutexes
A PyTorch native platform for training generative AI models
Track & Visualisation tool for numerics debugging
FlagGems is an operator library for large language models implemented in the Triton Language.
Efficient Triton Kernels for LLM Training
Tile primitives for speedy kernels