Highlights
- Pro
Stars
Our first fully AI generated deep learning system
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Fast and memory-efficient exact attention
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
Perplexity open source garden for inference technology
Building the Virtuous Cycle for AI-driven LLM Systems
Ship correct and fast LLM kernels to PyTorch
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
An extremely fast Python package and project manager, written in Rust.
VS Code extension for syntax highlighting C++/CUDA/HIP code in PyTorch load_inline() strings
RFC document, tooling and other content related to the array API standard
AGENTS.md — a simple, open format for guiding coding agents
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
🎡 Build Python wheels for all the platforms with minimal configuration.
A next generation Python CMake adaptor and Python API for plugins
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
Minimum example for deploying Apache TVM's Relax IR using C++ API
JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training
Distributed Compiler based on Triton for Parallel Systems
A Datacenter Scale Distributed Inference Serving Framework
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashMLA: Efficient Multi-head Latent Attention Kernels
verl: Volcano Engine Reinforcement Learning for LLMs
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation