High performance ML
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Flax is a neural network library for JAX that is designed for flexibility.
Fast and memory-efficient exact attention
Accessible large language models via k-bit quantization for PyTorch.
Code repository for the paper - "Matryoshka Representation Learning"
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Efficient Triton Kernels for LLM Training
Development repository for the Triton language and compiler
A high-throughput and memory-efficient inference and serving engine for LLMs
Large Language Model Text Generation Inference
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
SGLang is a fast serving framework for large language models and vision language models.
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
On-the-fly conversions between Jax and NumPy tensors