Stars
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 17+ clouds, or on-prem).
Accessible large language models via k-bit quantization for PyTorch.
Utilities intended for use with Llama models.
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Efficient Triton Kernels for LLM Training
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
A PyTorch native platform for training generative AI models
🚀 Efficient implementations of state-of-the-art linear attention models
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Open source process design kit for usage with SkyWater Technology Foundry's 130nm node.
Sparsity-aware deep learning inference runtime for CPUs
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
PyTorch native quantization and sparsity for training and inference
Minimalistic large language model 3D-parallelism training
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
dstack is an open-source control plane for running development, training, and inference jobs on GPUs—across hyperscalers, neoclouds, or on-prem.
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Machine learning on FPGAs using HLS
Implementing DeepSeek R1's GRPO algorithm from scratch
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Modular hardware build system