Lists (1)
Sort Name ascending (A-Z)
Stars
[Experimental] Miles-diffusion is an post-training framework for large-scale diffusion model training and production workloads, forked from and co-evolving with miles.
Browse, search, analyze and respond to your AI chat history. Local and private by design.
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)
FlashInfer: Kernel Library for LLM Serving
Debug print operator for cudagraph debugging
Periodically (e.g. 1ms or shorter) dump which thread is holding the GIL lock in Python
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Distributed Compiler based on Triton for Parallel Systems
ByteCheckpoint: An Unified Checkpointing Library for LFMs
SGLang is a high-performance serving framework for large language models and multimodal models.
DeepEP: an efficient expert-parallel communication library
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
Development repository for the Triton language and compiler
A curated list for Efficient Large Language Models
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
Repository for benchmarking graph neural networks (JMLR 2023)
Benchmark datasets, data loaders, and evaluators for graph machine learning
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Running large language models on a single GPU for throughput-oriented scenarios.
SparseTIR: Sparse Tensor Compiler for Deep Learning