Lists (1)
Sort Name ascending (A-Z)
Stars
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
ByteCheckpoint: An Unified Checkpointing Library for LFMs
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Serverless LLM Serving for Everyone.
Optimized primitives for collective multi-GPU communication
An elegant PyTorch deep reinforcement learning library.
Simple, safe way to store and distribute tensors
verl: Volcano Engine Reinforcement Learning for LLMs
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
A flexible and efficient training framework for large-scale alignment tasks
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.
A rule-based tunnel for Android.
A V2Ray client for Android, support Xray core and v2fly core
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
A Data Streaming Library for Efficient Neural Network Training
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A high-throughput and memory-efficient inference and serving engine for LLMs
Zero Bubble Pipeline Parallelism
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A PyTorch native platform for training generative AI models
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.
Training and serving large-scale neural networks with auto parallelization.