Stars
Run Slurm on Kubernetes. A Slinky project.
Tenstorrent Topology (TT-Topology) is a command line utility used to flash multiple NB cards on a system to use specific eth routing configurations.
Declarative RKE2 Kubernetes cluster bootstrap and lifecycle management with AMD GPU and ROCm support
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
ScalarLM - a unified training and inference stack
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!
Achieve state of the art inference performance with modern accelerators on Kubernetes
QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
Machine Learning Engineering Open Book
The future home for CnC Tests and Framework Libaries
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
A PyTorch native platform for training generative AI models
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A high-throughput and memory-efficient inference and serving engine for LLMs
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
A BUDE virtual-screening benchmark, in many programming models
A benchmark suite to evaluate CPU and GPU communication efficiency of MPI using different communication patterns