Stars
Docker configuration for running VLLM on dual DGX Sparks
Agents, and RL environment, for optimizing GPU kernels on AMD ROCm using LLM agents. Benchmarks LLM serving workloads end-to-end, profiles bottleneck kernels, optimizes them via Claude Code or Code…
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
A high-throughput and memory-efficient inference and serving engine for LLMs
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
Standards for all 50 states, organizations, schools, & districts. Sponsored by Common Curriculum
The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Build compute kernels and load them from the Hub.
A high-performance acceleration library dedicated to large-scale model training on AMD GPUs
A Datacenter Scale Distributed Inference Serving Framework
Development repository for the Triton language and compiler
Applied AI experiments and examples for PyTorch
A PyTorch native platform for training generative AI models
Visualize ONNX models with model-explorer
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Automatically split your PyTorch models on multiple GPUs for training & inference
A tool for parsing, editing, optimizing, and profiling ONNX models.
A Multi-Paradigm React State Management Library
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
🐥 A code review bot powered by ChatGPT