Highlights
- Pro
Stars
Machine Learning Engineering Open Book
FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)
Tensors and Dynamic neural networks in Python with strong GPU acceleration
wentao.site / Hugo Template / A template repository for Hugo based blog
A PyTorch native platform for training generative AI models
A framework for efficient model inference with omni-modality models
verl: Volcano Engine Reinforcement Learning for LLMs
TPU inference for vLLM, with unified JAX and PyTorch support.
SkyRL: A Modular Full-stack RL Library for LLMs
Post-training with Tinker
[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning
A high-performance and light-weight router for vLLM large scale deployment
Open-source implementation of AlphaEvolve
Achieve state of the art inference performance with modern accelerators on Kubernetes
A Datacenter Scale Distributed Inference Serving Framework
ArcticInference: vLLM plugin for high-throughput, low-latency inference
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
[ACL 2025 Long Main] Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions
NumPy aware dynamic Python compiler using LLVM
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
A collection of GPT system prompts and various prompt injection/leaking knowledge.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.