Skip to content
View WoosukKwon's full-sized avatar

Highlights

  • Pro

Block or report WoosukKwon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Machine Learning Engineering Open Book

Python 17,622 1,118 Updated Mar 16, 2026

FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)

Python 64 5 Updated Apr 5, 2026

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 98,830 27,413 Updated Apr 6, 2026

NVIDIA Inference Xfer Library (NIXL)

C++ 965 281 Updated Apr 6, 2026

wentao.site / Hugo Template / A template repository for Hugo based blog

55 3 Updated Mar 21, 2026

A PyTorch native platform for training generative AI models

Python 5,210 772 Updated Apr 6, 2026

A framework for efficient model inference with omni-modality models

Python 4,132 700 Updated Apr 6, 2026

Easy, Fast, and Scalable Multimodal AI

Python 121 8 Updated Apr 3, 2026

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,460 3,577 Updated Apr 3, 2026

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 284 147 Updated Apr 6, 2026

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,735 293 Updated Apr 4, 2026

Post-training with Tinker

Python 3,031 367 Updated Apr 6, 2026

[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning

Python 67 6 Updated Oct 31, 2025

A high-performance and light-weight router for vLLM large scale deployment

Rust 177 62 Updated Mar 31, 2026

Common recipes to run vLLM

Jupyter Notebook 571 196 Updated Apr 3, 2026

Open-source implementation of AlphaEvolve

Python 5,863 934 Updated Mar 18, 2026

Nano vLLM

Python 12,705 1,879 Updated Nov 3, 2025

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,916 388 Updated Apr 5, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,491 994 Updated Apr 6, 2026

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 420 56 Updated Mar 28, 2026

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Python 2,016 82 Updated Jun 5, 2025

Democratizing Reinforcement Learning for LLMs

Python 5,382 539 Updated Apr 6, 2026

[ACL 2025 Long Main] Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions

Python 40 8 Updated Apr 21, 2025

NumPy aware dynamic Python compiler using LLVM

Python 10,952 1,243 Updated Apr 3, 2026

[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.

Python 227 31 Updated May 31, 2025

A collection of GPT system prompts and various prompt injection/leaking knowledge.

HTML 10,492 1,461 Updated Mar 19, 2026

FAIR Sequence Modeling Toolkit 2

Python 1,129 138 Updated Apr 6, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,042 652 Updated Apr 6, 2026
Next