Stars
An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
OpenClaw-RL: Train any agent simply by talking
AI agents running research on single-GPU nanochat training automatically
A lightweight inference engine supporting speculative speculative decoding (SSD).
A lightweight, AI-native training framework for large language models. Designed for fast iteration, reproducible experiments, and modular configuration across SFT, RLVR, and evaluation workflows.
A simple, fast and robust program-aware agentic inference system.
FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels
Building the Virtuous Cycle for AI-driven LLM Systems
A rejection-sampling based distribution alignment method for extreme actor-policy mismatch RL Training
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
DFlash: Block Diffusion for Flash Speculative Decoding
A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
Accelerating MoE with IO and Tile-aware Optimizations
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.