Stars
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains papers, codes, datasets, evaluations, and analyses.
Code and example data for the paper: Rule Based Rewards for Language Model Safety
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
📰 Must-read papers and blogs on Speculative Decoding ⚡️
A highly optimized LLM inference acceleration engine for Llama and its variants.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
High-speed Large Language Model Serving for Local Deployment
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Instructions on how to use the Realtime API on Microcontrollers and Embedded Platforms
Aidan Bench attempts to measure <big_model_smell> in LLMs.