Stars
[NeurIPS'25] KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
🐝 When Agent Meets RL and Prompt Optimization the First Time
[Survey] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
This is the official Python version of Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.
[Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges
[ICML 2025] Official Repo for Stability-guided Adaptive Diffusion Acceleration. 🚀🌙Accelerating off-the-shelf diffusion model with a unified stability criterion.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Solve Visual Understanding with Reinforced VLMs
This is the official Python version of Angles Don’t Lie: Unlocking Training-Efficient RL Through the Model’s Own Signals.
This is Official PyTorch implementation for 2025-ICML-CoreMatching: Co-adaptive Sparse Inference Framework for Comprehensive Acceleration of Vision Language Model
[NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
Democratizing Reinforcement Learning for LLMs
An LLM agent that conducts deep research (local and web) on any given topic and generates a long report with citations.
Witness the aha moment of VLM with less than $3.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Fully open reproduction of DeepSeek-R1
Advanced implementation of DeepSeek-R1 featuring Group Relative Policy Optimization (GRPO) for mathematical reasoning AI. Integrates safe distillation, modular reward systems, and efficient LoRA fi…
Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation.
Repository for latent Bayesian Kernel Inference
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.