Stars
Official Implementation of "Maximum Likelihood Reinforcement Learning (MaxRL)"
[arxiv 2025] TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
A list of Offline to Online RL papers (continually updated)
[ACL 2025] Adaptive Retrieval without Self-Knowledge? Bringing Uncertainty Back Home
Scalable RL solution for advanced reasoning of language models
A Collection of High Quality research papers and open-source projects about LLM-agents
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
上海交通大学 LaTeX 论文模板 | Shanghai Jiao Tong University LaTeX Thesis Template
Agentic-RAG explores advanced Retrieval-Augmented Generation systems enhanced with AI LLM agents.
A repo lists papers related to LLM based agent
💡 Awesome RAG: A resource of Retrieval-Augmented Generation (RAG) for LLMs, focusing on the development of technology.
📐 Jekyll theme for building a personal site, blog, project documentation, or portfolio.
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline RL
MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)
Train transformer language models with reinforcement learning.