Starred repositories
"🐈 nanobot: The Ultra-Lightweight OpenClaw"
🦞+🔬: NanoResearch: The Autonomous AI Research Assistant
We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference…
OpenClaw-RL: Train any agent simply by talking
Can AI agents predict whether they will succeed at a task?
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
Train transformer language models with reinforcement learning.
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Reinforcement Learning via Self-Distillation (SDPO)
Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi et al.
Training Recipes for Agentic Reinforcement Learning in LLMs: A Survey
📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程
Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"
Curated systems, benchmarks, and papers etc. on memory for LLMs/MLLMs --- long-term context, retrieval, and reasoning.
MiroThinker is a deep research agent optimized for complex research and prediction tasks. Our latest models, MiroThinker-1.7 and MiroThinker-H1, achieve 74.0 and 88.2 on the BrowseComp, respectively.
Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.
We introduce BabyVision, a benchmark revealing the infancy of AI vision.
Salesforce Enterprise Deep Research
AgentEvolver: Towards Efficient Self-Evolving Agent System
This is AI implementation (not official) of the DreamGym framework from the paper "Scaling Agent Learning via Experience Synthesis" (arXiv:2511.03773).
[WWW 2026] 🛠️ DeepAgent: A General Reasoning Agent with Scalable Toolsets
Learning on the Job: An Experience-Driven, Self-Evolving Agent for Long-Horizon Tasks
A live stream development of RL tunning for LLM agents
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance