Highlights
- Pro
Stars
Official implementation of ProtoCycle: Reflective Tool-Augmented Planning for Text-Guided Protein Design.
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
A benchmark for LLMs on complicated tasks in the terminal
A diffusion-based framework for document OCR that replaces autoregressive decoding with block-level parallel diffusion decoding.
🦞 Just talk to your agent — it learns and EVOLVES 🧬.
GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.
Reinforcement Learning via Self-Distillation (SDPO)
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
OpenClaw-RL: Train any agent simply by talking
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
SWE-bench: Can Language Models Resolve Real-world Github Issues?
slime is an LLM post-training framework for RL Scaling.
DART-GUI: Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Pioneering Automated GUI Interaction with Native Agents
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
Wan: Open and Advanced Large-Scale Video Generative Models
A beautiful, simple, clean, and responsive Jekyll theme for academics
[ICML 2026 Spotlight] Latent Collaboration in Multi-Agent Systems
[ICLR 2026 Oral] Generative Universal Verifier as Multimodal Meta-Reasoner
Cambrian-S: Towards Spatial Supersensing in Video
Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework