Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Offical Implementation for "Recursive Multi-Agent Systems"
Anthropic's original performance take-home, now open for you to try!
Building the Virtuous Cycle for AI-driven LLM Systems
🛰️ A CLI tool for tracking token usage from OpenCode, Claude Code, 🦞OpenClaw (Clawdbot/Moltbot), Pi, Codex, Gemini, Cursor, AmpCode, Factory Droid, Kimi, and more! • 🏅Global Leaderboard + 2D/3D Con…
Reinforcement learning environments for compiler and program optimization tasks
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
A browser-based desktop where AI Agent operates every app through natural language.
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
SWE-chat: Coding Agent Interactions From Real Users in the Wild
Entire CLI hooks into your Git workflow to capture AI agent sessions as you work. Sessions are indexed alongside commits, creating a searchable record of how code was written in your repo.
This is the official codebase for paper: Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment
Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption
AutoGaze automatically removes redundant patches in a video, reducing #tokens in ViT/MLLM by 4x-100x.
AgentOCR is a token-efficient framework that compresses multi-turn agent history by rendering it into images and adopting RL-driven self-compression
Stash — persistent memory layer for AI agents. Episodes, facts, and working context stored in Postgres. MCP server included. Self-hosted, single binary, no cloud required.
OpenClaw plugin — turn Claude Code CLI into a programmable, headless coding engine with plenty of tools, agent teams, and multi-model proxy
Any model. Every tool. Zero limits.
COS-PLAY: Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Game Play
🤗 ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models
Hugging Face's take home challenge for post-training internships, now open for you to try!
This is the official implementation for AgentSPEX: An Agent SPecification and EXecution Language
Harbor is a framework for running agent evaluations and creating and using RL environments.
Human-like memory for AI agents — semantic, episodic & procedural. Experience-driven procedures that learn from failures. Free API, Python & JS SDKs, LangChain, CrewAI & OpenClaw integrations.