Stars
PyTorch code and models for VJEPA2 self-supervised learning from video.
PyTorch code and models for V-JEPA self-supervised learning from video.
Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption
Ο-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
Harbor is a framework for running agent evaluations and creating and using RL environments.
SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale
Lossless Claw β LCM (Lossless Context Management) plugin for OpenClaw
PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with π¦ by the humans at https://kilo.ai
An in-the-wild benchmark for AI agents in the OpenClaw Environment.
Self-hosted, open-source agent skill registry for enterprises. Publish & version skill packages, govern with RBAC and audit logs, deploy on-premise with Docker or Kubernetes.
A lightweight, unofficial implementation of Meta-Harness (arXiv:2603.28052). Official repo: https://github.com/stanford-iris-lab/meta-harness-tbench2-artifact
Meta-Harness: 76.4% on Terminal-Bench 2.0 (Claude Opus 4.6)
[ACL 2026] RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents
The agent that grows with you
OpenClaw-RL: Train any agent simply by talking
The agent-native LLM router for OpenClaw. 41+ models, <1ms routing, USDC payments on Base & Solana via x402.
"RAG-Anything: All-in-One RAG Framework"
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
[CVPR 2026] Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation
Interpretable Causal Diffusion Language Models
Your own personal AI assistant. Any OS. Any Platform. The lobster way. π¦
A compilation of the best multi-agent papers
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
Supercharge Your LLM Application Evaluations π
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
[ACL2026 Main] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts