Lists (1)
Sort Name ascending (A-Z)
Stars
An in-the-wild benchmark for AI agents in the OpenClaw Environment.
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Harbor is a framework for running agent evaluations and creating and using RL environments.
한국인을 위한 스킬 모음집 - SRT, KTX, 카카오톡, 한글과컴퓨터, 날씨, 미세먼지, 법령, 주식정보, 조선왕조실록, KBO, K-리그, LCK, 특허 검색, 토스 증권, 맞춤법 검사, 중고차 가격, 쿠팡, 네이버 블로그, 다이소, 올리브영, 택배 송장 조회 등등...
SkillsBench evaluates how well skills work and how effective agents are at using them.
Real-time global intelligence dashboard. AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface
Vero: An Open RL Recipe for General Visual Reasoning
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
Browser automation CLI for AI agents
Mount Hugging Face Buckets and repos as local filesystems. No download, no copy, no waiting.
AI agent toolkit: unified LLM API, agent loop, TUI, coding agent CLI
[NeurIPS 2025] The official implementation of "KL Penalty Control via Perturbation for Direct Preference Optimization"
Zero Bubble Pipeline Parallelism
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
chrome & firefox extension to chat with webpages: local llms
Convert Word documents to beautiful Markdown. Via command line or in your browser.
Training library for Megatron-based models with bidirectional Hugging Face conversion capability
An open-source RAG-based tool for chatting with your documents.
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse environments
bespokelabsai / verifiers
Forked from PrimeIntellect-ai/verifiersVerifiers for LLM Reinforcement Learning
Automatic, unsupervised collection of web agent training data via exploration.