Stars
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
Official implementation for paper "How Far Are We from Genuinely Useful Deep Research Agents?"
Processed / Cleaned Data for Paper Copilot
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
Python version of the Playwright testing and automation library.
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
Tongyi Deep Research, the Leading Open-source Deep Research Agent
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Deep Research Agent CognitiveKernel-Pro from Tencent AI Lab. Paper: https://arxiv.org/pdf/2508.00414
A simple yet powerful agent framework that delivers with open-source models
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
NLP2CT / RepreGuard
Forked from Chen-X666/RepreGuard[TACL 2025] RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
[EMNLP 2025] CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Public quant internship repository, maintained by NUFT but available for everyone.
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
[Preprint 2025] Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
A Survey of Reinforcement Learning for Large Reasoning Models