-
CMU
- Pennsylvania, USA
Stars
Open-source, community-driven agent harness
A.S.E (AICGSecEval) is a repository-level AI-generated code security evaluation benchmark developed by Tencent Wukong Code Security Team.
TeleMem is a high-performance drop-in replacement for Mem0, featuring semantic deduplication, long-term dialogue memory, and multimodal video reasoning.
Inference and training library for high-quality TTS models.
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
This repo includes Claude prompt curation to use Claude better.
(ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.01935
Understanding R1-Zero-Like Training: A Critical Perspective
Official Repo for Open-Reasoner-Zero
This is the official implementation of Multi-Agent PPO (MAPPO).
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
[ICASSP 2026] Agent4Debate is a dynamic multi-agent framework that leverages LLMs to achieve human-level performance in competitive debate by dynamically coordinating specialized agents to mitigate…
Scalable toolkit for efficient model reinforcement
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Recipes to scale inference-time compute of open models
Based on "long-form-factuality" a python based processor to easily fact check anything.
A bibliography and survey of the papers surrounding o1
Open source audio annotation tool for humans
A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
A framework for few-shot evaluation of language models.
Awesome-LLM-Prompt-Optimization: a curated list of advanced prompt optimization and tuning methods in Large Language Models
A collection of recent papers on building autonomous agent. Two topics included: RL-based / LLM-based agents.
Reference implementation for DPO (Direct Preference Optimization)
A library for advanced large language model reasoning