Starred repositories
Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas
Evaluate and improve models and agents using environments
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Python Framework to analyse Git repositories
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
Official code for "How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs"
Official JAX implementation of End-to-End Test-Time Training for Long Context
Programming language for literate programming law specification
Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"
CLIPPER: Compression enables long-context synthetic data generation [COLM '25]
An Open-source RL System from ByteDance Seed and Tsinghua AIR
Paper list for Efficient Reasoning.
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Awesome Reasoning LLM Tutorial/Survey/Guide
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Minimal reproduction of DeepSeek R1-Zero
Fully open reproduction of DeepSeek-R1
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
Data and tools for generating and inspecting OLMo pre-training data.
library supporting NLP and CV research on scientific papers
Python tool for converting files and office documents to Markdown.