Stars
Agentic RL on Any Harness at Scale
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
Open-source benchmark for browser AI agents on daily tasks.
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Consistent Autoregressive Video Generation with Long Context
The API to search, scrape, and interact with the web at scale. 🔥
SWE-Next: Scalable Real-World Software Engineering Tasks for Agents
Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks [ICLR 2026]
MCP server for OpenAI's Deep Research APIs, Gemini Deep Research Agent, Allen AI's DR-Tulu, and Hugging Face's Open Deep Research
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent (ACL 2026 Main)
The official code of "VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation" [EMNLP25]
The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]
A version of verl to support diverse tool use [TMLR 2026]
An efficiency tool that provides various functions by enhancing the Caps Lock key into a modifier key.
GPT-Fathom is an open-source and reproducible LLM evaluation suite, benchmarking 10+ leading open-source and closed-source LLMs as well as OpenAI's earlier models on 20+ curated benchmarks under al…
https://liuzeming01.github.io/XDailyDialog/
Code and datasets for EMNLP 2022 paper: Beyond prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations
Inception V3 & Bilinear CNN tensorflow code for CUB-200-2011 Birds Dataset.
The tensorflow CNN & Bilinear CNN codes for Oxford Flowers 17 Dataset.
we want to create a repo to illustrate usage of transformers in chinese