- Beijing
-
18:11
(UTC +08:00)
Stars
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are commi…
[ICML 2026] What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-Zoom
Code for paper OpenWebRL: Online Multi-Turn Reinforcement Learning for Visual Web Agents
InfoSFT is a modified supervised fine-tuning algorithm that generalizes better and forgets less.
CUA-Gym-Hub: mock web apps as reproducible RL training environments for computer-use agents
Scalable pipeline for synthesizing verifiable RLVR training data for computer-use agents
Symphony turns project work into isolated, autonomous implementation runs, allowing teams to manage work instead of supervising coding agents.
OpenSeeker: A search agent with open-source data and models
[NeurIPS 2025 Spotlight] OpenCUA: Open Foundations for Computer-Use Agents
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
An Illusion of Progress? Assessing the Current State of Web Agents
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
This is the official code base of AgentNetTool in OpenCUA. Website: https://opencua.xlang.ai/
SWE-bench: Can Language Models Resolve Real-world Github Issues?
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Implementation of paper: Scaling the Scaling Logic
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
[ICLR 2026] Official code for TraceRL: Revolutionizing post-training for Diffusion LLMs, powering the SOTA TraDo series.
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models (dLLMs with block diffusion, mixed-CoT, unified RL)
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
Official PyTorch implementation for "Large Language Diffusion Models"
Toolkit for linearizing PDFs for LLM datasets/training
Multimodal OCR: Parse Anything from Documents