Highlights
- Pro
Stars
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
GUI Grounding for Professional High-Resolution Computer Use
Python code for several metrics: PSNR, SSIM, UCIQE and UIQM
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
[TMLR'25] "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"
[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
Repository for the paper "InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners"
An awesome repository that maps the current landscape of GUI/OS Agent research
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
Official implementation of UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
This is a script for batch evaluation of psnr and ssim indicators of reconstructed images. It is suitable for image compression, image restoration, super-resolution reconstruction, image denoising …
[SIGIR 2024] TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision
[AAAI 2023 Oral] Official code for "PiCor: Multi-Task Deep Reinforcement Learning with Policy Correction".
R-HORIZON: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
[ICML 2025] Improving Planning of Agents for Long-Horizon Tasks