Highlights
- Pro
Stars
Official implementation of UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
R-HORIZON: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.
A Survey of Reinforcement Learning for Large Reasoning Models
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
A high-throughput and memory-efficient inference and serving engine for LLMs
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
verl: Volcano Engine Reinforcement Learning for LLMs
[ICML 2025] Improving Planning of Agents for Long-Horizon Tasks
[TMLR'25] "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
The model, data and code for the visual GUI Agent SeeClick
Mobile-Agent: The Powerful GUI Agent Family
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
An awesome repository that maps the current landscape of GUI/OS Agent research
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
面向开发者的 LLM 入门教程,吴恩达大模型系列课程中文版
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
GUI Grounding for Professional High-Resolution Computer Use