Highlights
- Pro
Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
Automate browser based workflows with AI
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
Official implementation of UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
Your AI Operator for Web, Android, Automation & Testing.
verl: Volcano Engine Reinforcement Learning for LLMs
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
An open protocol enabling communication and interoperability between opaque agentic applications.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
A Survey of Reinforcement Learning for Large Reasoning Models
⏰ Collaboratively track worldwide conference deadlines (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A collection of AI Agents papers (Updated biweekly)
A curated list of awesome LLM agents frameworks.
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Agent S: an open agentic framework that uses computers like a human
Mobile-Agent: The Powerful GUI Agent Family
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
GUI Grounding for Professional High-Resolution Computer Use
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Building a comprehensive and handy list of papers for GUI agents
R-HORIZON: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?