-
Tsinghua University
- Beijing
-
18:52
(UTC +08:00) - http://yujia-qin.github.io/
- https://twitter.com/
Stars
Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
[ICLR 2026] Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
A series of technical report on Slow Thinking with LLM
verl: Volcano Engine Reinforcement Learning for LLMs
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Pioneering Automated GUI Interaction with Native Agents
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
🚀 Efficient implementations for emerging model architectures
Utilities intended for use with Llama models.
Agentic components of the Llama Stack APIs
🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
DeepSeek-VL: Towards Real-World Vision-Language Understanding
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K…
Repo for paper "Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents"
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
A keyboard shortcut browser extension for keyboard-based navigation and tab operations with an advanced omnibar
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".