Highlights
Lists (1)
Sort Name ascending (A-Z)
Stars
AI agents running research on single-GPU nanochat training automatically
[Arxiv] Official repo for "Subspace Control: Turning Constrained Model Steering into Controllable Spectral Optimization"
This code implements the algorithm of FIPO, a value-free RL recipe for eliciting deeper reasoning from a clean base model.
Repo for vLLM Hook, an vLLM plug-in for programming internal states of models deployed on vLLM
This code implements the algorithm of FIPO, a value-free RL recipe for eliciting deeper reasoning from a clean base model.
🚀 Efficient implementations for emerging model architectures
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Muon is an optimizer for hidden layers in neural networks
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
A comprehensive benchmark framework for evaluating the physical safety of Large Language Models (LLMs).
[CVPR 2025] Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile devices, spanning 6 types of cross-app…
[ICLR 2026] Variation in Verification: Understanding Verification Dynamics in Large Language Models
Awesome GUI Agent Paper List
Mobile-Agent: The Powerful GUI Agent Family
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
We introduce the Audio Logical Reasoning (ALR) dataset, consisting of 6,446 text-audio annotated samples specifically designed for complex reasoning tasks. Building on this resource, we propose Sou…
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
Pioneering Automated GUI Interaction with Native Agents