Skip to content
View njucckevin's full-sized avatar

Block or report njucckevin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,123 31,046 Updated Nov 5, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 15,982 1,259 Updated Oct 27, 2025

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Python 2,292 321 Updated Oct 30, 2025

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,840 146 Updated Oct 4, 2025

Latest Advances on System-2 Reasoning

Python 1,264 73 Updated Jun 8, 2025

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

Python 1,209 190 Updated Oct 3, 2025

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

875 25 Updated Aug 26, 2025

ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).

Python 774 43 Updated Oct 3, 2025

Building a comprehensive and handy list of papers for GUI agents

Python 544 30 Updated Oct 27, 2025

AndroidWorld is an environment and benchmark for autonomous agents

Python 493 105 Updated Oct 27, 2025

This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.

453 16 Updated Jun 4, 2025

The model, data and code for the visual GUI Agent SeeClick

HTML 433 24 Updated Jul 13, 2025

Paper list for Personal LLM Agents

417 21 Updated May 8, 2024

OS-ATLAS: A Foundation Action Model For Generalist GUI Agents

Python 397 20 Updated Apr 20, 2025

[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Python 366 26 Updated Mar 7, 2025

[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Python 350 40 Updated Oct 29, 2025

[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents

Python 284 12 Updated Jul 18, 2025

GUI Grounding for Professional High-Resolution Computer Use

Python 277 30 Updated Oct 27, 2025

Neural Code Intelligence Survey 2024; Reading lists and resources

275 15 Updated Jul 24, 2025

Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)

Python 253 19 Updated Jul 16, 2024

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Python 240 28 Updated Aug 8, 2025

[ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents

Python 219 24 Updated Jun 16, 2025

A RLHF Infrastructure for Vision-Language Models

Python 185 8 Updated Nov 15, 2024

[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Jupyter Notebook 167 12 Updated Oct 8, 2025

Official Code for "Coser: Coordinating LLM-Based Persona Simulation of Established Roles"

Python 138 8 Updated Jun 28, 2025

GUICourse: From General Vision Langauge Models to Versatile GUI Agents

Python 133 7 Updated Jul 17, 2024

Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"

Python 117 10 Updated Aug 28, 2025

[ACL 2025] A Neural-Symbolic Self-Training Framework

C 116 4 Updated Jun 1, 2025

[ACL 2025] An inference-time decoding strategy with adaptive foresight sampling

Python 106 8 Updated May 18, 2025
Next