Skip to content
View njucckevin's full-sized avatar

Block or report njucckevin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
38 stars written in Python
Clear filter

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,116 31,045 Updated Nov 5, 2025

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Python 2,291 321 Updated Oct 30, 2025

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,840 146 Updated Oct 4, 2025

Latest Advances on System-2 Reasoning

Python 1,264 73 Updated Jun 8, 2025

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

Python 1,209 190 Updated Oct 3, 2025

ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).

Python 774 42 Updated Oct 3, 2025

Building a comprehensive and handy list of papers for GUI agents

Python 544 30 Updated Oct 27, 2025

AndroidWorld is an environment and benchmark for autonomous agents

Python 493 105 Updated Oct 27, 2025

OS-ATLAS: A Foundation Action Model For Generalist GUI Agents

Python 397 20 Updated Apr 20, 2025

[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Python 366 26 Updated Mar 7, 2025

[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Python 350 40 Updated Oct 29, 2025

[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents

Python 284 12 Updated Jul 18, 2025

GUI Grounding for Professional High-Resolution Computer Use

Python 277 30 Updated Oct 27, 2025

Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)

Python 253 19 Updated Jul 16, 2024

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Python 240 28 Updated Aug 8, 2025

[ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents

Python 219 24 Updated Jun 16, 2025

A RLHF Infrastructure for Vision-Language Models

Python 185 8 Updated Nov 15, 2024

Official Code for "Coser: Coordinating LLM-Based Persona Simulation of Established Roles"

Python 138 8 Updated Jun 28, 2025

GUICourse: From General Vision Langauge Models to Versatile GUI Agents

Python 133 7 Updated Jul 17, 2024

Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"

Python 117 10 Updated Aug 28, 2025

[ACL 2025] An inference-time decoding strategy with adaptive foresight sampling

Python 106 8 Updated May 18, 2025

A Self-Training Framework for Vision-Language Reasoning

Python 85 1 Updated Jan 23, 2025
Python 84 3 Updated Jun 7, 2024
Python 71 5 Updated Dec 6, 2024

[ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework

Python 71 8 Updated Jun 1, 2025

[ACL 2024] The project of Symbol-LLM

Python 59 4 Updated Jul 10, 2024

An Arena-style Automated Evaluation Benchmark for Detailed Captioning

Python 56 3 Updated Jun 1, 2025

Retrieved Sequence Augmentation for Protein Representation Learning

Python 53 3 Updated Nov 1, 2023

[ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Python 41 4 Updated Dec 19, 2024

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

Python 34 5 Updated Jul 12, 2024
Next