Skip to content
View njucckevin's full-sized avatar

Block or report njucckevin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OS-Sentinel

Python 31 1 Updated Nov 4, 2025

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

Jupyter Notebook 61 5 Updated Nov 2, 2025

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

Jupyter Notebook 5 Updated Oct 28, 2025

Official Repo for "Why Settle for One? Text-to-ImageSet Generation and Evaluation"

Python 19 1 Updated Oct 1, 2025

ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).

Python 774 42 Updated Oct 3, 2025
62 Updated Sep 6, 2025

Code for "From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios"

Python 27 1 Updated Jul 7, 2025

Code for Research Project TLDR

Python 24 Updated Jul 28, 2025

[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Python 350 40 Updated Oct 29, 2025

Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"

Python 117 10 Updated Aug 28, 2025

[EMNLP2025 Main] Code, Result and Files for paper[Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?]

10 Updated May 22, 2025

[ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework

Python 71 8 Updated Jun 1, 2025
21 Updated May 3, 2025

GUI Grounding for Professional High-Resolution Computer Use

Python 277 30 Updated Oct 27, 2025

[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Python 366 26 Updated Mar 7, 2025

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Python 240 28 Updated Aug 8, 2025

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

875 25 Updated Aug 26, 2025

Official Code for "Coser: Coordinating LLM-Based Persona Simulation of Established Roles"

Python 138 8 Updated Jun 28, 2025

[ACL 2025] An inference-time decoding strategy with adaptive foresight sampling

Python 106 8 Updated May 18, 2025

An Arena-style Automated Evaluation Benchmark for Detailed Captioning

Python 56 3 Updated Jun 1, 2025

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,839 146 Updated Oct 4, 2025

Latest Advances on System-2 Reasoning

Python 1,264 73 Updated Jun 8, 2025

[ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Python 41 4 Updated Dec 19, 2024

The model, data and code for the visual GUI Agent SeeClick

HTML 433 24 Updated Jul 13, 2025

[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Jupyter Notebook 167 12 Updated Oct 8, 2025
17 Updated Nov 3, 2025

OS-ATLAS: A Foundation Action Model For Generalist GUI Agents

Python 397 20 Updated Apr 20, 2025

Building a comprehensive and handy list of papers for GUI agents

Python 544 30 Updated Oct 27, 2025
Next