Skip to content
View pooruss's full-sized avatar

Block or report pooruss

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Jupyter Notebook 217 3 Updated Dec 19, 2025

All-in-One Sandbox for AI Agents that combines Browser, Shell, File, MCP and VSCode Server in a single Docker container.

Python 4,044 335 Updated Mar 28, 2026

A benchmark for LLMs on complicated tasks in the terminal

Python 1,871 498 Updated Jan 22, 2026

[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge

Python 105 7 Updated Feb 28, 2026

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,566 65 Updated Jun 14, 2025

Benchmark environment for evaluating vision-language models (VLMs) on popular video games!

Python 342 37 Updated May 30, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,794 171 Updated Apr 3, 2026

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,628 130 Updated Nov 21, 2025

The Abstraction and Reasoning Corpus

JavaScript 4,739 707 Updated Apr 4, 2025

AndroidWorld is an environment and benchmark for autonomous agents

Python 703 144 Updated Mar 25, 2026

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

Python 1,418 232 Updated Nov 26, 2025

VisualWebArena is a benchmark for multimodal agents.

Python 454 74 Updated Nov 9, 2024

[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents

Python 52 2 Updated Feb 27, 2025

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

TypeScript 29,232 2,861 Updated Mar 27, 2026

Pioneering Automated GUI Interaction with Native Agents

Python 10,027 731 Updated Jan 27, 2026

GUICourse: From General Vision Langauge Models to Versatile GUI Agents

Python 140 7 Updated Mar 1, 2026

[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents

Python 305 16 Updated Mar 11, 2026

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

Python 849 93 Updated Feb 11, 2026

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 2,112 125 Updated Jun 1, 2023

[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Jupyter Notebook 151 16 Updated Aug 26, 2024

Final project of COMP 7409 Machine Learning in Trading and Finance – Group 7.

Python 5 2 Updated Nov 13, 2023

UniMem: Towards a Unified View of Long-Context Large Language Models (COLM 2024)

Python 9 1 Updated Aug 14, 2024

Repository of GUI Action Narrator

JavaScript 13 Updated Apr 8, 2025

(ICLR 2025) The Official Code Repository for GUI-World.

Python 69 3 Updated Dec 18, 2024

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 58,037 4,797 Updated Apr 3, 2026

Graduation Project HKUCS

Python 2 Updated Jul 17, 2024

Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments

Jupyter Notebook 61 3 Updated Aug 19, 2024

[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile devices, spanning 6 types of cross-app…

Python 150 9 Updated Jan 3, 2026
Python 4,625 457 Updated Sep 14, 2025

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

TypeScript 72,151 8,903 Updated Mar 26, 2026
Next