- Tokyo, Japan
- https://ktrk115.github.io/
- @ktrk115
Lists (1)
Sort Name ascending (A-Z)
Stars
[TMLR] LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects
AndroidWorld is an environment and benchmark for autonomous agents
Some powerfull JSX scripts for extending Adobe Illustrator
Enable AI to control your desktop, mobile and HMI devices
Mobile-Agent: The Powerful GUI Agent Family
AI Agent for testing Android, iOS, and Web apps. Get Started in 5 Minutes. Arbigent's intuitive UI and powerful code interface make it accessible to everyone, while its scenario breakdown feature e…
Run Surfer-H agents powered by Holo1 using the Surfer-H-CLI. Includes example tasks, scripts, and configurations.
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
Unified MCP server for Adobe Creative Suite applications - Control Photoshop, Premiere Pro, Illustrator, and InDesign via AI
mcp server to run scripts on adobe illustrator
deck is a tool for creating deck using Markdown and Google Slides.
Manage multiple AI terminal agents like Claude Code, Aider, Codex, OpenCode, and Amp.
Session replay, cobrowsing and product analytics you can self-host. Ideal for reproducing issues and iterating on your product.
🚀 The fast, Pythonic way to build MCP servers and clients
NaturalCC: An Open-Source Toolkit for Code Intelligence
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression
Collection of Aesthetics Assessment Papers for Graphic Designs.
Rembg is a tool to remove images background
This repo contains the code for 1D tokenizer and generator
[CVPR 2023 highlight] Towards Flexible Multi-modal Document Models
Matplotlib styles for scientific plotting
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and te…
Owl Eyes: Spotting UI Display Issues via Visual Understanding
🎨 Type-safe and powerful Python library to generate SVG files