Remote desktop control for AI agents using vision-based primitives.
An MCP server that lets AI agents control your desktop through 12 simple actions. Perfect for AI automation, testing, and remote support.
β
12 semantic actions - click_at, hover_at, drag_and_drop, scroll_at, type_text_at
β
Vision-first design - AI sees screen, decides actions
β
Cross-platform - Windows, macOS, Linux
β
MCP native - Works with Claude Desktop, any MCP client
β
Ultra-fast screenshots - 16-47ms using mss library
β
Safety mechanisms - Auto-releases stuck keys, emergency cleanup
β
Zero-config - Install, run, connect
RemoteUse controls your desktop directly. Keys pressed will execute on your system. The code includes:
- Automatic key release - All held modifiers are released when controller is destroyed
- Emergency cleanup - Run
python emergency_release.pyif keys get stuck - Try-finally blocks - Key combinations always release modifiers even on error
If keys get stuck: Run the emergency release script immediately:
python emergency_release.pypip install -r requirements.txt
python -m remoteuse.server.mcp_server
Add to your config (~/Library/Application Support/Claude/claude_desktop_config.json on Mac):
{
"mcpServers": {
"remoteuse": {
"command": "python",
"args": ["-m", "remoteuse.server.mcp_server"]
}
}
}
Me: "Screenshot my desktop"
Claude: *calls screenshot() tool*
Claude: "I can see Chrome and VS Code open."
Me: "Open Notepad and type 'Hello AI'"
Claude: *calls click_at(), type_text(), etc.*
Claude: "Done!"
screenshot(monitor)- Capture screen as base64 PNG
click_at(x, y, button, count)- Click (single/double/triple, left/right/middle)hover_at(x, y, duration_ms)- Hover to reveal menusdrag_and_drop(from_x, from_y, to_x, to_y, button)- Drag operationscroll_at(x, y, direction, amount)- Scroll up/down/left/rightmouse_move(x, y)- Move cursor
type_text(text, delay_ms)- Type stringtype_text_at(x, y, text, clear, submit)- BEST for forms!key_press(key)- Single key ("enter", "escape", "f5", "a")key_combination(keys)- Combos ("ctrl+c", "cmd+space")key_hold(key)/key_release(key)- Hold modifiers
wait(duration_ms)- Pause execution
- Customer Support - AI troubleshoots user issues
- Data Entry - Automate form filling
- QA Testing - Test desktop apps
- Personal Automation - "Organize my desktop files"
- Remote Work - Control office computer from home
AI Agent (Claude/GPT/Qwen)
β MCP Protocol
RemoteUse MCP Server
β
Desktop Controller (mss + pynput)
Vision models are getting AMAZING (Gemini 2.5, Qwen3-VL, Claude 3.5). They can:
- See screens perfectly
- Read text (OCR built-in)
- Understand UI layouts
- Navigate complex apps
But they need primitives - basic mouse/keyboard actions to execute their vision.
RemoteUse provides those primitives in the cleanest possible API.
from remoteuse.core.actions import DesktopController
controller = DesktopController()
controller.screenshot()
controller.click_at(20, 1060)
controller.wait(500)
controller.type_text("notepad")
controller.key_press("enter")
controller.wait(2000)
controller.type_text("Hello from AI!")controller.type_text_at(400, 300, "john@example.com", submit=False)
controller.type_text_at(400, 400, "John Smith", submit=False)
controller.type_text_at(400, 500, "My message", submit=True)controller.key_hold("ctrl")
controller.click_at(100, 200)
controller.click_at(100, 300)
controller.click_at(100, 400)
controller.key_release("ctrl")
controller.drag_and_drop(100, 200, 800, 600)- Core 12 primitives
- MCP server implementation
- Cross-platform support (Windows/Mac/Linux)
- Safety mechanisms (auto-release keys, cleanup)
- Timing configurations (fast/default/slow)
- Solve CSVToForm workflow
RemoteUse controls your desktop directly. The code includes:
- Auto-release held modifiers on cleanup
- Try-finally blocks in key combinations
- Emergency release script:
python emergency_release.py
Only use with trusted AI agents!
MIT