CDP-native browser automation runtime for agents, with editable skills and reliable Gemini image workflows.
• Real Chrome Control • Skill-Driven Automation • Upload-Verified Image Generation •
Features • Quick Start • Workflows • Safety Model • Agent Guide
Tip
I'm a human -> Read this README for install, setup, and safe workflows.
I'm an agent -> Read SKILL.md for operation rules and execution patterns. (Recommended)
browser-agent is a minimal runtime that lets agents control your real Chrome session directly over CDP, while keeping task logic editable in-repo.
- For operators: one command surface for browser actions, diagnostics, and updates.
- For agents: stable helper APIs (
new_tab,js,click_at_xy,upload_file, rawcdp). - For reliability: interaction skills and domain skills to encode repeatable mechanics.
Tell your coding agent:
Install Browser Agent from https://github.com/PaulClawX/browser-agent and set it up to control my Chrome
git clone https://github.com/PaulClawX/browser-agent && cd browser-agent && uv tool install -e . && browser-agent --doctor- Open
chrome://inspect/#remote-debugging - Enable
Allow remote debugging for this browser instance - Accept the Chrome allow popup when prompted
- Re-run:
browser-agent -c 'print(page_info())'See install.md for full setup.
| Tier | Workflow | Expected Behavior |
|---|---|---|
| Stable | General browser automation | Deterministic tab + DOM + input operations through CDP helpers |
| Stable | Upload-driven tasks | Upload confirmation before submit; fail-fast if upload isn't verifiable |
| Stable | Gemini image generation/editing | Prompt + reference flow with strict upload-first gating and export |
| Stable | Diagnostics and lifecycle | --doctor, daemon auto-start, update checks |
| Best-effort | Complex anti-bot sites | Fallback to coordinate actions, retries, and skill-specific patterns |
src/browser_harness/- core runtime modulesSKILL.md- operator rules for day-to-day useinstall.md- first-time install and connectiondocs/interaction-skills/- reusable browser mechanics playbookssrc/agent-workspace/agent_helpers.py- task-specific helper extensionsdocs/domain-skills/- site-specific playbooks
|
Panwang Pan paulpanwang@gmail.com |
Jingjing Zhao jingjingbudlet@gmail.com |
Feel free to open an issue if you have any questions or suggestions. If this project helps you, please give it a ⭐ Star!
This project builds on and is inspired by the following open-source work:
- browser-use/browser-harness - the primary code and architecture source.
- OpenClaudex/openreview-agent - OpenReview dry-run workflow inspiration.