Skip to content

PaulClawX/browser-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Browser Agent

Browser Agent cover

CDP-native browser automation runtime for agents, with editable skills and reliable Gemini image workflows.

• Real Chrome Control • Skill-Driven Automation • Upload-Verified Image Generation •

FeaturesQuick StartWorkflowsSafety ModelAgent Guide

License Status Python CDP Codex

Quick Navigation

Tip

I'm a human -> Read this README for install, setup, and safe workflows.

I'm an agent -> Read SKILL.md for operation rules and execution patterns. (Recommended)

browser-agent is a minimal runtime that lets agents control your real Chrome session directly over CDP, while keeping task logic editable in-repo.

  • For operators: one command surface for browser actions, diagnostics, and updates.
  • For agents: stable helper APIs (new_tab, js, click_at_xy, upload_file, raw cdp).
  • For reliability: interaction skills and domain skills to encode repeatable mechanics.

Quick Start

For Agent (Recommended)

Tell your coding agent:

Install Browser Agent from https://github.com/PaulClawX/browser-agent and set it up to control my Chrome

For Human

git clone https://github.com/PaulClawX/browser-agent && cd browser-agent && uv tool install -e . && browser-agent --doctor

Browser Connection

Attach to your normal Chrome profile

  1. Open chrome://inspect/#remote-debugging
  2. Enable Allow remote debugging for this browser instance
  3. Accept the Chrome allow popup when prompted

Remote Debugging Setup

  1. Re-run:
browser-agent -c 'print(page_info())'

See install.md for full setup.

Workflows

Tier Workflow Expected Behavior
Stable General browser automation Deterministic tab + DOM + input operations through CDP helpers
Stable Upload-driven tasks Upload confirmation before submit; fail-fast if upload isn't verifiable
Stable Gemini image generation/editing Prompt + reference flow with strict upload-first gating and export
Stable Diagnostics and lifecycle --doctor, daemon auto-start, update checks
Best-effort Complex anti-bot sites Fallback to coordinate actions, retries, and skill-specific patterns

Project Layout

  • src/browser_harness/ - core runtime modules
  • SKILL.md - operator rules for day-to-day use
  • install.md - first-time install and connection
  • docs/interaction-skills/ - reusable browser mechanics playbooks
  • src/agent-workspace/agent_helpers.py - task-specific helper extensions
  • docs/domain-skills/ - site-specific playbooks

Core Contributors and Maintainers

Panwang Pan
Panwang Pan
paulpanwang@gmail.com
Jingjing Zhao
Jingjing Zhao
jingjingbudlet@gmail.com

📧 Contact

Feel free to open an issue if you have any questions or suggestions. If this project helps you, please give it a ⭐ Star!

Acknowledgements

This project builds on and is inspired by the following open-source work:

About

[browser-agent] Never send a human to do a machine's job.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages