2 unstable releases

Uses new Rust 2024

0.4.2	Apr 10, 2026
0.3.0	Mar 29, 2026

#859 in Command line utilities

MIT license

1.5MB
10K SLoC

chrome-agent

chrome-agent — Browser automation for AI agents

Browser automation that speaks LLM.

English | 简体中文

Disclaimer: This is an independent, community-driven project. It is not affiliated with, endorsed by, or sponsored by Google or the Chrome team.

You're not the user. Your LLM is.

You don't need to read this README. Your agent does. Install it, run chrome-agent --help, and let the LLM figure it out. The CLI embeds its own usage guide, every error comes with a hint for the next action, and --json mode outputs structured data an agent can parse without you writing a single adapter. This page is here because GitHub expects one.

How is this different from agent-browser?

agent-browser (Vercel) is a feature-complete browser automation platform: dashboard, cloud providers, annotated screenshots, iOS support, AI chat, auth vault, 40K lines of Rust. It's excellent.

chrome-agent is the opposite bet. Instead of adding features, it removes tokens.

	chrome-agent	agent-browser
Page snapshot	~50 tokens (a11y noise stripped, 66% reduction)	~200 tokens (full a11y tree)
Element IDs	`backendNodeId` — stable across inspects	Sequential `@e1, @e2` — reassigned every snapshot
Action + observe	`click n12 --inspect` (1 call)	`click @e1` then `snapshot` (2 calls)
Stealth	7 native CDP patches (incl. `Runtime.enable` skip)	Delegated to cloud providers
Content extraction	`read` (articles), `extract` (auto-detect lists/tables)	None built-in
Binary	3 MB, zero runtime	3 MB + Next.js dashboard + cloud SDKs
Codebase	7K lines	40K lines

agent-browser gives you a platform with monitoring, cloud browsers, and visual debugging. chrome-agent gives your LLM the smallest possible representation of a webpage and gets out of the way. If your agent needs a dashboard, use agent-browser. If your agent needs to spend tokens on reasoning instead of page parsing, use this.

Philosophy

Every token your agent spends understanding a page is a token it doesn't spend reasoning about the task. chrome-agent is built around one idea: minimize the tokens between "what does this page look like?" and "what should I do next?"

This means:

Accessibility tree over DOM. Playwright returns ~2,000 tokens of raw HTML. chrome-agent returns ~50 tokens of a11y tree with stable element IDs. No CSS selectors to write, no DOM to parse.
One binary, zero runtime. 3 MB Rust binary. No Node.js, no npm, no Playwright runtime. npx chrome-agent just works.
Action + observation in one call. --inspect on any action command returns the page state after the action. One round-trip instead of two.
Errors are instructions. Every error includes a hint field telling the agent what to do next. {"ok":false, "error":"...", "hint":"run inspect"}.
Stealth by default intent. 7 CDP patches including the detection vector nobody talks about (Runtime.enable). Connect to real Chrome for the hardest protections.
Content extraction without selectors. read for articles, extract for repeating data, network for API payloads. The agent never writes CSS selectors.

This is not a general-purpose browser testing framework. It's a tool that makes an LLM effective at browsing the web.

chrome-agent goto news.ycombinator.com --inspect

# ~50 tokens instead of ~2,000:
uid=n1 RootWebArea "Hacker News"
  uid=n50 heading "Hacker News" level=1
  uid=n82 link "Show HN: A New Browser Tool"
  uid=n97 link "Rust 2025 Edition Announced"
  ...

# Click + see the new page in one call:
chrome-agent click n82 --inspect

UIDs are based on Chrome's backendNodeId. They don't change between inspects. Click n82 now or five minutes from now.

chrome-agent (3 MB Rust binary)
    | CDP over WebSocket
    v
Chrome (headless, no Node.js, no runtime)

Why this exists

If you've hit this...	chrome-agent does this instead
Playwright snapshots burn 2K tokens	a11y tree: ~50 tokens. 40x less context spent on page state.
CSS selectors break after every deploy	UIDs from Chrome's `backendNodeId`. Stable as long as the DOM node exists.
Click then inspect = 2 round-trips	`--inspect` on any command. One call, action + observation.
200MB of Node + npm + Playwright	3 MB binary. `npx chrome-agent` works out of the box.
Cloudflare blocks your headless Chrome	7 CDP patches. `Runtime.enable` never called (the detection vector nobody talks about).
Writing per-site scraping selectors	`read` for articles, `extract` for lists/tables/cards, `network` for API payloads. No selectors.
Errors are stack traces	`{"ok":false, "error":"...", "hint":"run inspect"}` -- parseable, actionable.
Each command launches a fresh browser	Sessions persist. Chrome stays alive between calls. ~10ms startup.
Agent can't access your logged-in accounts	`--copy-cookies` grabs cookies from your real Chrome. Works with X.com, Gmail, dashboards.
Infinite scroll shows 10 items	`inspect --scroll --limit 50` scrolls and collects. Tested on X.com: 50 tweets from a live timeline.
Two agents sharing one browser = chaos	`--browser agent1`, `--browser agent2`. Separate Chrome instances.

Install

# For AI agents -- installs a SKILL.md your agent reads automatically
npx skills add sderosiaux/chrome-agent

# Or just the binary
npm install -g chrome-agent    # prebuilt
npx chrome-agent --help        # no install
cargo install chrome-agent     # from source

Quick start

# Navigate and see the page
chrome-agent goto https://example.com --inspect

# Click by uid
chrome-agent click n12 --inspect

# Fill a form
chrome-agent fill --uid n20 "user@test.com"

# CSS selectors work too
chrome-agent click --selector "button.submit"
chrome-agent fill --selector "input[name=email]" "hello@test.com"

# Article content (Readability -- like Firefox Reader Mode)
chrome-agent read

# Visible text, scoped and capped
chrome-agent text --selector "main" --truncate 500

# Run JS
chrome-agent eval "document.title"

# Screenshot (returns a file path, not binary)
chrome-agent screenshot

Commands

Command	What it does
`goto <url> [--inspect] [--max-depth N]`	Navigate. Auto-prefixes `https://` if missing.
`back`	History back.
`forward`	History forward.
`close [--purge]`	Stop browser. `--purge` deletes cookies/profile.

Inspection

Command	What it does
`inspect [--verbose] [--max-depth N] [--uid nN] [--filter "role,role"] [--scroll] [--limit N] [--urls]`	a11y tree with UIDs. `--scroll --limit` for infinite scroll. `--urls` resolves href on links.
`diff`	What changed since last inspect.
`screenshot [--filename name]`	Screenshot to file.
`tabs`	List open tabs.

Interaction

Command	What it does
`click <uid> [--inspect]`	Click by uid. Falls back to JS `.click()` when no box model.
`click --selector "css" [--inspect]`	Click by CSS selector.
`click --xy 100,200`	Click by coordinates.
`dblclick <uid> [--inspect]`	Double-click by uid, `--selector`, or `--xy`.
`fill --uid <uid> <value> [--inspect]`	Fill input by uid.
`fill --selector "css" <value>`	Fill by selector.
`fill-form <uid=val>...`	Batch fill.
`select --uid <uid> <value>`	Select dropdown option by value or visible text.
`select --selector "css" <value>`	Select by CSS selector.
`check <uid>`	Ensure checkbox/radio is checked. Idempotent.
`uncheck <uid>`	Ensure checkbox/radio is unchecked. Idempotent.
`upload --uid <uid> <file>...`	Upload file(s) to a file input.
`upload --selector "css" <file>...`	Upload by CSS selector.
`drag <from-uid> <to-uid>`	Drag element to another element.
`type <text> [--selector "css"]`	Type into focused element.
`press <key>`	Enter, Tab, Escape, etc.
`scroll <down\|up\|uid>`	Scroll page or element into view.
`hover <uid>`	Hover.
`wait <text\|url\|selector> <pattern>`	Wait for a condition.

Content extraction

Command	What it does
`read [--html] [--truncate N]`	Article extraction via Mozilla Readability.
`text [uid] [--selector "css"] [--truncate N]`	Visible text from page or element.
`eval <expression> [--selector "css"]`	JS in page context. `el` = matched element.
`extract [--selector "css"] [--limit N] [--scroll] [--a11y]`	Auto-detect repeating data. `--a11y` for React SPAs (X.com).

Monitoring

Command	What it does
`network [--filter "pattern"] [--body] [--live N] [--abort "pattern"]`	Network requests and API responses. `--abort` blocks matching requests.
`console [--level error] [--clear]`	console.log/warn/error + JS exceptions.

Advanced

Command	What it does
`frame <selector\|main>`	Switch execution context to an iframe (or back to main).
`batch`	Execute multiple commands from a JSON array on stdin.
`pipe`	Persistent JSON stdin/stdout connection.

Global flags

--browser <name>         Named browser profile (default: "default")
--page <name>            Named tab (default: "default")
--connect [url]          Attach to a running Chrome
--headed                 Show browser window (default: headless)
--stealth                Anti-detection patches (Cloudflare, Turnstile)
--copy-cookies           Use cookies from your real Chrome profile
--timeout <seconds>      Command timeout (default: 30)
--max-depth <N>          Limit inspect depth
--ignore-https-errors    Accept self-signed certs
--json                   Structured JSON output

The loop: inspect, act, inspect

chrome-agent goto https://app.com/login --inspect
# uid=n52 textbox "Email" focusable
# uid=n58 textbox "Password" focusable
# uid=n63 button "Sign In" focusable

chrome-agent fill --uid n52 "user@test.com"
chrome-agent fill --uid n58 "password123"
chrome-agent click n63 --inspect
# uid=n101 heading "Dashboard" level=1

UIDs stay the same between inspects as long as the DOM node exists.

Content extraction

From least to most tokens:

# Articles (Readability, like Firefox Reader Mode)
chrome-agent read

# Repeating data -- products, search results, feeds. No selectors.
chrome-agent extract
# Uses MDR/DEPTA heuristics. Finds the pattern automatically.

# React SPAs (X.com, etc.) -- uses a11y tree instead of DOM
chrome-agent extract --a11y --scroll --limit 20

# Scoped visible text
chrome-agent text --selector "[role=main]" --truncate 1000

# API responses -- skip the DOM
chrome-agent network --filter "api" --body

Forms: dropdowns, checkboxes, file uploads

# Select dropdown by value or visible text
chrome-agent select --uid n15 "California"

# Idempotent checkbox control
chrome-agent check n20     # no-op if already checked
chrome-agent uncheck n20   # no-op if already unchecked

# File upload
chrome-agent upload --uid n30 /path/to/document.pdf

# Double-click (text selection, special controls)
chrome-agent dblclick n42

Iframes

# Switch into an iframe
chrome-agent frame "#payment-iframe"
chrome-agent inspect    # see iframe content
chrome-agent fill --selector "input[name=card]" "4242424242424242"

# Switch back to main page
chrome-agent frame main

Batch mode

Execute a sequence of commands from stdin without per-command process startup:

echo '[
  {"cmd":"goto","url":"https://example.com"},
  {"cmd":"inspect","filter":"button"},
  {"cmd":"click","uid":"n42"}
]' | chrome-agent batch

Each command produces one JSON line. About 10x faster than spawning a process per command.

Stealth

--stealth patches 7 automation fingerprints via CDP:

navigator.webdriver set to undefined
chrome.runtime mocked
Permissions API fixed
WebGL renderer masked
User-Agent cleaned
Input coordinate leak patched
Runtime.enable never called

These are CDP-level patches (Page.addScriptToEvaluateOnNewDocument), not Chrome flags.

For sites with heavier protection (DataDome, Kasada) that fingerprint the Chromium binary itself, connect to your real Chrome:

google-chrome --remote-debugging-port=9222 &
chrome-agent --connect http://127.0.0.1:9222 goto https://www.leboncoin.fr --inspect

Protection	Solution
None	`chrome-agent goto ...`
Cloudflare/Turnstile	`chrome-agent --stealth goto ...`
Logged-in sites	`chrome-agent --stealth --copy-cookies goto ...`
DataDome/Kasada	`chrome-agent --connect` to real Chrome

Logged-in sites

--copy-cookies copies the cookie database from your Chrome profile. Both Chrome instances use the same macOS Keychain, so encrypted cookies just work.

chrome-agent --stealth --copy-cookies goto x.com/home --inspect
# Your timeline. Your DMs. No login flow.

chrome-agent --copy-cookies goto mail.google.com --inspect
chrome-agent --copy-cookies goto github.com/notifications --inspect

Your real Chrome is not affected.

Network capture and blocking

# Resources already loaded (stealth-safe, uses Performance API)
chrome-agent network --filter "api"

# Live traffic with response bodies
chrome-agent network --live 5 --body --filter "graphql"

# Block tracking/ads (uses Fetch domain interception)
chrome-agent network --abort "*tracking*" --live 30

# Console output
chrome-agent console --level error    # errors + exceptions only

Console capture uses an injected interceptor, not Runtime.enable.

Pipe mode

For agents that send many commands in sequence, pipe mode keeps a single connection open:

echo '{"cmd":"goto","url":"https://example.com","inspect":true}
{"cmd":"click","uid":"n12","inspect":true}
{"cmd":"read"}' | chrome-agent pipe

One JSON line per response. About 10x faster than spawning a process per command.

JSON mode

chrome-agent --json goto https://example.com --inspect
# {"ok":true,"url":"...","title":"...","snapshot":"uid=n1 heading..."}

chrome-agent --json eval "1+1"
# {"ok":true,"result":2}

# Errors exit 1 but JSON is still on stdout (parseable):
chrome-agent --json click n99
# {"ok":false,"error":"Element uid=n99 not found.","hint":"Run 'chrome-agent inspect'"}

Inspect with link URLs

When deciding which link to click, the agent often needs the URL, not just the text:

chrome-agent inspect --urls --filter link
# uid=n82 link "Pricing" url="https://example.com/pricing"
# uid=n97 link "Docs" url="https://docs.example.com"

Multi-tab and parallel agents

# Multiple tabs in one browser
chrome-agent --page main goto https://app.com
chrome-agent --page docs goto https://docs.app.com
chrome-agent --page main eval "document.title"   # "App"

# Multiple agents, each with their own Chrome
chrome-agent --browser agent1 goto https://example.com
chrome-agent --browser agent2 goto https://other.com

Using with AI agents

# Install the skill (Claude Code, Cursor, Copilot, etc.)
npx skills add sderosiaux/chrome-agent

# Or tell your agent to run:
chrome-agent --help
# The help output includes a full LLM usage guide.

Claude Code permissions:

{
  "permissions": {
    "allow": ["Bash(chrome-agent *)"]
  }
}

Comparison

	chrome-agent	agent-browser (Vercel)	Playwright MCP
Language	Rust	Rust	TypeScript
Binary	3 MB, zero runtime	3 MB CLI + dashboard + cloud providers	Node + Playwright
Startup	~10ms (session reuse)	daemon (fast after first)	cold start
Token efficiency	~50 tokens/page (a11y noise filtering)	~200 tokens/page (a11y tree)	~2,000 tokens (HTML)
UID stability	`backendNodeId` (stable across inspects)	sequential `@e1, @e2` (reassigned per snapshot)	N/A (selectors)
Action + observe	`--inspect` flag (1 call)	separate snapshot call	separate call
Stealth	7 native CDP patches	delegated to cloud providers	none
Reader mode	`read` (Readability.js)	none	none
Data extraction	`extract` (auto-detect repeating data)	none	none
Link URL resolution	`inspect --urls`	`snapshot -u`	N/A
Dropdowns	`select`	`select`	via selectors
Checkboxes	`check`/`uncheck` (idempotent)	`check`/`uncheck`	via selectors
File upload	`upload`	`upload`	via selectors
Drag and drop	`drag`	`drag`	via selectors
Annotated screenshots	not yet	`screenshot --annotate`	not yet
Live dashboard	no (lean)	yes (Next.js)	no
Cloud providers	no (`--connect` to anything)	5 built-in	no
iOS/Safari	no	yes (WebDriver)	no
Network blocking	`network --abort`	`network route --abort`	no
Iframe switching	`frame`	`frame`	via selectors
Batch execution	`batch` (JSON stdin)	`batch` (JSON or quoted)	N/A
AI chat built-in	no (the agent IS the LLM)	yes (AI Gateway)	N/A
Codebase	~7.3K lines	~40K lines	Playwright
Design goal	minimal tokens, maximal autonomy	feature-complete platform	browser testing

License

MIT

Dependencies

~17–31MB
~495K SLoC

app chrome-agent

2 unstable releases

chrome-agent

How is this different from agent-browser?

Philosophy

Why this exists

Install

Quick start

Commands

Navigation

Inspection

Interaction

Content extraction

Monitoring

Advanced

Global flags

The loop: inspect, act, inspect

Content extraction

Forms: dropdowns, checkboxes, file uploads

Iframes

Batch mode

Stealth

Logged-in sites

Network capture and blocking

Pipe mode

JSON mode

Inspect with link URLs

Multi-tab and parallel agents

Using with AI agents

Comparison

License

Dependencies