2 unstable releases
Uses new Rust 2024
| 0.4.2 | Apr 10, 2026 |
|---|---|
| 0.3.0 | Mar 29, 2026 |
#859 in Command line utilities
1.5MB
10K
SLoC
chrome-agent
Browser automation that speaks LLM.
Disclaimer: This is an independent, community-driven project. It is not affiliated with, endorsed by, or sponsored by Google or the Chrome team.
You're not the user. Your LLM is.
You don't need to read this README. Your agent does. Install it, run
chrome-agent --help, and let the LLM figure it out. The CLI embeds its own usage guide, every error comes with a hint for the next action, and--jsonmode outputs structured data an agent can parse without you writing a single adapter. This page is here because GitHub expects one.
How is this different from agent-browser?
agent-browser (Vercel) is a feature-complete browser automation platform: dashboard, cloud providers, annotated screenshots, iOS support, AI chat, auth vault, 40K lines of Rust. It's excellent.
chrome-agent is the opposite bet. Instead of adding features, it removes tokens.
| chrome-agent | agent-browser | |
|---|---|---|
| Page snapshot | ~50 tokens (a11y noise stripped, 66% reduction) | ~200 tokens (full a11y tree) |
| Element IDs | backendNodeId — stable across inspects |
Sequential @e1, @e2 — reassigned every snapshot |
| Action + observe | click n12 --inspect (1 call) |
click @e1 then snapshot (2 calls) |
| Stealth | 7 native CDP patches (incl. Runtime.enable skip) |
Delegated to cloud providers |
| Content extraction | read (articles), extract (auto-detect lists/tables) |
None built-in |
| Binary | 3 MB, zero runtime | 3 MB + Next.js dashboard + cloud SDKs |
| Codebase | 7K lines | 40K lines |
agent-browser gives you a platform with monitoring, cloud browsers, and visual debugging. chrome-agent gives your LLM the smallest possible representation of a webpage and gets out of the way. If your agent needs a dashboard, use agent-browser. If your agent needs to spend tokens on reasoning instead of page parsing, use this.
Philosophy
Every token your agent spends understanding a page is a token it doesn't spend reasoning about the task. chrome-agent is built around one idea: minimize the tokens between "what does this page look like?" and "what should I do next?"
This means:
- Accessibility tree over DOM. Playwright returns ~2,000 tokens of raw HTML. chrome-agent returns ~50 tokens of a11y tree with stable element IDs. No CSS selectors to write, no DOM to parse.
- One binary, zero runtime. 3 MB Rust binary. No Node.js, no npm, no Playwright runtime.
npx chrome-agentjust works. - Action + observation in one call.
--inspecton any action command returns the page state after the action. One round-trip instead of two. - Errors are instructions. Every error includes a
hintfield telling the agent what to do next.{"ok":false, "error":"...", "hint":"run inspect"}. - Stealth by default intent. 7 CDP patches including the detection vector nobody talks about (
Runtime.enable). Connect to real Chrome for the hardest protections. - Content extraction without selectors.
readfor articles,extractfor repeating data,networkfor API payloads. The agent never writes CSS selectors.
This is not a general-purpose browser testing framework. It's a tool that makes an LLM effective at browsing the web.
chrome-agent goto news.ycombinator.com --inspect
# ~50 tokens instead of ~2,000:
uid=n1 RootWebArea "Hacker News"
uid=n50 heading "Hacker News" level=1
uid=n82 link "Show HN: A New Browser Tool"
uid=n97 link "Rust 2025 Edition Announced"
...
# Click + see the new page in one call:
chrome-agent click n82 --inspect
UIDs are based on Chrome's backendNodeId. They don't change between inspects. Click n82 now or five minutes from now.
chrome-agent (3 MB Rust binary)
| CDP over WebSocket
v
Chrome (headless, no Node.js, no runtime)
Why this exists
| If you've hit this... | chrome-agent does this instead |
|---|---|
| Playwright snapshots burn 2K tokens | a11y tree: ~50 tokens. 40x less context spent on page state. |
| CSS selectors break after every deploy | UIDs from Chrome's backendNodeId. Stable as long as the DOM node exists. |
| Click then inspect = 2 round-trips | --inspect on any command. One call, action + observation. |
| 200MB of Node + npm + Playwright | 3 MB binary. npx chrome-agent works out of the box. |
| Cloudflare blocks your headless Chrome | 7 CDP patches. Runtime.enable never called (the detection vector nobody talks about). |
| Writing per-site scraping selectors | read for articles, extract for lists/tables/cards, network for API payloads. No selectors. |
| Errors are stack traces | {"ok":false, "error":"...", "hint":"run inspect"} -- parseable, actionable. |
| Each command launches a fresh browser | Sessions persist. Chrome stays alive between calls. ~10ms startup. |
| Agent can't access your logged-in accounts | --copy-cookies grabs cookies from your real Chrome. Works with X.com, Gmail, dashboards. |
| Infinite scroll shows 10 items | inspect --scroll --limit 50 scrolls and collects. Tested on X.com: 50 tweets from a live timeline. |
| Two agents sharing one browser = chaos | --browser agent1, --browser agent2. Separate Chrome instances. |
Install
# For AI agents -- installs a SKILL.md your agent reads automatically
npx skills add sderosiaux/chrome-agent
# Or just the binary
npm install -g chrome-agent # prebuilt
npx chrome-agent --help # no install
cargo install chrome-agent # from source
Quick start
# Navigate and see the page
chrome-agent goto https://example.com --inspect
# Click by uid
chrome-agent click n12 --inspect
# Fill a form
chrome-agent fill --uid n20 "user@test.com"
# CSS selectors work too
chrome-agent click --selector "button.submit"
chrome-agent fill --selector "input[name=email]" "hello@test.com"
# Article content (Readability -- like Firefox Reader Mode)
chrome-agent read
# Visible text, scoped and capped
chrome-agent text --selector "main" --truncate 500
# Run JS
chrome-agent eval "document.title"
# Screenshot (returns a file path, not binary)
chrome-agent screenshot
Commands
Navigation
| Command | What it does |
|---|---|
goto <url> [--inspect] [--max-depth N] |
Navigate. Auto-prefixes https:// if missing. |
back |
History back. |
forward |
History forward. |
close [--purge] |
Stop browser. --purge deletes cookies/profile. |
Inspection
| Command | What it does |
|---|---|
inspect [--verbose] [--max-depth N] [--uid nN] [--filter "role,role"] [--scroll] [--limit N] [--urls] |
a11y tree with UIDs. --scroll --limit for infinite scroll. --urls resolves href on links. |
diff |
What changed since last inspect. |
screenshot [--filename name] |
Screenshot to file. |
tabs |
List open tabs. |
Interaction
| Command | What it does |
|---|---|
click <uid> [--inspect] |
Click by uid. Falls back to JS .click() when no box model. |
click --selector "css" [--inspect] |
Click by CSS selector. |
click --xy 100,200 |
Click by coordinates. |
dblclick <uid> [--inspect] |
Double-click by uid, --selector, or --xy. |
fill --uid <uid> <value> [--inspect] |
Fill input by uid. |
fill --selector "css" <value> |
Fill by selector. |
fill-form <uid=val>... |
Batch fill. |
select --uid <uid> <value> |
Select dropdown option by value or visible text. |
select --selector "css" <value> |
Select by CSS selector. |
check <uid> |
Ensure checkbox/radio is checked. Idempotent. |
uncheck <uid> |
Ensure checkbox/radio is unchecked. Idempotent. |
upload --uid <uid> <file>... |
Upload file(s) to a file input. |
upload --selector "css" <file>... |
Upload by CSS selector. |
drag <from-uid> <to-uid> |
Drag element to another element. |
type <text> [--selector "css"] |
Type into focused element. |
press <key> |
Enter, Tab, Escape, etc. |
scroll <down|up|uid> |
Scroll page or element into view. |
hover <uid> |
Hover. |
wait <text|url|selector> <pattern> |
Wait for a condition. |
Content extraction
| Command | What it does |
|---|---|
read [--html] [--truncate N] |
Article extraction via Mozilla Readability. |
text [uid] [--selector "css"] [--truncate N] |
Visible text from page or element. |
eval <expression> [--selector "css"] |
JS in page context. el = matched element. |
extract [--selector "css"] [--limit N] [--scroll] [--a11y] |
Auto-detect repeating data. --a11y for React SPAs (X.com). |
Monitoring
| Command | What it does |
|---|---|
network [--filter "pattern"] [--body] [--live N] [--abort "pattern"] |
Network requests and API responses. --abort blocks matching requests. |
console [--level error] [--clear] |
console.log/warn/error + JS exceptions. |
Advanced
| Command | What it does |
|---|---|
frame <selector|main> |
Switch execution context to an iframe (or back to main). |
batch |
Execute multiple commands from a JSON array on stdin. |
pipe |
Persistent JSON stdin/stdout connection. |
Global flags
--browser <name> Named browser profile (default: "default")
--page <name> Named tab (default: "default")
--connect [url] Attach to a running Chrome
--headed Show browser window (default: headless)
--stealth Anti-detection patches (Cloudflare, Turnstile)
--copy-cookies Use cookies from your real Chrome profile
--timeout <seconds> Command timeout (default: 30)
--max-depth <N> Limit inspect depth
--ignore-https-errors Accept self-signed certs
--json Structured JSON output
The loop: inspect, act, inspect
chrome-agent goto https://app.com/login --inspect
# uid=n52 textbox "Email" focusable
# uid=n58 textbox "Password" focusable
# uid=n63 button "Sign In" focusable
chrome-agent fill --uid n52 "user@test.com"
chrome-agent fill --uid n58 "password123"
chrome-agent click n63 --inspect
# uid=n101 heading "Dashboard" level=1
UIDs stay the same between inspects as long as the DOM node exists.
Content extraction
From least to most tokens:
# Articles (Readability, like Firefox Reader Mode)
chrome-agent read
# Repeating data -- products, search results, feeds. No selectors.
chrome-agent extract
# Uses MDR/DEPTA heuristics. Finds the pattern automatically.
# React SPAs (X.com, etc.) -- uses a11y tree instead of DOM
chrome-agent extract --a11y --scroll --limit 20
# Scoped visible text
chrome-agent text --selector "[role=main]" --truncate 1000
# API responses -- skip the DOM
chrome-agent network --filter "api" --body
Forms: dropdowns, checkboxes, file uploads
# Select dropdown by value or visible text
chrome-agent select --uid n15 "California"
# Idempotent checkbox control
chrome-agent check n20 # no-op if already checked
chrome-agent uncheck n20 # no-op if already unchecked
# File upload
chrome-agent upload --uid n30 /path/to/document.pdf
# Double-click (text selection, special controls)
chrome-agent dblclick n42
Iframes
# Switch into an iframe
chrome-agent frame "#payment-iframe"
chrome-agent inspect # see iframe content
chrome-agent fill --selector "input[name=card]" "4242424242424242"
# Switch back to main page
chrome-agent frame main
Batch mode
Execute a sequence of commands from stdin without per-command process startup:
echo '[
{"cmd":"goto","url":"https://example.com"},
{"cmd":"inspect","filter":"button"},
{"cmd":"click","uid":"n42"}
]' | chrome-agent batch
Each command produces one JSON line. About 10x faster than spawning a process per command.
Stealth
--stealth patches 7 automation fingerprints via CDP:
navigator.webdriverset toundefinedchrome.runtimemocked- Permissions API fixed
- WebGL renderer masked
- User-Agent cleaned
- Input coordinate leak patched
Runtime.enablenever called
These are CDP-level patches (Page.addScriptToEvaluateOnNewDocument), not Chrome flags.
For sites with heavier protection (DataDome, Kasada) that fingerprint the Chromium binary itself, connect to your real Chrome:
google-chrome --remote-debugging-port=9222 &
chrome-agent --connect http://127.0.0.1:9222 goto https://www.leboncoin.fr --inspect
| Protection | Solution |
|---|---|
| None | chrome-agent goto ... |
| Cloudflare/Turnstile | chrome-agent --stealth goto ... |
| Logged-in sites | chrome-agent --stealth --copy-cookies goto ... |
| DataDome/Kasada | chrome-agent --connect to real Chrome |
Logged-in sites
--copy-cookies copies the cookie database from your Chrome profile. Both Chrome instances use the same macOS Keychain, so encrypted cookies just work.
chrome-agent --stealth --copy-cookies goto x.com/home --inspect
# Your timeline. Your DMs. No login flow.
chrome-agent --copy-cookies goto mail.google.com --inspect
chrome-agent --copy-cookies goto github.com/notifications --inspect
Your real Chrome is not affected.
Network capture and blocking
# Resources already loaded (stealth-safe, uses Performance API)
chrome-agent network --filter "api"
# Live traffic with response bodies
chrome-agent network --live 5 --body --filter "graphql"
# Block tracking/ads (uses Fetch domain interception)
chrome-agent network --abort "*tracking*" --live 30
# Console output
chrome-agent console --level error # errors + exceptions only
Console capture uses an injected interceptor, not Runtime.enable.
Pipe mode
For agents that send many commands in sequence, pipe mode keeps a single connection open:
echo '{"cmd":"goto","url":"https://example.com","inspect":true}
{"cmd":"click","uid":"n12","inspect":true}
{"cmd":"read"}' | chrome-agent pipe
One JSON line per response. About 10x faster than spawning a process per command.
JSON mode
chrome-agent --json goto https://example.com --inspect
# {"ok":true,"url":"...","title":"...","snapshot":"uid=n1 heading..."}
chrome-agent --json eval "1+1"
# {"ok":true,"result":2}
# Errors exit 1 but JSON is still on stdout (parseable):
chrome-agent --json click n99
# {"ok":false,"error":"Element uid=n99 not found.","hint":"Run 'chrome-agent inspect'"}
Inspect with link URLs
When deciding which link to click, the agent often needs the URL, not just the text:
chrome-agent inspect --urls --filter link
# uid=n82 link "Pricing" url="https://example.com/pricing"
# uid=n97 link "Docs" url="https://docs.example.com"
Multi-tab and parallel agents
# Multiple tabs in one browser
chrome-agent --page main goto https://app.com
chrome-agent --page docs goto https://docs.app.com
chrome-agent --page main eval "document.title" # "App"
# Multiple agents, each with their own Chrome
chrome-agent --browser agent1 goto https://example.com
chrome-agent --browser agent2 goto https://other.com
Using with AI agents
# Install the skill (Claude Code, Cursor, Copilot, etc.)
npx skills add sderosiaux/chrome-agent
# Or tell your agent to run:
chrome-agent --help
# The help output includes a full LLM usage guide.
Claude Code permissions:
{
"permissions": {
"allow": ["Bash(chrome-agent *)"]
}
}
Comparison
| chrome-agent | agent-browser (Vercel) | Playwright MCP | |
|---|---|---|---|
| Language | Rust | Rust | TypeScript |
| Binary | 3 MB, zero runtime | 3 MB CLI + dashboard + cloud providers | Node + Playwright |
| Startup | ~10ms (session reuse) | daemon (fast after first) | cold start |
| Token efficiency | ~50 tokens/page (a11y noise filtering) | ~200 tokens/page (a11y tree) | ~2,000 tokens (HTML) |
| UID stability | backendNodeId (stable across inspects) |
sequential @e1, @e2 (reassigned per snapshot) |
N/A (selectors) |
| Action + observe | --inspect flag (1 call) |
separate snapshot call | separate call |
| Stealth | 7 native CDP patches | delegated to cloud providers | none |
| Reader mode | read (Readability.js) |
none | none |
| Data extraction | extract (auto-detect repeating data) |
none | none |
| Link URL resolution | inspect --urls |
snapshot -u |
N/A |
| Dropdowns | select |
select |
via selectors |
| Checkboxes | check/uncheck (idempotent) |
check/uncheck |
via selectors |
| File upload | upload |
upload |
via selectors |
| Drag and drop | drag |
drag |
via selectors |
| Annotated screenshots | not yet | screenshot --annotate |
not yet |
| Live dashboard | no (lean) | yes (Next.js) | no |
| Cloud providers | no (--connect to anything) |
5 built-in | no |
| iOS/Safari | no | yes (WebDriver) | no |
| Network blocking | network --abort |
network route --abort |
no |
| Iframe switching | frame |
frame |
via selectors |
| Batch execution | batch (JSON stdin) |
batch (JSON or quoted) |
N/A |
| AI chat built-in | no (the agent IS the LLM) | yes (AI Gateway) | N/A |
| Codebase | ~7.3K lines | ~40K lines | Playwright |
| Design goal | minimal tokens, maximal autonomy | feature-complete platform | browser testing |
License
MIT
Dependencies
~17–31MB
~495K SLoC