O O
\ /
O —— Cr —— O
/ \
O O
Python automation SDK for the Carbonyl headless browser
pip install carbonyl-agent
carbonyl-agent installGet Started · Session API · Daemon Mode · Bot Detection · Examples
carbonyl-agent is the Python automation SDK for Carbonyl — a Chromium-based headless browser that renders into terminal text. The SDK spawns Carbonyl via PTY, parses the screen via pyte, and exposes a high-level API for navigation, clicking, text extraction, and session persistence. It is designed for agent-driven web interaction: scripted scraping, automated form submission, and LLM-driven browsing loops that need a real browser but not a real display.
Unlike Playwright or Selenium, carbonyl-agent returns terminal text, not a DOM. This makes it fast (no screenshot decode), cheap (no GPU, no window server), and well-suited for the context windows of LLM-driven agents.
A real browser, cheap and scriptable. Most automation stacks require either a full display server (Selenium + Xvfb) or a heavyweight DevTools protocol (Playwright CDP). carbonyl-agent gives you Chromium rendering through a PTY — pip install, call open(), read page_text(). Named sessions persist cookies across runs; daemon mode keeps a browser warm across short-lived scripts.
Rendered text is the native LLM format. An LLM consuming page_text() gets the page as a human would read it in a terminal — headings, lists, table rows — without DOM noise or screenshot OCR. Built-in bot-detection evasion (Firefox UA, AutomationControlled suppression, HTTP/2 off) means agents aren't blocked by default on Akamai/Cloudflare-protected sites.
Low footprint, no window server. Runs in a safe-mode console, over SSH, or inside a container without X11/Wayland. Binary discovery is prioritized: env var → local install → PATH → Docker opt-in. Sessions and daemon sockets live under ~/.local/share/carbonyl/ with 0600/0700 permissions.
- CarbonylBrowser — spawn Carbonyl via PTY,
open(),drain(),page_text(),click(),send_key(),find_text(),click_text(),mouse_path() - SessionManager — named persistent profiles,
create/fork/snapshot/restore, live-session detection - Daemon mode — long-running Carbonyl exposed over a Unix socket; clients reconnect without losing state
- ScreenInspector — coordinate-grid rendering, region annotation, crosshairs for debugging click targets
- Bot-detection evasion — curated
_HEADLESS_FLAGSset at spawn (UA spoof, webdriver suppression, HTTP/1.1 fallback) - Verified install —
carbonyl-agent installdownloads the runtime, verifies SHA256, optional--checksumpinning
Prerequisites: Python 3.11+. Linux (x86_64, aarch64) or macOS.
pip install carbonyl-agent
# Download the Carbonyl runtime binary (verified via SHA256)
carbonyl-agent install
# Or pin to a known checksum for reproducible installs
carbonyl-agent install --checksum <sha256-hex>from carbonyl_agent import CarbonylBrowser
with CarbonylBrowser() as b:
b.open("https://example.com")
b.drain(8.0)
print(b.page_text())
# close() runs automatically on exit, even on exceptionCarbonylBrowser and DaemonClient both implement the context-manager
protocol (#24) — preferred for any script where an unhandled exception
should still tear the browser down cleanly.
All primary names importable directly from the package root:
from carbonyl_agent import (
CarbonylBrowser, SessionManager, ScreenInspector,
DaemonClient, start_daemon, stop_daemon, daemon_status,
)Full API reference is auto-generated from docstrings (#17):
pip install -e ".[docs]"
./scripts/build-docs.sh # writes to docs/api/
./scripts/build-docs.sh --serve # local preview at http://localhost:8080CI uploads the docs as an api-docs-<sha> artifact on every build.
Named sessions persist cookies, localStorage, and IndexedDB across browser restarts:
from carbonyl_agent import CarbonylBrowser
b = CarbonylBrowser(session="myapp")
b.open("https://example.com")
b.drain(5.0)
b.close()
# Session data in ~/.local/share/carbonyl/sessions/myapp/Fork a logged-in session for parallel scraping, or snapshot to pin a known-good state. The full lifecycle — seed once, fork to N workers, snapshot the seed before each campaign, restore on drift — works without re-authentication:
from concurrent.futures import ThreadPoolExecutor
from carbonyl_agent import CarbonylBrowser, SessionManager
sm = SessionManager()
# 1. Seed: spawn an interactive session, log in, accept cookies, then close.
# Everything that hits disk during this run becomes the base profile.
sm.create("base")
with CarbonylBrowser(session="base") as b:
b.open("https://example.com/login")
b.drain(8.0)
# ... interactive login, manual or scripted ...
# 2. Snapshot the seed BEFORE forking, so you can roll back if a worker
# pollutes the base by accident.
sm.snapshot("base", "post-login")
# 3. Fork to N workers. Each fork is a deep copy — independent cookies,
# independent localStorage, but starts logged in.
for i in range(4):
sm.fork("base", f"worker-{i}")
# 4. Run workers in parallel. Each spawn uses its own profile dir, so
# the four browsers don't fight over Chromium's profile lock.
def scrape(name):
with CarbonylBrowser(session=name) as b:
b.open("https://example.com/dashboard")
b.wait_for_render_settle(timeout=10.0)
return b.page_text()
with ThreadPoolExecutor(max_workers=4) as pool:
results = list(pool.map(scrape, [f"worker-{i}" for i in range(4)]))
# 5. Restore the base from snapshot — wipes any drift accumulated during
# workflow above (e.g. cookies the login flow refreshed).
sm.restore("base", "post-login")
# 6. Cleanup: workers are throwaway after a campaign. Snapshot + base survive.
for i in range(4):
sm.destroy(f"worker-{i}")The full API is create, fork, snapshot, restore, list, destroy, exists, is_live, clean_stale_lock. All operations are atomic against the session JSON metadata file — a crashed fork won't leave a half-copied profile registered as live.
persona= is a higher-level alternative to session= keyed on a stable persona identity. Profiles live under CARBONYL_AGENT_PROFILES_DIR (default ~/.config/carbonyl-agent/profiles/), separate from the runtime session store, and ship with public purge_profile / export_profile / import_profile operations:
from carbonyl_agent import CarbonylBrowser
b = CarbonylBrowser(persona="my_throwaway")
b.open("https://example.com")
b.drain(5.0)
b.close() # cookies, localStorage persist
# Backup / CI seeding
b.export_profile("/backups/my_throwaway.tar.gz")
b.import_profile("/backups/my_throwaway.tar.gz")
# Rotate the persona — wipe its state but keep the name registered
b.purge_profile()A file lock prevents accidental dual-open of the same persona; a second open raises RuntimeError naming the holding PID. Profiles are portable across input_backend="pty" and input_backend="uinput" — recording happens at the metadata level only.
persona= and session= are mutually exclusive on the constructor; pick one per browser instance.
A long-running Carbonyl process exposed over a Unix domain socket (not TCP/HTTP — there is no listen port or base URL). Clients reconnect without losing in-memory state — ideal for agent loops that want to amortize browser startup cost across many short scripts.
Transport contract (issue #47):
| Concern | Default | Override |
|---|---|---|
| Socket path | ~/.local/share/carbonyl/sessions/<session>.sock |
session_dir= kwarg or CARBONYL_SESSION_DIR env var |
| Permissions | socket 0o600, parent dir 0o700 |
(not configurable) |
| Public path API | from carbonyl_agent import sock_path, DEFAULT_SOCKET_DIR |
— |
| TCP-style readiness | is_daemon_live(session_name) — checks the socket accepts connections |
— |
| Semantic readiness | client.ping() — round-trips the hello handshake; returns bool, never raises |
— |
Containers: the daemon and clients must share a filesystem path for the socket. Either run both inside the same container, or bind-mount the session dir from host into container so the host can DaemonClient("myapp", session_dir=Path("/host/path")) to reach the in-container daemon.
from carbonyl_agent import DaemonClient, start_daemon, stop_daemon
# Start (forks a background process)
start_daemon("myapp", "https://example.com")
# Connect from any number of short-lived scripts. The context manager
# disconnects the local socket on exit but leaves the daemon running.
with DaemonClient("myapp") as client:
client.drain(5.0)
text = client.page_text()
# ... later, from another script ...
with DaemonClient("myapp") as client:
client.navigate("https://example.com/login")
client.wait_for_render_settle() # #50: same probe as CarbonylBrowser
# Shut down the daemon + browser
stop_daemon("myapp")Multiple short-lived clients can share one long-running daemon — that's the whole point. The browser keeps its in-memory cookies / localStorage across clients, so a login script and a scraping script can run as two separate Python processes against the same authenticated session.
For clients that need to survive a daemon restart in the background (supervisor restart, host suspend/resume), opt into transparent reconnect:
with DaemonClient("myapp", auto_reconnect=True,
max_reconnect_attempts=5,
reconnect_backoff=0.5) as client:
# If the daemon dies and a supervisor brings it back, the next
# _rpc call will reconnect with exponential backoff (0.5s, 1s,
# 2s, 4s, 5s) before giving up. Daemon-side errors (semantic)
# still surface immediately — only transient transport failures
# trigger retry.
text = client.page_text()Default is auto_reconnect=False, preserving the existing fail-fast
behaviour. Opt in only when you've decided your client should outlive
its daemon process.
carbonyl-agent daemon start myapp https://example.com
carbonyl-agent daemon status
carbonyl-agent daemon attach myapp # interactive REPL
carbonyl-agent daemon stop myappSocket: ~/.local/share/carbonyl/daemons/<name>.sock (mode 0600, parent dir 0700).
Find text, debug click targets, and visualize coordinates:
from carbonyl_agent import CarbonylBrowser
b = CarbonylBrowser()
b.open("https://example.com")
b.drain(8.0)
# Find text and click the first match's center
b.click_text("Sign In")
# Or inspect the screen first
si = b.inspector()
si.print_grid(marks=[(46, 45)]) # overlay a coordinate marker
matches = b.find_text("Continue") # [{col, row, end_col}, ...]
print(si.annotate(marks=[(m["col"], m["row"]) for m in matches]))ScreenInspector also exposes region(top, left, bottom, right) for
extracting a rectangular slice of the rendered grid — useful when the
page has multiple lookalike controls and you need to scope find_text
to a known panel.
All carbonyl-agent exceptions inherit from CarbonylError so you can
catch the whole family with one block, or match on a specific subtype
when you want different recovery per failure mode:
from carbonyl_agent import (
CarbonylBrowser, DaemonClient, is_daemon_live,
CarbonylError, BackendMismatchError, BrowserCrashed,
DaemonConnectionError, RenderTimeoutError,
)
# Binary not found at install / first spawn
try:
b = CarbonylBrowser()
b.open("https://example.com")
except FileNotFoundError as exc:
# Run `carbonyl-agent install` or set CARBONYL_BIN
print(f"runtime missing: {exc}")
# Backend contract enforcement (#40) — fail fast when a uinput-only
# script connects to a pty-only daemon
if not is_daemon_live("myapp"):
raise DaemonConnectionError("start the daemon first: carbonyl-agent daemon start myapp")
try:
client = DaemonClient("myapp", require_backend="uinput")
client.connect()
except BackendMismatchError as exc:
print(f"daemon has wrong input backend: {exc}")
# Render-readiness — opt into exception-style control flow (#23)
with CarbonylBrowser() as b:
b.open("https://slow-site.example.com")
try:
b.wait_for_render_settle(timeout=10.0, raise_on_timeout=True)
except RenderTimeoutError as exc:
print(f"giving up: {exc}")
# Catch-all for any SDK error
try:
do_work()
except CarbonylError as exc:
log.error("SDK failure: %s", exc)Backwards compatibility: DaemonConnectionError, BackendMismatchError,
and UinputUnavailableError still inherit from RuntimeError via
multiple inheritance, so existing except RuntimeError blocks keep
working. RenderTimeoutError similarly subclasses TimeoutError.
For the persona profile lock (raised when two processes try to open the
same persona): RuntimeError is raised with the holding PID in the
message so the second caller can decide whether to wait, kill, or pick
a different persona.
CarbonylBrowser applies a curated _HEADLESS_FLAGS set at spawn time to minimize detection by commercial bot-detection engines (Akamai, Cloudflare, PerimeterX):
- Spoofed Firefox User-Agent (removes the
(Carbonyl)marker and Chrome identifier) --disable-blink-features=AutomationControlled(suppressesnavigator.webdriver=true)--disable-http2(HTTP/2 SETTINGS frame is a server-side fingerprint)- Standard
--no-first-run,--disable-sync,--use-mock-keychainflags
If you hit bot-detection walls, do not remove these flags — they are the baseline. For additional entropy, call CarbonylBrowser.mouse_path([...]) to simulate organic mouse movement before interaction.
Synthetic browser events arrive at JavaScript with event.isTrusted = false. Modern React forms and bot-detection libraries refuse to update controlled-input state when this flag is false, so scripted login on X, LinkedIn, and similar sites silently fails — typed text is rendered into the input but never submitted.
CarbonylBrowser accepts an input_backend="uinput" constructor argument. When set, every send() / send_key() / click() / mouse_move() routes through /dev/uinput. The kernel routes the events through Xorg into Chromium with isTrusted = true, indistinguishable from a physical keyboard and mouse.
from carbonyl_agent import CarbonylBrowser, ANTI_FEDCM_FLAGS
with CarbonylBrowser(
cols=500, rows=150,
viewport=(1280, 800),
input_backend="uinput",
extra_flags=ANTI_FEDCM_FLAGS,
) as b:
b.open("https://x.com/i/flow/login")
b.drain(15)
b.click(320, 88) # focus the input
b.send("jmagly") # typed via uinput → isTrusted=true
b.send_key("enter") # advances the form
...Requirements:
- Linux host with
/dev/uinputwritable (sudo modprobe uinputif missing; user ininputgroup or use the99-uinput.rulesudev rule fromscripts/setup-uinput-host.sh) - An X server running so Carbonyl's
--ozone-platform=x11build has a display to attach to - The
python-uinputpackage:pip install python-uinput
Recommended deployment: run inside the carbonyl-agent-qa-runner container, which packages Xorg, the X-Carbonyl runtime, and uinput passthrough so you don't have to assemble the environment yourself:
docker pull git.integrolabs.net/roctinam/carbonyl-agent/qa-runner:latest
cd docker/qa-runner && ./run.sh pytest tests/See roctinam/carbonyl/docs/runtime-modes.md for the full deployment-shape reference (terminal-only / x11+uinput / x11+uinput+X-mirror) and ADR-002 rev 2 for the architecture rationale.
Flag groups are published as module constants so agents can pick and choose:
from carbonyl_agent import (
CarbonylBrowser,
DEFAULT_HEADLESS_FLAGS, # baseline (applied automatically)
BASE_CHROMIUM_FLAGS, # first-run / keychain suppression only
ANTI_BOT_FLAGS, # UA spoof, no-webdriver, HTTP/1.1
ANTI_FEDCM_FLAGS, # disable Google One Tap (X, LinkedIn, publishers)
ANTI_ONETAP_FLAGS, # alias for ANTI_FEDCM_FLAGS
)
# Default: BASE_CHROMIUM_FLAGS + ANTI_BOT_FLAGS
b = CarbonylBrowser()
# Add Google One Tap suppression — required for scripted X/Twitter login
b = CarbonylBrowser(extra_flags=ANTI_FEDCM_FLAGS)
# Compose multiple groups:
b = CarbonylBrowser(extra_flags=ANTI_FEDCM_FLAGS + ["--disable-extensions"])
# Completely replace the defaults (rarely needed):
b = CarbonylBrowser(base_flags=[*BASE_CHROMIUM_FLAGS, "--my-flag"])When to reach for ANTI_FEDCM_FLAGS: any site that aggressively overlays
Google Sign-In on top of its own login form. Without this, the overlay's
autofocused input steals your keystrokes and the underlying form is
unreachable.
CARBONYL_BINenv var (explicit path)~/.local/share/carbonyl/bin/<triple>/carbonyl(installed bycarbonyl-agent install)carbonylon$PATH- Docker fallback (requires
CARBONYL_ALLOW_DOCKER=1)
CI runs the full E2E suite (tests/e2e/) against multiple Carbonyl runtime tags so SDK-vs-runtime drift is caught before it reaches users.
| Tag | Status | Notes |
|---|---|---|
runtime-dd69bef0ea4b2512 |
Supported (current pin in .carbonyl-runtime-version) |
Default for carbonyl-agent install |
runtime-3f5e5a96aa10c4ac |
Backwards-compat tested | Prior runtime; CI verifies SDK still works against it |
Older runtime-* tags |
Best-effort | Not in CI; expected to work but not guaranteed |
Pin a specific runtime in your project by writing the hash into .carbonyl-runtime-version (one runtime-hash=<hash> line). The carbonyl-agent install command reads it. Override on the command line with --tag runtime-<hash> for a one-off install.
When no local binary is installed, the SDK can fall back to docker run fathyb/carbonyl — but this is opt-in for supply-chain safety:
export CARBONYL_ALLOW_DOCKER=1
python -c "from carbonyl_agent import CarbonylBrowser; CarbonylBrowser().open('https://example.com')"Without CARBONYL_ALLOW_DOCKER=1, attempts to use Docker fallback raise RuntimeError with a clear message. The fallback pulls by pinned SHA256 digest, not a mutable :latest tag.
Common exceptions:
| Exception | Raised when |
|---|---|
ValueError |
invalid session name (path traversal, too long, empty) |
FileExistsError |
session already exists on create() |
KeyError |
session not found on get() / destroy() / restore() |
RuntimeError |
destructive op on a live session; Docker fallback blocked |
pexpect.EOF / pexpect.TIMEOUT |
browser subprocess died or read timed out |
Retry pattern for flaky network:
import pexpect
from carbonyl_agent import CarbonylBrowser
b = CarbonylBrowser()
for attempt in range(3):
try:
b.open(url)
b.drain(10)
break
except (pexpect.TIMEOUT, pexpect.EOF):
b.close()
b = CarbonylBrowser()- CHANGELOG — release history
- CONTRIBUTING — dev setup, test suite, PR guidelines
- pyproject.toml — dependencies, CLI entry points
- carbonyl — the Chromium fork that produces the runtime binary
- carbonyl-fleet — server for managing N concurrent Carbonyl instances over PTY + Unix socket
PRs and issues welcome at git.integrolabs.net/roctinam/carbonyl-agent or github.com/jmagly/carbonyl-agent.
- Run the test suite:
pytest - Type-check:
mypy --strict src/ - Lint:
ruff check .
- Issues: git.integrolabs.net/roctinam/carbonyl-agent/issues
- Discussions: github.com/jmagly/carbonyl-agent/discussions
MIT License — see LICENSE.
|
The Temporal Layer for Web3 Enterprise-grade timing infrastructure for blockchain applications. |
No-Code Smart Contracts for Everyone Making blockchain-based agreements accessible to all. |
AI-Powered Automation Solutions Custom AI and blockchain solutions for the digital age. |
Interested in sponsoring? Open a discussion on GitHub.
Built on top of Carbonyl by Fathy Boundjadj. The roctinam/carbonyl fork is actively maintained through the M147 Chromium line. PTY handling via pexpect; terminal parsing via pyte.