Skip to content

jmagly/carbonyl-agent

Repository files navigation

   O    O
    \  /
   O —— Cr —— O
    /  \
   O    O

carbonyl-agent

Python automation SDK for the Carbonyl headless browser

pip install carbonyl-agent
carbonyl-agent install

License: MIT Python Carbonyl M147

Get Started · Session API · Daemon Mode · Bot Detection · Examples


What carbonyl-agent Is

carbonyl-agent is the Python automation SDK for Carbonyl — a Chromium-based headless browser that renders into terminal text. The SDK spawns Carbonyl via PTY, parses the screen via pyte, and exposes a high-level API for navigation, clicking, text extraction, and session persistence. It is designed for agent-driven web interaction: scripted scraping, automated form submission, and LLM-driven browsing loops that need a real browser but not a real display.

Unlike Playwright or Selenium, carbonyl-agent returns terminal text, not a DOM. This makes it fast (no screenshot decode), cheap (no GPU, no window server), and well-suited for the context windows of LLM-driven agents.


Why This Matters

For Developers

A real browser, cheap and scriptable. Most automation stacks require either a full display server (Selenium + Xvfb) or a heavyweight DevTools protocol (Playwright CDP). carbonyl-agent gives you Chromium rendering through a PTY — pip install, call open(), read page_text(). Named sessions persist cookies across runs; daemon mode keeps a browser warm across short-lived scripts.

For Agents

Rendered text is the native LLM format. An LLM consuming page_text() gets the page as a human would read it in a terminal — headings, lists, table rows — without DOM noise or screenshot OCR. Built-in bot-detection evasion (Firefox UA, AutomationControlled suppression, HTTP/2 off) means agents aren't blocked by default on Akamai/Cloudflare-protected sites.

For Operators

Low footprint, no window server. Runs in a safe-mode console, over SSH, or inside a container without X11/Wayland. Binary discovery is prioritized: env var → local install → PATH → Docker opt-in. Sessions and daemon sockets live under ~/.local/share/carbonyl/ with 0600/0700 permissions.


Core Capabilities

  1. CarbonylBrowser — spawn Carbonyl via PTY, open(), drain(), page_text(), click(), send_key(), find_text(), click_text(), mouse_path()
  2. SessionManager — named persistent profiles, create / fork / snapshot / restore, live-session detection
  3. Daemon mode — long-running Carbonyl exposed over a Unix socket; clients reconnect without losing state
  4. ScreenInspector — coordinate-grid rendering, region annotation, crosshairs for debugging click targets
  5. Bot-detection evasion — curated _HEADLESS_FLAGS set at spawn (UA spoof, webdriver suppression, HTTP/1.1 fallback)
  6. Verified installcarbonyl-agent install downloads the runtime, verifies SHA256, optional --checksum pinning

Quick Start

Prerequisites: Python 3.11+. Linux (x86_64, aarch64) or macOS.

Install

pip install carbonyl-agent

# Download the Carbonyl runtime binary (verified via SHA256)
carbonyl-agent install

# Or pin to a known checksum for reproducible installs
carbonyl-agent install --checksum <sha256-hex>

Your first script

from carbonyl_agent import CarbonylBrowser

with CarbonylBrowser() as b:
    b.open("https://example.com")
    b.drain(8.0)
    print(b.page_text())
# close() runs automatically on exit, even on exception

CarbonylBrowser and DaemonClient both implement the context-manager protocol (#24) — preferred for any script where an unhandled exception should still tear the browser down cleanly.

Public API

All primary names importable directly from the package root:

from carbonyl_agent import (
    CarbonylBrowser, SessionManager, ScreenInspector,
    DaemonClient, start_daemon, stop_daemon, daemon_status,
)

Full API reference is auto-generated from docstrings (#17):

pip install -e ".[docs]"
./scripts/build-docs.sh           # writes to docs/api/
./scripts/build-docs.sh --serve   # local preview at http://localhost:8080

CI uploads the docs as an api-docs-<sha> artifact on every build.


Session Persistence

Named sessions persist cookies, localStorage, and IndexedDB across browser restarts:

from carbonyl_agent import CarbonylBrowser

b = CarbonylBrowser(session="myapp")
b.open("https://example.com")
b.drain(5.0)
b.close()
# Session data in ~/.local/share/carbonyl/sessions/myapp/

Fork and snapshot

Fork a logged-in session for parallel scraping, or snapshot to pin a known-good state. The full lifecycle — seed once, fork to N workers, snapshot the seed before each campaign, restore on drift — works without re-authentication:

from concurrent.futures import ThreadPoolExecutor
from carbonyl_agent import CarbonylBrowser, SessionManager

sm = SessionManager()

# 1. Seed: spawn an interactive session, log in, accept cookies, then close.
#    Everything that hits disk during this run becomes the base profile.
sm.create("base")
with CarbonylBrowser(session="base") as b:
    b.open("https://example.com/login")
    b.drain(8.0)
    # ... interactive login, manual or scripted ...

# 2. Snapshot the seed BEFORE forking, so you can roll back if a worker
#    pollutes the base by accident.
sm.snapshot("base", "post-login")

# 3. Fork to N workers. Each fork is a deep copy — independent cookies,
#    independent localStorage, but starts logged in.
for i in range(4):
    sm.fork("base", f"worker-{i}")

# 4. Run workers in parallel. Each spawn uses its own profile dir, so
#    the four browsers don't fight over Chromium's profile lock.
def scrape(name):
    with CarbonylBrowser(session=name) as b:
        b.open("https://example.com/dashboard")
        b.wait_for_render_settle(timeout=10.0)
        return b.page_text()

with ThreadPoolExecutor(max_workers=4) as pool:
    results = list(pool.map(scrape, [f"worker-{i}" for i in range(4)]))

# 5. Restore the base from snapshot — wipes any drift accumulated during
#    workflow above (e.g. cookies the login flow refreshed).
sm.restore("base", "post-login")

# 6. Cleanup: workers are throwaway after a campaign. Snapshot + base survive.
for i in range(4):
    sm.destroy(f"worker-{i}")

The full API is create, fork, snapshot, restore, list, destroy, exists, is_live, clean_stale_lock. All operations are atomic against the session JSON metadata file — a crashed fork won't leave a half-copied profile registered as live.

Persona profiles

persona= is a higher-level alternative to session= keyed on a stable persona identity. Profiles live under CARBONYL_AGENT_PROFILES_DIR (default ~/.config/carbonyl-agent/profiles/), separate from the runtime session store, and ship with public purge_profile / export_profile / import_profile operations:

from carbonyl_agent import CarbonylBrowser

b = CarbonylBrowser(persona="my_throwaway")
b.open("https://example.com")
b.drain(5.0)
b.close()                                     # cookies, localStorage persist

# Backup / CI seeding
b.export_profile("/backups/my_throwaway.tar.gz")
b.import_profile("/backups/my_throwaway.tar.gz")

# Rotate the persona — wipe its state but keep the name registered
b.purge_profile()

A file lock prevents accidental dual-open of the same persona; a second open raises RuntimeError naming the holding PID. Profiles are portable across input_backend="pty" and input_backend="uinput" — recording happens at the metadata level only.

persona= and session= are mutually exclusive on the constructor; pick one per browser instance.


Daemon Mode

A long-running Carbonyl process exposed over a Unix domain socket (not TCP/HTTP — there is no listen port or base URL). Clients reconnect without losing in-memory state — ideal for agent loops that want to amortize browser startup cost across many short scripts.

Transport contract (issue #47):

Concern Default Override
Socket path ~/.local/share/carbonyl/sessions/<session>.sock session_dir= kwarg or CARBONYL_SESSION_DIR env var
Permissions socket 0o600, parent dir 0o700 (not configurable)
Public path API from carbonyl_agent import sock_path, DEFAULT_SOCKET_DIR
TCP-style readiness is_daemon_live(session_name) — checks the socket accepts connections
Semantic readiness client.ping() — round-trips the hello handshake; returns bool, never raises

Containers: the daemon and clients must share a filesystem path for the socket. Either run both inside the same container, or bind-mount the session dir from host into container so the host can DaemonClient("myapp", session_dir=Path("/host/path")) to reach the in-container daemon.

from carbonyl_agent import DaemonClient, start_daemon, stop_daemon

# Start (forks a background process)
start_daemon("myapp", "https://example.com")

# Connect from any number of short-lived scripts. The context manager
# disconnects the local socket on exit but leaves the daemon running.
with DaemonClient("myapp") as client:
    client.drain(5.0)
    text = client.page_text()

# ... later, from another script ...
with DaemonClient("myapp") as client:
    client.navigate("https://example.com/login")
    client.wait_for_render_settle()        # #50: same probe as CarbonylBrowser

# Shut down the daemon + browser
stop_daemon("myapp")

Multiple short-lived clients can share one long-running daemon — that's the whole point. The browser keeps its in-memory cookies / localStorage across clients, so a login script and a scraping script can run as two separate Python processes against the same authenticated session.

Auto-reconnect for long-running clients (#23)

For clients that need to survive a daemon restart in the background (supervisor restart, host suspend/resume), opt into transparent reconnect:

with DaemonClient("myapp", auto_reconnect=True,
                  max_reconnect_attempts=5,
                  reconnect_backoff=0.5) as client:
    # If the daemon dies and a supervisor brings it back, the next
    # _rpc call will reconnect with exponential backoff (0.5s, 1s,
    # 2s, 4s, 5s) before giving up. Daemon-side errors (semantic)
    # still surface immediately — only transient transport failures
    # trigger retry.
    text = client.page_text()

Default is auto_reconnect=False, preserving the existing fail-fast behaviour. Opt in only when you've decided your client should outlive its daemon process.

Daemon CLI

carbonyl-agent daemon start myapp https://example.com
carbonyl-agent daemon status
carbonyl-agent daemon attach myapp      # interactive REPL
carbonyl-agent daemon stop myapp

Socket: ~/.local/share/carbonyl/daemons/<name>.sock (mode 0600, parent dir 0700).


Screen Inspection

Find text, debug click targets, and visualize coordinates:

from carbonyl_agent import CarbonylBrowser

b = CarbonylBrowser()
b.open("https://example.com")
b.drain(8.0)

# Find text and click the first match's center
b.click_text("Sign In")

# Or inspect the screen first
si = b.inspector()
si.print_grid(marks=[(46, 45)])         # overlay a coordinate marker
matches = b.find_text("Continue")       # [{col, row, end_col}, ...]
print(si.annotate(marks=[(m["col"], m["row"]) for m in matches]))

ScreenInspector also exposes region(top, left, bottom, right) for extracting a rectangular slice of the rendered grid — useful when the page has multiple lookalike controls and you need to scope find_text to a known panel.


Error Handling

All carbonyl-agent exceptions inherit from CarbonylError so you can catch the whole family with one block, or match on a specific subtype when you want different recovery per failure mode:

from carbonyl_agent import (
    CarbonylBrowser, DaemonClient, is_daemon_live,
    CarbonylError, BackendMismatchError, BrowserCrashed,
    DaemonConnectionError, RenderTimeoutError,
)

# Binary not found at install / first spawn
try:
    b = CarbonylBrowser()
    b.open("https://example.com")
except FileNotFoundError as exc:
    # Run `carbonyl-agent install` or set CARBONYL_BIN
    print(f"runtime missing: {exc}")

# Backend contract enforcement (#40) — fail fast when a uinput-only
# script connects to a pty-only daemon
if not is_daemon_live("myapp"):
    raise DaemonConnectionError("start the daemon first: carbonyl-agent daemon start myapp")
try:
    client = DaemonClient("myapp", require_backend="uinput")
    client.connect()
except BackendMismatchError as exc:
    print(f"daemon has wrong input backend: {exc}")

# Render-readiness — opt into exception-style control flow (#23)
with CarbonylBrowser() as b:
    b.open("https://slow-site.example.com")
    try:
        b.wait_for_render_settle(timeout=10.0, raise_on_timeout=True)
    except RenderTimeoutError as exc:
        print(f"giving up: {exc}")

# Catch-all for any SDK error
try:
    do_work()
except CarbonylError as exc:
    log.error("SDK failure: %s", exc)

Backwards compatibility: DaemonConnectionError, BackendMismatchError, and UinputUnavailableError still inherit from RuntimeError via multiple inheritance, so existing except RuntimeError blocks keep working. RenderTimeoutError similarly subclasses TimeoutError.

For the persona profile lock (raised when two processes try to open the same persona): RuntimeError is raised with the holding PID in the message so the second caller can decide whether to wait, kill, or pick a different persona.


Bot Detection Flags

CarbonylBrowser applies a curated _HEADLESS_FLAGS set at spawn time to minimize detection by commercial bot-detection engines (Akamai, Cloudflare, PerimeterX):

  • Spoofed Firefox User-Agent (removes the (Carbonyl) marker and Chrome identifier)
  • --disable-blink-features=AutomationControlled (suppresses navigator.webdriver=true)
  • --disable-http2 (HTTP/2 SETTINGS frame is a server-side fingerprint)
  • Standard --no-first-run, --disable-sync, --use-mock-keychain flags

If you hit bot-detection walls, do not remove these flags — they are the baseline. For additional entropy, call CarbonylBrowser.mouse_path([...]) to simulate organic mouse movement before interaction.

Trusted input backend (uinput)

Synthetic browser events arrive at JavaScript with event.isTrusted = false. Modern React forms and bot-detection libraries refuse to update controlled-input state when this flag is false, so scripted login on X, LinkedIn, and similar sites silently fails — typed text is rendered into the input but never submitted.

CarbonylBrowser accepts an input_backend="uinput" constructor argument. When set, every send() / send_key() / click() / mouse_move() routes through /dev/uinput. The kernel routes the events through Xorg into Chromium with isTrusted = true, indistinguishable from a physical keyboard and mouse.

from carbonyl_agent import CarbonylBrowser, ANTI_FEDCM_FLAGS

with CarbonylBrowser(
    cols=500, rows=150,
    viewport=(1280, 800),
    input_backend="uinput",
    extra_flags=ANTI_FEDCM_FLAGS,
) as b:
    b.open("https://x.com/i/flow/login")
    b.drain(15)
    b.click(320, 88)         # focus the input
    b.send("jmagly")         # typed via uinput → isTrusted=true
    b.send_key("enter")      # advances the form
    ...

Requirements:

  • Linux host with /dev/uinput writable (sudo modprobe uinput if missing; user in input group or use the 99-uinput.rules udev rule from scripts/setup-uinput-host.sh)
  • An X server running so Carbonyl's --ozone-platform=x11 build has a display to attach to
  • The python-uinput package: pip install python-uinput

Recommended deployment: run inside the carbonyl-agent-qa-runner container, which packages Xorg, the X-Carbonyl runtime, and uinput passthrough so you don't have to assemble the environment yourself:

docker pull git.integrolabs.net/roctinam/carbonyl-agent/qa-runner:latest
cd docker/qa-runner && ./run.sh pytest tests/

See roctinam/carbonyl/docs/runtime-modes.md for the full deployment-shape reference (terminal-only / x11+uinput / x11+uinput+X-mirror) and ADR-002 rev 2 for the architecture rationale.

Composing flags for specific scenarios

Flag groups are published as module constants so agents can pick and choose:

from carbonyl_agent import (
    CarbonylBrowser,
    DEFAULT_HEADLESS_FLAGS,   # baseline (applied automatically)
    BASE_CHROMIUM_FLAGS,      # first-run / keychain suppression only
    ANTI_BOT_FLAGS,           # UA spoof, no-webdriver, HTTP/1.1
    ANTI_FEDCM_FLAGS,         # disable Google One Tap (X, LinkedIn, publishers)
    ANTI_ONETAP_FLAGS,        # alias for ANTI_FEDCM_FLAGS
)

# Default: BASE_CHROMIUM_FLAGS + ANTI_BOT_FLAGS
b = CarbonylBrowser()

# Add Google One Tap suppression — required for scripted X/Twitter login
b = CarbonylBrowser(extra_flags=ANTI_FEDCM_FLAGS)

# Compose multiple groups:
b = CarbonylBrowser(extra_flags=ANTI_FEDCM_FLAGS + ["--disable-extensions"])

# Completely replace the defaults (rarely needed):
b = CarbonylBrowser(base_flags=[*BASE_CHROMIUM_FLAGS, "--my-flag"])

When to reach for ANTI_FEDCM_FLAGS: any site that aggressively overlays Google Sign-In on top of its own login form. Without this, the overlay's autofocused input steals your keystrokes and the underlying form is unreachable.


Binary Search Order

  1. CARBONYL_BIN env var (explicit path)
  2. ~/.local/share/carbonyl/bin/<triple>/carbonyl (installed by carbonyl-agent install)
  3. carbonyl on $PATH
  4. Docker fallback (requires CARBONYL_ALLOW_DOCKER=1)

Runtime compatibility matrix (#21)

CI runs the full E2E suite (tests/e2e/) against multiple Carbonyl runtime tags so SDK-vs-runtime drift is caught before it reaches users.

Tag Status Notes
runtime-dd69bef0ea4b2512 Supported (current pin in .carbonyl-runtime-version) Default for carbonyl-agent install
runtime-3f5e5a96aa10c4ac Backwards-compat tested Prior runtime; CI verifies SDK still works against it
Older runtime-* tags Best-effort Not in CI; expected to work but not guaranteed

Pin a specific runtime in your project by writing the hash into .carbonyl-runtime-version (one runtime-hash=<hash> line). The carbonyl-agent install command reads it. Override on the command line with --tag runtime-<hash> for a one-off install.

Docker fallback (opt-in)

When no local binary is installed, the SDK can fall back to docker run fathyb/carbonyl — but this is opt-in for supply-chain safety:

export CARBONYL_ALLOW_DOCKER=1
python -c "from carbonyl_agent import CarbonylBrowser; CarbonylBrowser().open('https://example.com')"

Without CARBONYL_ALLOW_DOCKER=1, attempts to use Docker fallback raise RuntimeError with a clear message. The fallback pulls by pinned SHA256 digest, not a mutable :latest tag.


Error Handling

Common exceptions:

Exception Raised when
ValueError invalid session name (path traversal, too long, empty)
FileExistsError session already exists on create()
KeyError session not found on get() / destroy() / restore()
RuntimeError destructive op on a live session; Docker fallback blocked
pexpect.EOF / pexpect.TIMEOUT browser subprocess died or read timed out

Retry pattern for flaky network:

import pexpect
from carbonyl_agent import CarbonylBrowser

b = CarbonylBrowser()
for attempt in range(3):
    try:
        b.open(url)
        b.drain(10)
        break
    except (pexpect.TIMEOUT, pexpect.EOF):
        b.close()
        b = CarbonylBrowser()

Documentation

Related projects

  • carbonyl — the Chromium fork that produces the runtime binary
  • carbonyl-fleet — server for managing N concurrent Carbonyl instances over PTY + Unix socket

Contributing

PRs and issues welcome at git.integrolabs.net/roctinam/carbonyl-agent or github.com/jmagly/carbonyl-agent.

  • Run the test suite: pytest
  • Type-check: mypy --strict src/
  • Lint: ruff check .

Community & Support


License

MIT License — see LICENSE.


Sponsors

The Temporal Layer for Web3

Enterprise-grade timing infrastructure for blockchain applications.

No-Code Smart Contracts for Everyone

Making blockchain-based agreements accessible to all.

AI-Powered Automation Solutions

Custom AI and blockchain solutions for the digital age.

Interested in sponsoring? Open a discussion on GitHub.


Acknowledgments

Built on top of Carbonyl by Fathy Boundjadj. The roctinam/carbonyl fork is actively maintained through the M147 Chromium line. PTY handling via pexpect; terminal parsing via pyte.


About

Python SDK for the Carbonyl terminal browser. Named sessions, daemon mode, bot-detection evasion (Firefox UA, organic mouse paths, HTTP/2 off), screen inspection. Selenium/Playwright alternative purpose-built for LLM agents and lightweight scraping.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors