A Rust MCP server that gives an AI coding agent a closed build → see → interact → debug loop over external native GUI applications.
glass lets an agent launch a GUI app, capture what is on screen, inject mouse and keyboard input, read the app's logs, and detect visual changes — so a coding agent can build and debug UI applications independently instead of asking the user "does this look right?".
glass drives apps as an external black box, so it works with any native GUI app
regardless of toolkit or language. It currently has two Linux backends — X11 and
Wayland (wlroots) — and a Windows backend (Windows.Graphics.Capture,
SendInput, UI Automation), behind a platform-agnostic core; a macOS backend is
planned. See packaging/README-windows.md
for the Windows build and setup.
Where glass stands feature-by-feature across backends and OSes. ✓ supported · – not supported · 🚧 planned.
| Capability | X11 | Wayland | Windows | macOS |
|---|---|---|---|---|
| Screen capture — full + region crop | ✓ | ✓ | ✓ | 🚧 |
| Click / move | ✓ | ✓ | ✓ | 🚧 |
| Type text · key chord | ✓ | ✓ | ✓ | 🚧 |
| Scroll · drag | ✓ | ✓ | ✓ | 🚧 |
| Modifier-held click / drag / scroll | ✓ | ✓ | ✓ | 🚧 |
| Window discovery | ✓ | ✓ | ✓ | 🚧 |
Multi-window (glass_list_windows / glass_select_window) |
✓ | ✓ | ✓ | 🚧 |
| Window move / resize / focus | ✓ | ✓ | ✓ | 🚧 |
| Log capture (stdout / stderr) | ✓ | ✓ | ✓ | 🚧 |
| Clipboard get / set | ✓ | ✓ | ✓ | 🚧 |
| Linux | Windows | macOS | |
|---|---|---|---|
| Provider | AT-SPI | UI Automation | AX 🚧 |
| Serves backends | X11 + Wayland | Windows | — |
| Tree snapshot · click-by-element | ✓ | ✓ | 🚧 |
Set value (glass_set_value) |
✓ | ✓ | 🚧 |
| Value population (text / numeric) | ✓ | ✓ | 🚧 |
Set-of-Mark overlay (glass_a11y_marks) |
✓ | ✓ | 🚧 |
Accessibility is per-OS (AT-SPI serves both Linux backends). It returns an error — never a fake tree — for apps with no accessible UI (bare canvas / game UIs), so the agent falls back to pixels.
| Linux | Windows | macOS | |
|---|---|---|---|
| Engine | bubblewrap | Sandboxie Classic | — 🚧 |
off / default / strict |
✓ | ✓ | accepts, not enforced 🚧 |
| Fail-closed when engine absent | ✓ | ✓ | n/a |
| Build step contained | ✓ | ✓ | 🚧 |
| X11 | Wayland | Windows | macOS | |
|---|---|---|---|---|
| Display isolation (app off your desktop) | ✓ private Xvfb | ✓ headless sway | – interactive desktop¹ | 🚧 |
| Headless (no host desktop needed) | ✓ | ✓ | – needs a session² | 🚧 |
| Clipboard isolation | ✓ | ✓ | ✓ private (contained)³ | 🚧 |
¹ A Windows VirtualDisplay / headless provider is a planned follow-on; stronger isolation today is
the VM tier (the Windows Sandbox .wsb template under packaging/windows-sandbox/, or a managed
VM running glass-mcp serve --http). ² Windows needs an interactive, logged-in session to render
and capture. ³ When contained (sandbox=default/strict), the boxed app gets a private clipboard
isolated from yours — an injected hook backs its clipboard with glass's own store; sandbox=off
uses the real OS clipboard. Carries text, HTML, RTF, and images for apps using either the Win32 or
the OLE clipboard (so rich apps like Word, Excel, and Chrome work too; x64); real-file copy via CF_HDROP works; virtual-file drag-out (shell extensions, zip attachments) is deferred.
Transport: MCP over stdio (default, all platforms) or network HTTP (glass-mcp serve --http, all platforms) — the network transport is behind the default-on network cargo feature
(a --no-default-features build is stdio-only).
-
Rust, via rustup. glass pins a nightly toolchain in
rust-toolchain.toml(needed for the portable-SIMD hot paths); rustup installs it automatically on the first build, so there's no toolchain to choose. -
A display dependency, for the backend you'll run:
- Linux / X11 (default): the headless X server —
sudo apt-get install -y xvfb(Debian/Ubuntu; Fedoraxorg-x11-server-Xvfb, Archxorg-server-xvfb). glass spawns its own private display, so this binary is the only thing to install. - Linux / Wayland: a discoverable
sway ≥ 1.12plus Mesa software GL — see Running on Wayland. - Windows: nothing extra; glass uses built-in Windows APIs.
- Linux / X11 (default): the headless X server —
-
A containment runtime — launched apps are sandboxed by default, and the
defaultlevel is fail-closed: with no sandbox available,glass_starterrors rather than running the app unconfined. So either install the runtime, or setGLASS_SANDBOX=offon the server to launch apps unconfined:- Linux: bubblewrap —
sudo apt-get install -y bubblewrap(Fedora/Arch:bubblewrap) — and unprivileged user namespaces enabled. Ubuntu 23.10+ restricts them via AppArmor; allow withsudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0(persist via/etc/sysctl.d/). - Windows: Sandboxie Classic, installed with its service running.
See Containment / sandboxing for the levels;
glass-mcp doctorchecks availability and prints the exact remedy for your system. - Linux: bubblewrap —
git clone https://github.com/fixed-width/glass
cd glass
cargo build --release -p glass-mcp # → target/release/glass-mcp(Tagged releases also attach prebuilt binaries to the GitHub Releases page, with
per-platform setup notes under packaging/.)
./target/release/glass-mcp doctor # checks the environment, with a remedy for any gapBy default glass-mcp speaks MCP over stdio, so you register the binary with
your MCP client. (To attach from another machine, see
Over the network.)
Claude Code:
claude mcp add glass --scope user -- /absolute/path/to/target/release/glass-mcpClaude Desktop / project .mcp.json:
{
"mcpServers": {
"glass": {
"command": "/absolute/path/to/target/release/glass-mcp"
}
}
}No env is needed: on Linux, the default X11 backend spawns its own private headless
display (see Running on X11), and the agent picks
the backend per call via glass_start's backend argument (see
Backends). Add an env block only to change the defaults —
"env": { "GLASS_DISPLAY": ":42" } to attach to a display you manage, or
"env": { "GLASS_BACKEND": "wayland" } to make Wayland the default backend.
The agent then gets tools like glass_start, glass_screenshot, glass_click,
glass_drag, glass_scroll, glass_type, glass_key, glass_wait_stable,
glass_baseline_save, glass_diff, glass_logs, glass_list_windows,
glass_select_window, glass_a11y_snapshot, glass_click_element, glass_set_value,
glass_a11y_marks, glass_wait_for_element, glass_wait_for_region,
glass_wait_for_log, glass_do, glass_clipboard_get, glass_clipboard_set, and
glass_doctor.
stdio requires glass-mcp to run on the same machine as the agent. When the agent and the target app are on different machines, run glass-mcp as a network server on the app's machine (rmcp Streamable HTTP) and point your client at the URL:
mkdir -p ~/.glass
glass-mcp gen-token --out ~/.glass/token # cross-platform CSPRNG token
glass-mcp serve --http --addr 0.0.0.0:7300 --token-file ~/.glass/tokenThe client supplies the token as an Authorization: Bearer <token> header. Binding a
non-loopback address without a token is refused (fail-closed); a loopback bind needs
no token and pairs with an SSH tunnel for confidentiality
(ssh -L 7300:127.0.0.1:7300 user@appbox, then point the client at
http://127.0.0.1:7300/). The network transport is behind the default-on network
cargo feature (a --no-default-features build is stdio-only).
glass-mcp doctor checks that the environment glass needs is in place (Xvfb for X11,
a discoverable sway ≥ 1.12 and Mesa software GL for Wayland) and prints how to fix
anything missing:
glass-mcp doctor # per-check ✓/⚠/✗ with remedies; exits non-zero if the
# default backend can't run (CI-friendly)
glass-mcp doctor --deep # additionally spawn + tear down the display to prove it starts
glass-mcp doctor --json # machine-readable outputThe agent can run the same checks itself via the glass_doctor tool (e.g. to
self-diagnose a failed glass_start).
To see how glass is configured (as opposed to whether it can run), use env:
glass-mcp env # all GLASS_* vars: purpose, default, current value
glass-mcp env --json # machine-readableIt lists every GLASS_* variable (see External tool paths and the
backend/containment sections) with its default and current value; the network token
(GLASS_TOKEN) is shown only as set/(unset), never printed.
Run glass-mcp --help for the full command list, glass-mcp <command> --help for a
command's flags, and glass-mcp --version for the version. (With no command, glass-mcp
serves MCP over stdio — the default.)
A few capabilities worth knowing:
- Region capture.
glass_screenshotandglass_wait_stableaccept an optional window-relativeregionso the agent can grab just the area it cares about. Vision-model image cost scales with pixel area, so a tight region is a large, recurring token saving versus the whole window. - Region-scoped settling.
glass_wait_stablealso takes astability_region— it waits for that sub-rectangle to stop changing, ignoring unrelated motion elsewhere (a clock, a spinner) that would otherwise keep the window from ever settling. - Wait-for-condition tools. Three text-only blocking waits collapse
screenshot poll-loops into a single call:
glass_wait_for_elementblocks until a UI element reaches a precise state (e.g. a button becomes enabled) and returns the element's#idfor immediate use withglass_click_element;glass_wait_for_regionblocks until a watched region changes or converges to a saved baseline;glass_wait_for_logblocks until a matching log line appears. All return{matched, …}and time out softly with{matched:false}. - Modifier-held clicks/drags/scrolls.
glass_click,glass_drag, andglass_scrollaccept an optionalmodifiersarray (e.g.["ctrl"],["ctrl","shift"]) that holds Ctrl/Shift/Alt/Super during the action — enabling shift/ctrl-click multi-select, modified drags, and Ctrl+scroll. - Batched input (
glass_do). Run an ordered sequence of input actions (click/type/key/move/drag/scroll/settle) in one call with an optional text-firstthenobserve (settle/diff/screenshot), collapsing per-action round-trips and failing fast at the offending action. Use for KNOWN sequences (login, form-fill, menu→item); if you need to see a result to choose the next action, don't batch that part. - Clipboard get/set.
glass_clipboard_getreads the clipboard as text (""when empty);glass_clipboard_setwrites text so the app can paste it. Both are isolated to the app's display on the private Xvfb/sway backends, and on Windows a sandboxed app gets a private clipboard too (an injected hook backs the boxed app's clipboard with glass's own store) — so they never touch your real clipboard unless you setGLASS_DISPLAY=:0or run the Windows backend withsandbox=off.glass_clipboard_getis also the cheap text-extraction path: issuectrl+athenctrl+cviaglass_do, then read here — faster and token-free compared to OCR for any app with selectable text. - Real window managers. On X11, window discovery uses
_NET_WM_PID, a title/class hint, and_NET_CLIENT_LIST, so glass finds an app's window whether it runs bare onXvfbor reparented under a desktop WM's decorations. On Wayland, glass enumerates the app's windows over the IPC of the headless sway compositor it spawns for the session. - Multiple windows.
glass_list_windowsenumerates the app's top-level windows (id, title, class, geometry, which is active);glass_select_windowmakes one active, and subsequent capture/click/type/window ops target it with window-relative coordinates. All three backends enumerate every top-level the app owns (X11 via EWMH, Wayland via sway IPC, Windows via the launched Job's windows). - Accessibility tree (semantic addressing). Where the app exposes an
accessibility tree (most GTK/Qt/toolkit apps — not bare canvas/Unity/game UIs),
glass_a11y_snapshotreturns its elements as compact text — role, name, and window-relative bounds, each with an#id— andglass_click_elementclicks one by#id. That's deterministic, low-token element addressing that complements the pixel loop; it errors (never a fake tree) for apps with no accessible UI, so the agent falls back to screenshots. Available on Linux (AT-SPI viaat-spi2-core, serving both X11 and Wayland) and Windows (UI Automation);./scripts/test-a11y.shexercises the Linux reader end-to-end.glass_a11y_marksreturns the same elements as a numbered Set-of-Mark overlay drawn on the screenshot (plus a text legend) for agents that ground visually — click a mark withglass_click_elementby its#id.
On Linux, launched apps run inside a bubblewrap sandbox by default (filesystem +
process containment, network on). Three levels are available via glass_start's sandbox
arg or the GLASS_SANDBOX environment variable:
default— bubblewrap containment, network on (the default).strict— same asdefaultplus--unshare-net(no outbound network from the app or build).off— no containment; app runs unconfined.
default and strict are fail-closed: if bwrap is not installed or unprivileged user namespaces
are disabled, glass_start returns an error rather than silently falling back to unconfined.
Install bubblewrap with sudo apt-get install -y bubblewrap on Debian/Ubuntu.
On Windows, default/strict give real in-OS containment via
Sandboxie Classic (filesystem/registry virtualization; the boxed app still renders, is
WGC-captured, and is SendInput-driven on the interactive desktop). default = contained,
network on; strict = contained, no network egress; off = launched unconfined. The engine
is Sandboxie Classic (cleanly GPLv3 — Plus needs a commercial "Business Certificate"); you
install it yourself (sandboxie-plus.com/downloads), and
glass only invokes Start.exe/SbieIni.exe as subprocesses (no linking) — the same model as
Linux bubblewrap. It is configurable, not hardcoded: GLASS_WIN_SANDBOX_PROVIDER=auto|sandboxie|none
(default auto) and GLASS_SANDBOXIE_DIR (default %ProgramFiles%\Sandboxie, auto-detected).
Like Linux, default/strict are fail-closed: if no in-OS provider is available (Sandboxie
absent / its service not running, or provider=none), glass_start errors rather than running
unconfined — off is the explicit escape hatch. The build step also runs contained. Native
AppContainer / Low-integrity were evaluated on-box and rejected (the integrity-drop makes
ordinary Win32 apps fail to render; they need per-app tuning, whereas Sandboxie virtualizes
transparently). For even stronger isolation, the VM tier remains the stronger option: the
checked-in Windows Sandbox template under packaging/windows-sandbox/, or a managed VM running
glass-mcp serve --http. glass_doctor reports this posture (its Windows sandbox section).
glass-mcp doctor # checks sandbox availability alongside display/compositor depsPass --audit-log <path> (or set GLASS_AUDIT_LOG=<path>) to append a JSONL record of
every actuation glass performs — launch/stop, type, key, click, drag, scroll, set_value,
clipboard writes, element clicks, window focus/resize/move, and each glass_do
sub-action. Reads (screenshots, diffs, accessibility snapshots, log/clipboard reads) are
not logged. The hook lives in the core actuation path, so no actuation can bypass it. One
JSON object per line: seq, ts, action, target, args, result, and for
content-bearing actions a content descriptor.
Typed/clipboard/launch content is redacted by default to a length + SHA-256 + short
prefix, so the log is not a secret sink. GLASS_AUDIT_CONTENT=full stores verbatim text,
none stores no content, and GLASS_AUDIT_PREFIX_LEN=<n> sizes the prefix (0 disables
it). glass-mcp doctor reports whether auditing is on, the path, and the content mode.
Two things are recorded in plaintext regardless of GLASS_AUDIT_CONTENT: the short
content prefix (default 8 chars — set GLASS_AUDIT_PREFIX_LEN=0 to drop it), and
target metadata (the active window's title and an element's role/name) which is
attribution, not actuation content. A window title or field label can itself be sensitive,
so treat the log as confidential. Launch records intentionally omit env and cwd.
glass shells out to a few third-party programs. Each resolves from a GLASS_*
environment variable when set, otherwise a sensible default (a bare name found on
PATH). Point a variable at a full path to use a binary in a non-standard location.
| Tool | Env var | Default | Used by |
|---|---|---|---|
| bubblewrap | GLASS_BWRAP |
bwrap (on PATH) |
Linux app + build containment |
| Xvfb | GLASS_XVFB |
Xvfb (on PATH) |
X11 private headless display |
| sway | GLASS_SWAY |
auto-discovered¹ | Wayland headless compositor |
| build shell | GLASS_SH |
sh (on PATH) |
running spec.build |
| Sandboxie dir | GLASS_SANDBOXIE_DIR |
%ProgramFiles%\Sandboxie |
Windows containment |
¹ Otherwise sway is discovered automatically: a recent-enough one on PATH, then
~/.local/share/glass/sway/bin/sway, then next to the glass-mcp binary. GLASS_SWAY
forces a specific binary and skips that search (and fails closed if the path is wrong).
glass_doctor reports the resolved paths.
The backend is chosen per glass_start — the tool takes an optional
backend ("x11" or "wayland" on Linux, "windows" on a Windows host), so the
agent can pick per launch with no server restart. When omitted it falls back to the
GLASS_BACKEND environment variable, then to the host default (windows on a
Windows host, otherwise x11). The backend is built on glass_start (so the
server boots even with no display/compositor), and the MCP tools behave identically
across backends — only the setup differs:
- X11 (Linux default) — spawns its own private headless
Xvfb(nothing to set up), or attaches to a display you name withGLASS_DISPLAY. See Running on X11. - Wayland (wlroots) — spawns a private headless
swaycompositor per session, so there's no ambient display to set up. See Running on Wayland. - Windows (default on a Windows host) — drives the app on the interactive
desktop (WGC capture, SendInput, UI Automation). See
packaging/README-windows.md.
The X11 backend chooses its display from GLASS_DISPLAY — it never reads
ambient $DISPLAY, so the environment you launch from can't accidentally aim
glass at your live desktop:
GLASS_DISPLAYunset (default) — glass spawns its own private headlessXvfbon a free display, logs the chosen number to stderr (glass: spawned a private headless X11 display :N), and tears it down on exit. Zero setup, fully isolated. RequiresXvfbinstalled (sudo apt-get install -y xvfb); override the size withGLASS_XVFB_SCREEN(default1280x800x24).GLASS_DISPLAY=:42(or bare42) — attach to a display you manage, e.g. a persistent sandbox you want to keep watching over VNC (see below).GLASS_DISPLAY=:0— deliberately drive your real desktop. The agent moves your actual cursor and pops real windows; useful for driving live apps, but it competes with you for input. This only happens when you ask for it explicitly.
To watch the default headless display live, point a VNC viewer at the logged
number: x11vnc -display :N + any VNC viewer (or Xephyr for a window).
If you'd rather run your own display — to keep a VNC view pinned across server
restarts, say — start one and set GLASS_DISPLAY to it. A helper manages a
sandbox Xvfb (defaults to :42; override the number with GLASS_DISPLAY, the
size with GLASS_XVFB_SCREEN):
./scripts/sandbox-xvfb.sh start # also: status | stop | restartThen register glass with "env": { "GLASS_DISPLAY": ":42" }. Watch it with
x11vnc -display :42 + any VNC viewer, or run a windowed Xephyr :42.
Run the Xvfb at login via a systemd user service:
# ~/.config/systemd/user/glass-xvfb.service
[Unit]
Description=glass sandbox Xvfb display :42
[Service]
ExecStart=/usr/bin/Xvfb :42 -screen 0 1280x800x24
Restart=on-failure
[Install]
WantedBy=default.targetsystemctl --user daemon-reload
systemctl --user enable --now glass-xvfb.service
loginctl enable-linger "$USER" # optional: keep it up without an active login(Adjust the Xvfb path to command -v Xvfb.) Or, for desktop-only autostart,
drop an equivalent Exec=Xvfb :42 -screen 0 1280x800x24 into a
~/.config/autostart/glass-xvfb.desktop entry.
Requires Xvfb installed (sudo apt-get install -y xvfb on Debian/Ubuntu).
Select it per launch with glass_start's backend: "wayland", or make it the
default for every launch with GLASS_BACKEND=wayland (e.g.
"env": { "GLASS_BACKEND": "wayland" } in the MCP config). Unlike X11, this
backend doesn't attach to an ambient display — for each session it spawns a
private headless sway instance (sway is the
third-party wlroots-based Wayland compositor) and runs the target app inside it. The app's windows float at their natural size;
glass_list_windows/glass_select_window enumerate and switch between them over
sway IPC. Capture goes through wlr-screencopy of the active window's output
region, and input through the wlr-virtual-pointer and zwp_virtual_keyboard
protocols.
glass needs a sway ≥ 1.12 / wlroots ≥ 0.20 it can discover (no env var): on
PATH (once your distro ships one that new), or installed to
~/.local/share/glass/sway/ by the sway-build tool, or in a
sway/ dir beside the glass-mcp binary. It also needs the host's Mesa software GL so GPU-less hosts can
render:
sudo apt-get install -y libegl1 libgl1-mesa-dri # Debian/UbuntuBecause sway is headless and per-session, there's nothing to set up or keep
running — no persistent display, no $DISPLAY/$WAYLAND_DISPLAY. sway also
launches an Xwayland server, so X11-only apps run under this backend too.
Because the target app runs inside the headless sway that glass spawns (not the
host's compositor), this backend works on any Linux host — including GNOME and
KDE desktops, where the host desktop is simply irrelevant. Driving the user's
existing live desktop session — the Wayland analog of X11 GLASS_DISPLAY=:0
— is a separate, deliberate non-goal: it requires the XDG-portal path with an
interactive consent dialog, unsuited to unattended use.
Per-frame hot-path micro-benchmarks (criterion) live in crates/*/benches/:
cargo bench -p glass-core -p glass-x11 # run all (diff, webp encode/decode, xdata_to_rgba)
cargo bench -p glass-core -p glass-x11 -- --save-baseline main # save a baseline, then compare after a change:
cargo bench -p glass-core -p glass-x11 -- --baseline main(Only glass-core and glass-x11 carry benchmarks; their libs set bench = false
so cargo bench runs the criterion targets rather than the unit-test harness,
which would reject criterion's --save-baseline/--baseline flags.)
Profile a hot path as a flamegraph (needs cargo install flamegraph and
kernel.perf_event_paranoid <= 1):
./scripts/bench.sh diff "identical/1920x1080" # writes flamegraph.svgThe Linux feature set is implemented and tested across both Linux backends (X11 and Wayland/wlroots), and the Windows backend (WGC capture, SendInput, UI Automation) is built and CI-tested; macOS is the one OS backend not yet built.
glass is open core, licensed Apache-2.0 — see LICENSE-APACHE.