feat: add page.cua (vision) and page.domCua (DOM-id) toolsets#114
Conversation
QuickJS Error.stack contains only frame lines (no "Name: message" header, unlike V8), so formatError's stack-first formatting dropped the thrown message entirely from stderr. Compose the header in formatError (extracted to format-error.ts), after #toError has applied its prefix, and skip it when the stack already carries one. Also add Buffer.isBuffer to the QuickJS Buffer polyfill. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- start the public id counter at a random high base whenever no inherited counter exists, so cross-origin navigations (empty sessionStorage) never reuse pre-navigation node ids - key the sticky id map by a per-document token minted in the walker, so a navigated child frame gets fresh ids instead of recycling old ones - track successfully-pressed modifier keys and release them in a finally covering the down loop, so an invalid key never leaves modifiers held on the persistent page - pin clip screenshot semantics as viewport-relative by scrolling first Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Agents regex node ids out of the snapshot text, so they arrive as strings; coerce digit-only strings instead of erroring. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Playwright ignores scale:'css' on viewport:null pages (headed and connected Chrome), returning device-pixel images that break the 1:1 cua coordinate contract on Retina displays. Detect the mismatch by parsing the JPEG dimensions and rescale in-page via OffscreenCanvas. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d0d1bc30b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| function frameKey(frame: Frame): string { | ||
| const name = frame.name(); | ||
| if (name) return name; |
There was a problem hiding this comment.
Avoid frame-name collisions when resolving DOM IDs
When a page contains multiple iframes with the same non-empty name, this key collapses all of them to the same value, so #resolveNodeCenter() later picks the first matching frame and can click/type against the wrong iframe (or report the node stale) for IDs from the later frames. Include the index path or another per-frame discriminator even for named frames so snapshot IDs remain tied to the frame they came from.
Useful? React with 👍 / 👎.
Ports the interaction tiers from OpenAI Codex's chrome plugin (
tab.cua/tab.dom_cua) into dev-browser as two new namespaces on every sandboxpage, so agents can act by screenshot coordinates or by snapshot node ids when locators aren't enough.page.cua — pixel/vision tier
click / doubleClick / drag / move / scroll / keypress / type / screenshot, all options-object camelCase.screenshot()saves a JPEG whose pixels map 1:1 onto cua coordinates at any DPR and returns{path, width, height}. Playwright silently ignoresscale:'css'onviewport:nullpages (headed + connected Chrome, 2x on Retina), so the helper detects the mismatch by parsing the JPEG dimensions and downscales in-page viaOffscreenCanvas.clickwaits ~1s for a click-triggered main-frame navigation (then up to 10s for the load);waitForNavigation: falseopts out.ctrl→ControlOrMeta,ctrl+y→ControlOrMeta+Shift+z, …); modifiers are tracked and released in reverse infinally, so a failed chord can't leave keys stuck on the persistent page.mouse.move+mouse.wheel); buttons areleft|middle|rightwith a clear error otherwise.page.domCua — DOM-id tier
getVisibleDom()runs a self-contained in-page walker (serialized viaString(fn)) per frame: interactable + visible-in-viewport predicates, shadow DOM, pseudo-HTML output lines (<button node_id=42>Submit</button>), 200-line/20k-char/50-per-frame budgets with explicit truncation markers.click / doubleClick / scroll / type / keypressact bynodeId(number or numeric string). Ids are sticky across snapshots, survive across CLI invocations on named pages, and are minted from a random high base keyed by a per-document token — a stale id always fails fast withDOM node N is stale or missing — re-run getVisibleDom()instead of silently clicking whatever now owns that number after a navigation.Prerequisite fix
QuickJS
Error.stackhas noName: messageheader, andformatErrorpreferred the stack — so thrown script error messages were dropped entirely (throw new Error("boom")printed only frame lines).formatErrornow composes the header (extracted toformat-error.tsfor testability); reproduced with a failing test first.Tests & docs
cua.test.ts,dom-cua.test.ts,format-error.test.ts): exact-coordinate assertions, clip semantics verified while scrolled, iframe act-by-id, id-reuse-after-navigation regressions (same-origin, cross-origin, child-frame), isolated-realm walker serialization check, cross-invocation snapshot→act, and the Retina downscale path. Full suite green;tsc, prettier, both bundles, andcargo buildclean.cli/llm-guide.txtgains Vision and DOM-id workflow sections, the tier-preference ladder, and method-table rows; README + CHANGELOG updated.Verified live
Tested end-to-end against a real running Chrome over
--connect:domCuasnapshot of google.com, click search box by node id,cua.type+ Enter, results screenshot. The Retina coordinate-contract bug was found by exactly this test and fixed in the last commit.Remaining manual checks before release: headed-mode Retina pass and a true cross-origin (OOPIF) iframe click.
🤖 Generated with Claude Code