ax-grep reproduces a browser-like accessibility tree before opening a browser.
It parses HTML into a DOM-like tree, computes role, name, state, focusability,
and interactivity, then reshapes that structure into agent-ready agent,
pageCheck, verification, and handoff output.
The public library entry point, extract(), delegates
to the static HTML extractor,
extractStaticSemanticTree(). That function parses HTML
with htmlparser2, resolves extraction options, and indexes id, aria-*
references, label for, and collapsed-control relationships first
(indexDocument()).
Then walkElement() visits each element and builds a
SemanticNode that approximates the structure a browser would expose through
its accessibility tree. getRole() maps explicit
role values and HTML tags to accessibility roles, while
computeName() computes accessible names from
aria-labelledby, aria-label, <label>, alt, and text content.
getState() carries states such as checked,
selected, expanded, disabled, and aria-current.
isFocusable() and
isInteractive() mark controls an agent can click or
type into.
Static HTML often contains layout wrappers, closed menus, repeated cards, ads,
and footer/sidebar boilerplate. walkElement() does not pass all of that
through unchanged. Collapsed subtrees are reduced by
shouldSkipChildrenForCollapsedElement(), likely
closed overlays are removed by
isLikelyClosedOverlay(), and generic wrappers are
pruned or flattened by shouldPrune().
The goal is not to summarize every byte of the original HTML. The goal is to preserve the structure an agent is likely to read, cite, click, or continue from.
For WebViews and already-open pages,
extractSemanticTree() walks the live document
instead of static parser output. The browser DOM version of
walkElement() fills the same role, name, state,
selector, xpath, and child fields, and can also attach bounds, shadow DOM, and
iframe information.
Live page changes are handled by
observeSemanticTree(), which uses MutationObserver
to emit updated semantic snapshots. That makes the same extraction model useful
for mobile WebViews, browser extensions, and in-page agents that need to turn
the current page into an agent-readable structure immediately.
The CLI does more than print a semantic tree.
jsonEnvelope() extracts links, outline entries,
actions, content, and search results.
summarizePageCheck() turns those into content evidence,
forms, action targets, hydration/API hints, and barriers.
Then summarizeAgent() decides what an agent should do
next. It determines whether fetched HTML is enough, whether a search result
should be opened, whether browser-captured HTML is required, and whether there
is enough evidence for an answer. The output is exposed through fields such as
agent.executor, agent.handoff, agent.readTargets, pageCheck, and
verification. In --agent-brief mode,
compactAgentBrief() and
compactAgentBriefHandoff() compress that result for
subagent loops.
Some pages cannot be handled from static HTML alone. hCaptcha, reCAPTCHA,
Cloudflare, Akamai, DataDome, PerimeterX, and Kasada challenge markers are
detected by detectBarrierDiagnostics(). When those
signals appear, the CLI does not pretend the page is readable. It returns a
handoff that tells the agent browser use or additional capture is required.
Search mode follows the same principle. --search --engine auto runs through
DuckDuckGo, Bing, StartPage, and Google in
resolveAutoSearch(), skips blocked or empty result
pages, and keeps the best usable result set.
The reproduction target is measured, not guessed.
scripts/compare.ts runs ax-grep and
agent-browser snapshot against the same URL, then scores named-role overlap
and agent readiness. After comparison,
closeAgentBrowserSession() closes the browser
session, and withAgentBrowserLock() prevents
parallel browser comparisons from overloading the host.
Release smoke floors live in
scripts/check-agent-browser-smoke.ts.
Simple pages require full overlap, while more complex targets use per-site
overlap, recall, and readiness thresholds from 0.75 to 0.90 or higher. In other
words, ax-grep does not blindly copy every browser accessibility node. It keeps
comparing against browser snapshots while tuning the structure agents actually
need.