| Situation | Use |
|---|---|
HTML string from fetch(), SSR, or a Worker |
extract(html) from ax-grep |
| Small Worker/static-only bundle | extract(html) from ax-grep/static |
| Code already running inside the page | extract() from ax-grep/browser |
| Puppeteer, Playwright, WebView, or external page controller | createExtractorScript() from ax-grep |
import { extract } from "ax-grep";
const response = await fetch("https://example.com");
const html = await response.text();
const tree = extract(html);The root extract(html) function is the same static extractor exposed at
ax-grep/static.
import { extract } from "ax-grep/static";
const tree = extract(html, {
includeAttributes: false,
});Static extraction can infer roles, names, labels, ARIA state, links, forms, headings, tables, and lists from SSR markup. It cannot see computed CSS, layout bounds, client-rendered DOM, shadow DOM, iframe contents, or post-load mutations.
Use createExtractorScript() when you control a page from Puppeteer,
Playwright, WebView, or an agent browser.
import { createExtractorScript } from "ax-grep";
const tree = await page.evaluate(createExtractorScript());Playwright example:
import { chromium } from "playwright";
import { createExtractorScript, formatSemanticTreeText } from "ax-grep";
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto("https://example.com");
const tree = await page.evaluate(createExtractorScript({
includeBounds: false,
includeAttributes: false,
}));
console.log(formatSemanticTreeText(tree));
await browser.close();WebView-style injection works the same way:
import { createExtractorScript } from "ax-grep";
const script = createExtractorScript({
mode: "interactive",
format: "json",
});
// Android: webView.evaluateJavascript(script, callback)
// iOS: webView.evaluateJavaScript(script, completionHandler)Use ax-grep/browser when your code is already executing in the page, such as a
browser extension content script.
import { extract, formatSemanticTreeText } from "ax-grep/browser";
const tree = extract({
mode: "interactive",
includeBounds: false,
});
console.log(formatSemanticTreeText(tree));extract() returns a SemanticNode tree:
type SemanticNode = {
id: string;
tag: string;
role: string | null;
name: string;
interactive: boolean;
focusable: boolean;
selector?: string;
xpath?: string;
text?: string;
value?: string;
state?: Record<string, unknown>;
attributes?: Record<string, string>;
children: SemanticNode[];
};Use formatSemanticTreeText(tree) for a compact prompt-friendly text view, or
flattenSemanticTree(tree) and summarizeSemanticTree(tree) for analysis and
benchmarks.
const tree = extract(html, {
mode: "compact",
includeAttributes: false,
includeHidden: false,
includeSelectOptions: true,
maxTextLength: 240,
});| Option | Default | Notes |
|---|---|---|
mode |
"compact" |
Use "interactive" to keep mostly actionable nodes. |
includeAttributes |
true |
Turn off for smaller prompt payloads. |
includeHidden |
false |
Keep hidden/collapsed content only when needed. |
includeSelectOptions |
true |
Useful for agent planning, verbose for huge selects. |
includeTextNodes |
browser: true, static: false |
Static extraction relies more on semantic names by default. |
maxTextLength |
240 |
Clips long direct text/name fragments. |
excludeLikelyAds |
false |
Optional heuristic pruning for benchmark or prompt use. |
summarizeLargeSubtrees |
static: true |
Keeps SSR payloads bounded. |
summarizeLikelyLinkFarms |
static: true |
Helps forum/sidebar/navigation-heavy pages. |
import { observeSemanticTree } from "ax-grep/browser";
const observer = observeSemanticTree((change) => {
console.log(change.mutationCount, change.tree);
}, { debounceMs: 50 });
observer.disconnect();For injected-script use, createObserverScript() installs an observer on
window.__AX_LITE_OBSERVER__ and dispatches __AX_LITE_OBSERVER__:change
events.
import { extract } from "ax-grep/static";
import { formatSemanticTreeText } from "ax-grep";
export default {
async fetch(request: Request): Promise<Response> {
const url = new URL(request.url).searchParams.get("url");
if (!url) return new Response("Missing url", { status: 400 });
const response = await fetch(url);
const html = await response.text();
const tree = extract(html);
return new Response(formatSemanticTreeText(tree), {
headers: { "content-type": "text/plain; charset=utf-8" },
});
},
};