Skip to content

Latest commit

 

History

History
211 lines (158 loc) · 5.59 KB

File metadata and controls

211 lines (158 loc) · 5.59 KB

Library API and Browser Injection

Entry Points

Situation Use
HTML string from fetch(), SSR, or a Worker extract(html) from ax-grep
Small Worker/static-only bundle extract(html) from ax-grep/static
Code already running inside the page extract() from ax-grep/browser
Puppeteer, Playwright, WebView, or external page controller createExtractorScript() from ax-grep

Static HTML

ax-grep is ESM-only and requires Node 18 or newer.

import { extract } from "ax-grep";

const response = await fetch("https://example.com");
const html = await response.text();
const tree = extract(html);

The root extract(html) function is the same static extractor exposed at ax-grep/static.

import { extract } from "ax-grep/static";

const tree = extract(html, {
  includeAttributes: false,
});

CommonJS services can import it dynamically:

const { extract, formatSemanticTreeText } = await import("ax-grep");
const tree = extract(html, { includeAttributes: false });
console.log(formatSemanticTreeText(tree));

Static extraction can infer roles, names, labels, ARIA state, links, forms, headings, tables, and lists from SSR markup. It cannot see computed CSS, layout bounds, client-rendered DOM, shadow DOM, iframe contents, or post-load mutations.

Browser Injection

Use createExtractorScript() when you control a page from Puppeteer, Playwright, WebView, or an agent browser.

import { createExtractorScript } from "ax-grep";

const tree = await page.evaluate(createExtractorScript());

Playwright example:

import { chromium } from "playwright";
import { createExtractorScript, formatSemanticTreeText } from "ax-grep";

const browser = await chromium.launch();
const page = await browser.newPage();

await page.goto("https://example.com");

const tree = await page.evaluate(createExtractorScript({
  includeBounds: false,
  includeAttributes: false,
}));

console.log(formatSemanticTreeText(tree));

await browser.close();

WebView-style injection works the same way:

import { createExtractorScript } from "ax-grep";

const script = createExtractorScript({
  mode: "interactive",
  format: "text",
});

const text = await page.evaluate(script);

Android and iOS WebViews return the script result through their normal JavaScript evaluation callbacks:

// Android: webView.evaluateJavascript(script) receives a JSON-encoded string.
// iOS: webView.evaluateJavaScript(script) receives the text or object value.

Direct In-Page Usage

Use ax-grep/browser when your code is already executing in the page, such as a browser extension content script. Extension content scripts usually need a bundler, and the result reflects the content script world that executed it.

import { extract, formatSemanticTreeText } from "ax-grep/browser";

const tree = extract({
  mode: "interactive",
  includeBounds: false,
});

console.log(formatSemanticTreeText(tree));

Output Shape

extract() returns a SemanticNode tree:

type SemanticNode = {
  id: string;
  tag: string;
  role: string | null;
  name: string;
  interactive: boolean;
  focusable: boolean;
  selector?: string;
  xpath?: string;
  text?: string;
  value?: string;
  state?: Record<string, unknown>;
  attributes?: Record<string, string>;
  children: SemanticNode[];
};

Use formatSemanticTreeText(tree) for a compact prompt-friendly text view, or flattenSemanticTree(tree) and summarizeSemanticTree(tree) for analysis and benchmarks.

Options

const tree = extract(html, {
  mode: "compact",
  includeAttributes: false,
  includeHidden: false,
  includeSelectOptions: true,
  maxTextLength: 240,
});
Option Default Notes
mode "compact" Use "interactive" to keep mostly actionable nodes.
includeAttributes true Turn off for smaller prompt payloads.
includeHidden false Keep hidden/collapsed content only when needed.
includeSelectOptions true Useful for agent planning, verbose for huge selects.
includeTextNodes browser: true, static: false Static extraction relies more on semantic names by default.
maxTextLength 240 Clips long direct text/name fragments.
excludeLikelyAds false Optional heuristic pruning for benchmark or prompt use.
summarizeLargeSubtrees static: true Keeps SSR payloads bounded.
summarizeLikelyLinkFarms static: true Helps forum/sidebar/navigation-heavy pages.

Mutation Stream

import { observeSemanticTree } from "ax-grep/browser";

const observer = observeSemanticTree((change) => {
  console.log(change.mutationCount, change.tree);
}, { debounceMs: 50 });

observer.disconnect();

For injected-script use, createObserverScript() installs an observer on window.__AX_LITE_OBSERVER__ and dispatches __AX_LITE_OBSERVER__:change events.

import { createObserverScript } from "ax-grep";

await page.evaluate(createObserverScript({ format: "text" }));
await page.evaluate(() => window.__AX_LITE_OBSERVER__?.disconnect());

Worker Example

import { extract } from "ax-grep/static";
import { formatSemanticTreeText } from "ax-grep";

export default {
  async fetch(request: Request): Promise<Response> {
    const url = new URL(request.url).searchParams.get("url");
    if (!url) return new Response("Missing url", { status: 400 });

    const response = await fetch(url);
    const html = await response.text();
    const tree = extract(html);

    return new Response(formatSemanticTreeText(tree), {
      headers: { "content-type": "text/plain; charset=utf-8" },
    });
  },
};