Skip to content

Latest commit

 

History

History
186 lines (140 loc) · 4.82 KB

File metadata and controls

186 lines (140 loc) · 4.82 KB

Library API and Browser Injection

Entry Points

Situation Use
HTML string from fetch(), SSR, or a Worker extract(html) from ax-grep
Small Worker/static-only bundle extract(html) from ax-grep/static
Code already running inside the page extract() from ax-grep/browser
Puppeteer, Playwright, WebView, or external page controller createExtractorScript() from ax-grep

Static HTML

import { extract } from "ax-grep";

const response = await fetch("https://example.com");
const html = await response.text();
const tree = extract(html);

The root extract(html) function is the same static extractor exposed at ax-grep/static.

import { extract } from "ax-grep/static";

const tree = extract(html, {
  includeAttributes: false,
});

Static extraction can infer roles, names, labels, ARIA state, links, forms, headings, tables, and lists from SSR markup. It cannot see computed CSS, layout bounds, client-rendered DOM, shadow DOM, iframe contents, or post-load mutations.

Browser Injection

Use createExtractorScript() when you control a page from Puppeteer, Playwright, WebView, or an agent browser.

import { createExtractorScript } from "ax-grep";

const tree = await page.evaluate(createExtractorScript());

Playwright example:

import { chromium } from "playwright";
import { createExtractorScript, formatSemanticTreeText } from "ax-grep";

const browser = await chromium.launch();
const page = await browser.newPage();

await page.goto("https://example.com");

const tree = await page.evaluate(createExtractorScript({
  includeBounds: false,
  includeAttributes: false,
}));

console.log(formatSemanticTreeText(tree));

await browser.close();

WebView-style injection works the same way:

import { createExtractorScript } from "ax-grep";

const script = createExtractorScript({
  mode: "interactive",
  format: "json",
});

// Android: webView.evaluateJavascript(script, callback)
// iOS: webView.evaluateJavaScript(script, completionHandler)

Direct In-Page Usage

Use ax-grep/browser when your code is already executing in the page, such as a browser extension content script.

import { extract, formatSemanticTreeText } from "ax-grep/browser";

const tree = extract({
  mode: "interactive",
  includeBounds: false,
});

console.log(formatSemanticTreeText(tree));

Output Shape

extract() returns a SemanticNode tree:

type SemanticNode = {
  id: string;
  tag: string;
  role: string | null;
  name: string;
  interactive: boolean;
  focusable: boolean;
  selector?: string;
  xpath?: string;
  text?: string;
  value?: string;
  state?: Record<string, unknown>;
  attributes?: Record<string, string>;
  children: SemanticNode[];
};

Use formatSemanticTreeText(tree) for a compact prompt-friendly text view, or flattenSemanticTree(tree) and summarizeSemanticTree(tree) for analysis and benchmarks.

Options

const tree = extract(html, {
  mode: "compact",
  includeAttributes: false,
  includeHidden: false,
  includeSelectOptions: true,
  maxTextLength: 240,
});
Option Default Notes
mode "compact" Use "interactive" to keep mostly actionable nodes.
includeAttributes true Turn off for smaller prompt payloads.
includeHidden false Keep hidden/collapsed content only when needed.
includeSelectOptions true Useful for agent planning, verbose for huge selects.
includeTextNodes browser: true, static: false Static extraction relies more on semantic names by default.
maxTextLength 240 Clips long direct text/name fragments.
excludeLikelyAds false Optional heuristic pruning for benchmark or prompt use.
summarizeLargeSubtrees static: true Keeps SSR payloads bounded.
summarizeLikelyLinkFarms static: true Helps forum/sidebar/navigation-heavy pages.

Mutation Stream

import { observeSemanticTree } from "ax-grep/browser";

const observer = observeSemanticTree((change) => {
  console.log(change.mutationCount, change.tree);
}, { debounceMs: 50 });

observer.disconnect();

For injected-script use, createObserverScript() installs an observer on window.__AX_LITE_OBSERVER__ and dispatches __AX_LITE_OBSERVER__:change events.

Worker Example

import { extract } from "ax-grep/static";
import { formatSemanticTreeText } from "ax-grep";

export default {
  async fetch(request: Request): Promise<Response> {
    const url = new URL(request.url).searchParams.get("url");
    if (!url) return new Response("Missing url", { status: 400 });

    const response = await fetch(url);
    const html = await response.text();
    const tree = extract(html);

    return new Response(formatSemanticTreeText(tree), {
      headers: { "content-type": "text/plain; charset=utf-8" },
    });
  },
};