This project goal is not just raw accessibility-tree parity. ax-grep should
let subagents search, inspect pages, decide the next step, and recover from thin
or blocked pages with less guesswork than an agent-browser snapshot alone.
| Requirement | Current evidence | Completion note |
|---|---|---|
| CLI agent usefulness stays above the readiness floor for every gate-included target. | averageCliAgentScore and minCliAgentScore must stay at or above 0.8; averageAgentExecutorScore and minAgentExecutorScore must stay at or above 0.995; raw accessibility overlap metrics remain diagnostic only. |
Covered by compare:gate and comparison-gate tests. |
| Search agents can open the best or alternate result without rebuilding commands. | agent.resultChoices, openResult, commandArgs; covered by tests/cli.test.ts and averageSearchResultActionScore. |
Covered by focused tests and comparison gate metrics. |
| Page-check agents can read structured evidence instead of raw tree text. | pageCheck.contentEvidence, citations, agent.readTargets, bestReadTarget; covered by read-target, citation, answer-plan, evidence metadata, readability reason, and consistency gates. |
Covered by tests and static comparison scoring. |
| Source-link follow-up keeps a stable pointer back to the source array. | sourceLinkRef on actions, compact actions, page steps, and text output; covered by CLI and public type tests. |
Covered for JSON and text output. |
| Brief handoff remains executable for subagent loops. | agent.executor, agent.handoff, commandArgs, readValue, resultChoices, sourceChoices; covered by brief executor/handoff tests and gates. |
Covered for common search, page, source, form, action-target, and diagnostic cases. |
| Thin, blocked, or browser-needed pages expose why browser capture is needed. | needsBrowserHtml, browserHtml, signals, qualityGates, barriers, and browser retry actions; covered by browser-need, browser-html, signal, and quality-gate scores. |
Covered by non-browser fixtures and comparison scoring. |
| Hidden page signals that are absent from accessibility trees stay discoverable. | Hydration, API, config, policy, schema, resource, media, citation, code-block, and action-target summaries are scored through hidden-signal, hidden-command, response-metadata, count, consistency, and read-target gates. | Covered by static extraction tests; real browser parity still depends on comparison runs. |
| A minimal real page can use static agent handoff without browser capture. | pnpm readiness:real-page-smoke checks https://example.com with --agent-brief, canUseFetchedHtml=true, needsBrowserHtml=false, and named semantic roles. |
Covered as a smoke gate; broader real-page and agent-browser comparison remains. |
A minimal agent-browser comparison set has stable named-role overlap. |
pnpm readiness:agent-browser-smoke checks https://example.com for exact overlap, https://books.toscrape.com/ for catalog-page floors, https://news.ycombinator.com for link-heavy listing floors, and https://www.gov.uk/foreign-travel-advice for government index/search-page floors. |
Covered as a four-target smoke gate; broader agent-browser comparison remains. |
| Text-heavy documents separate structural readiness from raw StaticText volume. | pnpm readiness:agent-browser-text-heavy-smoke checks Korean Wikipedia with structural content, action, navigation, and text-recall fields. |
Covered as a separate smoke gate; not part of the main overlap gate. |
| Operational safety prevents host overload during validation. | AGENTS.md, vitest.config.ts, docs/benchmarks.md, docs/comparison-baseline.md, the agent-browser comparison lock, and finally-based session close helpers in browser comparison scripts. |
Commands must still be run one at a time, with pnpm check:processes before and after risky browser-backed runs. |
Do not call this objective complete from unit tests alone. A completion audit must inspect:
pnpm exec tsc --noEmitpnpm readiness:auditpnpm readiness:real-page-smokepnpm readiness:agent-browser-smoke- focused non-browser Vitest coverage for changed contracts
pnpm compare:gate <latest comparison report>for saved comparison output- process cleanup before and after browser-backed comparison commands
- current docs proving README details remain split into
docs/
Browser-backed comparison suites must run sequentially. If the host is already under browser load, postpone them rather than starting another comparison.