1 stable release
Uses new Rust 2024
| new 1.0.0 | May 14, 2026 |
|---|
#500 in Text processing
Used in 4 crates
67KB
1K
SLoC
Index is a terminal-native semantic browser with an adaptive transformer.
It does not try to clone a graphical browser. It treats the web as input and recomposes it into a calm, keyboard-first, scriptable terminal interface.
Product thesis
Modern pages are designed for pixels, animation, tracking, overlays, mouse input, and JavaScript-heavy state.
Index is designed for:
- reading
- navigation
- forms
- extraction
- search
- bookmarking
- scripting
- command-first workflows
- terminal-native interaction
The core product is the transformer:
URL / HTML / Markdown / RSS / app snapshot
-> parse
-> classify
-> extract semantic intent
-> remove noise
-> emit Index Document Model
-> render in terminal
Current repository state
This workspace currently includes:
- Rust workspace
- core document model
- static HTML parser backed by
scraper - readability extractor with main-content detection
- typestate transformer pipeline
- orthogonal transform instruction set
- terminal renderer backed by
ratatuiwith semantic colors, reader profiles, symbols, syntax hiding, padded structural blocks, a high-cyan prompt, a mildly highlighted current line, structural sidebar modes, hidden response logs, and URL-history suggestions - bounded layout rhythm extraction from semantic block boundaries and simple CSS spacing hints
- semantic page-region summaries that prioritize explicit main content and allow secondary navigation, related, comment, aside, and footer regions to be expanded from the sidebar
- navigation state with history, bookmarks, session restore primitives, origin state, and cache keys
- live HTTP fetching, redirect validation, and filesystem response cache primitives
- static form extraction and semantic command submission actions
- site adapter registry for supported task-oriented views
- expanded knowledge adapter fixtures for code forges, docs, reference pages, forums, research abstracts, and archive items
- robustness fixture matrix for malformed, sparse, code-heavy, table/list-heavy, navigation-heavy, accessibility-rich, and international page shapes
- headless snapshot fallback abstraction with deterministic policies and accessibility-tree-first extraction when semantic roles are strong enough
- origin-scoped authentication and cookie primitives with redaction
- deterministic extraction and pipe confirmation policy
- optional AI-assisted transformation boundary with offline fallback
- hostile-input security policy with size limits and redirect validation
- actionable failure diagnostics for failed or low-confidence transforms with likely causes and exact local commands
- document quality scoring for adapter, strong generic, partial generic, fallback, and failed transform paths
- local reader repair commands for cycling plausible main regions, hiding or showing noisy regions, and promoting a section into temporary focus
- transformed document and renderer layout caches for repeated workflows
- canonical runtime artifacts (
index-artifact-v1) keyed by canonical URL and context with stale-while-revalidate behavior for repeated navigation - local knowledge shelf records with Markdown/JSON exports, citations, tags, and notes
- offline shelf search across saved metadata and local Markdown exports
- adapter contribution harness reports for fixture-backed adapter reviews
- offline knowledge workflows for deterministic saves, citations, selected section export, batch extraction, and bookmark notes/tags
- local capture artifacts for unsupported page shapes with credential redaction
- TUI capture preview/save commands for local artifacts from the current page
- runtime-compatible
index.pack/v1loading with deterministic precedence, local policy controls, and rollback snapshots - standalone
index-compat-labtooling for compatibility ingest, synthesis, linting, and override merge workflows - app icon, banners, and official terminal font guidance
- installable package artifacts with man page and shell completions
- production-readiness policies for compatibility, MSRV, adapters, diagnostics, benchmarks, and issue intake
- CLI prototype with TUI, plain output, machine-readable extraction, and offline AI modes
- unit and fixture tests
- roadmap
- changelog
- RFCs
- ADRs
- AGENTS.md for AI-agent-driven development
The implementation remains transformer-first: pages are fetched or read, parsed into a semantic document model, and only then rendered in the terminal. Stateful browsing is represented through semantic session types while live URL loading and :open navigation are composed through fetcher and renderer action boundaries.
Site adapters now run inside the transformer for recognized canonical URLs and emit task-oriented IndexDocument views before falling back to the generic reader.
Workspace layout
.
├── AGENTS.md
├── CHANGELOG.md
├── ROADMAP.md
├── Makefile
├── Cargo.toml
├── assets
│ ├── black-banner.png
│ ├── black-icon.png
│ ├── white-banner.png
│ └── white-icon.png
├── crates
│ ├── index-cli
│ ├── index-ai
│ ├── index-capture
│ ├── index-compat-lab
│ ├── index-core
│ ├── index-dom
│ ├── index-extract
│ ├── index-headless
│ ├── index-http
│ ├── index-readability
│ ├── index-renderer
│ ├── index-security
│ └── index-transformer
├── docs
│ ├── ARCHITECTURE.md
│ ├── COMPATIBILITY.md
│ ├── DIAGNOSTICS.md
│ ├── MSRV.md
│ ├── PERFORMANCE.md
│ ├── SECURITY.md
│ ├── SPEC.md
│ ├── adr
│ ├── issue-templates
│ └── rfc
└── examples
└── sample.html
Local commands
make fmt
make clippy
make test
make coverage
make coverage-catalog
make dogfood-corpus
make forum-corpus
make top100-corpus
make security-review
make compatibility
make compatibility-slo
make compatibility-slo-v2
make compatibility-backlog
make readability-lift-v2
make actionability-lift-v2
make failure-quality-v3
make index-idx-adoption-v1
make family-pack-expansion-v2
make compatibility-pack-runtime-v1
make compat-lab-bootstrap-v1
make compat-rule-synthesis-v1
make compat-pack-trust-v1
make compat-pack-hotswap-v1
make compat-pack-ci-v1
make compat-no-binary-release-v1
make live-variance-v1
make app-shell-recovery-v2
make auth-assist-v1
make challenge-failure-ux-v1
make layout-fidelity-v3
make international-text-v2
make structured-data-recovery-v1
make compat-data-plane-v2
make compatibility-recovery-gate
make performance-great
make security-best
make ux-great
make readiness-great
make security-closure-v1
make performance-capacity-v1
make ux-interaction-v1
make operability-evidence-v1
make contract-freeze-v1
make release-1-0-gate-v1
make robustness-gate
make beta-readiness
make stable-readiness
make audit
make verify
make package
make package-dry-run
make package-manifest
make package-smoke
make release-candidate-dry-run
make bench
make alpha-smoke
make run
make coverage enforces a minimum 93% line coverage using cargo-llvm-cov.
make coverage-catalog validates the fixture paths listed by the coverage program.
make dogfood-corpus validates committed and live dogfooding corpus manifests
without fetching live URLs.
make forum-corpus validates forum target-domain tiers and fixture mappings.
make top100-corpus validates top-100 target-domain rows, tiers, known-limit
classes, and fixture mappings.
make security-review validates the hostile-input abuse-case catalog and
release security checklist.
make compatibility validates terminal compatibility and accessibility
release notes.
make compatibility-slo scores top-100 and forum corpus compatibility against
release floors for readability, actionability, and failure quality.
make compatibility-slo-v2 enforces global + per-family SLO thresholds and
optional baseline delta reporting for release candidates.
make compatibility-backlog emits a deterministic top-N compatibility queue
with recommended roadmap milestone linkage.
make readability-lift-v2 validates dense-root selection, boilerplate
suppression, spacing, and code-preservation fixtures for generic extraction.
make actionability-lift-v2 validates link ranking/deduplication, forum
next-step extraction, and form-default submission modeling.
make failure-quality-v3 validates blocked-flow taxonomy coverage, deterministic
failure diagnostics, and unsupported-page no-silent-success guardrails.
make index-idx-adoption-v1 validates index idx lint, toolkit templates, and
publisher guidance assets.
make security-closure-v1 enforces threat-model/abuse-case/risk-register
closure and pack-trust fail-closed checks.
make performance-capacity-v1 composes strict benchmark budgets and runtime
stage-policy coverage checks.
make ux-interaction-v1 enforces quickstart/help/start-page interaction
contracts and progress semantics.
make operability-evidence-v1 validates deterministic release-evidence bundle
generation and composed readiness evidence.
make contract-freeze-v1 enforces 1.x external contract policies for CLI,
index.idx/v1, and index.pack/v1.
make release-1-0-gate-v1 composes M96-M100 gates with full verification and
release-candidate dry-run checks for 1.0.0 decisions.
make family-pack-expansion-v2 validates family-pack confidence/fallback
behavior and fixtures for app-shell, commerce cards, and mixed-media pages.
make compatibility-pack-runtime-v1 validates index.pack/v1 runtime
schema/precedence and fail-closed behavior.
make compat-lab-bootstrap-v1 validates deterministic index-compat-lab
ingest and scaffold workflows.
make compat-rule-synthesis-v1 validates deterministic rule synthesis,
safety linting, and override-merge behavior.
make compat-pack-trust-v1 validates compatibility-pack signing and
verification workflows.
make compat-pack-hotswap-v1 validates rollback snapshots and runtime reload
attribution behavior.
make compat-pack-ci-v1 validates composed compatibility-pack canary gates.
make compat-no-binary-release-v1 validates the compatibility data-only
release runbook.
make live-variance-v1 validates deterministic live-variance aggregation from
opt-in run ledgers.
make app-shell-recovery-v2 validates app-shell recovery profile attribution,
fallback order, and stage budgets.
make auth-assist-v1 validates session-aware auth diagnostics and local
cookie import/export helpers.
make challenge-failure-ux-v1 validates deterministic blocked-flow challenge
classification and reporting.
make layout-fidelity-v3 validates spacing rhythm and pre/code fidelity
regression tests.
make international-text-v2 validates multilingual rendering/search
regression checks.
make structured-data-recovery-v1 validates structured metadata extraction
guardrails.
make compat-data-plane-v2 validates compatibility data-plane synthesis
quality and strict linting.
make compatibility-recovery-gate validates composed SLO-v2 + live-variance
recovery evidence for release decisions.
make robustness-gate validates robustness policy/report assets and composes
local corpus, security, compatibility, package-manifest, and alpha-smoke
checks into one deterministic command.
make beta-readiness validates beta support scope and composes the local
coverage, dogfooding, security, compatibility, and package-manifest gates.
make stable-readiness validates stable support policy and fixture stewardship
gates; it does not by itself mean a stable release has been earned.
make audit runs cargo audit for dependency advisory checks.
make package builds a local tarball with the binary, man page, completions, README, and license.
make package-manifest validates package source paths and
make package-smoke verifies the packaged binary outside the source tree.
make release-candidate-dry-run builds and smoke-tests a package, then writes
SHA-256 checksums under dist/SHA256SUMS.
make bench runs a local release-binary smoke benchmark without hosted CI.
index --benchmark <url-or-local-html-file> reports transform timing and cache
reuse for one input.
make alpha-smoke runs the alpha hardening smoke gate for local file,
extraction, capture, adapter, shelf, and benchmark paths. Live URL and bounded
TUI startup checks are opt-in through environment variables documented in
docs/ALPHA.md.
make performance-great enforces strict benchmark ceilings for first transform,
cached transform, and release-binary average latency.
make security-best composes abuse-case review, advisory checks, deny checks,
and targeted credential-redaction tests.
make ux-great validates quickstart/help usability paths and key command
coverage for first-session usage.
make readiness-great composes strict performance, security, UX, compatibility,
robustness, beta, and stable gates.
CLI prototype
cargo run -p index-cli
cargo run -p index-cli -- quickstart
cargo run -p index-cli -- --profile docs
cargo run -p index-cli -- https://example.org
cargo run -p index-cli -- example.org
cargo run -p index-cli -- examples/sample.html
index quickstart
With no arguments, the CLI opens a built-in start page with the core commands.
It also reads an http or https URL, a URL without a scheme, a local HTML
file, or stdin and opens the terminal UI. URL inputs without an explicit scheme
default to https://.
cat examples/sample.html | cargo run -p index-cli -- -
index-core now provides reusable navigation state for history, bookmarks, session restore, redirect tracking, per-origin data, redacted response-log entries, and persisted sidebar mode preference. index-http provides a blocking HTTP fetcher, form submission transport for GET and application/x-www-form-urlencoded POST responses, deterministic cache paths, a filesystem cache for text responses, and SecureFetcher policy enforcement before content reaches the transformer.
index-headless defines the snapshot fallback boundary for JavaScript-heavy pages: timeout policy, script/network permissions, sandbox requirements, DOM snapshots, accessibility snapshots, and deterministic failure values. It does not embed a browser engine.
When an accessibility snapshot carries enough semantic roles, the transformer
uses it before DOM text, maps roles into Index nodes, merges rendered DOM links,
and falls back to DOM extraction when the accessibility tree is sparse.
Authentication state is modeled in index-core: cookies are isolated by origin, secure cookies require HTTPS, login form actions are checked by origin policy, logout clears session cookies, and diagnostics can redact known credentials. Real transport remains a later integration point.
Failure diagnostics are also modeled in index-core: failed or low-confidence
fetch, headless, and generic transform paths can produce deterministic
diagnostic documents with source, confidence, fallback information, suggested
next actions, and redacted local text suitable for fixture review.
Document quality is recorded in document metadata and shown in the TUI status
line. Quality categories are adapter, strong-generic, partial-generic,
fallback, and failed; JSON extraction includes the category, score, and
deterministic reasons so shell workflows can distinguish understood pages from
fallback documents.
Security hardening is modeled in index-security: content-size limits,
decompression expansion checks, redirect-loop detection, and URL scheme policy
tests are reusable by fetchers and entry points. index-http exposes
SecureFetcher for applying those checks before content reaches the transformer.
The static reader currently extracts main content, headings, links, code blocks, structured tables, image alt text, canonical URLs, descriptions, and OpenGraph title/description metadata from fetched or local HTML input.
It also preserves lists and simple forms as semantic actions with fields, buttons, methods, and resolved actions. In the TUI, e opens form editing, tab changes fields, and enter submits through the host fetch boundary.
When live image bytes are reachable, image nodes are rendered as bounded black-and-white dither previews with an explicit source link.
For scriptable output:
cargo run -p index-cli -- --plain examples/sample.html
curl -sS https://example.org | cargo run -p index-cli -- --plain -
cat examples/sample.html | cargo run -p index-cli -- --plain -
cargo run -p index-cli -- --plain https://example.org
cargo run -p index-cli -- --plain example.org
cargo run -p index-cli -- --extract markdown examples/sample.html
cargo run -p index-cli -- --extract links examples/sample.html
cargo run -p index-cli -- --extract json examples/sample.html
index-extract emits deterministic Markdown, stable numeric link lists, and JSON shaped from the Index Document Model, including table headers and row labels derived from structured table rows. It also classifies :pipe commands without executing them: safe commands require :pipe --confirm <cmd>, while shell syntax and unapproved programs are denied.
For offline knowledge workflows:
cargo run -p index-cli -- --save markdown examples/sample.html notes.md
cargo run -p index-cli -- --save json examples/sample.html notes.json
cargo run -p index-cli -- --citations examples/sample.html
cargo run -p index-cli -- --section "Overview" examples/sample.html
cargo run -p index-cli -- --batch-extract markdown examples/sample.html artifact.txt
--save writes deterministic Markdown or JSON to a local file. --citations
emits stable TSV references for external HTTP(S) links. --section exports the
first matching heading or section as Markdown. --batch-extract works only on
local files and local capture artifacts; it does not fetch URLs.
For deterministic local AI-style transforms:
cargo run -p index-cli -- --ai-offline explain examples/sample.html
cargo run -p index-cli -- --ai-offline summarize examples/sample.html
cargo run -p index-cli -- --ai-offline extract examples/sample.html
index-ai defines the provider trait, versioned prompt templates, privacy modes, mock provider, and offline fallback. It performs no network IO; external providers must be integrated explicitly by a host and receive content only after a user invokes an AI action.
For local performance checks:
cargo run -p index-cli -- --benchmark examples/sample.html
curl -sS https://example.org | cargo run -p index-cli -- --benchmark -
The benchmark report is local, machine-readable, and includes input bytes, document counts, transform timing, and transformed-cache reuse.
For local capture artifacts:
cargo run -p index-cli -- capture --redact https://example.org/page examples/sample.html
cargo run -p index-cli -- capture --redact example.org/page examples/sample.html
cargo run -p index-cli -- capture --preview --redact https://example.org/page examples/sample.html
cat artifact.txt | cargo run -p index-cli -- capture --validate -
cat examples/sample.html | cargo run -p index-cli -- capture --redact https://example.org/page -
index-capture validates the source URL, redacts credential-shaped URLs,
cookies, form values, and diagnostics, then emits a deterministic local artifact
for review. Preview output includes a redaction summary and fixture submission
checklist; validation confirms local bundles remain parseable and redacted. It
does not fetch or upload anything.
Open and submit workflows report real runtime stages with target context:
queued, fetching, snapshotting, parsing, transforming, scoring,
storing, done, and failed. See docs/ASYNC_STAGES.md.
Sites can optionally publish index.idx/v1 as a same-origin manifest for safe
presentation hints (docs/INDEX_IDX_PROTOCOL.md). Manifest hints are bounded,
validated, and fail closed.
For local manifest validation:
index idx lint docs/index-idx/examples/article.index.idx.json https://example.org/docs/page
index idx lint docs/index-idx/examples/search.index.idx.json https://example.org/search?q=index
index idx lint docs/index-idx/examples/forum.index.idx.json https://example.org/forum/thread/42
For runtime compatibility-pack operations:
index compatibility-pack lint docs/compat-packs/examples/social-community.pack.json https://news.ycombinator.com/item?id=1
index compatibility-pack inspect https://news.ycombinator.com/item?id=1
index compatibility-pack install docs/compat-packs/examples/social-community.pack.json --user
index compatibility-pack list
For compatibility data-plane authoring:
index-compat-lab ingest --top100 docs/top100-corpus/matrix.tsv --forum docs/forum-corpus/matrix.tsv
index-compat-lab synthesize --top100 docs/top100-corpus/matrix.tsv --forum docs/forum-corpus/matrix.tsv --family social-community
index-compat-lab scaffold --top100 docs/top100-corpus/matrix.tsv --forum docs/forum-corpus/matrix.tsv --family social-community
index-compat-lab lint docs/compat-packs/examples/social-community.pack.json
For compatibility recovery diagnostics:
index compatibility-live-variance --targets docs/compat-live/targets.tsv --runs docs/compat-live/runs.tsv --window 5
index compatibility-recovery-plan chatgpt.com
index compatibility-recovery-gate --top100 docs/top100-corpus/matrix.tsv --forum docs/forum-corpus/matrix.tsv --live-targets docs/compat-live/targets.tsv --live-runs docs/compat-live/runs.tsv
index auth-assist diagnose-submit https://news.ycombinator.com/login 403 "csrf token expired"
index challenge-diagnose https://example.org blocked-flow.html
For installed binary verification and runtime locations:
index --version
index --paths
index doctor
index artifact inspect https://example.org/docs
Runtime locations follow XDG conventions: $XDG_CONFIG_HOME/index,
$XDG_CACHE_HOME/index, and $XDG_STATE_HOME/index, with $HOME/.config,
$HOME/.cache, and $HOME/.local/state fallbacks.
index doctor emits a local telemetry-free support report with redacted runtime
paths, directory health checks, package/version guidance, and no network probe.
index artifact inspect reports local artifact presence and freshness by
context (live-get, live-submit, offline) from the cache artifact store.
Reader profiles change terminal presentation without changing extracted semantic content:
index --profile reader
index --profile docs https://example.org/manual
index --profile links example.org
Inside the TUI, use :profile reader|docs|links|research|compact|verbose.
Index starts in automatic profile mode and suggests docs, links,
research, or reader from the current page intent. Use :profile auto to
return to automatic selection after a manual override.
Theme tokens cover semantic roles, markdown emphasis, diagnostics, links, and
regions. True-color terminals get the richest palette; ANSI and monochrome
terminals fall back to deterministic named colors and modifiers.
Packaging assets live in:
docs/man/index.1completions/index.bashcompletions/index.zshcompletions/index.fishdocs/packaging/CRATES_IO.mddocs/packaging/DISTROS.mddocs/packaging/CLEAN_INSTALL.mddocs/BRANDING.mdassets/white-icon.pngassets/black-icon.pngassets/white-banner.pngassets/black-banner.png
Branding:
- App icon:
assets/white-icon.png - README banner:
assets/white-banner.png - Official interface font:
JetBrainsMono Nerd Font Mono
Production-readiness policies live in:
docs/COMPATIBILITY.mddocs/COMPATIBILITY_VALIDATION.mddocs/ACCESSIBILITY.mddocs/MSRV.mddocs/ADAPTER_STABILITY.mddocs/ADAPTER_PRIORITY.mddocs/ADAPTER_HARNESS.mddocs/ADAPTER_DISCIPLINE.mddocs/DIAGNOSTICS.mddocs/DOCTOR.mddocs/FAILURE_HANDOFF.mddocs/INTERNATIONAL_TEXT.mddocs/ALPHA.mddocs/BETA.mddocs/BETA_READINESS_REPORT.mddocs/STABLE.mddocs/STABLE_READINESS_REPORT.mddocs/KNOWN_LIMITS.mddocs/DOGFOODING.mddocs/dogfooding/CORPUS.mddocs/RELEASE.mddocs/RELEASE_NOTES_TEMPLATE.mddocs/NETWORK.mddocs/SECURITY_REVIEW.mddocs/ABUSE_CASES.mddocs/PERFORMANCE.mddocs/QUALITY.mddocs/SNAPSHOT_POLICY.mddocs/ARTIFACT_RUNTIME.mddocs/CAPTURE.mddocs/OFFLINE.mddocs/issue-templates/
Coverage program docs live in:
docs/COVERAGE_PROGRAM.mddocs/COVERAGE_CATALOG.mddocs/FIXTURE_INTAKE.mddocs/FIXTURE_MATRIX.mddocs/SITE_FAMILY_PACKS.mddocs/CAPTURE.mddocs/forum-corpus/docs/top100-corpus/
Local knowledge shelf commands:
index shelf save examples/sample.html
index shelf list
index shelf show <id>
index shelf search borrowing
index shelf search --format markdown borrowing
index shelf search --format json borrowing
index shelf tag <id> docs
index shelf note <id> "read before release"
Shelf metadata is stored under $XDG_STATE_HOME/index/shelf with
$HOME/.local/state/index/shelf as the fallback. Markdown and JSON exports live
under the shelf exports/ directory.
Search is local-only and ranks title, tag, note, citation, source URL, Markdown
heading, and Markdown body matches deterministically.
Adapter fixture review:
index adapter check crates/index-transformer/tests/fixtures/adapters/gitlab-project.html
The report is deterministic text with the detected adapter, support tier, quality, node/link/form/table/region counts, fallback reason, fixture checklist reference, and Markdown extraction snapshot.
TUI keys:
j/kscrollgg/Gtop/bottom/searchflink hintsltoggle the right sidebareedit the next form field near the current linetab/shift-tabmove between fields while editing a formentersubmit the current form while editing a formesccancel form editingttoggle compact/detail table mode[/]shift table columns while the sidebar is closed[/]switch sidebar modes while the sidebar is open1-6choose links, outline, forms, regions, search, or logs sidebar modesj/kselect sidebar items while the sidebar is openenteropen links, jump to outline/forms/search items, or expand/collapse selected regionseedits the selected form while the forms sidebar is openspaceexpand/collapse the selected region in the regions sidebarbgo back to the previous page:backgo back to the previous page:open <id>fetch and render a stable numeric link target:open <url>fetch and render an explicit URL;tabcompletes from current-session URL history while typing:logsshow the hidden local response-log sidebar with redacted server response previews:submit <form> field=valueresolve a form submission action:extract markdown|links|jsonrequest a document extraction action:pipe <cmd>request a confirmed pipe action:ai explain|summarize|extractrequest an explicit AI action:profile reader|docs|links|research|compact|verbose|autoswitch visual profile:capture previewreview local capture redactions for the current page:capture save <path>save a local capture artifact for the current page:main next/:main previousjump between plausible main regions:hide region <id>/:show region <id>collapse or restore a region:promote section <id>focus one region as the temporary main view:quitquit
Architectural rule
All external formats must be converted into the Index Document Model before rendering.
No renderer should parse HTML directly. No adapter should write terminal escape sequences directly. No transformer should know about terminal layout constraints.
Community
- Contributing Guide
- Code of Conduct
- Security Policy
- Compatibility Matrix
- Coverage Program
- Issue Templates
License
Unlicense.
Dependencies
~2.2–3.5MB
~54K SLoC