llm-pack is a personal-local, high-performance document ingestion tool that converts supported office files into structured .llm/ packages for downstream LLM workflows.
It is intentionally narrow:
- supported inputs:
.docx,.pptx,.xlsx,.pdf - supported outputs:
content.md,document.json,report.json, optionaltables/, optionalassets/ - supported surfaces: CLI and desktop wrapper
- permanent non-goals: OCR, LibreOffice, format conversion, cloud services, drag-and-drop upload
| Surface | Status | Notes |
|---|---|---|
| CLI | Verified | Main supported interface for local batch conversion |
| Desktop | Verified | Tauri wrapper over the same local runner; system dialogs only |
| Contracts | Verified | document.json and report.json both ship with schema 0.3.0 |
| CI | Verified | Rust workspace checks plus dedicated desktop build / shell compile gate |
Version notes live in CHANGELOG.md. GitHub Releases should copy from the matching changelog entry.
llm-pack is for local ingestion, not office interoperability. It does not try to preserve visual fidelity, and it does not attempt to “fix” image-only documents.
- No OCR, now or later
- No
.doc,.ppt,.xls - No drag-and-drop upload in the desktop UI
- No hosted API, sync, or multi-user workflow
If a PDF has no embedded text, the conversion can still complete with warnings, but no OCR fallback is attempted.
Install:
cargo install --path apps/cli --lockedRun without installing:
cargo run -p llm-pack-cli -- tests/fixtures/docx/01-minimal.docx -o /tmp/llm-pack-outUseful commands:
llm-pack tests/fixtures/pptx/01-minimal.pptx -o /tmp/llm-pack-out
llm-pack tests/fixtures/xlsx/01-single-sheet.xlsx -o /tmp/llm-pack-out
llm-pack tests/fixtures/pdf/01-minimal.pdf -o /tmp/llm-pack-out
llm-pack cache stats
llm-pack cache prune --max-gb 2
llm-pack watchSupported flags:
-o, --output <DIR>: output directory for generated.llmpackages--recursive: scan directory inputs recursively for supported files--no-cache: bypass the content-hash cache--jobs <N>: override configured parallelism--config <FILE>: load a specificllm-pack.toml
Unsupported extensions return UNSUPPORTED_FORMAT with exit code 2.
From the repo root:
cd apps/desktop
npm ci
npm run build
cargo check --manifest-path src-tauri/Cargo.tomlTo run locally:
cd apps/desktop
npm run tauri devDesktop scope:
- choose files via system dialog
- choose an input folder via system dialog
- choose an output directory via system dialog
- view per-file progress and reveal generated packages in Finder
No drag-and-drop upload is supported.
Each conversion writes one package:
<stem>.llm/
├── content.md
├── document.json
├── report.json
├── tables/ # optional
└── assets/ # optional
Contract rules:
- package-internal artifact paths are relative
report.json.output.package_diris a filesystem path to the generated package rootdocument.jsonandreport.jsonboth use schema version0.3.0
Schemas live under schemas/. Architecture and ADR references are listed below.
Run this matrix before claiming a change is done:
./scripts/validate-schemas.sh
./scripts/check-fixtures.sh
cargo fmt --all -- --check
cargo clippy --workspace -- -D warnings
cargo test --workspace
npm --prefix apps/desktop run build
cargo check --manifest-path apps/desktop/src-tauri/Cargo.tomlWarnings are first-class output, not hidden logs.
- non-fatal quality issues remain
status=completed - successful conversions keep
code=SUCCESS - warnings are preserved in both
document.jsonandreport.json - example: spreadsheet formulas are exported as displayed values and surfaced as non-fatal warnings
QUALITY_DEGRADED remains reserved for a stricter policy, but is not currently emitted.
- docs/adr/001-product-boundary.md
- docs/adr/002-llm-package-layout.md
- docs/adr/003-cli-error-codes.md
- docs/adr/007-pdf-parser-lopdf.md
- docs/adr/008-document-schema-0-3-0.md
- docs/adr/009-v1-personal-ux-scope.md
- docs/adr/010-v1-config-and-cli.md
- docs/superpowers/specs/2026-05-29-v03-architecture.md
- PDF extraction is text-first and does not reconstruct advanced layout semantics
- scanned or image-only PDFs will not be OCR-processed
- desktop is a local wrapper, not a separate conversion engine
- older historical docs under
docs/superpowers/specs/may describe earlier phases; use the ADRs and this README as the current source of truth