Skip to content

Bayern99/llm-pack

Repository files navigation

llm-pack

llm-pack is a personal-local, high-performance document ingestion tool that converts supported office files into structured .llm/ packages for downstream LLM workflows.

It is intentionally narrow:

  • supported inputs: .docx, .pptx, .xlsx, .pdf
  • supported outputs: content.md, document.json, report.json, optional tables/, optional assets/
  • supported surfaces: CLI and desktop wrapper
  • permanent non-goals: OCR, LibreOffice, format conversion, cloud services, drag-and-drop upload

Status

Surface Status Notes
CLI Verified Main supported interface for local batch conversion
Desktop Verified Tauri wrapper over the same local runner; system dialogs only
Contracts Verified document.json and report.json both ship with schema 0.3.0
CI Verified Rust workspace checks plus dedicated desktop build / shell compile gate

Version notes live in CHANGELOG.md. GitHub Releases should copy from the matching changelog entry.

Product Boundary

llm-pack is for local ingestion, not office interoperability. It does not try to preserve visual fidelity, and it does not attempt to “fix” image-only documents.

  • No OCR, now or later
  • No .doc, .ppt, .xls
  • No drag-and-drop upload in the desktop UI
  • No hosted API, sync, or multi-user workflow

If a PDF has no embedded text, the conversion can still complete with warnings, but no OCR fallback is attempted.

Quick Start

CLI

Install:

cargo install --path apps/cli --locked

Run without installing:

cargo run -p llm-pack-cli -- tests/fixtures/docx/01-minimal.docx -o /tmp/llm-pack-out

Useful commands:

llm-pack tests/fixtures/pptx/01-minimal.pptx -o /tmp/llm-pack-out
llm-pack tests/fixtures/xlsx/01-single-sheet.xlsx -o /tmp/llm-pack-out
llm-pack tests/fixtures/pdf/01-minimal.pdf -o /tmp/llm-pack-out
llm-pack cache stats
llm-pack cache prune --max-gb 2
llm-pack watch

Supported flags:

  • -o, --output <DIR>: output directory for generated .llm packages
  • --recursive: scan directory inputs recursively for supported files
  • --no-cache: bypass the content-hash cache
  • --jobs <N>: override configured parallelism
  • --config <FILE>: load a specific llm-pack.toml

Unsupported extensions return UNSUPPORTED_FORMAT with exit code 2.

Desktop

From the repo root:

cd apps/desktop
npm ci
npm run build
cargo check --manifest-path src-tauri/Cargo.toml

To run locally:

cd apps/desktop
npm run tauri dev

Desktop scope:

  • choose files via system dialog
  • choose an input folder via system dialog
  • choose an output directory via system dialog
  • view per-file progress and reveal generated packages in Finder

No drag-and-drop upload is supported.

Package Contract

Each conversion writes one package:

<stem>.llm/
├── content.md
├── document.json
├── report.json
├── tables/      # optional
└── assets/      # optional

Contract rules:

  • package-internal artifact paths are relative
  • report.json.output.package_dir is a filesystem path to the generated package root
  • document.json and report.json both use schema version 0.3.0

Schemas live under schemas/. Architecture and ADR references are listed below.

Validation

Run this matrix before claiming a change is done:

./scripts/validate-schemas.sh
./scripts/check-fixtures.sh
cargo fmt --all -- --check
cargo clippy --workspace -- -D warnings
cargo test --workspace
npm --prefix apps/desktop run build
cargo check --manifest-path apps/desktop/src-tauri/Cargo.toml

Warning Philosophy

Warnings are first-class output, not hidden logs.

  • non-fatal quality issues remain status=completed
  • successful conversions keep code=SUCCESS
  • warnings are preserved in both document.json and report.json
  • example: spreadsheet formulas are exported as displayed values and surfaced as non-fatal warnings

QUALITY_DEGRADED remains reserved for a stricter policy, but is not currently emitted.

Architecture Pointers

Known Limitations

  • PDF extraction is text-first and does not reconstruct advanced layout semantics
  • scanned or image-only PDFs will not be OCR-processed
  • desktop is a local wrapper, not a separate conversion engine
  • older historical docs under docs/superpowers/specs/ may describe earlier phases; use the ADRs and this README as the current source of truth

About

Personal-local high-performance document ingestion to structured .llm packages

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors