A CLI tool that turns an inbox of PDFs into an organised, agent-searchable document library. PDFs are OCR'd, classified by Claude, given a deterministic filename, and filed into a flat sidecar library.
- Go 1.21+
ocrmypdfandtesseract(for OCR)- An Anthropic API key (for document classification)
make deployThis builds the binary and installs it to /usr/local/bin/paperclaw (requires sudo). To build without installing:
make build # output: bin/paperclaw| Setting | Flag | Environment variable | Default |
|---|---|---|---|
| Inbox | --inbox |
PAPERCLAW_INBOX |
~/paperclaw/inbox |
| Library | --library |
PAPERCLAW_LIBRARY |
~/paperclaw/library |
ANTHROPIC_API_KEY must be set in the environment for the process command. CLI flags take precedence over environment variables.
paperclaw process [--inbox PATH] [--library PATH]Walks the inbox, OCRs each PDF, classifies it with Claude, and files it into the library. Duplicates are skipped (detected by SHA-256 hash). Failed documents land in library/_quarantine/ with a processing_error.json explaining the failure.
3 documents processed, 1 skipped (duplicate), 0 quarantined
paperclaw list [--type TYPE] [--since DATE] [--vendor VENDOR] [--overdue]Filters are combinable:
| Flag | Description |
|---|---|
--type |
invoice, utility_bill, bank_statement, insurance_letter, contract, government_letter, other |
--since |
Documents dated on or after YYYY-MM-DD |
--vendor |
Substring match on vendor name (case-insensitive) |
--overdue |
Only documents with a past due_date |
paperclaw list --type utility_bill --overdue
paperclaw list --vendor stadtwerke --since 2026-01-01paperclaw show <id-prefix>Prints full metadata and OCR transcript for one document. The id-prefix is a short prefix of the SHA-256 document ID (8+ characters is typically unambiguous).
paperclaw search <query>Full-text search across all OCR transcripts. Returns matching document entries and their IDs; use paperclaw show to fetch the full content.
paperclaw search IBAN
paperclaw search "Rechnungsnummer 2024"All commands print JSON when stdout is not a TTY, so they compose cleanly with jq and agent tooling:
paperclaw list --type invoice | jq '.[].summary'~/paperclaw/library/
process.log
2026-04-01_stadtwerke_strom-rechnung/
document.pdf
transcript.md
metadata.json
_quarantine/
bad-scan.pdf/
document.pdf
processing_error.json
Each metadata.json contains the document type, date, vendor, summary, and optional fields (amount, currency, due date, tags, language).
make check # format + lint + test
make test # tests only
make lint # lint onlyPre-commit hooks (via lefthook) run format, lint, and tests automatically. Run make setup once to install tooling.
skills/paperclaw/SKILL.md is a Claude Code project skill that lets an agent drive PaperClaw on your behalf. It is installed as a slash command via .claude/commands/paperclaw.md.
To use it, open this project in Claude Code and type /paperclaw followed by a natural-language question:
/paperclaw Which utility bills are overdue?
/paperclaw Find the invoice for the gadget I bought in March.
/paperclaw Show me the latest electricity bill from Stadtwerke.
/paperclaw Search for my IBAN across all documents.
The agent maps your question to the right paperclaw subcommand, runs it, and returns a plain-language answer. It uses list for filtering, search for keyword lookup, and show when you need the full document content.