Local CLI that turns a folder of raw PDFs (utility bills, invoices, contracts, insurance letters, bank statements) into an organized document library.
Two commands:
paperclaw classify [inbox-path]— scans a folder for PDFs, extracts their text, asks Claude to classify them, moves each PDF intolibrary/YYYY/category/with a sensible filename, and writes a.mdtranscript alongside it.paperclaw ask "<question>"— loads relevant.mdtranscripts (via the built-in search) and asks Claude to answer in plain language (e.g. "Which bills are overdue?").paperclaw mcp— runs as an MCP stdio server, exposing tools (search_transcripts,get_transcript,bills_due_this_week,extract_amounts, plus the discovery tools) so an agent (e.g. Claude Code) can drive the library directly.paperclaw telegram— runs as a Telegram bot. Forwards every text message it receives to theaskagent and replies with the answer.
See DESIGN.md for architecture, transcript format, categories, and limitations.
- Node.js 20+
- An Anthropic API key (
ANTHROPIC_API_KEY)
npm install
cp .env.example .env
# edit .env and set ANTHROPIC_API_KEY=sk-ant-...npm run buildDrop one or more PDFs into ./inbox/, then either:
# Option A — via the local CLI bin (after building)
npm link
paperclaw classify
# Option B — without linking
node dist/main.js classify
# Option C — point at a different inbox folder
paperclaw classify /path/to/some/folderOutput:
- The PDF is copied to
library/{YYYY}/{category}/{YYYY-MM}-{slug}.pdf. - A matching
.mdtranscript (YAML front-matter + extracted text) lives next to it. - The original is moved to
inbox/done/(never deleted). - Every event is appended to
library/processing.log(JSON Lines) and printed to stdout.
Low-confidence or unclassifiable documents land in
library/{current-year}/unsorted/ with a date-prefixed filename for manual review.
Scanned/image-only PDFs are skipped with a warning (OCR is not yet supported).
Once you have classified some documents, you can query the library two ways:
# Human-friendly Q&A — streams Claude's answer to stdout
paperclaw ask "Which bills are overdue?"
paperclaw ask "Show me documents from Stadtwerke"
paperclaw ask "Find the invoice for the gadget from three months ago"# Agent-facing: speak MCP over stdio
paperclaw mcpThe repo ships a project-scoped .mcp.json so Claude Code picks up
the server automatically when launched from the repo root (run npm run build
first). Inside Claude Code the same questions can be answered by the model
calling search_transcripts and get_transcript directly.
Tools exposed by the MCP server:
| Tool | Input | Purpose |
|---|---|---|
list_categories |
(none) | List the categories present in the library with their counts. Use before search_transcripts to know which filters make sense. |
list_providers |
(none) | List unique providers with counts and most recent document date. |
library_stats |
(none) | Library summary: total docs, breakdown by category and year, next 5 upcoming due dates. |
search_transcripts |
category, provider, dateFrom, dateTo, dueBefore, dueAfter, text, limit |
Filter the library by YAML front-matter and/or text. Returns hits with metadata and a snippet. |
bills_due_this_week |
days (optional, default 7) |
Return documents whose due_date falls within the next N days from today (inclusive). |
extract_amounts |
path (must be inside LIBRARY_PATH) |
Scan a transcript for monetary amounts (€, EUR, $, USD) and return each with surrounding context. |
get_transcript |
path (must be inside LIBRARY_PATH) |
Return the full markdown transcript. |
What could come next as MCP tools (not implemented): classify_pdf,
mark_paid, summarize_year, find_duplicates.
paperclaw telegram exposes the ask agent over a Telegram chat. Create a
bot via @BotFather, drop the token into .env as
TELEGRAM_BOT_TOKEN, then:
npm run build
node dist/main.js telegramAny text message you send to the bot is forwarded to AgentService.ask() and
the answer is sent back. Long answers are split across multiple messages to
stay within Telegram's 4 KB limit. Stop with Ctrl+C.
npm test # run unit tests (no API key required)
npm run typecheck # tsc --noEmit
npm run lint:check # eslint
npm run format # prettier --write
npm run start:dev # nest start --watch| Variable | Default | Purpose |
|---|---|---|
ANTHROPIC_API_KEY |
(required) | API key for Claude |
INBOX_PATH |
./inbox |
Folder scanned by classify |
LIBRARY_PATH |
./library |
Destination tree for organized documents |
TELEGRAM_BOT_TOKEN |
(optional) | Required only for paperclaw telegram. Get one from @BotFather. |