Skip to content

dominik1001/bettervibe-project

 
 

Repository files navigation

PaperClaw

Local CLI that turns a folder of raw PDFs (utility bills, invoices, contracts, insurance letters, bank statements) into an organized document library.

Two commands:

  • paperclaw classify [inbox-path] — scans a folder for PDFs, extracts their text, asks Claude to classify them, moves each PDF into library/YYYY/category/ with a sensible filename, and writes a .md transcript alongside it.
  • paperclaw ask "<question>" — loads relevant .md transcripts (via the built-in search) and asks Claude to answer in plain language (e.g. "Which bills are overdue?").
  • paperclaw mcp — runs as an MCP stdio server, exposing tools (search_transcripts, get_transcript, bills_due_this_week, extract_amounts, plus the discovery tools) so an agent (e.g. Claude Code) can drive the library directly.
  • paperclaw telegram — runs as a Telegram bot. Forwards every text message it receives to the ask agent and replies with the answer.

See DESIGN.md for architecture, transcript format, categories, and limitations.

Requirements

  • Node.js 20+
  • An Anthropic API key (ANTHROPIC_API_KEY)

Setup

npm install
cp .env.example .env
# edit .env and set ANTHROPIC_API_KEY=sk-ant-...

Build

npm run build

Run

Drop one or more PDFs into ./inbox/, then either:

# Option A — via the local CLI bin (after building)
npm link
paperclaw classify

# Option B — without linking
node dist/main.js classify

# Option C — point at a different inbox folder
paperclaw classify /path/to/some/folder

Output:

  • The PDF is copied to library/{YYYY}/{category}/{YYYY-MM}-{slug}.pdf.
  • A matching .md transcript (YAML front-matter + extracted text) lives next to it.
  • The original is moved to inbox/done/ (never deleted).
  • Every event is appended to library/processing.log (JSON Lines) and printed to stdout.

Low-confidence or unclassifiable documents land in library/{current-year}/unsorted/ with a date-prefixed filename for manual review. Scanned/image-only PDFs are skipped with a warning (OCR is not yet supported).

Ask & MCP

Once you have classified some documents, you can query the library two ways:

# Human-friendly Q&A — streams Claude's answer to stdout
paperclaw ask "Which bills are overdue?"
paperclaw ask "Show me documents from Stadtwerke"
paperclaw ask "Find the invoice for the gadget from three months ago"
# Agent-facing: speak MCP over stdio
paperclaw mcp

The repo ships a project-scoped .mcp.json so Claude Code picks up the server automatically when launched from the repo root (run npm run build first). Inside Claude Code the same questions can be answered by the model calling search_transcripts and get_transcript directly.

Tools exposed by the MCP server:

Tool Input Purpose
list_categories (none) List the categories present in the library with their counts. Use before search_transcripts to know which filters make sense.
list_providers (none) List unique providers with counts and most recent document date.
library_stats (none) Library summary: total docs, breakdown by category and year, next 5 upcoming due dates.
search_transcripts category, provider, dateFrom, dateTo, dueBefore, dueAfter, text, limit Filter the library by YAML front-matter and/or text. Returns hits with metadata and a snippet.
bills_due_this_week days (optional, default 7) Return documents whose due_date falls within the next N days from today (inclusive).
extract_amounts path (must be inside LIBRARY_PATH) Scan a transcript for monetary amounts (€, EUR, $, USD) and return each with surrounding context.
get_transcript path (must be inside LIBRARY_PATH) Return the full markdown transcript.

What could come next as MCP tools (not implemented): classify_pdf, mark_paid, summarize_year, find_duplicates.

Telegram bot

paperclaw telegram exposes the ask agent over a Telegram chat. Create a bot via @BotFather, drop the token into .env as TELEGRAM_BOT_TOKEN, then:

npm run build
node dist/main.js telegram

Any text message you send to the bot is forwarded to AgentService.ask() and the answer is sent back. Long answers are split across multiple messages to stay within Telegram's 4 KB limit. Stop with Ctrl+C.

Development

npm test            # run unit tests (no API key required)
npm run typecheck   # tsc --noEmit
npm run lint:check  # eslint
npm run format      # prettier --write
npm run start:dev   # nest start --watch

Configuration

Variable Default Purpose
ANTHROPIC_API_KEY (required) API key for Claude
INBOX_PATH ./inbox Folder scanned by classify
LIBRARY_PATH ./library Destination tree for organized documents
TELEGRAM_BOT_TOKEN (optional) Required only for paperclaw telegram. Get one from @BotFather.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 98.6%
  • JavaScript 1.4%