Personal document library processor. Drop PDFs into inbox/ — PaperClaw classifies them, generates a clean filename, moves each file to library/{category}/, and writes a structured .md transcript alongside it. The transcript contains a summary, key metadata (date, sender, amounts), and the original filename, making the entire library searchable by an agent or CLI.
inbox/stadtwerke-rechnung.pdf
│
▼ extract → classify → transcript
│
library/utilities/2024-03-15_stadtwerke-strom.pdf
library/utilities/2024-03-15_stadtwerke-strom.md
Categories: banking, contracts, insurance, invoices, tax, utilities, unclassified
# Install dependencies
uv sync
# Copy and fill in your API keys
cp .env.example .env.env variables:
CLASSIFIER_BACKEND=ollama # ollama | openai
CLASSIFIER_MODEL=llama3.2
TRANSCRIPT_BACKEND=openai # ollama | openai
TRANSCRIPT_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-...Recommended split: Ollama (local/fast) for classification, OpenAI for transcripts.
# Process all PDFs in inbox/
uv run paperclaw process
# Dry-run — shows what would be processed, no LLM calls
uv run paperclaw process --dry-run
# Search the library
uv run paperclaw search "stadtwerke strom"
# Filter by category
uv run paperclaw search "2024" --category utilities
# List all documents (optionally filtered)
uv run paperclaw list
uv run paperclaw list --category invoicesPaperClaw exposes the same capabilities as an MCP server so any MCP-compatible agent can query the library without shell access.
Start manually (for testing):
uv run paperclaw-mcp
# Waits for JSON-RPC on stdin — Ctrl-C to stopClaude Code — .claude/settings.json is already configured in this repo. Restart Claude Code and run /mcp to confirm the paperclaw server is listed.
Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"paperclaw": {
"command": "uv",
"args": ["--directory", "/absolute/path/to/paperclaw", "run", "paperclaw-mcp"]
}
}
}Restart Claude Desktop — the paperclaw tools appear in the tool palette.
Available MCP tools:
| Tool | Description |
|---|---|
search(query, category?, limit?) |
Full-text search, returns ranked results with snippets. limit=0 = unlimited. |
list_docs(category?) |
List all documents, sorted by date descending. |
process(dry_run?) |
Ingest new PDFs from inbox/. Set dry_run=true to preview without LLM calls. |
Path overrides (env vars, optional):
PAPERCLAW_LIBRARY_DIR=/path/to/library
PAPERCLAW_INBOX_DIR=/path/to/inboxuv run pytest # run tests
uv run mypy src # type check
uv run ruff check . # lintPre-commit hooks run ruff → mypy → pytest automatically on every commit.