Skip to content

dominik1001/flxschr-paperclaw-

Repository files navigation

PaperClaw

Personal document library processor. Drop PDFs into inbox/ — PaperClaw classifies them, generates a clean filename, moves each file to library/{category}/, and writes a structured .md transcript alongside it. The transcript contains a summary, key metadata (date, sender, amounts), and the original filename, making the entire library searchable by an agent or CLI.

inbox/stadtwerke-rechnung.pdf
  │
  ▼  extract → classify → transcript
  │
library/utilities/2024-03-15_stadtwerke-strom.pdf
library/utilities/2024-03-15_stadtwerke-strom.md

Categories: banking, contracts, insurance, invoices, tax, utilities, unclassified


Setup

# Install dependencies
uv sync

# Copy and fill in your API keys
cp .env.example .env

.env variables:

CLASSIFIER_BACKEND=ollama        # ollama | openai
CLASSIFIER_MODEL=llama3.2
TRANSCRIPT_BACKEND=openai        # ollama | openai
TRANSCRIPT_MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-...

Recommended split: Ollama (local/fast) for classification, OpenAI for transcripts.


Usage

CLI

# Process all PDFs in inbox/
uv run paperclaw process

# Dry-run — shows what would be processed, no LLM calls
uv run paperclaw process --dry-run

# Search the library
uv run paperclaw search "stadtwerke strom"

# Filter by category
uv run paperclaw search "2024" --category utilities

# List all documents (optionally filtered)
uv run paperclaw list
uv run paperclaw list --category invoices

MCP server (Claude Desktop / Claude Code)

PaperClaw exposes the same capabilities as an MCP server so any MCP-compatible agent can query the library without shell access.

Start manually (for testing):

uv run paperclaw-mcp
# Waits for JSON-RPC on stdin — Ctrl-C to stop

Claude Code.claude/settings.json is already configured in this repo. Restart Claude Code and run /mcp to confirm the paperclaw server is listed.

Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "paperclaw": {
      "command": "uv",
      "args": ["--directory", "/absolute/path/to/paperclaw", "run", "paperclaw-mcp"]
    }
  }
}

Restart Claude Desktop — the paperclaw tools appear in the tool palette.

Available MCP tools:

Tool Description
search(query, category?, limit?) Full-text search, returns ranked results with snippets. limit=0 = unlimited.
list_docs(category?) List all documents, sorted by date descending.
process(dry_run?) Ingest new PDFs from inbox/. Set dry_run=true to preview without LLM calls.

Path overrides (env vars, optional):

PAPERCLAW_LIBRARY_DIR=/path/to/library
PAPERCLAW_INBOX_DIR=/path/to/inbox

Development

uv run pytest        # run tests
uv run mypy src      # type check
uv run ruff check .  # lint

Pre-commit hooks run ruff → mypy → pytest automatically on every commit.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages