Kreuzakt is a project that takes the best parts of Paperless, drastically improves the OCR using VLLMs, and throws out 99% of the complexity. Take every boring document in your life and make them all instantly easy to find, and (optionally) let AIs search them to answer questions for you.
- Kreuzakt uses a single Docker container with an SQLite database, there aren't a ton of moving parts
- Rather than use Tesseract, Kreuzakt uses LLMs to do OCR (by default via OpenRouter but Ollama/Local LLMs work as well) via Kreuzberg. This drastically improves OCR accuracy, and by extension, search accuracy.
- Kreuzakt provides a remote MCP server - connect Claude Desktop, Cursor, or any other MCP client to Kreuzakt and ask questions about your documents
- Kreuzakt uses an LLM to also derive a title / description / original date for every document, out of the box. Zero manual curation / toil work.
- Metadata can always be regenerated from the source documents, the only thing you need to migrate is the originals
- Kreuzakt always preserves your original documents, it never edits them directly
- Ingestion based on file watches works the same, drop documents into the 'ingest' folder and it will automatically be processed
services:
kreuzakt:
image: ghcr.io/anaisbetts/kreuzakt:latest
ports:
- "3000:3000"
environment:
OPENROUTER_KEY: ${OPENROUTER_KEY}
TZ: Europe/Berlin # Set your local timezone
volumes:
- ./docs:data
restart: unless-stoppedDrop this in a docker-compose.yml, set OPENROUTER_KEY in your environment or a .env file, and run docker compose up -d. The web UI is at http://localhost:3000.
The ./docs folder will be initialized with directories including ./data/ingest, ./data/originals, and ./data/thumbnails.
docker-compose up -d- Drop all of your documents into the ingest folder - they will eventually all move to the originals folder. You can see the progress at
/settings- if you have a lot of documents it might take a bit. - If you've got an existing Paperless install, you can run the import
- You can also simply drag-drop a bunch of files onto the main page
I'm too lazy to do the math on exactly how much per-page it costs, but for perspective, importing 440 documents from Paperless (a few of which were up to 80pgs long), cost me ~$5.
Everything lives under /data by default — the SQLite database, originals, thumbnails, and the ingest folder. If you want to split things up, override with individual env vars and mount each path separately:
| Variable | Default | Description |
|---|---|---|
INGEST_DIR |
/data/ingest |
Watched folder for new documents |
IMPORT_DIR |
/data/import |
Staging folder for orchestrated imports (e.g. Paperless); not watched |
ORIGINALS_DIR |
/data/originals |
Stored original files |
THUMBNAILS_DIR |
/data/thumbnails |
Generated thumbnails |
DB_PATH |
/data/docs-ai.db |
SQLite database |
| Variable | Default | Description |
|---|---|---|
OPENROUTER_KEY |
— | API key for OpenRouter (recommended) |
OPENAI_API_KEY |
— | Alternative: direct OpenAI key |
OPENAI_BASE_URL |
https://openrouter.ai/api/v1 |
Base URL for any OpenAI-compatible API (e.g. Ollama at http://host.docker.internal:11434/v1) |
OCR_VLM_MODEL |
openai/gpt-5.4-mini |
Model used for OCR |
METADATA_LLM_MODEL |
openai/gpt-5.4 |
Model used for title/description extraction |
PORT |
3000 |
Port inside the container |
TZ |
UTC |
Timezone for date display (e.g. Europe/Berlin, America/New_York). Use any tz database name. |
INGEST_WATCH_POLL |
false |
Poll INGEST_DIR instead of using inotify. Enable when the ingest folder is on NFS, SMB, or a FUSE mount — inotify does not see changes made on the remote side. |
INGEST_WATCH_POLL_INTERVAL_MS |
2000 |
Poll interval in ms when INGEST_WATCH_POLL is enabled. |
Kreuzakt exposes a remote MCP endpoint at /mcp (Streamable HTTP). Replace the hostname in the snippets below with wherever you serve the app — for example https://docs.your-tailnet.ts.net/mcp when using Tailscale Serve. Most clients will not talk to plain http, so terminating TLS (Serve, a reverse proxy, etc.) is the usual approach.
Claude Desktop — npx mcp-remote@latest …
mcp-remote bridges the HTTP MCP endpoint for clients that expect a local process.
{
"mcpServers": {
"docs": {
"command": "npx",
"args": ["mcp-remote@latest", "https://docs.your-tailnet.ts.net/mcp"]
}
}
}Cursor — type: "http" in MCP config
Add to .cursor/mcp.json or your project’s MCP settings.
{
"mcpServers": {
"docs": {
"type": "http",
"url": "https://docs.your-tailnet.ts.net/mcp"
}
}
}- "Find invoices from Deutsche Telekom."
- "What was my health insurance number again?"
- "How much did I pay in taxes last year"
Prerequisites: Bun (the project runs Next.js and scripts through Bun; see package.json) and a Rust toolchain for the Kreuzberg extraction CLI.
- Install dependencies:
bun install - Build the local extraction CLI:
cargo build -p kreuzakt-kreuzberg - Copy
.env.local.exampleto.env.localand set at least one way to reach an OpenAI-compatible API. The usual choice isOPENROUTER_KEY. For a local LLM, setOPENAI_DEV_URL,OPENAI_DEV_KEY, and optionallyOCR_VLM_DEV_MODEL/METADATA_LLM_DEV_MODEL. See.env.local.examplefor all variables the app and tooling recognize. - Start the dev server:
bun dev. The app listens on port 3000 by default (PORT). Runtime data defaults to./data(SQLite, ingest, originals, thumbnails) unless you overrideDATA_DIRor individual path variables.
Other useful commands:
bun test— unit testscargo test— Rust extraction CLI testsbun run test:integration— integration tests (loads.env.localvia--env-file; requires Paperless-related vars when those tests run)bun storybook— UI development on port 6006
POST /api/documents/export-text exports every document whose SQLite content column is non-empty as a ZIP of .txt files. Each file is named {id}-{sanitized-title}.txt and begins with YAML frontmatter (original_filename, document_url, original_url) followed by the extracted document body. The response is an application/zip download named kreuzakt-text-export-YYYYMMDD-HHmmss.zip. Returns 400 if there is no exportable content.
curl -X POST http://localhost:3000/api/documents/export-text -o export.zipIt uses the library Kreuzberg, and it is a tool to help you with your "Akte" (files/documents). Just like "Berghain" is a portmanteau of "Kreuzberg" and "Friedrichshain", the two districts in Berlin that it sits between. (today you learn!)