Self-cartography engine.
Turn your scattered personal data into a queryable Obsidian wiki. You and your agents read from the same surface.
Warning
Heavy prototype development. This is a single-author scratch space, not a stable tool. Architecture, CLI surface, file layout, and config schema can break at any time without notice or migration path. No semver, no deprecation cycle, no compatibility promises. If you install it, expect to read commits before pulling.
| 11 substrate collectors |
8 psychometric instruments |
MIT open source |
Yesterday's Claude Code session is already a daily-log entry. This week's screenshots already have a summary article. Open a fresh agent session and the wiki index loads before you type. You read what the agent wrote. The agent reads what you wrote. Both refine the same surface.
- What is this · Why it exists · What you get
- What it looks like in Obsidian · Two ingest paths converge at compile · What a compiled article looks like · The defining choice
- Engine vs. vault split · Install · Update · Running scripts manually
- Documentation map · Security · CLI Reference · For contributors
An opinionated knowledge-compilation engine for personal substrates. Drop raw materials in — daily session logs, web clippings, screenshots, emails, calendars, browser history, HTML files — and a Claude Agent SDK loop compiles them into atomic Markdown articles with wikilinks. Renders as a navigable wiki inside Obsidian. The same wiki gets injected into every AI-agent session as context, so the loop closes: you read what you wrote, the agent reads what it wrote, both refine the same surface.
The vault holds data. The .wiki/ directory holds the engine. They never mix on disk.
A map, not an agent. The wiki is a map of the operator — a compiled, always-current, descriptive portrait of what you know, did, and committed to, read daily by you and your agents. It describes; it doesn't act. That's why it's a self-cartography engine, not a "second brain": "brain" implies the part that thinks and does, which is deliberately out of scope. Anything that acts on the map — automating a task, syncing a commitment to an external tool — is an operator-gated consumer downstream of the compiler, never the compiler itself.
What this isn't. Not a vector database, not a RAG service, not a Notion / Logseq / Mem replacement, not a docs-site generator, not a team wiki — and not a task manager, executive/agent layer, or "second brain." It is a compile loop for one operator's substrates, output as plain Markdown that any tool can read.
A solo knowledge worker's thinking lives in too many partial substrates: daily notes capture what happened, AI-agent memories capture working thought, clippings capture curiosity, screenshots and calendars capture everything else. Each is queryable in isolation, but not as a whole — and most are illegible even to the person who produced them.
Existing tools either solve a slice or solve a different problem:
- Karpathy's LLM Wiki — the
raw/→knowledge/shape and the compile-don't-retrieve choice; a sketch, not a working pipeline. - claude-memory-compiler (Cole Medin) — the session-capture pattern; only handles AI-agent memories, doesn't ingest other substrates.
- Obsidian — manual curation, no compilation.
- Notion — team docs, not personal cartography.
- RAG systems — retrieval, not compilation; weak cross-doc reasoning.
llm-wiki is the compilation layer between raw substrates and active consumption — by humans reading and by agents prompting. It is not an archive to be left behind; it is a working surface that gets refined by being used.
The map — substrates in, a compiled wiki out:
- Two-path ingest + per-day rollup — automatic session capture (hooks →
daily/<date>/sessions.md) and substrate-source writers (Registry-discovered Collectors + clipper + manual drop →raw/, plus per-day mirror summaries intodaily/<date>/{health,meetings,voice,email}.md— email carries top senders + recent subjects, not just a count) converge at one compiler. A daily-digest pass distills the per-source captures into a singledaily/<date>.md(~500 words). All eleven collectors ride the formal Collector Protocol (SPEC+@register+run()): email, jamie, gmeet, voice, health, calendar, pictures, browser, tabs, screenshots, youtube. - Compile once, query fast — knowledge is distilled into Markdown wikilinks at compile time. No embedding step, no retrieval per query.
- Multi-agent hooks —
session-start/session-end/pre-compactwired into Claude Code, Codex, Gemini, and Cursor. Every session ends as a structured daily-log entry. - Curiosity loop — a small local Ollama model spots gaps after each compile and queues deep-scan requests for the next cycle.
- Self-healing wiki —
lint.pyruns 8 structural checks plus an LLM contradiction scan, so the wiki stays consistent as it grows.
Consumers of the map — operator-gated layers that read the compiled wiki; where they act, only with explicit approval, downstream of the compiler:
- Surfaced commitments — the compiler reads back the commitments your substrate reveals (jamie + gmeet transcripts; Task / Owner / Deadline / Context quartet) and surfaces them on the relevant
knowledge/people/<slug>.md+knowledge/projects/<slug>.mdentity pages as## Action Items(Obsidian-Tasks-plugin syntax) and## Open Threads. Resolved items demote to Timeline on the next compile; a dashboard pane + cross-entity Inbox MOC make them legible. This surfaces commitments as part of the portrait — it does not manage them: reminders, completion-as-workflow, and syncing to an external task tool are out of scope (a downstream consumer's job). - Optimization suggestions — the compiler proposes YAML automations (e.g. mail-filter rules) with per-action approval before execution.
- Operator self-reports — air-gapped analytical surface at
<vault>/reports/. Validated clinical screens (PHQ-9, GAD-7, WHO-5, PSS-10, ISI, OLBI) scored by an informant agent reading the operator's own substrate. Deterministic Likert + cutoffs; the LLM only fills in raw answers. Two-pass analyst (per-study + cross-study).wiki study run <id>+wiki analyze. Seedocs/cli.md.
Foundations:
- Engine / vault split — engine code, prompts, hooks, runtime state, and venv all live under
<vault>/.wiki/. The vault root stays clean. - One install, one CLI, one venv —
wiki setup+wiki update+wiki statuscover the full lifecycle.
High-level dashboard. For the full cognitive-architecture diagram see docs/architecture.png. Both rendered from .excalidraw files in docs/.
The sidebar is the data layout — raw/, daily/, knowledge/ — exactly the folders the compiler creates and reads. The open file is a compiled article: frontmatter, body, wikilinks, and sources, none of it written by hand.
PATH A — Automatic capture PATH B — Curated sources
───────────────────────────── ──────────────────────────────────
session-start / session-end / Collectors (Registry — all eleven):
pre-compact hooks attach to every · email (multi-backend mailboxes)
Claude Code / Codex / Gemini / · jamie (Jamie AI meetings)
Cursor session. flush.py extracts · gmeet (Google Meet / Gemini transcripts)
the conversation transcript and · calendar (Google Calendar v3, per-date rollups)
appends a structured entry to · voice (iOS Shortcuts / OpenWhispr dictation)
daily/<date>/sessions.md, then a · health (Oura biometric daily rollup)
daily-digest agent distils all · pictures (phone-photo inbox + gemma4 vision)
per-source captures into · scan-browser (Firefox + Chrome)
daily/<date>.md (~500 words). · scan-screenshots (Vision LLM)
· scan-tabs (Firefox STG)
· scan-youtube (yt-dlp + gemma4 visual)
Other writers:
│ · clippings-sweep (Obsidian Web Clipper)
· ingest-html (file or URL)
· process-inbox (LLM-classified drop)
▼ │
daily/<date>/{sessions,health, ▼
meetings,voice,email}.md + raw/{articles,papers,notes,transcripts,
daily/<date>.md (compile digest) voice,audio,requests,suggestions}/
│ │
└──────────────┐ ┌──────────────────────┘
▼ ▼
┌────────────────────┐
│ compile.py │ Claude Agent SDK loop.
│ │ Reads index.md + AGENTS.md (vault),
│ │ walks raw/ + daily/, distils into
│ │ atomic articles + cross-links.
└────────────────────┘
│
▼
┌────────────────────┐
│ knowledge/ │ LLM owns; you and your agents read.
│ concepts/ │
│ connections/ │ ← cross-source links
│ projects/ │
│ people/ │
│ qa/ │
│ facts/ │ ← human-owned hard facts (override sources)
│ MOCs/ │ ← curated topic hubs (Maps of Content)
│ index.md │ ← master catalogue
└────────────────────┘
After every compile, two side loops run on the new article:
- Curiosity loop — a small local Ollama model spots gaps and writes JSON deep-scan requests to
raw/requests/. The next compile cycle picks one up and fills the gap. - Optimization suggestions — the compiler emits YAML proposals to
raw/suggestions/for repeatable manual actions (e.g. mail filter rules).suggestions/cli.pyapplies them only after explicit per-action approval.
lint.py watches the wiki itself: 8 structural checks (broken links, orphan pages, orphan sources, stale articles, missing backlinks, article type, sparse articles, fact violations) plus one LLM-driven contradiction scan.
When raw sources contradict reality — a Slack thread that calls a project by its old name, an old memo that claims a never-won award — wiki correct lets you write a hard fact to knowledge/facts/<slug>.md. Hard facts inject into compile + query prompts at the highest authority, so future compilations honour them automatically. wiki correct apply <slug> then spawns an agent that walks the existing wiki, strikes contaminated claims, fixes wikilinks, and — for disambiguation facts — renames files. Raw sources stay immutable; only knowledge/ (and minimal correction notes in daily/) are touched.
wiki reconcile is the autonomous, signal-driven version of that loop: it reads the lint fact-violations (concepts that contradict a hard fact) and reconciles them under a strict envelope — writes scope-locked to knowledge/concepts/, structural gates (skip a fact touching more than N concepts → manual review, cap facts per run), a per-fact cooldown, and tiered autonomy (only unambiguous fact-violations are auto-fixed; concept↔concept contradictions stay propose-only in the lint surface). It is dry-run by default and double-gated OFF (features.concept_reconciliation + a piggybacks.concept_reconcile block) so it only runs autonomously once you opt in.
wiki health-trends is the deterministic synthesis consumer for the health corpus. A single day's biometrics aren't knowledge, but trends across years are — so this pass ($0, no LLM) aggregates every numeric metric in raw/notes/health/** into a coverage-aware ## Trends block (range, all-time vs recent average, trend arrow) inside knowledge/concepts/health.md. Default OFF; safe to enable.
wiki dedup cleans up the silent duplicates that speech-to-text leaves behind. Transcribers garble names consistently — josefine-bartsch vs josephine-bartc, a real company vs a phantom the model invented — splitting one entity across two pages. Detection is $0 and deterministic (fuzzy + a German-aware phonetic key + shared sources); every merge is operator-confirmed, folds the duplicate's timeline/action-items/aliases into the survivor, rewrites all wikilinks, backs up + deletes the duplicate, and records a canonical-name hard fact so it can't silently come back.
wiki dream web-research enriches public people (founders, execs, speakers) the way you'd reflexively google a new contact — a targeted Exa search at dream time writes a clearly-labelled ## Public Profile block, kept structurally separate from compiled content and never fed back into raw/. Opt-in per vault and per entity; default OFF (it makes an external paid API call).
wiki usage shows where the model budget actually goes. Every LLM call — Claude (subscription) and Ollama (local) — is metered in tokens per provider/model, not dollars (a single USD figure would conflate non-commensurable billing). Usage lands in state/usage.json; wiki usage --days N prints it per day with totals.
A real knowledge/concepts/agent-config-staleness.md from a working vault — illustrative excerpt:
---
title: "Agent Config Staleness Pitfall"
aliases: [stale-claude-md, stale-agent-config, wrong-base-pr-incident]
tags: [agents, configuration, claude-md, incident]
sources:
- "daily/2026-04-16.md"
- "daily/2026-04-24.md"
created: 2026-04-16
updated: 2026-05-02
---
# Agent Config Staleness Pitfall
Agent-facing configuration files (CLAUDE.md, AGENTS.md) are *executed* by
agents, not just *read*. When they contain stale facts about the repo
(default branch, layout, conventions), agents act on those stale facts and
produce broken artifacts. PR #121 in `Yesterday-AI/agentic-foundation` was
a concrete instance: a stale "default branch = `feat/initial-structure`"
line in `CLAUDE.md` caused an agent to base a PR on the wrong branch …
## Key Points
- **CLAUDE.md is not documentation — it's instructions.** Agents follow it
like a runbook. Outdated facts produce outdated actions.
- **Counter-pattern to [[concepts/documentation-redundancy-for-agents]]:**
redundancy protects against agents *missing* a rule; freshness protects
against agents *following* an outdated rule. Both matter.
## Related
- [[concepts/agentic-foundation-skill-system]] — same repo, flat skills layout
- [[concepts/research-before-suggesting]] — verify state, don't trust assumed state
- [[concepts/a2a-one-click-provisioning]] — second instance of the same bug class
## Sources
- [[daily/2026-04-16.md]] — Session `fd9195f9` (18:08): PR #121 rejected
- [[daily/2026-04-24.md]] — A2A skill hallucinated obsolete UI pathThree things to notice:
- Frontmatter is structured (aliases, tags, sources, dates) so Dataview queries hit it cleanly.
[[wikilinks]]point both into the wiki (concepts/...) and back to durable sources (daily/...,raw/notes/...,raw/articles/...,raw/transcripts/...) — the audit trail is part of the article, not a metadata field.- The article is atomic. It argues one idea, cites two different days' sessions, and links to four sibling concepts. The compiler chose this granularity from the raw substrates; nothing is hand-curated.
Knowledge is distilled into Markdown wikilinks at compile time — no embedding step, no retrieval at every query.
| Approach | Cost per query | Latency | Cross-doc reasoning |
|---|---|---|---|
| RAG | re-embed + retrieve every time | seconds | weak — chunks are isolated |
| llm-wiki | one-time compile per source | ms — it's already markdown | strong — LLM saw all sources during compile |
Speed compounds. Every query is a Markdown read, so the wiki ends up read more often than it's written — by you, and by every agent you give vault access to. That inversion (output → input ratio greater than 1) is the point.
Inspired by Andrej Karpathy's LLM Wiki (the raw/ + knowledge/ shape, the compile-don't-retrieve choice) and Cole Medin's claude-memory-compiler (the session-capture pattern). The architecture wrapped around them — collectors, two-path ingest, curiosity loop, suggestions, lint, hooks across multiple agents, the engine/vault split — is the work of this project. Full design rationale in docs/concept.md; hard-won engine learnings (Ollama gotchas, rate-limit debugging, anti-patterns) in .ytstack/KNOWLEDGE.md.
<vault>/
├── AGENTS.md ← article-schema spec (seeded from templates/, then yours)
├── dashboard.md ← Dataview home page (seeded from templates/, then yours)
├── raw/ ← immutable curated sources (LLM reads, never writes)
├── daily/ ← per-day rollup: <date>/{sessions,health,meetings,voice,email}.md per-source captures + <date>.md compile-stage digest
├── knowledge/ ← LLM-compiled wiki (LLM owns, you and agents read)
├── reports/ ← operator self-reports (air-gapped from compile): studies/<id>/runs/<ts>/instruments/*.md + _summary.md + _analysis.md, plus analyses/ for Pass-2
├── inbox/ ← transient — process-inbox.py classifies + moves to raw/
├── Clippings/ ← optional — Obsidian Web Clipper drop point
├── .obsidian/ ← Obsidian config (community-plugins, core-plugins seeded)
├── .claude/skills/ ← symlinks to .wiki/skills/<name> (auto-discovered)
└── .wiki/ ← engine — hidden from Obsidian, never modified by hand
The vault holds the data. .wiki/ holds the engine. The two never mix on disk. The engine's internal layout is documented in docs/engine-layout.md.
curl -fsSL https://raw.githubusercontent.com/lx-0/llm-wiki/main/install.sh | bash
# or with explicit target:
curl -fsSL https://raw.githubusercontent.com/lx-0/llm-wiki/main/install.sh | bash -s -- ~/path/to/vaultThe installer clones into <target>/.wiki/, seeds config.yaml from config.example.yaml, runs uv sync so the venv lives at <target>/.wiki/.venv/, and seeds the vault root from .wiki/templates/ — but only when each target file is absent, never overwriting existing work.
| Created path | Source | Purpose |
|---|---|---|
<vault>/AGENTS.md |
templates/AGENTS.example.md |
Article-schema spec read by every compile prompt. Edit the Vault Owner + Language sections. |
<vault>/dashboard.md |
templates/dashboard.md |
Obsidian Dataview home (recently-compiled / wiki stats / recent daily logs). |
<vault>/.obsidian/community-plugins.json |
templates/.obsidian/ |
Lists dataview + obsidian-excalidraw-plugin for first-launch approval. |
<vault>/.obsidian/core-plugins.json |
templates/.obsidian/ |
Sensible defaults (daily-notes, properties, graph on; sync/publish off). |
<vault>/.claude/skills/use-llm-wiki |
symlink to .wiki/skills/use-llm-wiki |
The one bundled engine-side skill — lets agents query, contribute, or diagnose this wiki via the wiki CLI from any project. Global-eligible: wiki skills install --global links it into ~/.claude/skills/ so agents anywhere can reach it. The other operator skills (engine-pr, excalidraw-diagram, ingest-audio, vault-health-check, vault-triage) ship via the Claude Code plugin marketplace yesterday-public-plugins, not as repo-bundled skills. |
Prerequisites: bash ≥ 4, git, jq, uv (Python package manager). Optional but recommended: a local Ollama — the curiosity loop, screenshot vision, inbox classification, HTML visual analysis, and per-article review all run on local models. The Claude paths (compile, query, lint contradiction check, flush) work without it.
After install:
cd ~/path/to/vault
./.wiki/wiki setup # 6-question wizard: Ollama URL, compile model,
# compile-after hour, procmail execution, local-LLM
# bundle, global skill install
./.wiki/wiki status # config + hook install table + Ollama probe
./.wiki/wiki # interactive home screen — context-sensitive
# "what's pending" + browse categories./.wiki/wiki update # git pull --ff-only + sync skill symlinks into .claude/skills/
./.wiki/wiki update --no-skills # pull only (skip skill sync)
./.wiki/wiki skills status # per-skill linked / collision / missing table + global state
./.wiki/wiki skills install # ad-hoc resync without pulling
./.wiki/wiki skills install --global # also link use-llm-wiki into ~/.claude/skills/ + register vault
./.wiki/wiki seed # additive: add missing vault templates (dashboard, plugin configs)
./.wiki/wiki seed --force # overwrite existing templates with engine versionswiki update pulls into .wiki/ (preserves config.yaml + .venv/), then runs wiki skills sync so newly-shipped engine skills land in <vault>/.claude/skills/ automatically. Foreign entries (your own skills, other tools' symlinks) are never touched. When skills.global_install is on, the sync also refreshes the global ~/.claude/skills/ symlink — the opt-in survives updates with no re-flagging.
If the .wiki/ checkout has uncommitted changes to tracked engine files (direct edits inside <vault>/.wiki/ — which a --ff-only pull would otherwise refuse), wiki update lists them and offers to git stash them so the pull can proceed. After pulling it re-applies the stash automatically, but only if it merges cleanly onto the new engine; if it would conflict, the stash is left unpopped (recover with git -C <.wiki> stash pop). Run it from a terminal — non-interactively (e.g. dashboard button) it refuses rather than touching your changes. (config.yaml, state/, logs/ are gitignored and never trigger this.)
wiki seed re-applies engine templates to the vault root after an update — adds missing files (dashboard.md, _dashboard-stats.md, .obsidian/plugins/<name>/data.json) and merges community-plugins.json (additive — never drops your own plugins). Default mode never overwrites your existing files; use --force to replace customisations of dashboard.md / AGENTS.md with the engine version.
The venv lives inside .wiki/. Two equivalent invocations:
# Option A — cd into .wiki first (matches script docstrings)
cd ~/path/to/vault/.wiki
uv run python scripts/compile.py
# Option B — pin --project from any CWD
uv run --project ~/path/to/vault/.wiki python ~/path/to/vault/.wiki/scripts/compile.pyHooks always use Option B (the --project flag is hardcoded into the agent config).
| Doc | What's inside |
|---|---|
| docs/PRINCIPLES.md | The ten core rules — constitution of the wiki, with verbatim Karpathy / OpenClaw / own quotes + attribution |
| docs/concept.md | Three-layer architecture, compile-vs-RAG, cognitive-function mapping, curiosity loop |
| docs/PROCESS.md (German) | Live documentation of every data flow inside the engine — 13 numbered processes (German prose, English diagrams) |
| docs/cli.md | Full CLI reference — every wiki <subcommand>, every config key, every hook target |
| docs/FEATURES.md | Implementation map of every engine feature — status, code location, trigger, known gaps. Maintained alongside code. |
| docs/engine-layout.md | File-by-file tree of .wiki/ — the engine internals |
| docs/naming.md | Naming conventions for raw sources and knowledge articles |
| docs/architecture.png | Full Excalidraw render of the cognitive architecture |
| AGENTS.md | Conventions for AI agents working on this codebase (separate from the vault's own AGENTS.md) |
| .ytstack/PROJECT.md | Project framing, success criteria, current status |
| .ytstack/DECISIONS.md | Locked architectural choices |
| .ytstack/KNOWLEDGE.md | Hard-won engine learnings (Ollama, rate limits, anti-patterns) |
Documentation language. Most docs are English.
docs/PROCESS.mdis in German — pull requests that touch a flow keep the existing prose language; bilingual is fine, and Mermaid labels / table headers in English keep diagrams accessible.
Secrets-leak prevention runs on two layers:
-
CI (.github/workflows/secrets-scan.yml) — gitleaks scans every push + PR + nightly cron. Blocks merge on leaks.
-
Pre-commit (.pre-commit-config.yaml) — local hooks block leaks before they reach git:
pip install pre-commit && pre-commit install
Gitleaks rules + allowlist live in .gitleaks.toml. Manual scan: gitleaks detect --no-banner -v.
If you find a leak in history, rotate the secret immediately, then file an issue (don't post the leaked value).
Bare ./.wiki/wiki opens an interactive home screen (rendered by
scripts/menu.py via prompt_toolkit — arrow keys, redraw, raw mode) with: a
status one-liner (384 articles · last compile 4h ago · ollama ✓),
context-sensitive suggestions (3 files in inbox/ → process-inbox, 12 sources changed → compile), 4 quick-action letter shortcuts, a 6-bucket
category browse, and a /<substring> fuzzy filter across 49 commands. Plus a
health banner at the top when something's broken (missing hooks,
unreachable Ollama, recent compile errors) with inline fix hints. The menu
shells back to wiki <subcommand> for every dispatch — bash stays the single
source of truth for what each subcommand does. Non-TTY callers (CI, hooks,
pipes) get the help dump and exit. Two agent-facing JSON surfaces:
wiki menu --json (what's pending) and wiki doctor --json (config +
connectivity + pipeline audit, with --quick for sub-50ms hook usage). Full
reference — every subcommand, every config key, every hook target — lives in
docs/cli.md.
Engine internals (file-by-file tree, the bash/python/jq split rationale) live in docs/engine-layout.md. Development conventions — how to add an agent target, a tunable, or a prompt; style + side-effect rules — live in AGENTS.md.