Your codebase, compiled into a living wiki.
Maintained by LLMs. Searchable by LLMs. Viewable in Obsidian.
Installation · Quick Start · How It Works · Commands · Integrations
Every coding session starts from zero. Your AI agent re-reads files, re-discovers architecture, re-learns patterns. Past decisions and bug fixes are forgotten. There is no persistent understanding of a codebase across sessions.
CodeWiki compiles your codebase into a structured wiki of markdown articles — module overviews, architecture docs, cross-cutting concepts, decisions, and learnings. Your AI agent maintains it automatically. Knowledge compounds across every session.
Your Codebase ──► LLM reads code ──► Wiki (.md files) ──► LLM queries wiki
│
▼
Obsidian / QMD
CodeWiki is a thin CLI (cw) paired with a Claude Code skill. The CLI handles git ops, metadata, and scaffolding. Claude Code does all the reading, understanding, and writing. No API keys needed — your agent is the intelligence.
git clone https://github.com/mraza007/codewiki.git
cd codewiki
cargo install --path .cw --help# 1. Initialize a wiki for your project
cd your-project
cw init
# 2. Set up your AI agent
cw setup claude-code # or: cw setup codex
# 3. Start a Claude Code session — the skill triggers automatically
# Claude reads your codebase and compiles the wiki
# 4. Open in Obsidian
# Point Obsidian at ~/.codewiki/ as a vaultThat's it. From now on, every session starts with understanding and ends with updated knowledge.
Each project gets a wiki at ~/.codewiki/<project>/:
~/.codewiki/
├── my-project/
│ ├── _index.md # Master index of all articles
│ ├── _architecture.md # System overview
│ ├── _patterns.md # Recurring patterns
│ ├── _meta.yaml # Last compiled commit, timestamps
│ │
│ ├── modules/ # One article per logical module
│ │ ├── auth.md
│ │ ├── database.md
│ │ └── api.md
│ │
│ ├── concepts/ # Cross-cutting concerns
│ │ ├── data-flow.md
│ │ └── error-handling.md
│ │
│ ├── decisions/ # Why things are the way they are
│ │ └── why-postgres.md
│ │
│ ├── learnings/ # Bugs fixed, patterns discovered
│ │ └── auth-token-bug.md
│ │
│ └── queries/ # Past Q&A, filed back in
│ └── how-caching-works.md
│
├── another-project/
│ └── ...
Session start:
cw statuschecks what changed since the last compile- If first time: full compile — agent reads entire codebase, writes all articles
- If stale: incremental update — only re-analyze changed files
- If fresh: skip, start working immediately
During work: The agent queries the wiki instead of re-reading raw source files. Cross-cutting questions like "how does authentication work?" are answered by a single article that connects 8 files across 4 directories.
Session end:
- Bugs fixed become
learnings/<slug>.md - Decisions made become
decisions/<slug>.md - Changed code triggers article updates
cw index+cw meta updatekeeps everything current
Every article uses YAML frontmatter for metadata:
---
title: Authentication Module
type: module
source_files:
- src/auth/middleware.py
- src/auth/tokens.py
tags: [auth, middleware, jwt]
---
## Overview
JWT-based authentication using middleware chain...
## Key Components
- `AuthMiddleware` — validates tokens on every request
- `TokenService` — issues and refreshes JWTs
## Data Flow
Request → AuthMiddleware → TokenService → Handler
## Connections
- [[database]] — stores refresh tokens
- [[api]] — all routes pass through auth middleware
## Known Issues
Token refresh has a race condition under high concurrency.The source_files field is what makes cw status work — when those files change, the article is marked stale.
cw init # Scaffold wiki for current repo
cw status # Show changed files and stale articles
cw path # Print wiki path for current repo
cw projects # List all wikis
cw index # Rebuild _index.md from article frontmatter
cw meta update # Record current commit as "compiled"
cw setup claude-code # Install skill into Claude Code
cw setup codex # Install instructions into Codex
cw setup qmd # Register wiki as QMD search collection
cw uninstall claude-code # Remove from Claude Code
cw uninstall codex # Remove from Codexcw setup claude-code installs a skill at ~/.claude/skills/codewiki/. The skill tells Claude Code when and how to compile, query, and update the wiki. It triggers automatically at session start.
cw setup codex appends wiki instructions to ~/.codex/AGENTS.md. Codex will check the wiki at session start and update it at session end.
QMD is a local search engine for markdown files with hybrid BM25 + vector + LLM reranker search.
cw setup qmd # register collection
qmd query "how does auth work" -c codewiki # search the wikiWith QMD as an MCP server, your agent can search the wiki programmatically during a session.
EchoVault provides persistent memory for coding agents. CodeWiki and EchoVault complement each other:
- CodeWiki = what the code is (compiled understanding)
- EchoVault = what happened while working on it (decisions, bugs, patterns)
Both write into the same ~/.codewiki/<project>/ directory, so memories and code knowledge live side by side.
Open ~/.codewiki/ as an Obsidian vault. All wiki articles use [[backlinks]] natively, so you get a connected knowledge graph of your codebases out of the box. No plugins required.
┌──────────────────────────────────────────────┐
│ Your AI Agent │
│ (Claude Code / Codex) │
│ │
│ Reads code ─► Writes articles ─► Queries │
└──────────┬───────────────┬───────────────────┘
│ │
┌─────▼─────┐ ┌─────▼──────┐
│ cw CLI │ │ QMD │
│ │ │ (search) │
│ git ops │ │ │
│ metadata │ │ BM25 + │
│ scaffold │ │ vector + │
│ indexing │ │ reranker │
└─────┬─────┘ └─────┬──────┘
│ │
┌─────▼───────────────▼──────┐
│ ~/.codewiki/<project>/ │
│ │
│ Markdown files │
│ Viewable in Obsidian │
│ Searchable by QMD │
│ Version-controllable │
└────────────────────────────┘
Design principles:
- The CLI is not smart. It handles git diffs, file scaffolding, and metadata. Your AI agent does all the understanding and writing.
- Plain markdown. No databases, no proprietary formats. Just
.mdfiles with YAML frontmatter. - Central location. All wikis live at
~/.codewiki/. One Obsidian vault for all projects. Survives repo deletion. - Session-boundary updates. Wiki updates happen at session start and end — not continuously. Cost-contained, natural rhythm.
Traditional RAG chunks code, embeds it, and retrieves fragments. You get decontextualized snippets.
CodeWiki is different: an LLM reads and understands the code, then writes structured knowledge articles. When you query, you get pre-digested understanding, not raw chunks. The wiki is the semantic layer that RAG skips.
This project was inspired by Andrej Karpathy's tweet about using LLMs to build personal knowledge bases — raw data compiled into wikis, queried and enhanced incrementally.
CodeWiki applies that same pattern to codebases.
Contributions welcome! Some areas that need work:
- More agent integrations — Cursor, Windsurf, OpenCode
cw lint— LLM health checks over the wiki (find contradictions, gaps)cw export— Generate slides, diagrams from wiki content- Multi-repo concepts — Cross-project articles for shared patterns
- Team sharing — Git-backed wikis that teammates can pull
I built CodeWiki to fix this. It compiles your codebase into a structured wiki that your AI agent maintains. Module overviews, architecture docs, how things connect. The agent writes it once, keeps it updated as code changes, and uses it as context in every future
session.
No more cold starts. The understanding carries over.
Idea came from a Karpathy tweet about using LLMs to compile raw data into structured wikis. Codebases are raw data too.
Links in comments.
First comment:
GitHub: github.com/mraza007/codewiki
Blog post: muhammadraza.me/2026/building-codewiki-compiling-codebases-into-living-wikis/