reframes #33 / #34 against invariant #1 (op log is source of truth, .md is a projection).
WorkspaceIndex::build(workspace_root) (crates/outl-md/src/index.rs:73) does:
- walkdir
pages/ + journals/
read_to_string on every .md
parse(comrak) on each one
- reads the sidecar to map IDs
- populates
PageEntry + BlockIndex
Called on boot by:
outl-tui on a background thread (spawn_index_rebuild, crates/outl-tui/src/actions/lifecycle/index_build.rs:37)
outl-cli/cmd/doctor.rs:234
outl-cli/cmd/search.rs:67
outl-cli/cmd/backlinks.rs:86
outl-cli/mcp/mod.rs:128
Cost: ~250ms debug on 1500 pages (crates/outl-md/tests/index_perf.rs:18).
The problem
Everything the index needs already lives in the tree CRDT after the op-log replay:
| Data |
Already in the tree |
| Page list |
Tree::iter_nodes filtering root children with slug:: |
slug / title / icon / pinned / type |
Tree::property(page_id, key) |
is_journal |
kind:: property on the page node (PageKind::Journal) |
| Block text (search + ref extraction) |
Workspace::block_text(id) |
((blk-XXXXXX)) handle ↔ NodeId |
derive_ref_handle(NodeId) — deterministic, no sidecar needed |
| Reverse refs and backlinks |
outl_md::inline::tokenize over block_text |
None of this needs comrak, walkdir, or sidecar reads.
The index is deriving from the projection (.md) instead of from the source of truth (op log). That's the actual bug.
Proposal
impl WorkspaceIndex {
// before
pub fn build(workspace_root: &Path) -> Self { /* walkdir + parse + sidecar */ }
// after
pub fn derive(workspace: &Workspace) -> Self { /* tree walk + block_text + tokenize */ }
}
Boot becomes:
- Replay op log → tree (already runs).
WorkspaceIndex::derive(&workspace) → zero I/O, zero global parse.
- Lazy parse of
.md still happens only in load_current (one page at a time, when the user opens it).
Edge cases
.md edited externally (vim, iCloud peer drop): orphan scanner detects → reconcile_md → ops → tree → patch_page derive version (picks up the refreshed tree, not the filesystem).
- Fresh workspace post Logseq import:
outl_actions::ingest_md_file already creates ops; the index derives normally after that.
text_fold (lowercased cache for search_block_text): populated from the tree's block_text.
parse_warnings: stays per-page, only when a .md is opened. Doesn't enter the index.
Migration
- Replace
WorkspaceIndex::build(path) with WorkspaceIndex::derive(&workspace) at every caller.
- CLI / MCP load the workspace first (they already do) and derive the index instead of walking + parsing.
- Tests in
crates/outl-md/tests/workspace_index.rs migrate to building an in-memory workspace fixture + derive, instead of writing .md files in a tempdir.
Non-goals
- No persisted cache on disk (
.outl/cache/).
- No sidecar change.
- No op-log format change.
- No LRU AST cache.
Related
WorkspaceIndex::build(workspace_root)(crates/outl-md/src/index.rs:73) does:pages/+journals/read_to_stringon every.mdparse(comrak)on each onePageEntry+BlockIndexCalled on boot by:
outl-tuion a background thread (spawn_index_rebuild,crates/outl-tui/src/actions/lifecycle/index_build.rs:37)outl-cli/cmd/doctor.rs:234outl-cli/cmd/search.rs:67outl-cli/cmd/backlinks.rs:86outl-cli/mcp/mod.rs:128Cost: ~250ms debug on 1500 pages (
crates/outl-md/tests/index_perf.rs:18).The problem
Everything the index needs already lives in the tree CRDT after the op-log replay:
Tree::iter_nodesfiltering root children withslug::slug/title/icon/pinned/typeTree::property(page_id, key)is_journalkind::property on the page node (PageKind::Journal)Workspace::block_text(id)((blk-XXXXXX))handle ↔NodeIdderive_ref_handle(NodeId)— deterministic, no sidecar neededoutl_md::inline::tokenizeoverblock_textNone of this needs
comrak,walkdir, or sidecar reads.Proposal
Boot becomes:
WorkspaceIndex::derive(&workspace)→ zero I/O, zero global parse..mdstill happens only inload_current(one page at a time, when the user opens it).Edge cases
.mdedited externally (vim, iCloud peer drop): orphan scanner detects →reconcile_md→ ops → tree →patch_pagederive version (picks up the refreshed tree, not the filesystem).outl_actions::ingest_md_filealready creates ops; the index derives normally after that.text_fold(lowercased cache forsearch_block_text): populated from the tree'sblock_text.parse_warnings: stays per-page, only when a.mdis opened. Doesn't enter the index.Migration
WorkspaceIndex::build(path)withWorkspaceIndex::derive(&workspace)at every caller.crates/outl-md/tests/workspace_index.rsmigrate to building an in-memory workspace fixture + derive, instead of writing.mdfiles in a tempdir.Non-goals
.outl/cache/).Related
load_current; the real bottleneck was elsewhere).