dev: reh3376_dev01 -> main#408
Conversation
Promote the Unreleased CHANGELOG block to v0.9.0 (2026-05-06) ahead of release.yml / goreleaser tag push. New ### Breaking subsection captures two operator-visible cutovers since v0.8.5: (1) Phase 13.5 LLM runtime port 8101 -> 8102 + .env migration required; (2) Phase 13.6 MLX_* -> LLM_* env-var rename (legacy aliases retained for >= 1 release cycle). New ### Added entries: Phase 10.5 closeout (UBENCH framework promotion, commit 0389b49) and Claude Code GitHub App workflows (PRs #378, #379). All previously-Unreleased entries (Phase 14.2.3, 14.2.x, 14.1.x, 14, 13.6, 13.5, 13.2, 13.1) carried forward unchanged into the v0.9.0 block. Fresh empty Unreleased section seeded above. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps packaging/homebrew-mdemg pointer a235977 -> 6077097, which incorporates: - f9358cd Brew formula update for mdemg version v0.8.5 (goreleaser, prior) - b4a0d2c Brew formula update for mdemg version v0.9.0 (goreleaser, this release) - 6077097 docs: v0.9.0 -- CHANGELOG, README What's New, beta-testing version pin Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`config.FromEnv()` defaulted MdemgVersion/MdemgCommit to literal "0.6.0"/
"unknown" when MDEMG_VERSION/MDEMG_COMMIT envs were unset. Both /healthz
and /readyz serialize cfg.MdemgVersion, so they reported "0.6.0" forever
regardless of the actual binary's ldflags-injected cli.Version.
Fix: defaults to "" in config; cli/config_loader.go injects cli.Version /
cli.Commit (the build-time vars set by goreleaser ldflags) when the env
override is unset. Operators can still pin via MDEMG_VERSION env.
Live-verified: dev build (no ldflags) now reports {"version":"dev"} on
/healthz instead of the lying "0.6.0". Production builds via goreleaser
will report the real semver tag.
TestHandleHealthz unaffected (sets cfg.MdemgVersion directly).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a-server Phase 13.5 cutover (2026-05-03) replaced mlx_lm.server (port 8101) with llama.cpp llama-server (port 8102) as the production LLM runtime, but the embedded launchd plist template + service install code paths were never updated. Any operator running 'mdemg service install' from a fresh checkout got the decommissioned mlx_lm.server agent — mdemg's startup preflight then failed because LLM_ENDPOINT=http://127.0.0.1:8102/v1 wasn't reachable. Changes: - New packaging/launchd/com.mdemg.llama-server.plist with the Phase 13.5 production flags (--ctx-size 32768 --parallel 4 --cont-batching --metrics --jinja). Byte-identical mirror at internal/cli/launchd_templates/ for the embed.FS (CI sync-check enforced). - Removed packaging/launchd/com.mdemg.mlx-server.plist + embed.FS mirror. mlx_lm.server is decommissioned and known-broken on M5 + macOS 26.3.x; keeping the template would just risk re-deploying it. - internal/cli/service_darwin.go: launchdServices entry replaced with com.mdemg.llama-server. resolveMLXLMBin renamed to resolveLlamaServerBin with primary env MDEMG_LLAMA_SERVER_BIN, deprecation alias for MDEMG_MLX_LM_BIN (slog.Warn at boot, retained ≥1 release cycle per the Phase 13.6 deprecation pattern), PATH lookup of `llama-server`. resolveMDEMGModelPath default updated to the canonical Phase 13.5 GGUF filepath (.local-models/mdemg-llm-v1-gguf/mdemg-llm-v1.Q5_K_M.gguf) since llama-server takes a `.gguf` filepath, not an HF-format directory like mlx_lm.server. Install error message updated for the new env var name + remediation steps (`brew install llama.cpp`). - migrateLegacyMLXServerPlist() added: if a pre-cutover com.mdemg.mlx-server plist is bootstrapped on the operator's machine, Install() boots it out and renames the file to .disabled-phase13_5 (matches the manual operator convention from Phase 13.5 rollout). Best-effort: failures don't block the install. - internal/cli/service_darwin_test.go fully rewritten: * TestLaunchdServicesIncludesLlamaServer asserts the new entry exists and is Optional=false (production matches Hotfix 11.6.3.1; the old test asserted Optional=true, a latent lie since 2026-05-02 that Linux CI never caught because of //go:build darwin) * TestLlamaServerPlistEmbedded replaces TestMLXServerPlistEmbedded; additionally asserts mlx-server.plist is NOT in embed.FS * Two resolver tests for the primary env var * New TestResolveLlamaServerBinFallsBackToMLXAlias proves the Phase 13.6 deprecation alias path works * resolveMDEMGModelPath tests updated for the new GGUF default - internal/cli/watchdog.go: help text references com.mdemg.llama-server (instead of com.mdemg.mlx-server) and llama-server (instead of mlx_lm.server). Notes that mdemg_mlx_health_state metric name is retained for dashboard compatibility. Tested: - Tier 1 unit: 7/7 new tests pass; full ./internal/cli/... suite green (61s wall-clock). - Tier 2 integration: golangci-lint run ./internal/cli/ — 0 issues. CI plist sync-check (diff -q packaging/launchd/*.plist internal/cli/launchd_templates/) — 6/6 byte-identical. - Tier 3 live e2e: deferred. Running mdemg service install on the operator's currently-serving machine would briefly bootout the running llama-server LaunchAgent (PID 20527 actively serving production inference). The hand-installed llama-server plist on the operator's machine is byte-equivalent (modulo template substitutions) to what this commit will install via `mdemg service install` on a fresh operator setup, so the operator can verify on next planned redeploy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Epic 0 of Sprint MODEL-DIST-001 — Local LoRA Distribution via Ollama Library. Sprint plan in 12-section v1.0 format. Supersedes parts of the speculative spec at docs/research/mdemg_sprint_ideas/MDEMG_FT_LORA_PACKAGING_SPEC.md (HF Hub vs Ollama Library; adapter-only vs both-fused-and-adapter; Apple Silicon scope vs cross-platform). Configurability Contract — every operator-visible value is dynamic per the framework's no-hardcoding rule. 12 env vars + flag overrides + sensible defaults. ModelFetcher interface decouples CLI from Ollama-specific knowledge; v1 ships OllamaFetcher only, future backends (HF / S3 / GitHub Release / file) plug in via factory dispatch on MDEMG_MODEL_BACKEND without touching the CLI surface. Forensic from Epic 0: - adapters/tier1/adapters.safetensors verified present (514 MB MLX, Phase 5 SFT Iter 2400 best output) - mdemg-llm-v1.Q5_K_M.gguf SHA256 captured (9.8 GB; 144ad7231...) - f16 GGUF intermediate NOT on disk; Epic 1 will regenerate via convert_hf_to_gguf.py from the MLX merged model (~5 min) - qwen3:14b model-layer digest captured from Ollama registry; manifest digest to be computed at Epic 3 for Modelfile FROM @sha256: pinning quant_manifest.json skeleton with Q5_K_M SHA pre-populated; Q4_K_M / Q8_0 / adapter SHAs filled in during Epics 1+2. Estimated effort 5–7 dev-days. OpenAI spend $0. Risk medium (Ollama publish one-way; MLX→PEFT→GGUF LoRA conversion is the riskiest engineering item with documented contingency to defer to MODEL-DIST-002 if blocked). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pipeline (CLAUDE.md Phase 13.5 documented path):
1. mlx_lm.fuse --dequantize: mlx-community/Qwen3-14B-4bit + adapters/tier1/
-> 29.6 GB bf16 HF safetensors at .local-models/qwen3-14b-mdemg-v1-bf16/
2. convert_hf_to_gguf.py --outtype f16 -> 30 GB f16 GGUF (required
neural/.venv interpreter with torch + transformers + gguf installed;
/opt/homebrew/bin/convert_hf_to_gguf.py uses system python which lacks
these — installed gguf/sentencepiece/protobuf into neural/.venv)
3. llama-quantize Q4_K_M -> 9.0 GB (4.87 BPW; 40s wall on M5)
4. llama-quantize Q8_0 -> 16 GB (8.50 BPW; 11s wall on M5)
5. Live smoke per new quant via llama-server on port 18102 — both serve
/v1/models cleanly with embedded chat_template
SHAs captured in quant_manifest.json:
Q4_K_M: 401161710c22f0ae...411d42ea
Q5_K_M: 144ad723101d688f...d5f5d54 (matches Epic 0 baseline)
Q8_0: fc14dcb40af1bb58...8db6089
f16: 436cd6f41a684805...3217bd (intermediate, retained for Epic 2)
Resource matrix updated with empirical sizes (Q4_K_M is 9.0 GB vs estimated
6.5 GB; min RAM revised 8 -> 12 GB to cover ~3 GB working memory above
weights). 14B params x 4.87 BPW ≈ 8.5 GB matches the formula.
GGUF binary artifacts stay local — .local-models/ gitignored per
.gitignore:70. Sprint deliverable in git is just the manifest update.
Production llama-server (PID 20527 on port 8102) undisturbed throughout
Epic 1; live smokes used port 18102.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adapter (LoRA-only Modelfile via ADAPTER directive) deferred per the sprint plan's documented contingency clause. Fused-only path (Epics 1, 3, 4, 5) continues — that's the primary operator value. Forensic findings (epic_2_forensic.md): - MLX adapter is well-formed: 560 tensors, 40 layers x 7 target_modules, rank 32, alpha 64, scale 20.0. - convert_lora_to_gguf.py is NOT in brew install llama.cpp; would need manual fetch from llama.cpp source. - MLX -> PEFT requires tensor transposition: MLX lora_a is (in, rank); PEFT expects (rank, in). Same for lora_b. - Estimated 80-95 min to complete vs ~30 min budget remaining for Epic 2. - Hit the contingency criterion: "MLX -> PEFT conversion blocked by tooling gaps." Decision: defer adapter scope to MODEL-DIST-002 (new follow-up sprint, to be planned separately). Fused-only ships this sprint. Knock-on changes (in-flight to subsequent epics): - Epic 3: drop Modelfile.adapter; publish only 3 fused quants. - Epic 4 CLI: --adapter flag accepted at parse-time but errors with "lands in MODEL-DIST-002"; machinery preserved for forward-compat. - Epic 6 e2e: drop adapter-pull step. - Epic 7 feature doc: adapter section notes "coming in MODEL-DIST-002". Artifacts preserved on disk for MODEL-DIST-002 pickup: - adapters/tier1/adapters.safetensors (MLX, 514 MB) - .local-models/mdemg-llm-v1-gguf/mdemg-llm-v1.f16.gguf (30 GB, retained as base for llama-server --lora verification later) quant_manifest.json adapter block updated with status=deferred + reason. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sh pending) Authored 3 Ollama Modelfiles in packaging/ollama/: Modelfile.Q4_K_M — 9.0 GB, 12 GB min RAM, 16 GB recommended Modelfile.Q5_K_M — 11 GB, 14 GB min RAM, 24 GB recommended (production canonical) Modelfile.Q8_0 — 16 GB, 20 GB min RAM, 32 GB recommended Common shape: FROM ./../../.local-models/mdemg-llm-v1-gguf/...gguf relative path (operator-machine local); num_ctx 32768, num_predict 4096, stop tokens <|im_end|>/<|im_start|>; Apache-2.0 LICENSE; SYSTEM positioning block. No TEMPLATE directive — chat template baked into GGUF metadata (Qwen3 chat_template.jinja preserved through mlx_lm.fuse --dequantize → convert_hf → llama-quantize pipeline). packaging/ollama/README.md documents the publish workflow including the fork-customization path (operators publishing under a different namespace follow MDEMG_MODEL_NAMESPACE per the Configurability Contract). Local ollama create completed for all 3: reh3376/mdemg-llm-v1:Q4_K_M ID 5c3a7252c295 reh3376/mdemg-llm-v1:Q5_K_M ID 08c13b480864 reh3376/mdemg-llm-v1:Q8_0 ID 6b1006facd36 Layers de-duplicated: config + params + system layers (3 layers) are identical across all 3 quants; only the model blob (GGUF) differs. ** ollama push deferred ** — one-way action gated on operator confirmation per Sprint Plan §10 Risk #8. Operator must claim reh3376 namespace on ollama.com and generate API token before push proceeds. Local-create proves the Modelfiles are well-formed; push is a separate decision. Once pushed, manifest digests captured into quant_manifest.json (ollama_manifest_digest field per quant) for mdemg model verify. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…interface
Sprint MODEL-DIST-001 Epic 4 — the bulk of the operator-facing surface.
New CLI subcommand group:
mdemg model pull # fetch + symlink + SHA verify
mdemg model list # show pulled models
mdemg model verify # re-check SHAs vs quant manifest
mdemg model remove # destructive (requires --yes)
mdemg model where # print resolved path for shell scripting
Pluggable backend (internal/cli/model_fetcher.go):
type Fetcher interface { Name, Fetch, Verify, Remove }
NewFetcher dispatches on cfg.ModelBackend (env: MDEMG_MODEL_BACKEND)
v1 ships OllamaFetcher only; future backends (hf, s3, github-release,
file) plug in via factory branch — CLI surface unchanged.
OllamaFetcher (internal/cli/model_fetcher_ollama.go):
Encapsulates ALL Ollama-specific concepts: `ollama pull` invocation,
manifest path under <OLLAMA_MODELS>/manifests/<OLLAMA_HOST>/<ns>/<n>/<tag>,
mediaType=application/vnd.ollama.image.model layer filtering,
blob path under <OLLAMA_MODELS>/blobs/sha256-<digest>, symlink under
<MDEMG_MODEL_DIR>, idempotent.
Configurability Contract (no hardcoding; memory: feedback_no_hardcoded_values.md):
12 env vars + flag overrides, each with v1-production-tuned defaults so
`mdemg model pull` with no flags Just Works. See sprint plan §3.
Live-verified all 3 resolution paths:
`--quant Q5_K_M` → namespace=reh3376
`--namespace acme --name custom-model` → namespace=acme name=custom
`MDEMG_MODEL_NAMESPACE=acme env` → env overrides applied
Added to internal/config/config.go: ModelBackend, ModelNamespace,
ModelName, ModelQuants, ModelRamTiers, ModelQuant, AdapterBase,
ModelDir, OllamaModelsRoot, OllamaRegistryHost, ModelManifestPath.
Embedded quant manifest (internal/cli/quant_manifest.json via embed.FS):
Runtime source-of-truth for SHA verification. Operator override via
MDEMG_MODEL_MANIFEST_PATH for air-gapped deployments. Mirrors
docs/development/model-dist-001/quant_manifest.json.
RAM-tier auto-pick:
Default JSON `{"<16":"Q4_K_M","<24":"Q5_K_M","default":"Q8_0"}` maps
host RAM (sysctl on darwin, /proc/meminfo on linux) to quant. Operator
override via MDEMG_MODEL_RAM_TIERS.
Adapter path (--adapter flag) returns ErrAdapterDeferred per Epic 2's
contingency exit — adapter publication lands in MODEL-DIST-002. Flag
machinery preserved for forward compatibility.
Tests (22, all green) in internal/cli/model_test.go:
- Backend factory dispatch (5 cases incl. case-insensitive, default, error)
- Quant allowlist parsing (5 cases incl. whitespace + empty entries)
- RAM-tier JSON parsing (default + operator override + malformed)
- PickQuantForRAM (7 boundary cases)
- ResolveQuant across paths (auto, explicit, rejection, operator-custom)
- QuantManifest load (embedded + file override + missing-file error)
- Ollama tag composition (fused + adapter forms)
- Manifest path composition under custom OLLAMA_MODELS/OLLAMA_HOST
- Blob path digest prefix handling
- Adapter deferred error
- Manifest JSON parser (mediaType filtering + malformed + no-model-layer)
Grep audit (verification checklist):
grep on internal/cli/model*.go for hardcoded values found only in help
text Long/example strings documenting defaults to operators — not in
logic. Behavior values all flow through cfg.Model* fields.
Build + lint clean. Full cli test suite (61s wall) green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…+ writer Sprint MODEL-DIST-001 Epic 5 — observability for `mdemg model` operations. Grafana panels deferred to Sprint B (Grafana audit). New migration: internal/tsdb/migrations/021_model_install_events.sql Hypertable on recorded_at, 7-day chunks, 3 indexes (quant-time, failed-events partial, backend-event-time). Columns: event_id CUIDv2 PK + recorded_at, event_type (pull/verify/remove), backend_name, namespace, model_name, quant, adapter bool, success bool, latency_ms, sha256, size_bytes, err_message (1 KB cap). New writer: internal/tsdb/model_install_writer.go Synchronous single-row INSERT (not buffered + CopyFrom — CLI is one-shot, writes are infrequent vs the V0017/V0018/V0019/V0020 retrieval- path writers that fire per-request). Nil-pool no-op for degraded mode. errMessageMaxLen=1024 truncation at write time. New modelInstallPool interface (Exec-shaped) avoids touching the existing CopyFrom-shaped poolIface used by buffered writers. Wiring: internal/cli/model.go gets recordModelEvent(parent, cfg, row) helper: - Returns immediately if !cfg.TSDBEnabled || cfg.TSDBHost=="" - 2s timeout on connect (TSDB unreachable doesn't block CLI exit) - Logs warning + degrades gracefully on any TSDB error Called from runModelPull (success + failure paths), runModelVerify (single sweep row), runModelRemove (success + failure paths). Schema version bump: internal/config/config.go: TSDB_REQUIRED_SCHEMA_VERSION default 20→21. CI validator at .github/workflows/ci.yml:60-65 counts SQL files in internal/tsdb/migrations/ and asserts equality; now 21 files = 21 in config = passes. Build + lint clean. Existing tsdb / cli test suites green; no new tests added for the writer itself (single INSERT mirrors V0017/V0018/V0019 patterns already covered; integration is operational verification at Epic 6 once tsdb is up in the dev stack). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 Epic 7 — operator-facing feature documentation
following the standard Why / Choices / How / How-to-use shape (memory:
feedback_per_feature_docs_required.md).
Contents:
- Why: gap between brew install and a working local LLM after Phase 13.5
- Choices: backend matrix (Ollama vs HF vs GitHub vs S3 vs file://),
artifact form (fused vs adapter), Apple Silicon scope, "Ollama runtime
rejected (broken on M5+macOS 26.3.x), Ollama distribution only"
- How it works: ASCII flow diagram covering CLI dispatch -> Fetcher
interface -> OllamaFetcher (preflight, ollama pull, manifest discovery,
blob resolve, symlink, SHA verify) -> V0021 observability row
- How to use:
* Quick start (3 commands: brew install ollama, mdemg model pull,
curl /v1/models)
* Explicit quant selection
* Managing pulled models (list / verify / where / remove)
* Forks + enterprise (MDEMG_MODEL_NAMESPACE override)
* Air-gapped (MDEMG_MODEL_MANIFEST_PATH override)
* Resource matrix per quant (disk, min RAM, recommended RAM, BPW)
* Full Configurability Contract table (11 env vars + flags + defaults)
* V0021 observability schema
- Troubleshooting: ollama missing, SHA mismatch, quant allowlist
rejection, RAM auto-detection failure, out-of-disk, symlink permission
- Forward-looking: MODEL-DIST-002 adapter, Sprint B Grafana panels,
future backends, cross-platform
- References: all source-of-truth files cross-linked
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 Epic 8 — final epic, never cut (memory: feedback_sequential_epics.md). This commit lands the main-repo doc updates. The packaging/homebrew-mdemg/ submodule docs (README, CHANGELOG, formula caveats text) update at v0.10.0 release-tag time per the v0.9.0 release flow precedent — that's when goreleaser auto-regenerates mdemg.rb from .goreleaser.yaml's caveats template, and the tap-side README/CHANGELOG get edited in lockstep. Changes: - CHANGELOG.md: comprehensive Unreleased entry documenting Epics 0-5 + 7 landed in this sprint. Epic 3 ollama push and Epic 6 Tier 3 e2e marked as gated on operator confirmation. Adapter path explicitly deferred to MODEL-DIST-002 with epic_2_forensic.md cross-reference. Captures the Configurability Contract enumeration, the 3 quant SHAs, the Fetcher interface design, the V0021 hypertable, and the explicit out-of-scope list. - CLAUDE.md: new "Model Distribution (Sprint MODEL-DIST-001)" subsection in Architecture Notes, slotted ABOVE the existing Compose embed entry for visibility. Captures the pluggable-backend design, the Ollama-as- distribution-only constraint, the on-disk symlink + manifest discovery flow, the 11-knob Configurability Contract surface, the no-hardcoding enforcement, the TSDB V0021 hookup, and the Apple Silicon v1 scope. - README.md: new "Step 2b (optional): Pull the local LLM" section between Step 2 (Initialize/Start) and Open the Dashboard. 3-command quick start (brew install ollama -> mdemg model pull -> set MDEMG_MODEL_PATH). Cross-references the feature doc for the full Configurability Contract. - .goreleaser.yaml: caveats template updated to include `mdemg model pull` instructions. Goreleaser regenerates the homebrew formula's caveats block from this on the next v* tag push, so v0.10.0 will ship the new text to brew users automatically. Deferred to v0.10.0 release-tag time (handled per v0.9.0 precedent): - packaging/homebrew-mdemg/README.md update - packaging/homebrew-mdemg/CHANGELOG.md update - packaging/homebrew-mdemg/mdemg.rb regeneration (automatic via goreleaser from the .goreleaser.yaml change in this commit) - Submodule pointer bump in main repo Deferred to Epic 6 close (after operator does ollama push): - post.md sprint-close document - Capture of remote Ollama manifest digests into quant_manifest.json Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All 3 fused quants now live on Ollama Library: https://ollama.com/reh3376/mdemg-llm-v1:Q4_K_M https://ollama.com/reh3376/mdemg-llm-v1:Q5_K_M https://ollama.com/reh3376/mdemg-llm-v1:Q8_0 End-to-end integrity verified: remote model-layer digests captured via GET https://registry.ollama.ai/v2/reh3376/mdemg-llm-v1/manifests/<quant> match the local Epic 1 SHAs exactly: Q4_K_M 401161710c22f0ae...411d42ea (matches Epic 1) Q5_K_M 144ad723101d688f...d5f5d54 (matches Epic 1) Q8_0 fc14dcb40af1bb58...8db6089 (matches Epic 1) Captured into quant_manifest.json (both docs canonical + internal/cli embed.FS mirror, byte-synced): - ollama_manifest_digest per quant (computed from the manifest body): Q4_K_M sha256:a210cccb12311773fd70bfa81f221ca0f7940a315bef87b84608caf894533b1b Q5_K_M sha256:ae6e54fe1ee0b487ae41260687ed14c46c30d1ffb0fece936282418b5bcb78e1 Q8_0 sha256:93df4d64bfa751506f7afba8bf08b891ea828575b838adec17b9399ad85be718 - Corrected size_bytes (Epic 1 used approximate values; replaced with registry-reported exact bytes for each tag): Q4_K_M 9.0 GB -> 8.4 GB (9001753408 B; was 9658404096) Q5_K_M 11 GB -> 9.8 GB (10514569568 B; was 11811160064) Q8_0 16 GB -> 14.6 GB (15698534208 B; was 17179869184) - Status flipped from "local-create done; push pending" to "published". Embedded runtime manifest (internal/cli/quant_manifest.json) re-built into the binary via embed.FS. TestLoadQuantManifest_EmbeddedFallback green with new values. Epic 3 of Sprint MODEL-DIST-001 now COMPLETE. Epic 6 (Tier 3 live e2e — `mdemg model pull` against the published tags + llama-server load on port 18102 + sanity inference) is now unblocked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 close-out per memory rule
(feedback_sprint_plan_format.md §11 — sprint plans live in
docs/development/<sprint-line>/ with the standard post.md companion).
Sections (CLAUDE.md sprint-plan section guidance):
- Outcome: 3 quants live on Ollama Library, mdemg model pull is the
canonical install path
- Process: how the plan held under reality (operator-surfaced no-
hardcoding rule revised the plan in-place to add the Configurability
Contract before code was written)
- Findings: 5 smooth parts + 5 friction items, both honest:
* convert_hf_to_gguf.py python deps gap (silent ModuleNotFoundError)
* mlx_lm.fuse adapter-path requirement
* convert_lora_to_gguf.py missing from brew install llama.cpp
(proximate Epic 2 deferral trigger)
* mdemg tsdb migrate CWD-aware .env loader quirk
* Epic 1 size estimates off vs registry-reported exact bytes
- Current state: per-layer state matrix
- Testing & benchmarking: all 3 tiers documented (Tier 3 e2e captured
V0021 rows for both pull + verify event_types — live-verified)
- Risks & opportunities (forward): MODEL-DIST-002 adapter scope, Sprint
B Grafana, cross-platform, HFFetcher slot, CWD-aware .env loader QoL
- Sprint commits: 9 commits on dev01, mapped to their epics
Closes Sprint MODEL-DIST-001 functionally. Operational sprint close
(v0.10.0 release tag + tap-repo doc updates) is a separate motion.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…shed main) PR #385 squash-merged the original Epic 3 quant_manifest values (estimated sizes from llama-quantize wall output, null ollama_manifest_digest because the push hadn't happened yet) into main as commit f1d029a. Meanwhile on dev01, commit 87293f8 (Epic 3 closeout) corrected those values to the registry-canonical state after the ollama push completed: - size_bytes: replaced Epic 1 approximations with registry-reported exact bytes (Q4_K_M 9001753408 / Q5_K_M 10514569568 / Q8_0 15698534208) - size_human: 9.0/11/16 GB -> 8.4/9.8/14.6 GB (more accurate) - ollama_manifest_digest: null -> sha256:a210cccb...|ae6e54fe...|93df4d64... - status: "local-create done; push pending" -> "published (...)" Conflict resolution: keep dev01 (HEAD) values for both files — those are the registry-canonical post-push state. JSON validity verified for both files; TestLoadQuantManifest_{EmbeddedFallback,OperatorOverride,OverrideMissingFile} all green against the resolved embedded manifest. The non-conflicting fast-forwarded changes from main (claude workflow edits + dependabot go.mod/go.sum bumps) are folded in by this merge unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Promote the Sprint MODEL-DIST-001 entry from Unreleased to v0.10.0 (2026-05-11) ahead of release.yml / goreleaser tag push. Fresh empty Unreleased section seeded above. v0.10.0 ships: - mdemg model pull|list|verify|remove|where — one-command path from brew install mdemg to a working local LLM - Pluggable ModelFetcher interface (Ollama in v1, slots for HF/S3/GHR/file) - 3 fused GGUF quants live on Ollama Library at reh3376/mdemg-llm-v1 (:Q4_K_M 8.4 GB / :Q5_K_M 9.8 GB / :Q8_0 14.6 GB) - 11-knob Configurability Contract (every operator-visible value dynamic) - TSDB V0021 model_install_events hypertable + writer - docs/features/local-model-distribution.md Adapter (LoRA-only) path deferred to MODEL-DIST-002 per the sprint plan's documented contingency (epic_2_forensic.md). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ce Model Distribution section
Stage 4 + Stage 5 of v0.10.0 release.
Submodule pointer bump:
packaging/homebrew-mdemg 6077097 -> c3aa68b
incorporates:
- 42d7390 — goreleaser auto-bumped mdemg.rb to version "0.10.0" + new
caveats text on v0.10.0 tag push
- c3aa68b — manual docs round-trip: CHANGELOG v0.10.0 entry,
README Optional Pull-the-local-LLM section in Quick Start (full
Ollama Library doc with quant matrix, list/verify/where/remove
subcommands, fork variants via MDEMG_MODEL_NAMESPACE, architecture
note "Ollama is distribution-only"), Upgrading to v0.10.0 +
What's New in v0.10.0 blocks, default-LLM rotation history extended,
mdemg_beta_testing.md version pin v0.9.0 -> v0.10.0
docs/user/cli-reference.md (per Stage 5 user request to align refs
with current codebase):
- New ## Model Distribution top-level section before ## Synergy
Optimization (model command group is GroupID="config" in root.go
but a top-level cli-ref section is cleaner for discoverability).
Documents all 5 subcommands (pull, list, verify, remove, where) with
flag tables, usage examples, the full Configurability Contract (11
knobs), the architecture note (Ollama is distribution-only).
- Updated Environment Variable Reference with new "Model Distribution
(Sprint MODEL-DIST-001, v0.10.0)" subsection — 11 env vars +
defaults table.
- Updated Command Tree Summary with the new model subcommand group
slotted between Configuration and Advanced.
docs/user/api-reference.md unchanged: Sprint MODEL-DIST-001 added zero
HTTP endpoints (CLI-only sprint; observability via TSDB V0021 row
writer is server-side internal). Audit also surfaced ~25 routes of
pre-existing drift between code and docs (mostly path-parameter
notation: `/v1/backup/` in code vs `/v1/backup/{id}` in docs — same
routes — plus 3 undocumented /api/graph/* endpoints and 2
undocumented /v1/admin/features/{restart,stop} actions). That drift
is out-of-scope for v0.10.0 and belongs in its own follow-up sprint.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
One-shot or interactive REPL chat against the configured LLM endpoint (default: llama-server at port 8102 per Phase 13.5). Closes the gap operators noted between `ollama run` and the mdemg framework. Two modes: - One-shot: `mdemg model run -p "hello"` or positional arg after `--` - Interactive REPL: no prompt; reads stdin line-by-line, accumulates conversation history across turns Pure stdlib HTTP (no llmclient retries/breakers/recording). CLI invocations are intentionally NOT recorded to llm_interactions — this is an ad-hoc exploration tool, not a production code path; keeping the training-data corpus clean. Every operator-visible value is dynamic per the no-hardcoding rule: --endpoint override cfg.EffectiveLLMEndpoint --model override cfg.LLMModel (final fallback: mdemg-llm-v1) --prompt/-p one-shot prompt (omit for REPL) --system/-s system message --temperature (default 0.7) --max-tokens (default 1024) --timeout (default 60s) Live-verified end-to-end on the operator's running llama-server on port 8102 with mdemg-llm-v1: one-shot worked; system+prompt with --model override worked. 13 unit tests in model_run_test.go covering: message composition (system first, no-system skip, history preservation), config resolution (flag > cfg > final fallback), OpenAI-compat HTTP shape, error paths (HTTP error, inline error object, no choices, timeout), trailing-slash endpoint normalization, body-bounding helper. All green. Renamed local body-bounding helper to `truncateRunBody` to avoid name collision with a same-named helper in internal/cli/data.go. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Audit of internal/api/server.go (167 routes) vs docs/user/api-reference.md
surfaced 19 genuinely missing endpoints. v0.10.0 commit noted this as
out-of-scope; this commit resolves the gap.
Audit method: extract mux.HandleFunc registrations from server.go, extract
documented "VERB /path" headings from api-reference.md, normalize both to
strip path parameters and trailing prefix slashes, diff. Of the initial
24-entry code-only set, 5 are false positives (combined headers like
"POST /v1/admin/features/start|stop|restart" cover the individual verbs;
"GET|POST /v1/jiminy/protocol/metrics" covers both methods on one route).
Added sections:
Jiminy / J17 (10 endpoints, all under "## Jiminy Inner-Voice"):
GET|POST /v1/jiminy/protocol/metrics # snapshot + reset
GET /v1/jiminy/protocol/status # per-session J17 state
POST /v1/jiminy/checkpoint # tier-transition checkpoint
POST /v1/jiminy/resume-protocol # restore from checkpoint
POST /v1/jiminy/extension # operator-driven tier hold
POST /v1/jiminy/strict # toggle strict mode per session
POST /v1/jiminy/reformulate # advisory -> imperative rewrite
POST /v1/jiminy/classify # pre-Write/Edit pass/deny gate
GET /v1/jiminy/latest # most recent guidance (warm store)
POST /v1/jiminy/warm # eager cache warmup
Memory / Graph (3 endpoints, under "## Memory Operations"):
GET /v1/memory/graph/topology # node/edge counts per layer
GET /v1/memory/graph/neighborhood # local 1-3 hop walk
GET /v1/memory/spaces # root listing of all spaces
Observability (2 endpoints, under "## Metrics & Monitoring"):
GET /v1/metrics/trends # TSDB time-series query
GET /v1/prometheus # Prometheus scrape endpoint
Dashboard / Viz (4 endpoints, new "## Dashboard / Visualization (internal)"
section before MCP Server Tools — operator-internal endpoints backing the
browser dashboard at /ui/):
GET /api/graph/data # force-directed graph data
GET /api/graph/fields # schema field catalog
GET /api/graph/health # explorer health
GET /viz/topology # standalone HTML topology view
Each entry has handler-signature-derived request/response shape, query
parameter table, sample curl/JSON examples following the existing
api-reference convention. TOC updated with new "Dashboard / Visualization
(internal)" entry and renumbered tail.
Out of scope (deliberate, deferred):
- 28 "docs-only" entries from the audit are confirmed false positives
from prefix-matching path normalization (code registers /v1/memory/nodes/
with trailing slash and routes the suffix; docs spell out the full
/v1/memory/nodes/{node_id}/archive form correctly)
- /v1/symbols root path is partially covered by /v1/symbols/relationships
+ /v1/symbols/{id}/relationships in docs; root listing endpoint
documentation can land later if/when its handler grows specific shape
- /v1/conversation/observations covered indirectly by the flag-for-org
endpoint documentation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint GRAFANA-AUDIT-001 Epic 0. Builds the per-panel audit harness:
walks every panel in deploy/docker/grafana/dashboards/*.json, extracts
rawSql/sql targets, substitutes Grafana macros (\$__timeFilter,
\$__timeFrom/To, \$__interval, \$__unixEpoch*) + template variables
(\$space_id, \$instance + multi-value variants like \${space_id:raw}),
executes via docker exec mdemg-timescaledb-1 psql, classifies each
panel target as PASS / EMPTY / FAIL / SKIP.
Tier 1 unit tests (17 tests, all green):
- Template-variable substitution: time_filter / from-to / unix epoch /
interval / interval_ms / space_id (3 syntaxes) / instance (3
syntaxes) / multi-macro composite query
- Table extraction (FROM/JOIN with alias, case-insensitive, no-table)
- Panel walking (flat, nested rows, targets-with-sql vs no-sql)
Smoke test against mdemg-overview.json IMMEDIATELY validated the
operator's "diminished observability" report — 5 of 13 panels FAIL,
1 EMPTY, 7 PASS on the front-page dashboard:
FAIL Request Rate
FAIL Error Rate
FAIL Circuit Breakers
FAIL Requests by Status
FAIL Rate Limit Rejections
EMPTY Request Latency Distribution (t0; t1/t2 PASS)
The original 11-panel sample missed these because it sampled different
panels. Lesson: trust the rigorous audit, not the sample. Sprint
proceeds to Epic 1 (full audit across all 146 panels) immediately.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint GRAFANA-AUDIT-001 Epics 1 + 2. Per-panel rigorous audit of all
165 target executions across 146 panels in 8 dashboards.
Headline:
PASS 125 (76%) — executes, returns rows in 24h window
EMPTY 19 (12%) — executes, 0 rows
FAIL 3 (2%) — SQL error
SKIP 18 (11%) — non-SQL panel types
Harness fix mid-Epic-1: \$__interval substitution was wrapping the
value in quotes, but Grafana convention has panel SQL provide its own
outer quotes — producing doubled quotes and 18 false-positive FAILs.
Fixed: substitute bare value. Verified by re-run: 20→3 FAILs.
Real failures (Epic 2 findings):
(a) 3 SQL bugs on mdemg-llm-routing.json — all three panels hardcoded
`mdemg-dev` (unquoted) in WHERE clauses instead of '\$space_id'
template variable. PG parses `mdemg-dev` as subtraction.
(b) 5 schema-drift EMPTYs — panel filter expects metric_type or labels
shape that doesn't match server emission:
- mdemg_j17_events_total: panel 'counter', server 'gauge'
- mdemg_rsic_action_total: panel status='success', server status='completed'
- 2 more suspected pending full-SQL inspection.
(c) 2 missing-server-side metrics — mdemg_rate_limit_rejected_total
and mdemg_http_request_duration_seconds_p50 not emitted. Will be
documented; server emission is follow-up.
(d) ~11 sparse-data EMPTYs — panel SQL correct, no rows in 24h window.
Widening time-range in Epic 4.
Projected post-Epic-3/4: 133 PASS, ≤11 EMPTY, 0 FAIL.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint GRAFANA-AUDIT-001 Epic 3. Minimum-change JSON edits to fix
category (a) SQL bugs and category (b) schema-drift EMPTYs identified
in Epic 1/2.
mdemg-llm-routing.json (3 panels, all category-a SQL bugs):
- LLM call distribution by model_name (24h)
- LLM latency p50 / p95 / p99 by task × model
- LLM error rate % by task_name (selected range)
Bug: WHERE clause was `(\$space_id = '' OR space_id = '\$space_id')` —
the first \$space_id was unquoted, so PG parsed `mdemg-dev = ''` as
`column "mdemg-dev"` which doesn't exist. Also breached the
no-hardcoding rule (memory: feedback_no_hardcoded_values.md).
Fix: wrap the first variable reference in quotes → `('\$space_id' =
'' OR space_id = '\$space_id')` — a proper string-literal comparison
that also serves as the All-spaces guard the panel author intended.
Verdict: 3 FAIL -> 3 PASS. mdemg-llm-routing is now 4/4 PASS.
mdemg-j17.json :: Total Events (1 panel, category-b drift):
Panel filtered `metric_type = 'counter'` (Prometheus naming
convention because metric is `mdemg_j17_events_total`). Server
actually emits `metric_type = 'gauge'`. 6,393 rows in 7d; 0 panel
matches. Fix: align panel filter to `'gauge'`.
Verdict: EMPTY -> PASS.
mdemg-rsic.json :: Action Success Rate t0 (1 panel target, category-b
drift):
Panel filtered `labels->>'status' = 'success'`. Server actually
emits `'completed'` (181 rows in 24h; 0 panel matches). Fix: align
panel filter to `'completed'`. The t1 'failed' target retained
unchanged — its EMPTY result is now accurate observation (server
emits no `'failed'` actions; 0 = legitimate zero).
Verdict: 1/2 EMPTY -> PASS, 1/2 EMPTY accurate-zero.
Audit verdict counts:
Before: 125 PASS, 19 EMPTY, 3 FAIL, 18 SKIP
After: 130 PASS, 17 EMPTY, 0 FAIL, 18 SKIP
Remaining 17 EMPTYs (Epic 4 disposition):
- 5 category-c emission regression — 4 rsic metrics stopped at
2026-05-07/08 (server-side investigation queued as follow-up)
- 2 category-c never-emitted — Rate Limit Rejections, p50 latency
- 8 category-d sparse-data on ft-training — widen time-range
- 1 mdemg-jiminy :: Effectiveness Trends — CTE pending inspection
- 1 mdemg-rsic :: Action Success Rate t1 (accurate-zero)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint GRAFANA-AUDIT-001 closeout (Epics 4 + 5 + 6 + 7 combined as a single doc-only commit; Epic 5 deferred and Epic 6 deferred-to-operator as documented in post.md). New: docs/features/observability-dashboards.md (286 lines) — full operator-facing inventory of the 8 dashboards with: - Per-dashboard purpose + panel count + primary tables - Audit verdict table (130/17/0/18 post-Epic-3) - Epic 3 fix log: 3 SQL bugs + 2 schema-drift filters - Known gaps in 3 buckets: (c) emission regression (4 May-7-8 metrics, current codebase has zero refs — server removed emission), (c) never-emitted (mdemg_rate_limit_rejected_total + mdemg_http_request_duration_seconds_p50), (d) sparse/zero data on this dev TSDB (ft-training tables) - Refresh expectations per table - Operator playbook for re-running scripts/grafana_panel_audit.py - Forward-looking: CI integration, coverage expansion, server-side emission restore New: docs/development/grafana-audit-001/post.md — sprint close per memory rule, covers process / smooth-parts / friction / sprint-plan vs reality / current state / risks-opportunities / commits. Epic deferrals (documented in post.md): - Epic 5 (coverage expansion for 11 unused TSDB tables): deferred because most target tables are zero on this dev TSDB. Adding panels would create more EMPTYs, defeating the goal. - Epic 6 (Tier 3 browser e2e): deferred to operator; not blocking. CHANGELOG Unreleased entry covers the sprint at high level + cross- references the feature doc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-002 picks up the adapter-only path deferred from MODEL-DIST-001 Epic 2. Resolves the tooling gap documented in epic_2_forensic.md. Workspace prep: - Vendored convert_lora_to_gguf.py from llama.cpp source (master, pinned 2026-05-21) into scripts/vendor/llama_cpp/ with MIT LICENSE attribution and a README documenting refresh policy. brew install llama.cpp ships convert_hf_to_gguf.py but NOT convert_lora_to_gguf.py; vendoring is the cleanest path (vs requiring operators to clone llama.cpp source). - pip install peft==0.19.1 + accelerate==1.13.0 + psutil==7.2.2 into neural/.venv (the same venv that has torch + transformers + gguf from MODEL-DIST-001 Epic 1). PEFT is needed for PEFT-format schema validation + as a dependency of convert_lora_to_gguf.py. - Inspected convert_lora_to_gguf.py — expects directory with adapter_config.json + adapter_model.safetensors in PEFT layout. Confirms the MLX → PEFT direction is `lora_A: (rank, input)` and `lora_B: (output, rank)` (script line 41-42 docstring). Sprint plan in 12-section v1.0 format. 7 epics, 1-2 dev-day estimate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e verify
Sprint MODEL-DIST-002 Epics 1, 2, 3 (combined commit).
Epic 1 — MLX → PEFT converter (scripts/mlx_adapter_to_peft.py + 14 unit tests):
Reads adapters/tier1/adapters.safetensors (514 MB MLX format, 560 tensors,
Phase 5 SFT Iter 2400 best). Per the analysis in MODEL-DIST-001
epic_2_forensic.md:
Key rename: model.layers.<N>.<module>.lora_a
-> base_model.model.model.layers.<N>.<module>.lora_A.weight
Tensor transpose: lora_a (input,rank) -> (rank,input)
lora_b (rank,output) -> (output,rank)
Emits PEFT-format adapter_config.json + adapter_model.safetensors.
Single-adapter PEFT layout (.lora_A.weight, not .lora_A.default.weight)
required by convert_lora_to_gguf.py.
Epic 2 — PEFT → GGUF LoRA (scripts/vendor/llama_cpp/convert_lora_to_gguf.py):
Pinned to llama.cpp release b9000 (self-contained version; upstream master
refactored to a conversion/ Python package with 30+ model files, excessive
vendoring scope). README documents refresh policy.
Output: .local-models/mdemg-llm-v1-adapter.gguf
SHA256: 0cfaf4bae3215a4aea664a8d28ae9a41d73ee740cbcce5c2eef950232cfe1de5
Size: 257 MB (vs ~9 GB fused Q5_K_M; ~35x smaller download)
Tensor count: 560 (matches expected 40 layers x 7 target_modules x 2)
Epic 3 — Live verification (docs/development/model-dist-002/verification.md):
Side-port llama-server on 127.0.0.1:18103 with f16 base + adapter; sanity
prompt vs production 8102 fused model returns semantically-aligned outputs
on the same prompt — both describe MDEMG as a knowledge-graph memory
system. Confirms the MLX-PEFT-GGUF chain is structurally correct.
Iteration during Epic 2 (worth noting):
- Initial vendored convert_lora_to_gguf.py from upstream master failed
with ImportError (refactored to use conversion/ package). Pinned to
b9000 release which is self-contained.
- Initial PEFT keys used .default.weight suffix (multi-adapter layout);
convert_lora_to_gguf.py rejected with \"Not a lora_A or lora_B tensor.\"
Switched to single-adapter layout (.weight) which the script accepts.
Test results: 14/14 Tier 1 tests green; PEFT output loads via
peft.PeftConfig.from_pretrained; GGUF emission completes with all 560
tensors; runtime adapter application produces coherent outputs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ENTGRAPH-001 Epic 3) ApplyCoactivation Cypher RETURN clause extended from "count(*) AS updated" to 17 per-pair columns: src/dst node IDs, prev/new/delta weight, evidence_count_after, eta_effective (cfg.LearningEta × etaMult), surprise_factor, activation_product, path_sim, role_a/b, obs_type_a/b, session_id, direction (forward/reverse/bidirectional), created_new_edge. created_new_edge derived from (r.evidence_count = 1) — the ON CREATE branch sets evidence_count to 1; ON MATCH increments. Reliable proxy for "new connection formed" vs "existing connection strengthened" at analysis time. Plan-deviation disclosure (per feedback_plan_options_pattern.md): the plan called for 2 rows per pair in asymmetric mode (forward + reverse). The Cypher mirrors rr.weight = r.weight at all times — forward and reverse edges carry identical weights. Emitting 2 rows would double- count without adding signal. Final choice: 1 row per logical pair regardless of mode, with the direction column carrying the forward/reverse/bidirectional distinction. Revisit if EVENTGRAPH-003 introduces a Hebbian path where forward/reverse weights diverge. New helper internal/learning/reinforcement_parser.go translates a neo4j.Record (or any (key) → (any, bool) getter) into a tsdb.ReinforcementEventRow. Lives in its own file so service.go doesn't grow. Defensive against missing keys (zero values), nil values (zero/empty), wrong types (fallback to zero) — no panics. Tier 1 unit tests (6 green) cover: - Symmetric bidirectional + ON CREATE branch - Asymmetric forward + ON MATCH branch (evidence > 1) - Missing optional fields → zero values (nullable* writer helpers serialize as DB NULL) - Neo4j int64 → Go int coercion - nil values → zero/empty - Wrong-typed values → graceful fallback Reinforcement rows are captured locally in ApplyCoactivation but not yet forwarded to TSDB — Epic 4 wires the writer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pic 4) learning.Service grows a reinforcementWriter field + SetReinforcementWriter setter (mirrors the SetStabilityReinforcer back-compat pattern). After ExecuteWrite returns from ApplyCoactivation, each captured per-pair row gets the spaceID stamped on it and is enqueued via writer.Record. The writer is non-blocking; the Hebbian hot path never waits on TSDB. Configurability Contract — 7 new env vars (no-hardcoding rule): - EVENTGRAPH_ENABLED (bool, default true) - EVENTGRAPH_WRITER_FLUSH_INTERVAL_SEC (int, default 30, floor 5) - EVENTGRAPH_WRITER_BUFFER_SIZE (int, default 1000, 0 = unlimited) - EVENTGRAPH_MAX_PAIRS_PER_EVENT_BATCH (int, default 200) - EVENTGRAPH_MAX_EVENTS_PER_QUERY (int, default 500, Epic 5 ceiling) - EVENTGRAPH_FEDERATION_DEFAULT_HOPS (int, default 2) - EVENTGRAPH_FEDERATION_DEFAULT_LOOKBACK_HOURS (int, default 24) api/server.go wires the writer's lifecycle: - Constructed after TSDB client is ready, gated by cfg.EventGraphEnabled so EVENTGRAPH_ENABLED=false cleanly skips construction; learner's reinforcementWriter stays nil and the Hebbian path short-circuits. - Closed alongside the other TSDB writers in graceful-shutdown — buffer drains before the process exits. Tier 2 integration tests (against real TSDB, build tag integration): - TestEventGraph_Writer_RoundTrip: 3 rows recorded → flush-window elapses → SELECT count(*) returns 3. - TestEventGraph_Writer_DrainOnClose: 5 rows recorded with 1-hour flush interval → Close() drains → SELECT returns 5 (verifies the server shutdown invariant). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…001 Epic 5)
internal/eventgraph/query.go — Pattern Y1 federation helper.
EventsInGraphNeighborhood orchestrates a two-step query:
1. Cypher graph walk from a seed node — variable-length path over
CO_ACTIVATED_WITH | GENERALIZES at depth 0..Hops. Returns the
N-hop neighborhood (DISTINCT node_ids, includes the seed).
2. TSDB query against reinforcement_events for events where src OR
dst is in the neighborhood, within the lookback window, ordered
newest-first, capped at the configured limit.
3. Go-side join — annotates events with SrcInNeighborhood /
DstInNeighborhood so the consumer can distinguish "both endpoints
in the subgraph" from "one endpoint outside the seed's N-hop
reach but the event still touches our subgraph."
Empty neighborhood (no seed match, hops=0) short-circuits before the
TSDB call. Sub-1-second Since values clamp to 1s. Hops < 0 is rejected
upfront. The handler enforces an additional ceiling of 2 ×
EVENTGRAPH_FEDERATION_DEFAULT_HOPS for runaway-walk protection.
internal/api/eventgraph_handler.go — POST /v1/eventgraph/reinforcement-
neighborhood. Same auth convention as /v1/admin/breakers. 503 when
EVENTGRAPH_ENABLED=false or when eventgraphService is nil (TSDB-down at
boot). 400 on missing space_id / seed_node_id / negative hops / hops >
ceiling. Defaults applied from config when fields omitted from request.
Plan-decision disclosure (per feedback_plan_options_pattern.md): plan
proposed Option A (single endpoint with event_type query param) vs
Option B (endpoint per event class). Final choice: A. v1 has one event
class (reinforcement); the endpoint URL is explicit about that.
EVENTGRAPH-002 can either add a query param or split the URL when a
second event class arrives — no breaking change either way.
Tests:
- Tier 1 (internal/eventgraph/query_test.go, 7 green): request
validation rejects empty space_id, empty seed, negative hops; interval
formatting roundtrips; join annotation handles both-inside,
one-outside, and empty-neighborhood cases.
- Tier 1 (internal/api/eventgraph_handler_test.go, 4 green + 2 skipped):
method-not-allowed, feature-disabled 503, nil-service 503, invalid-
JSON short-circuit. Two validation paths skipped — they require a
non-nil eventgraphService which can't be constructed without a real
driver; Tier 2 exercises them.
- Tier 2 (tests/integration/eventgraph_federation_test.go, 1 green):
builds seed--mid--leaf graph + off-node, emits 3 reinforcement
events touching all four nodes, calls federation at hops=0 and
hops=1, asserts neighborhood + in-neighborhood flags. The hops=0
test confirms that mid↔leaf (touching neither seed nor any 0-hop
neighbor) is correctly excluded.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ement events (EVENTGRAPH-001 Epic 6) Three new Prometheus counters mirror the V0022 writer's internal atomic counters: - mdemg_eventgraph_writer_rows_enqueued_total — rows successfully CopyFrom'd - mdemg_eventgraph_writer_rows_dropped_total — rows FIFO-evicted (buffer full) - mdemg_eventgraph_writer_flush_failure_total — flush errors Wiring: the writer accepts a narrow PrometheusCounter interface (Add(int64)) so internal/tsdb does not import internal/metrics (which would cycle). api/server.go calls SetPrometheusCounters after the writer is constructed, passing the three counters from the global StandardMetrics struct. Nil-safe. Dashboard: mdemg-graph-topology.json gains a new collapsed row "Reinforcement Events (EVENTGRAPH-001)" with a single time-series panel "Reinforcement Event Rate (events/min)" showing all three rates (enqueued / dropped / flush failures) over the last 24h. Dropped is colored orange, flush failures red, enqueued the default palette. Tied to the prometheus datasource. The existing GRAFANA-AUDIT-001 harness (scripts/grafana_panel_audit.py) only evaluates SQL-target panels — the new panel uses Prometheus queries, so it lands on the SKIP pile, same as the other 8 Cypher / Prometheus panels on this dashboard. Audit JSON refreshed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Epic 6's targeted audit run (scripts/grafana_panel_audit.py --dashboard mdemg-graph-topology.json) overwrote the full multi-dashboard audit results from GRAFANA-AUDIT-001 with the single-dashboard subset (9 SKIPs only). Restoring the full snapshot from commit 0a1e8e1 — that audit covered all 8 dashboards and is the canonical baseline the GRAFANA-AUDIT-001 post.md references. EVENTGRAPH-001 did not need to regenerate it; the new panel uses Prometheus queries, which the audit harness SKIPs regardless of dashboard. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…fix-commit)
ScoreAndRankRRF's ConsensusResult → RetrieveResult conversion was
silently dropping the Activation field. The legacy ScoreAndRank path at
scoring.go:883 sets Activation: a (where a := act[c.NodeID] is the
spreading-activation map value). The RRF path constructed
models.RetrieveResult{...} with no Activation key, leaving the field
zero-valued.
Net effect: since Phase 13.1 default-on (2026-05-03),
learning.Service.ApplyCoactivation has filtered out every L0 candidate
on the retrieve hot path. The filter is r.Activation >=
LearningMinActivation (default 0.20). With Activation=0, no pair makes
it to the Hebbian Cypher; the function returns nil without writing.
Hebbian learning has been silently no-op on the production retrieve
goroutine for ~24 days. CO_ACTIVATED_WITH edges still exist in the
graph — sidecar paths (CoactivateSession, ApplySymbolCoactivation,
consolidation walks) and pre-Phase-13.1 retrieves wrote them — but the
retrieve-time goroutine has been a silent no-op.
Discovered during EVENTGRAPH-001 Epic 7 live e2e. Three retrieves
produced 0 rows in reinforcement_events. Investigation traced the gap
to the missing Activation field.
Fix: one-line addition in scoring_rrf.go — Activation: act[c.NodeID].
Brings the RRF path to parity with the legacy scorer.
Post-fix verification: rebuilt, restarted server, re-issued 3 retrieves
→ 10 reinforcement events landed in TSDB. Federation API at hops=1
correctly returned all 10 with src_in_neighborhood=true,
dst_in_neighborhood=true. Documented in
docs/development/eventgraph-001/verification.md.
Per CLAUDE.md "Testing — Live System Testing Is Required":
"surprise bugs caught during live smoke get their own follow-up
fix-commit — do not silently roll them into the sprint commit." This
is the precedent-aligned separate commit.
Forward-only: existing graph state is preserved; new retrieves now
correctly emit Hebbian updates. EVENTGRAPH-002 may revisit whether to
backfill the missing 24-day window.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Real /v1/memory/retrieve × 3 against mdemg-dev → 10 reinforcement events landed in TSDB within the flush window. Federation API at hops=1 from a seed node returned 5-node neighborhood + 10 in-neighborhood events. Documents the surprise-bug discovery + fix that preceded this transcript (see fix-commit for scoring_rrf.go::ScoreAndRankRRF Activation propagation). Acceptance criteria from sprint plan §"Acceptance Criteria" all PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ose (Epic 8) Final epic — Documentation Update (never cut, per feedback_per_feature_docs_required.md and the standardized v1.0 sprint plan format). New: docs/features/event-graph-federation.md (~240 lines, Why / Choices / How it works / How to use / Forward-looking). Documents: - Pattern Y1 vs Y2 trade-off (why federation-in-Go now, link-node reification deferred until a query forces it) - Why V0019 buffered-CopyFrom over V0021 sync-INSERT (per-retrieve volume) - Why ApplyCoactivation first (other 3 Hebbian entry points deferred to EVENTGRAPH-003) - Why forward-only (no source to backfill from) - Federation pipeline (Cypher walk → TSDB query → Go-side join with src/dst_in_neighborhood annotation) - TSDB schema, API request/response shape, 7 env vars + defaults - Observability (3 Prometheus counters + Grafana panel) - Forward-looking sprints New: docs/development/eventgraph-001/post.md — epic-by-epic outcomes, acceptance criteria check-off, surprise log (RRF Activation drop + audit-JSON overwrite + orphan-process port collision), plan deviations disclosed (1-row-per-pair regardless of asymmetric mode; single- endpoint over endpoint-per-class), forward-looking. CHANGELOG.md Unreleased gains the EVENTGRAPH-001 entry — 11 bullet points covering V0022 migration, buffered writer, Cypher RETURN-shape change, Configurability Contract, federation helper + API, Prometheus + Grafana, Tier 2 + Tier 3 verification, the surprise-bug RRF Activation fix-commit, and the audit-JSON restore. CLAUDE.md Architecture Notes gains a new "Event Graph Federation" entry above the Model Distribution section. Documents the pattern, surface, deferrals, and the load-bearing fix-commit f307f55 that surfaced 24 days of silent Hebbian no-op on the retrieve hot path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Prometheus datasource
The Epic 6 panel used datasource {type: prometheus, uid: prometheus} but
this Grafana instance has no Prometheus datasource configured — mdemg
exposes counters as JSON via /v1/metrics/snapshot, not a /metrics scrape
endpoint. Configured datasources: mdemg-nodegraph, neo4j, timescaledb
only. The panel rendered "No data" in the live Grafana.
Rewritten panel queries the reinforcement_events hypertable directly via
the timescaledb postgres datasource. Two targets:
1. count(*) over 1-minute time_buckets → overall events/min
2. count(*) FILTER (WHERE created_new_edge) vs WHERE NOT created_new_edge
→ split between new connections formed and existing connections
strengthened (the operational dimension the analytic queries
actually need)
Both targets templated on $space_id (existing dashboard variable). The
Prometheus counters (mdemg_eventgraph_writer_rows_{enqueued,dropped,
flush_failure}_total) remain wired and incrementing — they surface via
/v1/metrics/snapshot for ops scripts. The Grafana panel now actually
displays data instead of relying on a scrape path that doesn't exist
in this deployment.
Discovered during post-merge live verification (2026-05-29). Verified
fix: reloaded dashboard via Grafana API → /api/ds/query against same
SQL returns 1-minute buckets matching TSDB direct count. Audit harness
now reports 2 PASS for the new panel (previously SKIP — no SQL target).
verification.md updated with the post-merge transcript.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# Conflicts: # deploy/docker/grafana/dashboards/mdemg-graph-topology.json # docs/development/eventgraph-001/verification.md
P0 fix. The Jiminy guidance->feedback->outcome loop has been dormant ~9 weeks: consulting/service.go gates constraint/suggestion extraction on hardcoded legacy-scale score thresholds (r.Score < 0.55 et al.). Phase 13.1 RRF (default-on May 3) dropped the score scale so strong matches top out ~0.53 -> 0/10 results clear the gates -> empty guidance -> dead loop. Third instance of the RRF-score-contract bug class (after the EVENTGRAPH-001 Activation drop). 12-section format; 6 epics; config-driven percentile-gate fix + sigmoid recalibration; live-verify the revived loop end-to-end. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Full-repo sweep of post-RRF score/activation/confidence consumers + live score-distribution sampling. Findings: - HIGH (4): consulting constraint gates (1005/1081/1087) + confidence sigmoid midpoint 1.5 (35-36) — the loop-killer cluster. - MED (5): consulting conflict gates (931/944/957/981) + minConfidence pre-filter (619, already config-driven). - LOW (3): retrieval/jiminy.go Activation display gates (45/155/192) — explanation text only, no guidance gating. - NONE (2): jiminy trial score (0-10 scale), trust-score clamp. Live distribution: RRF strong-match top scores cluster 0.49-0.58; the 0.55 gate sits mid-band, rejecting the most-relevant constraint half the time. NormalizedConfidence is positional rank (spreads 100->0 even on uniform-score sets) -> rules out plan Option A (percentile) as sole gate. Remediation: config-driven RRF-calibrated absolute thresholds (Option B), constraint floor default 0.45, sigmoid midpoint ->0.45. Disclosed deviation per feedback_plan_options_pattern. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…SCALE-001 Epic 2)
Revives the dormant Jiminy guidance loop. Replaces 7 hardcoded legacy-
scale score gates in consulting/service.go + the score->confidence
sigmoid (both copies) with config-driven, RRF-calibrated values.
Gates (all default 0.45, RRF strong-match band is 0.49-0.58):
- constraint extraction (was <0.55) -> CONSULTING_CONSTRAINT_SCORE_FLOOR
- keyword/name authority inner gate (0.55/0.6) -> CONSULTING_AUTHORITY_SCORE_FLOOR
- conflict/contradiction detection (0.6-0.7) -> CONSULTING_CONFLICT_SCORE_FLOOR
Key Epic-2 finding: keywordClassifyConstraint has an INNER authority
gate that binds tighter than the outer constraint gate. If authority
floor > constraint floor, the binding gate re-rejects the strong-match
band and the loop stays dormant -> all three default to 0.45. The RRF
band is too compressed to subdivide into tiers; knobs stay separate so
operators can raise any one independently.
Sigmoid (score->confidence), both consulting/service.go and
jiminy/retrieval_source.go (they MUST stay in sync per their own
comments): midpoint 1.5 -> 0.45, steepness 1.5 -> 8.0, config-driven via
RETRIEVAL_CONFIDENCE_SIGMOID_{MIDPOINT,STEEPNESS}. Legacy crushed a
strong 0.5 match to 0.18 confidence; recalibrated maps it to 0.60
(0.1->0.06, 0.58->0.74). normalizeRetrievalConfidence is now a Service
method reading cfg with zero-value fallback; mapRetrievalToGuidance
takes the sigmoid params from its caller's cfg.
5 new config knobs, all with RRF-calibrated defaults + zero-value
guards (no-hardcoding rule; the bug WAS a hardcoded value).
Tier 1 tests: updated 2 legacy-scale boundary tests to the new
thresholds + added RRFStrongMatchBand regression (0.50 must surface),
ConstraintFloor_ConfigDriven (override honored), and
NormalizeRetrievalConfidence_RRFCalibration (band mapping). Full
consulting + jiminy + config suites green; lint clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
retrieval/jiminy.go Activation display gates (45/155/192 + LearningEdge siblings) traced live: they're in the explainability renderer, not the guidance-surfacing path; always-additive at RRF scale (live activation ~0.723 >> thresholds), no misbehavior. Intentionally left unchanged with rationale — config-ifying display verbosity is out of proportion to zero functional impact. Every High/Med remediated (Epic 2), every Low decided. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pic 4) Tier 3 live e2e (verification.md): the score-gate fix revives the dormant guidance loop on the live stack — - /v1/jiminy/guide guidance items 0 -> 10, source_counts.constraints 0 -> 2, patterns 0 -> 3 (acceptance #1 MET). - Full loop warm->latest->feedback->outcome: TSDB constraint_outcomes sink REVIVED — fresh rows dated 2026-06-03 (table was dead since May 1). Constraint-effectiveness Grafana sink is live again. Three adjacent issues surfaced during live smoke, documented as distinct follow-ups (NOT score-scale, not bolted on): - A: Neo4j GUIDANCE_OUTCOME edges still dormant — guidance SourceNodes point at emergent_concept nodes; PersistGuidanceOutcome only writes edges for constraint/correction/pattern/learning or role_type= constraint targets. Node-type-targeting bug, independent of RRF. Candidate sprint JIMINY-OUTCOME-001. - B: LLM guidance synthesis timeout (now that synthesis runs). - C: /v1/jiminy/latest unescaped control chars break jq/json parsers — the hook uses jq, so may compound dormancy. Low-effort follow-up. Tier 2 (rrf_scale_guidance_test.go, integration tag, 2 green): - SuggestSurfacesGuidance: constraint-matching context surfaces 7 suggestions (was 0 before fix) against live mdemg-dev. - SuggestRejectsNoise: gibberish does not flood constraints (no over-correction). Cold-start note: first guide call post-restart returned constraints:0 (LLM classifier cold-model timeout -> keyword fallback); after one warm-up call, constraints surface. Model-warmth artifact, not a fix defect. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t.md (Epic 5) Final epic. CHANGELOG Unreleased gains the RRF-SCALE-001 Fixed entry. CLAUDE.md gains a 'score-scale contract' architecture note — the structural defense against a 4th instance: downstream consumers MUST NOT hardcode absolute thresholds against RetrieveResult.Score (the scorer scale is not a stable contract); gate via config or a scale-invariant signal, and re-audit on any scorer change. Notes that NormalizedConfidence is positional (not a safe sole gate) and records the three open follow-ups. post.md: epic-by-epic, acceptance check-off (honest: #2 partial — TSDB sink revived, Neo4j edge is distinct Follow-up A), scope note separating the score-scale fix (done) from the 3 adjacent surfaced issues (documented follow-ups), discipline notes (cold-start mask, inner authority gate). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ment (CI fix) CI failure on PR 404: TestRRFScale_SuggestSurfacesGuidance failed in 0.02s. Root cause: the test assumed the populated local mdemg-dev space (111 constraint nodes), but CI boots a FRESH EMPTY Neo4j with stub embeddings (and RETRIEVAL_COLUMN_VOTING_ENABLED=false / legacy scorer). With no data, /v1/memory/suggest returns 0 candidates, so the 'total == 0' assertion fired. Other integration tests self-seed data or skip when prerequisites are absent; mine relied on ambient data — wrong for a reproducible CI run. Fix: skip when debug.retrieved_count == 0 (no retrievable data → the score-gate fix isn't exercisable; there's nothing for the gate to admit or reject). The test stays meaningful against a populated stack (local: 9 suggestions from 15 retrieved → PASS) and skips cleanly in CI's empty-DB environment. Verified both paths live: populated → PASS, empty space → retrieved_count 0 → SKIP. The gate fix itself is validated by Tier 1 unit tests + the live Tier 3 e2e (docs/development/rrf-scale-001/verification.md); this integration test is a bonus live-stack assertion, not the primary proof. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… sink Follow-up A from RRF-SCALE-001: the Neo4j GUIDANCE_OUTCOME edge sink has been dormant since Apr 12. Root cause: matchConstraintCode links guidance items to constraint codes by keyword overlap (>=3 shared words), but retrieval surfaces emergent_concept abstractions whose content does not share 3+ literal words with raw constraint text -> no constraint_code -> PersistGuidanceOutcome falls back to the concept SourceNode -> the role_type=constraint filter rejects it -> no edge. Live-proven: all 17 recent outcome rows had constraint_code=(none). Fix (Option 1): switch the matcher to embedding cosine similarity (content already normalized to natural language ~0.70 cosine; Service has an embedder; cosineSimilarity + embed->cosine pattern already exist in-package via OutcomeClassifier). Existing PersistGuidanceOutcome + findConstraintNodeID then create edges on the correct constraint nodes. Keyword matching stays as fallback -- never regresses. 4 epics; ~1-1.5 dev-days; config-driven threshold; acceptance bar = a fresh Neo4j GUIDANCE_OUTCOME edge on a real role_type=constraint node dated today, reflected in GetConstraintEffectiveness. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…UTCOME-001 Epic 1) Revives the Neo4j GUIDANCE_OUTCOME edge sink (dormant since Apr 12). Root cause (RRF-SCALE-001 Follow-up A): matchConstraintCode links guidance items to constraint codes by keyword overlap (>=3 shared words), but retrieval surfaces emergent_concept abstractions whose content rarely shares 3+ literal words with raw constraint text -> no code -> PersistGuidanceOutcome falls back to the concept SourceNode -> the role_type=constraint filter rejects it -> no edge. Fix: new matchConstraintCodeByEmbedding queries the constraint vector index (db.index.vector.queryNodes, role_type=constraint, sim >= threshold) and returns the closest constraint's code. Guide() tries this first, falling back to the keyword matcher when the embedder is unavailable, content is empty, or nothing clears the threshold — never regresses. The existing PersistGuidanceOutcome + findConstraintNodeID then create the edge on the correct constraint node. Implementation refinement vs plan: uses Neo4j's vector index server-side (mirrors the proven Evaluator.findMatchingConstraints pattern) rather than loading all constraint embeddings into Go and computing cosine in a loop — cleaner, no constraintCodeEntry.Embedding needed. Same Option-1 outcome. Config: JIMINY_CONSTRAINT_CODE_SIM_THRESHOLD (default 0.55, zero-value fallback) — provisional; tuned against the live similarity distribution in Epic 2. Tier 1 (4 tests): nil-driver/empty-embedding guards, threshold default resolution, keyword-fallback non-regression. Full jiminy + config suites green; lint clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…on (Epic 2)
Tier 3 live e2e (verification.md) — acceptance bar MET:
- /v1/jiminy/guide now yields guidance items carrying constraint_codes
(10 items, 6 coded; was 0). Matched code 'no-direct-main-commits' is
semantically exact for the 'commit to main' context.
- Full warm->latest->feedback loop: Neo4j GUIDANCE_OUTCOME 893 -> 899
(+6), latest today. All 6 new edges land on REAL role_type=constraint
nodes ('CONSTRAINT: NEVER commit directly to main') — not
emergent_concept. The sink dormant since Apr 12 is revived on the
correct nodes.
- /v1/constraints/effectiveness reflects it: 'NEVER commit directly to
main | surfaced: 30 followed: 28 rate: 0.93'.
- Both sinks now revived: TSDB (RRF-SCALE-001) + Neo4j (here). The
constraint-effectiveness loop is fully restored.
Threshold 0.55 validated live: correct matches, no false positives.
Tier 2 (jiminy_outcome_test.go, integration tag, skip-on-empty): PASSES
on a populated stack with an idle LLM (7/10 items coded). The guide path
is LLM-latency-dependent (per-node classifier ~31s/call, serialized; a
call fired while the LLM is busy fast-fails empty), so the test
warm-retries and SKIPS (never false-fails) when the LLM path can't
produce items. Bonus check; Tier 3 is the definitive proof. The LLM
serialization/synthesis-timeout is RRF-SCALE-001 Follow-up B, tracked
separately.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Final epic. CHANGELOG Unreleased gains the JIMINY-OUTCOME-001 Fixed entry. CLAUDE.md gains a guidance-outcome constraint-code-matching note (embedding-first via vector index, keyword fallback; both outcome sinks now live). post.md: epic-by-epic, acceptance check-off, the loop-revival completion (TSDB from RRF-SCALE-001 + Neo4j here), discipline notes (LLM serialization is the test-flakiness source), forward-looking (Follow-up B now the most operationally-visible remaining issue). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sprint JIMINY-OUTCOME-001 — SummaryCompletes the guidance-loop revival (Follow-up A from RRF-SCALE-001): the Neo4j Root cause
FixNew Implementation refinement vs plan: server-side vector index instead of in-Go cosine — cleaner, same Option-1 outcome. Config: Live verification (Tier 3 — the acceptance bar)
Tests
Still-open follow-ups (unchanged)
Details: |
Summary
Development branch changes from
reh3376_dev01.Commits
mdemg modelCLI + pluggable Fetcher interfaceAuto-generated PR from reh3376_dev01 push