dev: reh3376_dev01 -> main#412
Conversation
Promote the Unreleased CHANGELOG block to v0.9.0 (2026-05-06) ahead of release.yml / goreleaser tag push. New ### Breaking subsection captures two operator-visible cutovers since v0.8.5: (1) Phase 13.5 LLM runtime port 8101 -> 8102 + .env migration required; (2) Phase 13.6 MLX_* -> LLM_* env-var rename (legacy aliases retained for >= 1 release cycle). New ### Added entries: Phase 10.5 closeout (UBENCH framework promotion, commit 0389b49) and Claude Code GitHub App workflows (PRs #378, #379). All previously-Unreleased entries (Phase 14.2.3, 14.2.x, 14.1.x, 14, 13.6, 13.5, 13.2, 13.1) carried forward unchanged into the v0.9.0 block. Fresh empty Unreleased section seeded above. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps packaging/homebrew-mdemg pointer a235977 -> 6077097, which incorporates: - f9358cd Brew formula update for mdemg version v0.8.5 (goreleaser, prior) - b4a0d2c Brew formula update for mdemg version v0.9.0 (goreleaser, this release) - 6077097 docs: v0.9.0 -- CHANGELOG, README What's New, beta-testing version pin Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`config.FromEnv()` defaulted MdemgVersion/MdemgCommit to literal "0.6.0"/
"unknown" when MDEMG_VERSION/MDEMG_COMMIT envs were unset. Both /healthz
and /readyz serialize cfg.MdemgVersion, so they reported "0.6.0" forever
regardless of the actual binary's ldflags-injected cli.Version.
Fix: defaults to "" in config; cli/config_loader.go injects cli.Version /
cli.Commit (the build-time vars set by goreleaser ldflags) when the env
override is unset. Operators can still pin via MDEMG_VERSION env.
Live-verified: dev build (no ldflags) now reports {"version":"dev"} on
/healthz instead of the lying "0.6.0". Production builds via goreleaser
will report the real semver tag.
TestHandleHealthz unaffected (sets cfg.MdemgVersion directly).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a-server Phase 13.5 cutover (2026-05-03) replaced mlx_lm.server (port 8101) with llama.cpp llama-server (port 8102) as the production LLM runtime, but the embedded launchd plist template + service install code paths were never updated. Any operator running 'mdemg service install' from a fresh checkout got the decommissioned mlx_lm.server agent — mdemg's startup preflight then failed because LLM_ENDPOINT=http://127.0.0.1:8102/v1 wasn't reachable. Changes: - New packaging/launchd/com.mdemg.llama-server.plist with the Phase 13.5 production flags (--ctx-size 32768 --parallel 4 --cont-batching --metrics --jinja). Byte-identical mirror at internal/cli/launchd_templates/ for the embed.FS (CI sync-check enforced). - Removed packaging/launchd/com.mdemg.mlx-server.plist + embed.FS mirror. mlx_lm.server is decommissioned and known-broken on M5 + macOS 26.3.x; keeping the template would just risk re-deploying it. - internal/cli/service_darwin.go: launchdServices entry replaced with com.mdemg.llama-server. resolveMLXLMBin renamed to resolveLlamaServerBin with primary env MDEMG_LLAMA_SERVER_BIN, deprecation alias for MDEMG_MLX_LM_BIN (slog.Warn at boot, retained ≥1 release cycle per the Phase 13.6 deprecation pattern), PATH lookup of `llama-server`. resolveMDEMGModelPath default updated to the canonical Phase 13.5 GGUF filepath (.local-models/mdemg-llm-v1-gguf/mdemg-llm-v1.Q5_K_M.gguf) since llama-server takes a `.gguf` filepath, not an HF-format directory like mlx_lm.server. Install error message updated for the new env var name + remediation steps (`brew install llama.cpp`). - migrateLegacyMLXServerPlist() added: if a pre-cutover com.mdemg.mlx-server plist is bootstrapped on the operator's machine, Install() boots it out and renames the file to .disabled-phase13_5 (matches the manual operator convention from Phase 13.5 rollout). Best-effort: failures don't block the install. - internal/cli/service_darwin_test.go fully rewritten: * TestLaunchdServicesIncludesLlamaServer asserts the new entry exists and is Optional=false (production matches Hotfix 11.6.3.1; the old test asserted Optional=true, a latent lie since 2026-05-02 that Linux CI never caught because of //go:build darwin) * TestLlamaServerPlistEmbedded replaces TestMLXServerPlistEmbedded; additionally asserts mlx-server.plist is NOT in embed.FS * Two resolver tests for the primary env var * New TestResolveLlamaServerBinFallsBackToMLXAlias proves the Phase 13.6 deprecation alias path works * resolveMDEMGModelPath tests updated for the new GGUF default - internal/cli/watchdog.go: help text references com.mdemg.llama-server (instead of com.mdemg.mlx-server) and llama-server (instead of mlx_lm.server). Notes that mdemg_mlx_health_state metric name is retained for dashboard compatibility. Tested: - Tier 1 unit: 7/7 new tests pass; full ./internal/cli/... suite green (61s wall-clock). - Tier 2 integration: golangci-lint run ./internal/cli/ — 0 issues. CI plist sync-check (diff -q packaging/launchd/*.plist internal/cli/launchd_templates/) — 6/6 byte-identical. - Tier 3 live e2e: deferred. Running mdemg service install on the operator's currently-serving machine would briefly bootout the running llama-server LaunchAgent (PID 20527 actively serving production inference). The hand-installed llama-server plist on the operator's machine is byte-equivalent (modulo template substitutions) to what this commit will install via `mdemg service install` on a fresh operator setup, so the operator can verify on next planned redeploy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Epic 0 of Sprint MODEL-DIST-001 — Local LoRA Distribution via Ollama Library. Sprint plan in 12-section v1.0 format. Supersedes parts of the speculative spec at docs/research/mdemg_sprint_ideas/MDEMG_FT_LORA_PACKAGING_SPEC.md (HF Hub vs Ollama Library; adapter-only vs both-fused-and-adapter; Apple Silicon scope vs cross-platform). Configurability Contract — every operator-visible value is dynamic per the framework's no-hardcoding rule. 12 env vars + flag overrides + sensible defaults. ModelFetcher interface decouples CLI from Ollama-specific knowledge; v1 ships OllamaFetcher only, future backends (HF / S3 / GitHub Release / file) plug in via factory dispatch on MDEMG_MODEL_BACKEND without touching the CLI surface. Forensic from Epic 0: - adapters/tier1/adapters.safetensors verified present (514 MB MLX, Phase 5 SFT Iter 2400 best output) - mdemg-llm-v1.Q5_K_M.gguf SHA256 captured (9.8 GB; 144ad7231...) - f16 GGUF intermediate NOT on disk; Epic 1 will regenerate via convert_hf_to_gguf.py from the MLX merged model (~5 min) - qwen3:14b model-layer digest captured from Ollama registry; manifest digest to be computed at Epic 3 for Modelfile FROM @sha256: pinning quant_manifest.json skeleton with Q5_K_M SHA pre-populated; Q4_K_M / Q8_0 / adapter SHAs filled in during Epics 1+2. Estimated effort 5–7 dev-days. OpenAI spend $0. Risk medium (Ollama publish one-way; MLX→PEFT→GGUF LoRA conversion is the riskiest engineering item with documented contingency to defer to MODEL-DIST-002 if blocked). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pipeline (CLAUDE.md Phase 13.5 documented path):
1. mlx_lm.fuse --dequantize: mlx-community/Qwen3-14B-4bit + adapters/tier1/
-> 29.6 GB bf16 HF safetensors at .local-models/qwen3-14b-mdemg-v1-bf16/
2. convert_hf_to_gguf.py --outtype f16 -> 30 GB f16 GGUF (required
neural/.venv interpreter with torch + transformers + gguf installed;
/opt/homebrew/bin/convert_hf_to_gguf.py uses system python which lacks
these — installed gguf/sentencepiece/protobuf into neural/.venv)
3. llama-quantize Q4_K_M -> 9.0 GB (4.87 BPW; 40s wall on M5)
4. llama-quantize Q8_0 -> 16 GB (8.50 BPW; 11s wall on M5)
5. Live smoke per new quant via llama-server on port 18102 — both serve
/v1/models cleanly with embedded chat_template
SHAs captured in quant_manifest.json:
Q4_K_M: 401161710c22f0ae...411d42ea
Q5_K_M: 144ad723101d688f...d5f5d54 (matches Epic 0 baseline)
Q8_0: fc14dcb40af1bb58...8db6089
f16: 436cd6f41a684805...3217bd (intermediate, retained for Epic 2)
Resource matrix updated with empirical sizes (Q4_K_M is 9.0 GB vs estimated
6.5 GB; min RAM revised 8 -> 12 GB to cover ~3 GB working memory above
weights). 14B params x 4.87 BPW ≈ 8.5 GB matches the formula.
GGUF binary artifacts stay local — .local-models/ gitignored per
.gitignore:70. Sprint deliverable in git is just the manifest update.
Production llama-server (PID 20527 on port 8102) undisturbed throughout
Epic 1; live smokes used port 18102.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adapter (LoRA-only Modelfile via ADAPTER directive) deferred per the sprint plan's documented contingency clause. Fused-only path (Epics 1, 3, 4, 5) continues — that's the primary operator value. Forensic findings (epic_2_forensic.md): - MLX adapter is well-formed: 560 tensors, 40 layers x 7 target_modules, rank 32, alpha 64, scale 20.0. - convert_lora_to_gguf.py is NOT in brew install llama.cpp; would need manual fetch from llama.cpp source. - MLX -> PEFT requires tensor transposition: MLX lora_a is (in, rank); PEFT expects (rank, in). Same for lora_b. - Estimated 80-95 min to complete vs ~30 min budget remaining for Epic 2. - Hit the contingency criterion: "MLX -> PEFT conversion blocked by tooling gaps." Decision: defer adapter scope to MODEL-DIST-002 (new follow-up sprint, to be planned separately). Fused-only ships this sprint. Knock-on changes (in-flight to subsequent epics): - Epic 3: drop Modelfile.adapter; publish only 3 fused quants. - Epic 4 CLI: --adapter flag accepted at parse-time but errors with "lands in MODEL-DIST-002"; machinery preserved for forward-compat. - Epic 6 e2e: drop adapter-pull step. - Epic 7 feature doc: adapter section notes "coming in MODEL-DIST-002". Artifacts preserved on disk for MODEL-DIST-002 pickup: - adapters/tier1/adapters.safetensors (MLX, 514 MB) - .local-models/mdemg-llm-v1-gguf/mdemg-llm-v1.f16.gguf (30 GB, retained as base for llama-server --lora verification later) quant_manifest.json adapter block updated with status=deferred + reason. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sh pending) Authored 3 Ollama Modelfiles in packaging/ollama/: Modelfile.Q4_K_M — 9.0 GB, 12 GB min RAM, 16 GB recommended Modelfile.Q5_K_M — 11 GB, 14 GB min RAM, 24 GB recommended (production canonical) Modelfile.Q8_0 — 16 GB, 20 GB min RAM, 32 GB recommended Common shape: FROM ./../../.local-models/mdemg-llm-v1-gguf/...gguf relative path (operator-machine local); num_ctx 32768, num_predict 4096, stop tokens <|im_end|>/<|im_start|>; Apache-2.0 LICENSE; SYSTEM positioning block. No TEMPLATE directive — chat template baked into GGUF metadata (Qwen3 chat_template.jinja preserved through mlx_lm.fuse --dequantize → convert_hf → llama-quantize pipeline). packaging/ollama/README.md documents the publish workflow including the fork-customization path (operators publishing under a different namespace follow MDEMG_MODEL_NAMESPACE per the Configurability Contract). Local ollama create completed for all 3: reh3376/mdemg-llm-v1:Q4_K_M ID 5c3a7252c295 reh3376/mdemg-llm-v1:Q5_K_M ID 08c13b480864 reh3376/mdemg-llm-v1:Q8_0 ID 6b1006facd36 Layers de-duplicated: config + params + system layers (3 layers) are identical across all 3 quants; only the model blob (GGUF) differs. ** ollama push deferred ** — one-way action gated on operator confirmation per Sprint Plan §10 Risk #8. Operator must claim reh3376 namespace on ollama.com and generate API token before push proceeds. Local-create proves the Modelfiles are well-formed; push is a separate decision. Once pushed, manifest digests captured into quant_manifest.json (ollama_manifest_digest field per quant) for mdemg model verify. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…interface
Sprint MODEL-DIST-001 Epic 4 — the bulk of the operator-facing surface.
New CLI subcommand group:
mdemg model pull # fetch + symlink + SHA verify
mdemg model list # show pulled models
mdemg model verify # re-check SHAs vs quant manifest
mdemg model remove # destructive (requires --yes)
mdemg model where # print resolved path for shell scripting
Pluggable backend (internal/cli/model_fetcher.go):
type Fetcher interface { Name, Fetch, Verify, Remove }
NewFetcher dispatches on cfg.ModelBackend (env: MDEMG_MODEL_BACKEND)
v1 ships OllamaFetcher only; future backends (hf, s3, github-release,
file) plug in via factory branch — CLI surface unchanged.
OllamaFetcher (internal/cli/model_fetcher_ollama.go):
Encapsulates ALL Ollama-specific concepts: `ollama pull` invocation,
manifest path under <OLLAMA_MODELS>/manifests/<OLLAMA_HOST>/<ns>/<n>/<tag>,
mediaType=application/vnd.ollama.image.model layer filtering,
blob path under <OLLAMA_MODELS>/blobs/sha256-<digest>, symlink under
<MDEMG_MODEL_DIR>, idempotent.
Configurability Contract (no hardcoding; memory: feedback_no_hardcoded_values.md):
12 env vars + flag overrides, each with v1-production-tuned defaults so
`mdemg model pull` with no flags Just Works. See sprint plan §3.
Live-verified all 3 resolution paths:
`--quant Q5_K_M` → namespace=reh3376
`--namespace acme --name custom-model` → namespace=acme name=custom
`MDEMG_MODEL_NAMESPACE=acme env` → env overrides applied
Added to internal/config/config.go: ModelBackend, ModelNamespace,
ModelName, ModelQuants, ModelRamTiers, ModelQuant, AdapterBase,
ModelDir, OllamaModelsRoot, OllamaRegistryHost, ModelManifestPath.
Embedded quant manifest (internal/cli/quant_manifest.json via embed.FS):
Runtime source-of-truth for SHA verification. Operator override via
MDEMG_MODEL_MANIFEST_PATH for air-gapped deployments. Mirrors
docs/development/model-dist-001/quant_manifest.json.
RAM-tier auto-pick:
Default JSON `{"<16":"Q4_K_M","<24":"Q5_K_M","default":"Q8_0"}` maps
host RAM (sysctl on darwin, /proc/meminfo on linux) to quant. Operator
override via MDEMG_MODEL_RAM_TIERS.
Adapter path (--adapter flag) returns ErrAdapterDeferred per Epic 2's
contingency exit — adapter publication lands in MODEL-DIST-002. Flag
machinery preserved for forward compatibility.
Tests (22, all green) in internal/cli/model_test.go:
- Backend factory dispatch (5 cases incl. case-insensitive, default, error)
- Quant allowlist parsing (5 cases incl. whitespace + empty entries)
- RAM-tier JSON parsing (default + operator override + malformed)
- PickQuantForRAM (7 boundary cases)
- ResolveQuant across paths (auto, explicit, rejection, operator-custom)
- QuantManifest load (embedded + file override + missing-file error)
- Ollama tag composition (fused + adapter forms)
- Manifest path composition under custom OLLAMA_MODELS/OLLAMA_HOST
- Blob path digest prefix handling
- Adapter deferred error
- Manifest JSON parser (mediaType filtering + malformed + no-model-layer)
Grep audit (verification checklist):
grep on internal/cli/model*.go for hardcoded values found only in help
text Long/example strings documenting defaults to operators — not in
logic. Behavior values all flow through cfg.Model* fields.
Build + lint clean. Full cli test suite (61s wall) green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…+ writer Sprint MODEL-DIST-001 Epic 5 — observability for `mdemg model` operations. Grafana panels deferred to Sprint B (Grafana audit). New migration: internal/tsdb/migrations/021_model_install_events.sql Hypertable on recorded_at, 7-day chunks, 3 indexes (quant-time, failed-events partial, backend-event-time). Columns: event_id CUIDv2 PK + recorded_at, event_type (pull/verify/remove), backend_name, namespace, model_name, quant, adapter bool, success bool, latency_ms, sha256, size_bytes, err_message (1 KB cap). New writer: internal/tsdb/model_install_writer.go Synchronous single-row INSERT (not buffered + CopyFrom — CLI is one-shot, writes are infrequent vs the V0017/V0018/V0019/V0020 retrieval- path writers that fire per-request). Nil-pool no-op for degraded mode. errMessageMaxLen=1024 truncation at write time. New modelInstallPool interface (Exec-shaped) avoids touching the existing CopyFrom-shaped poolIface used by buffered writers. Wiring: internal/cli/model.go gets recordModelEvent(parent, cfg, row) helper: - Returns immediately if !cfg.TSDBEnabled || cfg.TSDBHost=="" - 2s timeout on connect (TSDB unreachable doesn't block CLI exit) - Logs warning + degrades gracefully on any TSDB error Called from runModelPull (success + failure paths), runModelVerify (single sweep row), runModelRemove (success + failure paths). Schema version bump: internal/config/config.go: TSDB_REQUIRED_SCHEMA_VERSION default 20→21. CI validator at .github/workflows/ci.yml:60-65 counts SQL files in internal/tsdb/migrations/ and asserts equality; now 21 files = 21 in config = passes. Build + lint clean. Existing tsdb / cli test suites green; no new tests added for the writer itself (single INSERT mirrors V0017/V0018/V0019 patterns already covered; integration is operational verification at Epic 6 once tsdb is up in the dev stack). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 Epic 7 — operator-facing feature documentation
following the standard Why / Choices / How / How-to-use shape (memory:
feedback_per_feature_docs_required.md).
Contents:
- Why: gap between brew install and a working local LLM after Phase 13.5
- Choices: backend matrix (Ollama vs HF vs GitHub vs S3 vs file://),
artifact form (fused vs adapter), Apple Silicon scope, "Ollama runtime
rejected (broken on M5+macOS 26.3.x), Ollama distribution only"
- How it works: ASCII flow diagram covering CLI dispatch -> Fetcher
interface -> OllamaFetcher (preflight, ollama pull, manifest discovery,
blob resolve, symlink, SHA verify) -> V0021 observability row
- How to use:
* Quick start (3 commands: brew install ollama, mdemg model pull,
curl /v1/models)
* Explicit quant selection
* Managing pulled models (list / verify / where / remove)
* Forks + enterprise (MDEMG_MODEL_NAMESPACE override)
* Air-gapped (MDEMG_MODEL_MANIFEST_PATH override)
* Resource matrix per quant (disk, min RAM, recommended RAM, BPW)
* Full Configurability Contract table (11 env vars + flags + defaults)
* V0021 observability schema
- Troubleshooting: ollama missing, SHA mismatch, quant allowlist
rejection, RAM auto-detection failure, out-of-disk, symlink permission
- Forward-looking: MODEL-DIST-002 adapter, Sprint B Grafana panels,
future backends, cross-platform
- References: all source-of-truth files cross-linked
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 Epic 8 — final epic, never cut (memory: feedback_sequential_epics.md). This commit lands the main-repo doc updates. The packaging/homebrew-mdemg/ submodule docs (README, CHANGELOG, formula caveats text) update at v0.10.0 release-tag time per the v0.9.0 release flow precedent — that's when goreleaser auto-regenerates mdemg.rb from .goreleaser.yaml's caveats template, and the tap-side README/CHANGELOG get edited in lockstep. Changes: - CHANGELOG.md: comprehensive Unreleased entry documenting Epics 0-5 + 7 landed in this sprint. Epic 3 ollama push and Epic 6 Tier 3 e2e marked as gated on operator confirmation. Adapter path explicitly deferred to MODEL-DIST-002 with epic_2_forensic.md cross-reference. Captures the Configurability Contract enumeration, the 3 quant SHAs, the Fetcher interface design, the V0021 hypertable, and the explicit out-of-scope list. - CLAUDE.md: new "Model Distribution (Sprint MODEL-DIST-001)" subsection in Architecture Notes, slotted ABOVE the existing Compose embed entry for visibility. Captures the pluggable-backend design, the Ollama-as- distribution-only constraint, the on-disk symlink + manifest discovery flow, the 11-knob Configurability Contract surface, the no-hardcoding enforcement, the TSDB V0021 hookup, and the Apple Silicon v1 scope. - README.md: new "Step 2b (optional): Pull the local LLM" section between Step 2 (Initialize/Start) and Open the Dashboard. 3-command quick start (brew install ollama -> mdemg model pull -> set MDEMG_MODEL_PATH). Cross-references the feature doc for the full Configurability Contract. - .goreleaser.yaml: caveats template updated to include `mdemg model pull` instructions. Goreleaser regenerates the homebrew formula's caveats block from this on the next v* tag push, so v0.10.0 will ship the new text to brew users automatically. Deferred to v0.10.0 release-tag time (handled per v0.9.0 precedent): - packaging/homebrew-mdemg/README.md update - packaging/homebrew-mdemg/CHANGELOG.md update - packaging/homebrew-mdemg/mdemg.rb regeneration (automatic via goreleaser from the .goreleaser.yaml change in this commit) - Submodule pointer bump in main repo Deferred to Epic 6 close (after operator does ollama push): - post.md sprint-close document - Capture of remote Ollama manifest digests into quant_manifest.json Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All 3 fused quants now live on Ollama Library: https://ollama.com/reh3376/mdemg-llm-v1:Q4_K_M https://ollama.com/reh3376/mdemg-llm-v1:Q5_K_M https://ollama.com/reh3376/mdemg-llm-v1:Q8_0 End-to-end integrity verified: remote model-layer digests captured via GET https://registry.ollama.ai/v2/reh3376/mdemg-llm-v1/manifests/<quant> match the local Epic 1 SHAs exactly: Q4_K_M 401161710c22f0ae...411d42ea (matches Epic 1) Q5_K_M 144ad723101d688f...d5f5d54 (matches Epic 1) Q8_0 fc14dcb40af1bb58...8db6089 (matches Epic 1) Captured into quant_manifest.json (both docs canonical + internal/cli embed.FS mirror, byte-synced): - ollama_manifest_digest per quant (computed from the manifest body): Q4_K_M sha256:a210cccb12311773fd70bfa81f221ca0f7940a315bef87b84608caf894533b1b Q5_K_M sha256:ae6e54fe1ee0b487ae41260687ed14c46c30d1ffb0fece936282418b5bcb78e1 Q8_0 sha256:93df4d64bfa751506f7afba8bf08b891ea828575b838adec17b9399ad85be718 - Corrected size_bytes (Epic 1 used approximate values; replaced with registry-reported exact bytes for each tag): Q4_K_M 9.0 GB -> 8.4 GB (9001753408 B; was 9658404096) Q5_K_M 11 GB -> 9.8 GB (10514569568 B; was 11811160064) Q8_0 16 GB -> 14.6 GB (15698534208 B; was 17179869184) - Status flipped from "local-create done; push pending" to "published". Embedded runtime manifest (internal/cli/quant_manifest.json) re-built into the binary via embed.FS. TestLoadQuantManifest_EmbeddedFallback green with new values. Epic 3 of Sprint MODEL-DIST-001 now COMPLETE. Epic 6 (Tier 3 live e2e — `mdemg model pull` against the published tags + llama-server load on port 18102 + sanity inference) is now unblocked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 close-out per memory rule
(feedback_sprint_plan_format.md §11 — sprint plans live in
docs/development/<sprint-line>/ with the standard post.md companion).
Sections (CLAUDE.md sprint-plan section guidance):
- Outcome: 3 quants live on Ollama Library, mdemg model pull is the
canonical install path
- Process: how the plan held under reality (operator-surfaced no-
hardcoding rule revised the plan in-place to add the Configurability
Contract before code was written)
- Findings: 5 smooth parts + 5 friction items, both honest:
* convert_hf_to_gguf.py python deps gap (silent ModuleNotFoundError)
* mlx_lm.fuse adapter-path requirement
* convert_lora_to_gguf.py missing from brew install llama.cpp
(proximate Epic 2 deferral trigger)
* mdemg tsdb migrate CWD-aware .env loader quirk
* Epic 1 size estimates off vs registry-reported exact bytes
- Current state: per-layer state matrix
- Testing & benchmarking: all 3 tiers documented (Tier 3 e2e captured
V0021 rows for both pull + verify event_types — live-verified)
- Risks & opportunities (forward): MODEL-DIST-002 adapter scope, Sprint
B Grafana, cross-platform, HFFetcher slot, CWD-aware .env loader QoL
- Sprint commits: 9 commits on dev01, mapped to their epics
Closes Sprint MODEL-DIST-001 functionally. Operational sprint close
(v0.10.0 release tag + tap-repo doc updates) is a separate motion.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…shed main) PR #385 squash-merged the original Epic 3 quant_manifest values (estimated sizes from llama-quantize wall output, null ollama_manifest_digest because the push hadn't happened yet) into main as commit f1d029a. Meanwhile on dev01, commit 87293f8 (Epic 3 closeout) corrected those values to the registry-canonical state after the ollama push completed: - size_bytes: replaced Epic 1 approximations with registry-reported exact bytes (Q4_K_M 9001753408 / Q5_K_M 10514569568 / Q8_0 15698534208) - size_human: 9.0/11/16 GB -> 8.4/9.8/14.6 GB (more accurate) - ollama_manifest_digest: null -> sha256:a210cccb...|ae6e54fe...|93df4d64... - status: "local-create done; push pending" -> "published (...)" Conflict resolution: keep dev01 (HEAD) values for both files — those are the registry-canonical post-push state. JSON validity verified for both files; TestLoadQuantManifest_{EmbeddedFallback,OperatorOverride,OverrideMissingFile} all green against the resolved embedded manifest. The non-conflicting fast-forwarded changes from main (claude workflow edits + dependabot go.mod/go.sum bumps) are folded in by this merge unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Promote the Sprint MODEL-DIST-001 entry from Unreleased to v0.10.0 (2026-05-11) ahead of release.yml / goreleaser tag push. Fresh empty Unreleased section seeded above. v0.10.0 ships: - mdemg model pull|list|verify|remove|where — one-command path from brew install mdemg to a working local LLM - Pluggable ModelFetcher interface (Ollama in v1, slots for HF/S3/GHR/file) - 3 fused GGUF quants live on Ollama Library at reh3376/mdemg-llm-v1 (:Q4_K_M 8.4 GB / :Q5_K_M 9.8 GB / :Q8_0 14.6 GB) - 11-knob Configurability Contract (every operator-visible value dynamic) - TSDB V0021 model_install_events hypertable + writer - docs/features/local-model-distribution.md Adapter (LoRA-only) path deferred to MODEL-DIST-002 per the sprint plan's documented contingency (epic_2_forensic.md). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ce Model Distribution section
Stage 4 + Stage 5 of v0.10.0 release.
Submodule pointer bump:
packaging/homebrew-mdemg 6077097 -> c3aa68b
incorporates:
- 42d7390 — goreleaser auto-bumped mdemg.rb to version "0.10.0" + new
caveats text on v0.10.0 tag push
- c3aa68b — manual docs round-trip: CHANGELOG v0.10.0 entry,
README Optional Pull-the-local-LLM section in Quick Start (full
Ollama Library doc with quant matrix, list/verify/where/remove
subcommands, fork variants via MDEMG_MODEL_NAMESPACE, architecture
note "Ollama is distribution-only"), Upgrading to v0.10.0 +
What's New in v0.10.0 blocks, default-LLM rotation history extended,
mdemg_beta_testing.md version pin v0.9.0 -> v0.10.0
docs/user/cli-reference.md (per Stage 5 user request to align refs
with current codebase):
- New ## Model Distribution top-level section before ## Synergy
Optimization (model command group is GroupID="config" in root.go
but a top-level cli-ref section is cleaner for discoverability).
Documents all 5 subcommands (pull, list, verify, remove, where) with
flag tables, usage examples, the full Configurability Contract (11
knobs), the architecture note (Ollama is distribution-only).
- Updated Environment Variable Reference with new "Model Distribution
(Sprint MODEL-DIST-001, v0.10.0)" subsection — 11 env vars +
defaults table.
- Updated Command Tree Summary with the new model subcommand group
slotted between Configuration and Advanced.
docs/user/api-reference.md unchanged: Sprint MODEL-DIST-001 added zero
HTTP endpoints (CLI-only sprint; observability via TSDB V0021 row
writer is server-side internal). Audit also surfaced ~25 routes of
pre-existing drift between code and docs (mostly path-parameter
notation: `/v1/backup/` in code vs `/v1/backup/{id}` in docs — same
routes — plus 3 undocumented /api/graph/* endpoints and 2
undocumented /v1/admin/features/{restart,stop} actions). That drift
is out-of-scope for v0.10.0 and belongs in its own follow-up sprint.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
One-shot or interactive REPL chat against the configured LLM endpoint (default: llama-server at port 8102 per Phase 13.5). Closes the gap operators noted between `ollama run` and the mdemg framework. Two modes: - One-shot: `mdemg model run -p "hello"` or positional arg after `--` - Interactive REPL: no prompt; reads stdin line-by-line, accumulates conversation history across turns Pure stdlib HTTP (no llmclient retries/breakers/recording). CLI invocations are intentionally NOT recorded to llm_interactions — this is an ad-hoc exploration tool, not a production code path; keeping the training-data corpus clean. Every operator-visible value is dynamic per the no-hardcoding rule: --endpoint override cfg.EffectiveLLMEndpoint --model override cfg.LLMModel (final fallback: mdemg-llm-v1) --prompt/-p one-shot prompt (omit for REPL) --system/-s system message --temperature (default 0.7) --max-tokens (default 1024) --timeout (default 60s) Live-verified end-to-end on the operator's running llama-server on port 8102 with mdemg-llm-v1: one-shot worked; system+prompt with --model override worked. 13 unit tests in model_run_test.go covering: message composition (system first, no-system skip, history preservation), config resolution (flag > cfg > final fallback), OpenAI-compat HTTP shape, error paths (HTTP error, inline error object, no choices, timeout), trailing-slash endpoint normalization, body-bounding helper. All green. Renamed local body-bounding helper to `truncateRunBody` to avoid name collision with a same-named helper in internal/cli/data.go. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Audit of internal/api/server.go (167 routes) vs docs/user/api-reference.md
surfaced 19 genuinely missing endpoints. v0.10.0 commit noted this as
out-of-scope; this commit resolves the gap.
Audit method: extract mux.HandleFunc registrations from server.go, extract
documented "VERB /path" headings from api-reference.md, normalize both to
strip path parameters and trailing prefix slashes, diff. Of the initial
24-entry code-only set, 5 are false positives (combined headers like
"POST /v1/admin/features/start|stop|restart" cover the individual verbs;
"GET|POST /v1/jiminy/protocol/metrics" covers both methods on one route).
Added sections:
Jiminy / J17 (10 endpoints, all under "## Jiminy Inner-Voice"):
GET|POST /v1/jiminy/protocol/metrics # snapshot + reset
GET /v1/jiminy/protocol/status # per-session J17 state
POST /v1/jiminy/checkpoint # tier-transition checkpoint
POST /v1/jiminy/resume-protocol # restore from checkpoint
POST /v1/jiminy/extension # operator-driven tier hold
POST /v1/jiminy/strict # toggle strict mode per session
POST /v1/jiminy/reformulate # advisory -> imperative rewrite
POST /v1/jiminy/classify # pre-Write/Edit pass/deny gate
GET /v1/jiminy/latest # most recent guidance (warm store)
POST /v1/jiminy/warm # eager cache warmup
Memory / Graph (3 endpoints, under "## Memory Operations"):
GET /v1/memory/graph/topology # node/edge counts per layer
GET /v1/memory/graph/neighborhood # local 1-3 hop walk
GET /v1/memory/spaces # root listing of all spaces
Observability (2 endpoints, under "## Metrics & Monitoring"):
GET /v1/metrics/trends # TSDB time-series query
GET /v1/prometheus # Prometheus scrape endpoint
Dashboard / Viz (4 endpoints, new "## Dashboard / Visualization (internal)"
section before MCP Server Tools — operator-internal endpoints backing the
browser dashboard at /ui/):
GET /api/graph/data # force-directed graph data
GET /api/graph/fields # schema field catalog
GET /api/graph/health # explorer health
GET /viz/topology # standalone HTML topology view
Each entry has handler-signature-derived request/response shape, query
parameter table, sample curl/JSON examples following the existing
api-reference convention. TOC updated with new "Dashboard / Visualization
(internal)" entry and renumbered tail.
Out of scope (deliberate, deferred):
- 28 "docs-only" entries from the audit are confirmed false positives
from prefix-matching path normalization (code registers /v1/memory/nodes/
with trailing slash and routes the suffix; docs spell out the full
/v1/memory/nodes/{node_id}/archive form correctly)
- /v1/symbols root path is partially covered by /v1/symbols/relationships
+ /v1/symbols/{id}/relationships in docs; root listing endpoint
documentation can land later if/when its handler grows specific shape
- /v1/conversation/observations covered indirectly by the flag-for-org
endpoint documentation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint GRAFANA-AUDIT-001 Epic 0. Builds the per-panel audit harness:
walks every panel in deploy/docker/grafana/dashboards/*.json, extracts
rawSql/sql targets, substitutes Grafana macros (\$__timeFilter,
\$__timeFrom/To, \$__interval, \$__unixEpoch*) + template variables
(\$space_id, \$instance + multi-value variants like \${space_id:raw}),
executes via docker exec mdemg-timescaledb-1 psql, classifies each
panel target as PASS / EMPTY / FAIL / SKIP.
Tier 1 unit tests (17 tests, all green):
- Template-variable substitution: time_filter / from-to / unix epoch /
interval / interval_ms / space_id (3 syntaxes) / instance (3
syntaxes) / multi-macro composite query
- Table extraction (FROM/JOIN with alias, case-insensitive, no-table)
- Panel walking (flat, nested rows, targets-with-sql vs no-sql)
Smoke test against mdemg-overview.json IMMEDIATELY validated the
operator's "diminished observability" report — 5 of 13 panels FAIL,
1 EMPTY, 7 PASS on the front-page dashboard:
FAIL Request Rate
FAIL Error Rate
FAIL Circuit Breakers
FAIL Requests by Status
FAIL Rate Limit Rejections
EMPTY Request Latency Distribution (t0; t1/t2 PASS)
The original 11-panel sample missed these because it sampled different
panels. Lesson: trust the rigorous audit, not the sample. Sprint
proceeds to Epic 1 (full audit across all 146 panels) immediately.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint GRAFANA-AUDIT-001 Epics 1 + 2. Per-panel rigorous audit of all
165 target executions across 146 panels in 8 dashboards.
Headline:
PASS 125 (76%) — executes, returns rows in 24h window
EMPTY 19 (12%) — executes, 0 rows
FAIL 3 (2%) — SQL error
SKIP 18 (11%) — non-SQL panel types
Harness fix mid-Epic-1: \$__interval substitution was wrapping the
value in quotes, but Grafana convention has panel SQL provide its own
outer quotes — producing doubled quotes and 18 false-positive FAILs.
Fixed: substitute bare value. Verified by re-run: 20→3 FAILs.
Real failures (Epic 2 findings):
(a) 3 SQL bugs on mdemg-llm-routing.json — all three panels hardcoded
`mdemg-dev` (unquoted) in WHERE clauses instead of '\$space_id'
template variable. PG parses `mdemg-dev` as subtraction.
(b) 5 schema-drift EMPTYs — panel filter expects metric_type or labels
shape that doesn't match server emission:
- mdemg_j17_events_total: panel 'counter', server 'gauge'
- mdemg_rsic_action_total: panel status='success', server status='completed'
- 2 more suspected pending full-SQL inspection.
(c) 2 missing-server-side metrics — mdemg_rate_limit_rejected_total
and mdemg_http_request_duration_seconds_p50 not emitted. Will be
documented; server emission is follow-up.
(d) ~11 sparse-data EMPTYs — panel SQL correct, no rows in 24h window.
Widening time-range in Epic 4.
Projected post-Epic-3/4: 133 PASS, ≤11 EMPTY, 0 FAIL.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint GRAFANA-AUDIT-001 Epic 3. Minimum-change JSON edits to fix
category (a) SQL bugs and category (b) schema-drift EMPTYs identified
in Epic 1/2.
mdemg-llm-routing.json (3 panels, all category-a SQL bugs):
- LLM call distribution by model_name (24h)
- LLM latency p50 / p95 / p99 by task × model
- LLM error rate % by task_name (selected range)
Bug: WHERE clause was `(\$space_id = '' OR space_id = '\$space_id')` —
the first \$space_id was unquoted, so PG parsed `mdemg-dev = ''` as
`column "mdemg-dev"` which doesn't exist. Also breached the
no-hardcoding rule (memory: feedback_no_hardcoded_values.md).
Fix: wrap the first variable reference in quotes → `('\$space_id' =
'' OR space_id = '\$space_id')` — a proper string-literal comparison
that also serves as the All-spaces guard the panel author intended.
Verdict: 3 FAIL -> 3 PASS. mdemg-llm-routing is now 4/4 PASS.
mdemg-j17.json :: Total Events (1 panel, category-b drift):
Panel filtered `metric_type = 'counter'` (Prometheus naming
convention because metric is `mdemg_j17_events_total`). Server
actually emits `metric_type = 'gauge'`. 6,393 rows in 7d; 0 panel
matches. Fix: align panel filter to `'gauge'`.
Verdict: EMPTY -> PASS.
mdemg-rsic.json :: Action Success Rate t0 (1 panel target, category-b
drift):
Panel filtered `labels->>'status' = 'success'`. Server actually
emits `'completed'` (181 rows in 24h; 0 panel matches). Fix: align
panel filter to `'completed'`. The t1 'failed' target retained
unchanged — its EMPTY result is now accurate observation (server
emits no `'failed'` actions; 0 = legitimate zero).
Verdict: 1/2 EMPTY -> PASS, 1/2 EMPTY accurate-zero.
Audit verdict counts:
Before: 125 PASS, 19 EMPTY, 3 FAIL, 18 SKIP
After: 130 PASS, 17 EMPTY, 0 FAIL, 18 SKIP
Remaining 17 EMPTYs (Epic 4 disposition):
- 5 category-c emission regression — 4 rsic metrics stopped at
2026-05-07/08 (server-side investigation queued as follow-up)
- 2 category-c never-emitted — Rate Limit Rejections, p50 latency
- 8 category-d sparse-data on ft-training — widen time-range
- 1 mdemg-jiminy :: Effectiveness Trends — CTE pending inspection
- 1 mdemg-rsic :: Action Success Rate t1 (accurate-zero)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint GRAFANA-AUDIT-001 closeout (Epics 4 + 5 + 6 + 7 combined as a single doc-only commit; Epic 5 deferred and Epic 6 deferred-to-operator as documented in post.md). New: docs/features/observability-dashboards.md (286 lines) — full operator-facing inventory of the 8 dashboards with: - Per-dashboard purpose + panel count + primary tables - Audit verdict table (130/17/0/18 post-Epic-3) - Epic 3 fix log: 3 SQL bugs + 2 schema-drift filters - Known gaps in 3 buckets: (c) emission regression (4 May-7-8 metrics, current codebase has zero refs — server removed emission), (c) never-emitted (mdemg_rate_limit_rejected_total + mdemg_http_request_duration_seconds_p50), (d) sparse/zero data on this dev TSDB (ft-training tables) - Refresh expectations per table - Operator playbook for re-running scripts/grafana_panel_audit.py - Forward-looking: CI integration, coverage expansion, server-side emission restore New: docs/development/grafana-audit-001/post.md — sprint close per memory rule, covers process / smooth-parts / friction / sprint-plan vs reality / current state / risks-opportunities / commits. Epic deferrals (documented in post.md): - Epic 5 (coverage expansion for 11 unused TSDB tables): deferred because most target tables are zero on this dev TSDB. Adding panels would create more EMPTYs, defeating the goal. - Epic 6 (Tier 3 browser e2e): deferred to operator; not blocking. CHANGELOG Unreleased entry covers the sprint at high level + cross- references the feature doc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-002 picks up the adapter-only path deferred from MODEL-DIST-001 Epic 2. Resolves the tooling gap documented in epic_2_forensic.md. Workspace prep: - Vendored convert_lora_to_gguf.py from llama.cpp source (master, pinned 2026-05-21) into scripts/vendor/llama_cpp/ with MIT LICENSE attribution and a README documenting refresh policy. brew install llama.cpp ships convert_hf_to_gguf.py but NOT convert_lora_to_gguf.py; vendoring is the cleanest path (vs requiring operators to clone llama.cpp source). - pip install peft==0.19.1 + accelerate==1.13.0 + psutil==7.2.2 into neural/.venv (the same venv that has torch + transformers + gguf from MODEL-DIST-001 Epic 1). PEFT is needed for PEFT-format schema validation + as a dependency of convert_lora_to_gguf.py. - Inspected convert_lora_to_gguf.py — expects directory with adapter_config.json + adapter_model.safetensors in PEFT layout. Confirms the MLX → PEFT direction is `lora_A: (rank, input)` and `lora_B: (output, rank)` (script line 41-42 docstring). Sprint plan in 12-section v1.0 format. 7 epics, 1-2 dev-day estimate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e verify
Sprint MODEL-DIST-002 Epics 1, 2, 3 (combined commit).
Epic 1 — MLX → PEFT converter (scripts/mlx_adapter_to_peft.py + 14 unit tests):
Reads adapters/tier1/adapters.safetensors (514 MB MLX format, 560 tensors,
Phase 5 SFT Iter 2400 best). Per the analysis in MODEL-DIST-001
epic_2_forensic.md:
Key rename: model.layers.<N>.<module>.lora_a
-> base_model.model.model.layers.<N>.<module>.lora_A.weight
Tensor transpose: lora_a (input,rank) -> (rank,input)
lora_b (rank,output) -> (output,rank)
Emits PEFT-format adapter_config.json + adapter_model.safetensors.
Single-adapter PEFT layout (.lora_A.weight, not .lora_A.default.weight)
required by convert_lora_to_gguf.py.
Epic 2 — PEFT → GGUF LoRA (scripts/vendor/llama_cpp/convert_lora_to_gguf.py):
Pinned to llama.cpp release b9000 (self-contained version; upstream master
refactored to a conversion/ Python package with 30+ model files, excessive
vendoring scope). README documents refresh policy.
Output: .local-models/mdemg-llm-v1-adapter.gguf
SHA256: 0cfaf4bae3215a4aea664a8d28ae9a41d73ee740cbcce5c2eef950232cfe1de5
Size: 257 MB (vs ~9 GB fused Q5_K_M; ~35x smaller download)
Tensor count: 560 (matches expected 40 layers x 7 target_modules x 2)
Epic 3 — Live verification (docs/development/model-dist-002/verification.md):
Side-port llama-server on 127.0.0.1:18103 with f16 base + adapter; sanity
prompt vs production 8102 fused model returns semantically-aligned outputs
on the same prompt — both describe MDEMG as a knowledge-graph memory
system. Confirms the MLX-PEFT-GGUF chain is structurally correct.
Iteration during Epic 2 (worth noting):
- Initial vendored convert_lora_to_gguf.py from upstream master failed
with ImportError (refactored to use conversion/ package). Pinned to
b9000 release which is self-contained.
- Initial PEFT keys used .default.weight suffix (multi-adapter layout);
convert_lora_to_gguf.py rejected with \"Not a lora_A or lora_B tensor.\"
Switched to single-adapter layout (.weight) which the script accepts.
Test results: 14/14 Tier 1 tests green; PEFT output loads via
peft.PeftConfig.from_pretrained; GGUF emission completes with all 560
tensors; runtime adapter application produces coherent outputs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…S backfill Builds the first consumer for EVENTGRAPH-001's reinforcement-neighborhood federation API (which has no consumer): a 'mdemg eventgraph' CLI command. Validates the Pattern Y1 bet + becomes the live-testing harness for EVENTGRAPH-002/003 (user directive: build the consumer first). Per the UxTS directive: maps the work to the frameworks. UATS applies to the federation HTTP API -> add eventgraph_reinforcement_neighborhood.uats .json (backfilling the -001 gap; the endpoint shipped with no UATS), which replaces an ad-hoc Go integration test as the Tier 2 contract test. UVTS/UBENCH N/A. UOTS panel-spec gap noted as a follow-up (out of scope). CLI rendering -> Tier 1 Go units. 4 epics; CLI (--seed/--query/--hops/--since/--limit/--json) renders summary + events table or JSON; server-driven defaults (no re-hardcoding); read-only. ~1-1.5 dev-days. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (Epic 1) First consumer of the EVENTGRAPH-001 federation API — POSTs to /v1/eventgraph/reinforcement-neighborhood, renders a summary + events table (or --json). Supports --seed, --query (resolves seed via /v1/memory/retrieve top-1), --hops, --since, --limit. Unset flags are omitted from the request so the server applies its config defaults (no re-hardcoding of hops/since/limit in the CLI). Registered under the "advanced" command group. Tier 1 (httptest, -race clean): request-mapping omit-when-unset + conversion, --query seed resolution, no-results + invalid --since + surfaced-503 errors, render (empty + table), helpers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y neighborhood
Caught in EVENTGRAPH-CLI-001 live contract testing (standard code tests
missed it; the live UATS happy-path against the running server did not):
walkNeighborhood returns a nil slice when the seed has no neighborhood
(e.g. an unknown seed), which JSON-marshals to `null`, while Events is
defensively initialized to []. Both are array fields and must serialize
consistently — null breaks any consumer asserting an array type (incl. the
new UATS contract's `type_is array` on $.neighbor_node_ids).
EventsInGraphNeighborhood now coalesces the nil slice to []string{}.
Tier 1 TestFederationResult_EmptyArraysNotNull pins the JSON contract.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Backfills the UATS gap EVENTGRAPH-001 left (no contract test for /v1/eventgraph/reinforcement-neighborhood). 6 cases, validated 6/6 live against the running server: - happy 200: asserts the response contract shape (events/neighbor_node_ids arrays, graph_hops/tsdb_rows_scanned numbers, truncated boolean) — robust to data, works even with an unknown seed (empty neighborhood is valid 200) - missing_space_id / missing_seed_node_id → 400 (empty-string override, since the runner deep-merges variant body over base — key omission can't unset) - negative_hops → 400, hops_over_ceiling (999 > 2×default) → 400 - method_not_allowed (GET) → 405 sha256 integrity hash added + verified. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… + close (Epic 3) Tier 3 live e2e verified the real binary against the real stack: --query surfaced 20 reinforcement events in a 5-node neighborhood (demonstrating the Hebbian-write → federation-read loop closing in one command); --seed/--json/ --limit/unknown-seed/no-arg paths all verified live. Feature doc gains the CLI consumer section; CHANGELOG Added + Fixed entries; CLAUDE.md architecture note; verification.md + post.md (UxTS mapping: UATS done, UOTS follow-up carried over). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…TSDB CI Test failed: the UATS contract step boots a minimal server without TSDB, so the eventgraph service is nil and every POST returns 503 "service not initialized" instead of the expected 200/400 (only GET→405 passed, since the method check precedes the service check). Same class as PR #404. The federation endpoint genuinely requires TSDB (it queries reinforcement_events; the service is nil without TSDB at boot), and CI's UATS step already excludes `tsdb`-tagged specs (ci.yml --exclude-tag ...,tsdb). Added "tsdb" to api.tags (matching metrics_snapshot/readyz_tsdb); re-hashed. Verified locally: the spec now reports Status: skip under the exact CI exclude filter, and still 6/6 live against the full stack via explicit --spec. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Federate the guidance-outcome event stream (Pattern Y1, second event class): walk a constraint's Neo4j neighborhood, surface time-windowed constraint_outcomes (followed/ignored/contradicted) for the constraint + its graph-related constraints. Data-decided architecture: reuse the existing constraint_outcomes table (no new hypertable/writer/enqueue site — RRF-SCALE-001 already populates it, 1176 live rows); join graph↔events on constraint_code (TSDB constraint_id UUID ≠ Neo4j node_id CUID — code is the only viable key). One additive migration (V0023: constraint_code index, schema 22→23). 8 epics, 3 testing tiers, live Tier 3. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mes (Epic 1) Adds idx_constraint_outcomes_code (space_id, constraint_code, time DESC) — the guidance-outcome federation joins graph↔events on constraint_code (TSDB constraint_id is a UUID that doesn't match the Neo4j node_id CUID; code is the only viable key), and migration 011 indexed only space/constraint_id/outcome. Partial index (constraint_code NOT NULL AND <> '') skips uncoded outcomes. Bumps TSDB_REQUIRED_SCHEMA_VERSION default 22→23 (config.go) to match the migration count — CI schema-version validator gates on this. Additive, no data change, idempotent. Live-verified: migration applies (schema 22→23), idx present, re-apply is a no-op, config/tsdb tests green, CI schema check 23=23. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d (Epic 2) Second Pattern Y1 federation: walk a constraint's Neo4j neighborhood, collect each neighbor's constraint_code, and join constraint_outcomes on those codes (backed by the V0023 index). walkNeighborhoodWithCodes returns the neighborhood node IDs + a code→node map; queryGuidanceOutcomes pulls coded outcomes in the window; Go-side join resolves each outcome's code → its neighborhood constraint node. Non-nil slices from the start (EVENTGRAPH-CLI-001 lesson). Reuses the existing constraint_outcomes sink — no new table/writer. Tier 1 (-race): validation guards, empty-arrays-not-null, sortedKeys determinism, join resolution. Tier 2 integration (live Neo4j+TSDB): full round-trip — hops=1 (seed+related codes, off-neighborhood excluded), hops=0 (seed code only), unknown-seed (empty non-nil). PASS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ic 3) POST /v1/eventgraph/guidance-outcome-neighborhood — walk a constraint's neighborhood, surface constraint_outcomes whose code is in the neighborhood. Same gating/auth/default convention as the reinforcement endpoint. Single-source refactor (per the dynamic-variables directive): extracted the shared gate (method/enabled/service → eventgraphGate) and default-resolution (hops/since/limit + ceiling → resolveFederationDefaults) into helpers used by BOTH handlers, so the federation rules live in exactly one place. The reinforcement handler now calls them too — verified no regression (reinforcement UATS still 6/6 live, unit tests green). Live-verified: seeding from the real 'no-direct-main-commits' constraint node surfaced real 'followed' outcomes with constraint_node_id resolved to the seed and in_neighborhood=true. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…CLI (Epic 4) Sibling subcommand consuming POST /v1/eventgraph/guidance-outcome-neighborhood. Walks a constraint's neighborhood and renders guidance outcomes (followed/ ignored split + table: code · outcome · sim · g_type · guidance_id · recorded) or --json. Seed via --seed/--query (--constraint-code seeding deferred — needs server-side code→node resolution; --query covers discovery). Unset hops/since/ limit omitted so the server applies config defaults (single source of truth). Tier 1 (-race): request-mapping omit-when-unset + conversion, --query seed resolution, surfaced-503 error, render (empty + followed/ignored table), truncStr. Help renders. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ion (Epic 5) 6 cases, validated 6/6 live: happy-200 response shape (outcomes/ neighbor_node_ids/neighbor_constraint_codes arrays, graph_hops/tsdb_rows_scanned numbers, truncated boolean), missing space_id/seed → 400 (empty-string override under deep-merge), negative_hops → 400, hops_over_ceiling → 400, GET → 405. Tagged 'tsdb' so CI skips it without TSDB (the EVENTGRAPH-CLI-001 lesson). sha256 hashed + verified. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Real binary against the real stack. Key assertion: CLI --json output matches direct constraint_outcomes SQL exactly (11 outcomes = 11, all followed) for the no-direct-main-commits constraint. --seed/--query/--limit/--json/unknown-seed/ no-arg all verified live. The --query "0 outcomes" result was traced to SQL ground truth — the 5 neighborhood codes genuinely have no feedback, so it's correct (federation distinguishes "code in neighborhood" from "code has outcomes"), not a join bug. Reinforcement endpoint un-regressed by the shared- helper refactor (UATS 6/6). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ic 7) Feature doc gains a Guidance-Outcome Federation section (why reuse constraint_outcomes, why join on constraint_code, CLI usage) + forward-look update. CHANGELOG Added (endpoint + CLI) + Changed (TSDB schema 22→23). CLAUDE.md architecture note extended. post.md closes the sprint with UxTS mapping + follow-ups (--constraint-code seeding, EVENTGRAPH-003, UOTS). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sprint EVENTGRAPH-002 — Guidance-Outcome Federation (second event class)Federates the guidance-outcome event stream (Pattern Y1): walk a constraint's Neo4j neighborhood, surface the time-windowed Commits
Two data-decided architecture calls (disclosed, not asked)
Single-source refactor (dynamic-variables directive)Extracted the shared federation gate ( Live testing earned its keep
UxTSUATS ✅ ( Follow-ups
|
…nchd PATH The native server (launchd) inherits PATH=/usr/bin:/bin:/usr/sbin:/sbin, which excludes the Docker Desktop symlink (/usr/local/bin/docker). So every server-runtime `docker` shellout failed with "executable file not found in $PATH": (1) Neo4j container CPU/mem stats (server.go) logged an ERROR every 60s and left the neo4j_container_* gauges empty — so the neo4j_high_cpu/_memory alert rules had no data; (2) the TSDB backup scheduler's `docker compose pg_dump` (backup.go) failed with only a slog.Warn. The DATA PLANE was never affected — Neo4j (Bolt) + TSDB (pgx) connect over mapped TCP ports, not the docker CLI. Fix (durable, configurable, single-source): new internal/dockerbin resolver — MDEMG_DOCKER_BIN env override → exec.LookPath → well-known install locations (/usr/local/bin, /opt/homebrew/bin, /usr/bin) → graceful unavailable. Wired into server.go (stats) + backup.go (both compose calls). The perpetual 60s ERROR is downgraded to a one-shot WARN when docker is genuinely absent (it's optional telemetry). Added a sane PATH to the launchd server plist template as defense-in-depth. Live-verified: after restart, mdemg_neo4j_container_cpu_percent=0.59 / mem_percent=29.13 now land in metric_samples (were absent); no more docker-stats ERROR; `docker stats` + backup resolve docker under a simulated minimal PATH. Note: `mdemg data export-auto` (training-export job) was NOT a victim — it exports via network SQL, not docker (corrected from an earlier assumption). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Triggered by a live-discovered silent failure: the TSDB backup scheduler was failing every 24h run (docker-under-launchd-PATH) with only a buried slog.Warn. Docker cause fixed (4cc7608); this sprint fixes the class — scheduled jobs that fail with no record + no alert. V0024 scheduled_job_events hypertable + writer, jobhealth.Report (record + alert on failure), wire the 3 jobs (backup, maintenance, export-auto), 2 evaluator rules (backup staleness + recent failure) so the server catches "job failed OR never ran". Config-driven, 3 testing tiers, live Tier-3 induced-failure. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Hypertable (job_name, success, latency_ms, error_message, metadata jsonb, recorded_at) + RecordJobEvent synchronous single-row writer (mirrors V0021 model_install pattern). Indexes: per-job freshness, partial failed, per-space. One row per scheduled-job run so the alert evaluator can detect "job failed" AND "job never ran" (staleness). Schema 23->24; TSDB_REQUIRED_SCHEMA_VERSION bumped to match migration count (CI check 24=24). Tier 1 (-race): field mapping, optional-nulls, error truncation, nil-pool no-op, insert-error propagation. Tier 2 (live TSDB): round-trip + the staleness (recent successes) + failure (recent failures) query shapes the rules will use. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New internal/jobhealth.Report — the single policy point: record a scheduled_job_events row and fire a high-severity "scheduled-job" alert on failure (both pool + dispatcher nil-safe). Wired into all three jobs: - TSDB backup scheduler (internal/tsdb/backup.go): decoupled JobResultFunc hook (mutex-guarded, -race clean) so internal/tsdb stays free of internal/alert; server.go::SetTSDBClient sets it with the pool + s.alertDispatcher. A failed or never-run backup now records + alerts instead of a silent slog.Warn. - export-auto + maintenance (CLI): deferred reportScheduledJob on the named return error — opens a short-lived pool + a file-backed dispatcher (same ~/.mdemg/alerts/current.json the hooks surface) so a separate-process CLI job still alerts the operator. Tier 1 (-race): jobhealth fires alert only on failure (real file-backend dispatcher), nil-safe. Live smoke: export-auto recorded success=t latency=3050ms. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ic 3) Two server-native evaluator rules over V0024 scheduled_job_events: - scheduled_job_recent_failure (always on): any job failure in the last JOB_FAILURE_LOOKBACK_MIN (default 60) → high alert. - backup_no_recent_success (gated on TSDB_BACKUP_ENABLED): zero successful tsdb-backups within the staleness window → high alert. THIS is the "job never ran" guarantee — it fires from the server observing ABSENT success, so a backup that silently died or never started is caught, not just one that errored. Window derives from the real backup interval × 2 (JOB_BACKUP_STALENESS_HOURS override; no hardcoded literal). JOB_HEALTH_ALERT_ENABLED master gate (default true). Appended after DefaultRules() in serve.go. Tier 1: failure rule always present (gt 0), staleness gated on backups-enabled (lt 0.5), windows reflect config, non-positive fallback. Build/lint clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…he other Caught in live Tier-3 testing: both evaluator rules used Service="scheduled-jobs", and the dispatcher cooldown key is (Service, Severity) — so the failure alert's cooldown SUPPRESSED the staleness alert (only the failure fired). One alarm masking another is the exact silent-failure class this sprint kills. Fixed: scheduled-job-failure / scheduled-job-staleness distinct services. Re-verified live — both fire independently and land as distinct alert-file entries. Tier-1 assertion pins that the two services differ. Includes Tier-3 verification.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Feature doc docs/features/scheduled-job-health.md (why / two mechanisms / operator view / config). CHANGELOG Added (NOSILENT-001) + Changed (schema 23→24) + Fixed (docker-under-launchd-PATH). CLAUDE.md Service Alert System extended with the scheduled-job-health note + the distinct-Service-per-rule cooldown caveat. post.md closes the sprint (UxTS mapping + follow-ups). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sprint NOSILENT-001 — Fail-loud scheduled jobs (+ docker-PATH root-cause fix)Trigger: a live-discovered silent failure — with Root-cause fix first
Then the class: no silent failures
Two mechanisms so absence-of-success is caught too:
Live testing earned its keep (twice)
Config-driven (no hardcoded literals): These (docker fix + NOSILENT-001) are infra, unrelated to EVENTGRAPH-002 but riding this branch's PR. |
CI "Verify embedded launchd templates match source" diffs packaging/launchd/* against internal/cli/launchd_templates/* (the embed.FS copy mdemg service install uses). The PATH addition landed only in the source copy; sync the embedded copy so they match byte-for-byte. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ction 7) Records the jiminy-governance Claude Code skill on the active forward roadmap (SPRINT_ROADMAP_POST_FT_LORA.md, cross-cutting governance) + brings the source spec into the repo (docs/development/jiminy-governance-skill/SKILL.md, out of ~/Downloads). The skill makes Jiminy the deterministic source of context + governance over J17, enforced by the PreToolUse hook — a routing/handshake shim, not a rulebook. Build-out scope notes the wire-up placeholders that must be resolved against the real instance (Jiminy MCP/endpoint, PreToolUse hook, J17 ack/RetireCode/GUIDANCE_OUTCOME calls). Aligns with the now-live guidance loop (RRF-SCALE-001 / JIMINY-OUTCOME-001 / GUIDANCE-SYNTH-001). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Development branch changes from
reh3376_dev01.Commits
mdemg modelCLI + pluggable Fetcher interfaceAuto-generated PR from reh3376_dev01 push