dev: reh3376_dev01 -> main#387
Merged
Merged
Conversation
Promote the Unreleased CHANGELOG block to v0.9.0 (2026-05-06) ahead of release.yml / goreleaser tag push. New ### Breaking subsection captures two operator-visible cutovers since v0.8.5: (1) Phase 13.5 LLM runtime port 8101 -> 8102 + .env migration required; (2) Phase 13.6 MLX_* -> LLM_* env-var rename (legacy aliases retained for >= 1 release cycle). New ### Added entries: Phase 10.5 closeout (UBENCH framework promotion, commit 0389b49) and Claude Code GitHub App workflows (PRs #378, #379). All previously-Unreleased entries (Phase 14.2.3, 14.2.x, 14.1.x, 14, 13.6, 13.5, 13.2, 13.1) carried forward unchanged into the v0.9.0 block. Fresh empty Unreleased section seeded above. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps packaging/homebrew-mdemg pointer a235977 -> 6077097, which incorporates: - f9358cd Brew formula update for mdemg version v0.8.5 (goreleaser, prior) - b4a0d2c Brew formula update for mdemg version v0.9.0 (goreleaser, this release) - 6077097 docs: v0.9.0 -- CHANGELOG, README What's New, beta-testing version pin Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`config.FromEnv()` defaulted MdemgVersion/MdemgCommit to literal "0.6.0"/
"unknown" when MDEMG_VERSION/MDEMG_COMMIT envs were unset. Both /healthz
and /readyz serialize cfg.MdemgVersion, so they reported "0.6.0" forever
regardless of the actual binary's ldflags-injected cli.Version.
Fix: defaults to "" in config; cli/config_loader.go injects cli.Version /
cli.Commit (the build-time vars set by goreleaser ldflags) when the env
override is unset. Operators can still pin via MDEMG_VERSION env.
Live-verified: dev build (no ldflags) now reports {"version":"dev"} on
/healthz instead of the lying "0.6.0". Production builds via goreleaser
will report the real semver tag.
TestHandleHealthz unaffected (sets cfg.MdemgVersion directly).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a-server Phase 13.5 cutover (2026-05-03) replaced mlx_lm.server (port 8101) with llama.cpp llama-server (port 8102) as the production LLM runtime, but the embedded launchd plist template + service install code paths were never updated. Any operator running 'mdemg service install' from a fresh checkout got the decommissioned mlx_lm.server agent — mdemg's startup preflight then failed because LLM_ENDPOINT=http://127.0.0.1:8102/v1 wasn't reachable. Changes: - New packaging/launchd/com.mdemg.llama-server.plist with the Phase 13.5 production flags (--ctx-size 32768 --parallel 4 --cont-batching --metrics --jinja). Byte-identical mirror at internal/cli/launchd_templates/ for the embed.FS (CI sync-check enforced). - Removed packaging/launchd/com.mdemg.mlx-server.plist + embed.FS mirror. mlx_lm.server is decommissioned and known-broken on M5 + macOS 26.3.x; keeping the template would just risk re-deploying it. - internal/cli/service_darwin.go: launchdServices entry replaced with com.mdemg.llama-server. resolveMLXLMBin renamed to resolveLlamaServerBin with primary env MDEMG_LLAMA_SERVER_BIN, deprecation alias for MDEMG_MLX_LM_BIN (slog.Warn at boot, retained ≥1 release cycle per the Phase 13.6 deprecation pattern), PATH lookup of `llama-server`. resolveMDEMGModelPath default updated to the canonical Phase 13.5 GGUF filepath (.local-models/mdemg-llm-v1-gguf/mdemg-llm-v1.Q5_K_M.gguf) since llama-server takes a `.gguf` filepath, not an HF-format directory like mlx_lm.server. Install error message updated for the new env var name + remediation steps (`brew install llama.cpp`). - migrateLegacyMLXServerPlist() added: if a pre-cutover com.mdemg.mlx-server plist is bootstrapped on the operator's machine, Install() boots it out and renames the file to .disabled-phase13_5 (matches the manual operator convention from Phase 13.5 rollout). Best-effort: failures don't block the install. - internal/cli/service_darwin_test.go fully rewritten: * TestLaunchdServicesIncludesLlamaServer asserts the new entry exists and is Optional=false (production matches Hotfix 11.6.3.1; the old test asserted Optional=true, a latent lie since 2026-05-02 that Linux CI never caught because of //go:build darwin) * TestLlamaServerPlistEmbedded replaces TestMLXServerPlistEmbedded; additionally asserts mlx-server.plist is NOT in embed.FS * Two resolver tests for the primary env var * New TestResolveLlamaServerBinFallsBackToMLXAlias proves the Phase 13.6 deprecation alias path works * resolveMDEMGModelPath tests updated for the new GGUF default - internal/cli/watchdog.go: help text references com.mdemg.llama-server (instead of com.mdemg.mlx-server) and llama-server (instead of mlx_lm.server). Notes that mdemg_mlx_health_state metric name is retained for dashboard compatibility. Tested: - Tier 1 unit: 7/7 new tests pass; full ./internal/cli/... suite green (61s wall-clock). - Tier 2 integration: golangci-lint run ./internal/cli/ — 0 issues. CI plist sync-check (diff -q packaging/launchd/*.plist internal/cli/launchd_templates/) — 6/6 byte-identical. - Tier 3 live e2e: deferred. Running mdemg service install on the operator's currently-serving machine would briefly bootout the running llama-server LaunchAgent (PID 20527 actively serving production inference). The hand-installed llama-server plist on the operator's machine is byte-equivalent (modulo template substitutions) to what this commit will install via `mdemg service install` on a fresh operator setup, so the operator can verify on next planned redeploy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Epic 0 of Sprint MODEL-DIST-001 — Local LoRA Distribution via Ollama Library. Sprint plan in 12-section v1.0 format. Supersedes parts of the speculative spec at docs/research/mdemg_sprint_ideas/MDEMG_FT_LORA_PACKAGING_SPEC.md (HF Hub vs Ollama Library; adapter-only vs both-fused-and-adapter; Apple Silicon scope vs cross-platform). Configurability Contract — every operator-visible value is dynamic per the framework's no-hardcoding rule. 12 env vars + flag overrides + sensible defaults. ModelFetcher interface decouples CLI from Ollama-specific knowledge; v1 ships OllamaFetcher only, future backends (HF / S3 / GitHub Release / file) plug in via factory dispatch on MDEMG_MODEL_BACKEND without touching the CLI surface. Forensic from Epic 0: - adapters/tier1/adapters.safetensors verified present (514 MB MLX, Phase 5 SFT Iter 2400 best output) - mdemg-llm-v1.Q5_K_M.gguf SHA256 captured (9.8 GB; 144ad7231...) - f16 GGUF intermediate NOT on disk; Epic 1 will regenerate via convert_hf_to_gguf.py from the MLX merged model (~5 min) - qwen3:14b model-layer digest captured from Ollama registry; manifest digest to be computed at Epic 3 for Modelfile FROM @sha256: pinning quant_manifest.json skeleton with Q5_K_M SHA pre-populated; Q4_K_M / Q8_0 / adapter SHAs filled in during Epics 1+2. Estimated effort 5–7 dev-days. OpenAI spend $0. Risk medium (Ollama publish one-way; MLX→PEFT→GGUF LoRA conversion is the riskiest engineering item with documented contingency to defer to MODEL-DIST-002 if blocked). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pipeline (CLAUDE.md Phase 13.5 documented path):
1. mlx_lm.fuse --dequantize: mlx-community/Qwen3-14B-4bit + adapters/tier1/
-> 29.6 GB bf16 HF safetensors at .local-models/qwen3-14b-mdemg-v1-bf16/
2. convert_hf_to_gguf.py --outtype f16 -> 30 GB f16 GGUF (required
neural/.venv interpreter with torch + transformers + gguf installed;
/opt/homebrew/bin/convert_hf_to_gguf.py uses system python which lacks
these — installed gguf/sentencepiece/protobuf into neural/.venv)
3. llama-quantize Q4_K_M -> 9.0 GB (4.87 BPW; 40s wall on M5)
4. llama-quantize Q8_0 -> 16 GB (8.50 BPW; 11s wall on M5)
5. Live smoke per new quant via llama-server on port 18102 — both serve
/v1/models cleanly with embedded chat_template
SHAs captured in quant_manifest.json:
Q4_K_M: 401161710c22f0ae...411d42ea
Q5_K_M: 144ad723101d688f...d5f5d54 (matches Epic 0 baseline)
Q8_0: fc14dcb40af1bb58...8db6089
f16: 436cd6f41a684805...3217bd (intermediate, retained for Epic 2)
Resource matrix updated with empirical sizes (Q4_K_M is 9.0 GB vs estimated
6.5 GB; min RAM revised 8 -> 12 GB to cover ~3 GB working memory above
weights). 14B params x 4.87 BPW ≈ 8.5 GB matches the formula.
GGUF binary artifacts stay local — .local-models/ gitignored per
.gitignore:70. Sprint deliverable in git is just the manifest update.
Production llama-server (PID 20527 on port 8102) undisturbed throughout
Epic 1; live smokes used port 18102.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adapter (LoRA-only Modelfile via ADAPTER directive) deferred per the sprint plan's documented contingency clause. Fused-only path (Epics 1, 3, 4, 5) continues — that's the primary operator value. Forensic findings (epic_2_forensic.md): - MLX adapter is well-formed: 560 tensors, 40 layers x 7 target_modules, rank 32, alpha 64, scale 20.0. - convert_lora_to_gguf.py is NOT in brew install llama.cpp; would need manual fetch from llama.cpp source. - MLX -> PEFT requires tensor transposition: MLX lora_a is (in, rank); PEFT expects (rank, in). Same for lora_b. - Estimated 80-95 min to complete vs ~30 min budget remaining for Epic 2. - Hit the contingency criterion: "MLX -> PEFT conversion blocked by tooling gaps." Decision: defer adapter scope to MODEL-DIST-002 (new follow-up sprint, to be planned separately). Fused-only ships this sprint. Knock-on changes (in-flight to subsequent epics): - Epic 3: drop Modelfile.adapter; publish only 3 fused quants. - Epic 4 CLI: --adapter flag accepted at parse-time but errors with "lands in MODEL-DIST-002"; machinery preserved for forward-compat. - Epic 6 e2e: drop adapter-pull step. - Epic 7 feature doc: adapter section notes "coming in MODEL-DIST-002". Artifacts preserved on disk for MODEL-DIST-002 pickup: - adapters/tier1/adapters.safetensors (MLX, 514 MB) - .local-models/mdemg-llm-v1-gguf/mdemg-llm-v1.f16.gguf (30 GB, retained as base for llama-server --lora verification later) quant_manifest.json adapter block updated with status=deferred + reason. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sh pending) Authored 3 Ollama Modelfiles in packaging/ollama/: Modelfile.Q4_K_M — 9.0 GB, 12 GB min RAM, 16 GB recommended Modelfile.Q5_K_M — 11 GB, 14 GB min RAM, 24 GB recommended (production canonical) Modelfile.Q8_0 — 16 GB, 20 GB min RAM, 32 GB recommended Common shape: FROM ./../../.local-models/mdemg-llm-v1-gguf/...gguf relative path (operator-machine local); num_ctx 32768, num_predict 4096, stop tokens <|im_end|>/<|im_start|>; Apache-2.0 LICENSE; SYSTEM positioning block. No TEMPLATE directive — chat template baked into GGUF metadata (Qwen3 chat_template.jinja preserved through mlx_lm.fuse --dequantize → convert_hf → llama-quantize pipeline). packaging/ollama/README.md documents the publish workflow including the fork-customization path (operators publishing under a different namespace follow MDEMG_MODEL_NAMESPACE per the Configurability Contract). Local ollama create completed for all 3: reh3376/mdemg-llm-v1:Q4_K_M ID 5c3a7252c295 reh3376/mdemg-llm-v1:Q5_K_M ID 08c13b480864 reh3376/mdemg-llm-v1:Q8_0 ID 6b1006facd36 Layers de-duplicated: config + params + system layers (3 layers) are identical across all 3 quants; only the model blob (GGUF) differs. ** ollama push deferred ** — one-way action gated on operator confirmation per Sprint Plan §10 Risk #8. Operator must claim reh3376 namespace on ollama.com and generate API token before push proceeds. Local-create proves the Modelfiles are well-formed; push is a separate decision. Once pushed, manifest digests captured into quant_manifest.json (ollama_manifest_digest field per quant) for mdemg model verify. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…interface
Sprint MODEL-DIST-001 Epic 4 — the bulk of the operator-facing surface.
New CLI subcommand group:
mdemg model pull # fetch + symlink + SHA verify
mdemg model list # show pulled models
mdemg model verify # re-check SHAs vs quant manifest
mdemg model remove # destructive (requires --yes)
mdemg model where # print resolved path for shell scripting
Pluggable backend (internal/cli/model_fetcher.go):
type Fetcher interface { Name, Fetch, Verify, Remove }
NewFetcher dispatches on cfg.ModelBackend (env: MDEMG_MODEL_BACKEND)
v1 ships OllamaFetcher only; future backends (hf, s3, github-release,
file) plug in via factory branch — CLI surface unchanged.
OllamaFetcher (internal/cli/model_fetcher_ollama.go):
Encapsulates ALL Ollama-specific concepts: `ollama pull` invocation,
manifest path under <OLLAMA_MODELS>/manifests/<OLLAMA_HOST>/<ns>/<n>/<tag>,
mediaType=application/vnd.ollama.image.model layer filtering,
blob path under <OLLAMA_MODELS>/blobs/sha256-<digest>, symlink under
<MDEMG_MODEL_DIR>, idempotent.
Configurability Contract (no hardcoding; memory: feedback_no_hardcoded_values.md):
12 env vars + flag overrides, each with v1-production-tuned defaults so
`mdemg model pull` with no flags Just Works. See sprint plan §3.
Live-verified all 3 resolution paths:
`--quant Q5_K_M` → namespace=reh3376
`--namespace acme --name custom-model` → namespace=acme name=custom
`MDEMG_MODEL_NAMESPACE=acme env` → env overrides applied
Added to internal/config/config.go: ModelBackend, ModelNamespace,
ModelName, ModelQuants, ModelRamTiers, ModelQuant, AdapterBase,
ModelDir, OllamaModelsRoot, OllamaRegistryHost, ModelManifestPath.
Embedded quant manifest (internal/cli/quant_manifest.json via embed.FS):
Runtime source-of-truth for SHA verification. Operator override via
MDEMG_MODEL_MANIFEST_PATH for air-gapped deployments. Mirrors
docs/development/model-dist-001/quant_manifest.json.
RAM-tier auto-pick:
Default JSON `{"<16":"Q4_K_M","<24":"Q5_K_M","default":"Q8_0"}` maps
host RAM (sysctl on darwin, /proc/meminfo on linux) to quant. Operator
override via MDEMG_MODEL_RAM_TIERS.
Adapter path (--adapter flag) returns ErrAdapterDeferred per Epic 2's
contingency exit — adapter publication lands in MODEL-DIST-002. Flag
machinery preserved for forward compatibility.
Tests (22, all green) in internal/cli/model_test.go:
- Backend factory dispatch (5 cases incl. case-insensitive, default, error)
- Quant allowlist parsing (5 cases incl. whitespace + empty entries)
- RAM-tier JSON parsing (default + operator override + malformed)
- PickQuantForRAM (7 boundary cases)
- ResolveQuant across paths (auto, explicit, rejection, operator-custom)
- QuantManifest load (embedded + file override + missing-file error)
- Ollama tag composition (fused + adapter forms)
- Manifest path composition under custom OLLAMA_MODELS/OLLAMA_HOST
- Blob path digest prefix handling
- Adapter deferred error
- Manifest JSON parser (mediaType filtering + malformed + no-model-layer)
Grep audit (verification checklist):
grep on internal/cli/model*.go for hardcoded values found only in help
text Long/example strings documenting defaults to operators — not in
logic. Behavior values all flow through cfg.Model* fields.
Build + lint clean. Full cli test suite (61s wall) green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…+ writer Sprint MODEL-DIST-001 Epic 5 — observability for `mdemg model` operations. Grafana panels deferred to Sprint B (Grafana audit). New migration: internal/tsdb/migrations/021_model_install_events.sql Hypertable on recorded_at, 7-day chunks, 3 indexes (quant-time, failed-events partial, backend-event-time). Columns: event_id CUIDv2 PK + recorded_at, event_type (pull/verify/remove), backend_name, namespace, model_name, quant, adapter bool, success bool, latency_ms, sha256, size_bytes, err_message (1 KB cap). New writer: internal/tsdb/model_install_writer.go Synchronous single-row INSERT (not buffered + CopyFrom — CLI is one-shot, writes are infrequent vs the V0017/V0018/V0019/V0020 retrieval- path writers that fire per-request). Nil-pool no-op for degraded mode. errMessageMaxLen=1024 truncation at write time. New modelInstallPool interface (Exec-shaped) avoids touching the existing CopyFrom-shaped poolIface used by buffered writers. Wiring: internal/cli/model.go gets recordModelEvent(parent, cfg, row) helper: - Returns immediately if !cfg.TSDBEnabled || cfg.TSDBHost=="" - 2s timeout on connect (TSDB unreachable doesn't block CLI exit) - Logs warning + degrades gracefully on any TSDB error Called from runModelPull (success + failure paths), runModelVerify (single sweep row), runModelRemove (success + failure paths). Schema version bump: internal/config/config.go: TSDB_REQUIRED_SCHEMA_VERSION default 20→21. CI validator at .github/workflows/ci.yml:60-65 counts SQL files in internal/tsdb/migrations/ and asserts equality; now 21 files = 21 in config = passes. Build + lint clean. Existing tsdb / cli test suites green; no new tests added for the writer itself (single INSERT mirrors V0017/V0018/V0019 patterns already covered; integration is operational verification at Epic 6 once tsdb is up in the dev stack). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 Epic 7 — operator-facing feature documentation
following the standard Why / Choices / How / How-to-use shape (memory:
feedback_per_feature_docs_required.md).
Contents:
- Why: gap between brew install and a working local LLM after Phase 13.5
- Choices: backend matrix (Ollama vs HF vs GitHub vs S3 vs file://),
artifact form (fused vs adapter), Apple Silicon scope, "Ollama runtime
rejected (broken on M5+macOS 26.3.x), Ollama distribution only"
- How it works: ASCII flow diagram covering CLI dispatch -> Fetcher
interface -> OllamaFetcher (preflight, ollama pull, manifest discovery,
blob resolve, symlink, SHA verify) -> V0021 observability row
- How to use:
* Quick start (3 commands: brew install ollama, mdemg model pull,
curl /v1/models)
* Explicit quant selection
* Managing pulled models (list / verify / where / remove)
* Forks + enterprise (MDEMG_MODEL_NAMESPACE override)
* Air-gapped (MDEMG_MODEL_MANIFEST_PATH override)
* Resource matrix per quant (disk, min RAM, recommended RAM, BPW)
* Full Configurability Contract table (11 env vars + flags + defaults)
* V0021 observability schema
- Troubleshooting: ollama missing, SHA mismatch, quant allowlist
rejection, RAM auto-detection failure, out-of-disk, symlink permission
- Forward-looking: MODEL-DIST-002 adapter, Sprint B Grafana panels,
future backends, cross-platform
- References: all source-of-truth files cross-linked
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 Epic 8 — final epic, never cut (memory: feedback_sequential_epics.md). This commit lands the main-repo doc updates. The packaging/homebrew-mdemg/ submodule docs (README, CHANGELOG, formula caveats text) update at v0.10.0 release-tag time per the v0.9.0 release flow precedent — that's when goreleaser auto-regenerates mdemg.rb from .goreleaser.yaml's caveats template, and the tap-side README/CHANGELOG get edited in lockstep. Changes: - CHANGELOG.md: comprehensive Unreleased entry documenting Epics 0-5 + 7 landed in this sprint. Epic 3 ollama push and Epic 6 Tier 3 e2e marked as gated on operator confirmation. Adapter path explicitly deferred to MODEL-DIST-002 with epic_2_forensic.md cross-reference. Captures the Configurability Contract enumeration, the 3 quant SHAs, the Fetcher interface design, the V0021 hypertable, and the explicit out-of-scope list. - CLAUDE.md: new "Model Distribution (Sprint MODEL-DIST-001)" subsection in Architecture Notes, slotted ABOVE the existing Compose embed entry for visibility. Captures the pluggable-backend design, the Ollama-as- distribution-only constraint, the on-disk symlink + manifest discovery flow, the 11-knob Configurability Contract surface, the no-hardcoding enforcement, the TSDB V0021 hookup, and the Apple Silicon v1 scope. - README.md: new "Step 2b (optional): Pull the local LLM" section between Step 2 (Initialize/Start) and Open the Dashboard. 3-command quick start (brew install ollama -> mdemg model pull -> set MDEMG_MODEL_PATH). Cross-references the feature doc for the full Configurability Contract. - .goreleaser.yaml: caveats template updated to include `mdemg model pull` instructions. Goreleaser regenerates the homebrew formula's caveats block from this on the next v* tag push, so v0.10.0 will ship the new text to brew users automatically. Deferred to v0.10.0 release-tag time (handled per v0.9.0 precedent): - packaging/homebrew-mdemg/README.md update - packaging/homebrew-mdemg/CHANGELOG.md update - packaging/homebrew-mdemg/mdemg.rb regeneration (automatic via goreleaser from the .goreleaser.yaml change in this commit) - Submodule pointer bump in main repo Deferred to Epic 6 close (after operator does ollama push): - post.md sprint-close document - Capture of remote Ollama manifest digests into quant_manifest.json Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All 3 fused quants now live on Ollama Library: https://ollama.com/reh3376/mdemg-llm-v1:Q4_K_M https://ollama.com/reh3376/mdemg-llm-v1:Q5_K_M https://ollama.com/reh3376/mdemg-llm-v1:Q8_0 End-to-end integrity verified: remote model-layer digests captured via GET https://registry.ollama.ai/v2/reh3376/mdemg-llm-v1/manifests/<quant> match the local Epic 1 SHAs exactly: Q4_K_M 401161710c22f0ae...411d42ea (matches Epic 1) Q5_K_M 144ad723101d688f...d5f5d54 (matches Epic 1) Q8_0 fc14dcb40af1bb58...8db6089 (matches Epic 1) Captured into quant_manifest.json (both docs canonical + internal/cli embed.FS mirror, byte-synced): - ollama_manifest_digest per quant (computed from the manifest body): Q4_K_M sha256:a210cccb12311773fd70bfa81f221ca0f7940a315bef87b84608caf894533b1b Q5_K_M sha256:ae6e54fe1ee0b487ae41260687ed14c46c30d1ffb0fece936282418b5bcb78e1 Q8_0 sha256:93df4d64bfa751506f7afba8bf08b891ea828575b838adec17b9399ad85be718 - Corrected size_bytes (Epic 1 used approximate values; replaced with registry-reported exact bytes for each tag): Q4_K_M 9.0 GB -> 8.4 GB (9001753408 B; was 9658404096) Q5_K_M 11 GB -> 9.8 GB (10514569568 B; was 11811160064) Q8_0 16 GB -> 14.6 GB (15698534208 B; was 17179869184) - Status flipped from "local-create done; push pending" to "published". Embedded runtime manifest (internal/cli/quant_manifest.json) re-built into the binary via embed.FS. TestLoadQuantManifest_EmbeddedFallback green with new values. Epic 3 of Sprint MODEL-DIST-001 now COMPLETE. Epic 6 (Tier 3 live e2e — `mdemg model pull` against the published tags + llama-server load on port 18102 + sanity inference) is now unblocked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 close-out per memory rule
(feedback_sprint_plan_format.md §11 — sprint plans live in
docs/development/<sprint-line>/ with the standard post.md companion).
Sections (CLAUDE.md sprint-plan section guidance):
- Outcome: 3 quants live on Ollama Library, mdemg model pull is the
canonical install path
- Process: how the plan held under reality (operator-surfaced no-
hardcoding rule revised the plan in-place to add the Configurability
Contract before code was written)
- Findings: 5 smooth parts + 5 friction items, both honest:
* convert_hf_to_gguf.py python deps gap (silent ModuleNotFoundError)
* mlx_lm.fuse adapter-path requirement
* convert_lora_to_gguf.py missing from brew install llama.cpp
(proximate Epic 2 deferral trigger)
* mdemg tsdb migrate CWD-aware .env loader quirk
* Epic 1 size estimates off vs registry-reported exact bytes
- Current state: per-layer state matrix
- Testing & benchmarking: all 3 tiers documented (Tier 3 e2e captured
V0021 rows for both pull + verify event_types — live-verified)
- Risks & opportunities (forward): MODEL-DIST-002 adapter scope, Sprint
B Grafana, cross-platform, HFFetcher slot, CWD-aware .env loader QoL
- Sprint commits: 9 commits on dev01, mapped to their epics
Closes Sprint MODEL-DIST-001 functionally. Operational sprint close
(v0.10.0 release tag + tap-repo doc updates) is a separate motion.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…shed main) PR #385 squash-merged the original Epic 3 quant_manifest values (estimated sizes from llama-quantize wall output, null ollama_manifest_digest because the push hadn't happened yet) into main as commit f1d029a. Meanwhile on dev01, commit 87293f8 (Epic 3 closeout) corrected those values to the registry-canonical state after the ollama push completed: - size_bytes: replaced Epic 1 approximations with registry-reported exact bytes (Q4_K_M 9001753408 / Q5_K_M 10514569568 / Q8_0 15698534208) - size_human: 9.0/11/16 GB -> 8.4/9.8/14.6 GB (more accurate) - ollama_manifest_digest: null -> sha256:a210cccb...|ae6e54fe...|93df4d64... - status: "local-create done; push pending" -> "published (...)" Conflict resolution: keep dev01 (HEAD) values for both files — those are the registry-canonical post-push state. JSON validity verified for both files; TestLoadQuantManifest_{EmbeddedFallback,OperatorOverride,OverrideMissingFile} all green against the resolved embedded manifest. The non-conflicting fast-forwarded changes from main (claude workflow edits + dependabot go.mod/go.sum bumps) are folded in by this merge unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Promote the Sprint MODEL-DIST-001 entry from Unreleased to v0.10.0 (2026-05-11) ahead of release.yml / goreleaser tag push. Fresh empty Unreleased section seeded above. v0.10.0 ships: - mdemg model pull|list|verify|remove|where — one-command path from brew install mdemg to a working local LLM - Pluggable ModelFetcher interface (Ollama in v1, slots for HF/S3/GHR/file) - 3 fused GGUF quants live on Ollama Library at reh3376/mdemg-llm-v1 (:Q4_K_M 8.4 GB / :Q5_K_M 9.8 GB / :Q8_0 14.6 GB) - 11-knob Configurability Contract (every operator-visible value dynamic) - TSDB V0021 model_install_events hypertable + writer - docs/features/local-model-distribution.md Adapter (LoRA-only) path deferred to MODEL-DIST-002 per the sprint plan's documented contingency (epic_2_forensic.md). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Development branch changes from
reh3376_dev01.Commits
mdemg modelCLI + pluggable Fetcher interfaceAuto-generated PR from reh3376_dev01 push