dev: reh3376_dev01 -> main by github-actions[bot] · Pull Request #387 · reh3376/mdemg

github-actions · 2026-05-11T16:34:17Z

Summary

Development branch changes from reh3376_dev01.

Commits

docs(release): promote Unreleased -> v0.10.0
merge: resolve quant_manifest.json conflicts (Epic 3 closeout vs squashed main)
docs(model-dist-001): sprint close — post.md
feat(model-dist-001): Epic 3 closeout — Ollama Library push complete
docs(model-dist-001): Epic 8 — Documentation Update (main repo)
docs(model-dist-001): Epic 7 — local-model-distribution feature doc
feat(model-dist-001): Epic 5 — V0021 model_install_events hypertable + writer
feat(model-dist-001): Epic 4 — mdemg model CLI + pluggable Fetcher interface
feat(model-dist-001): Epic 3 — 3 Modelfiles + local ollama create (push pending)
docs(model-dist-001): Epic 2 — defer adapter to MODEL-DIST-002
feat(model-dist-001): Epic 1 — built Q4_K_M + Q8_0 fused GGUFs
docs(sprint): MODEL-DIST-001 sprint plan + quant manifest skeleton
fix(service): replace decommissioned mlx-server LaunchAgent with llama-server
fix(api): /healthz returns build-time version, not stale literal "0.6.0"
chore(submodule): bump homebrew-mdemg to v0.9.0 formula + docs
Merge remote-tracking branch 'origin/main' into reh3376_dev01
docs(release): promote Unreleased -> v0.9.0

Auto-generated PR from reh3376_dev01 push

Promote the Unreleased CHANGELOG block to v0.9.0 (2026-05-06) ahead of release.yml / goreleaser tag push. New ### Breaking subsection captures two operator-visible cutovers since v0.8.5: (1) Phase 13.5 LLM runtime port 8101 -> 8102 + .env migration required; (2) Phase 13.6 MLX_* -> LLM_* env-var rename (legacy aliases retained for >= 1 release cycle). New ### Added entries: Phase 10.5 closeout (UBENCH framework promotion, commit 0389b49) and Claude Code GitHub App workflows (PRs #378, #379). All previously-Unreleased entries (Phase 14.2.3, 14.2.x, 14.1.x, 14, 13.6, 13.5, 13.2, 13.1) carried forward unchanged into the v0.9.0 block. Fresh empty Unreleased section seeded above. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Bumps packaging/homebrew-mdemg pointer a235977 -> 6077097, which incorporates: - f9358cd Brew formula update for mdemg version v0.8.5 (goreleaser, prior) - b4a0d2c Brew formula update for mdemg version v0.9.0 (goreleaser, this release) - 6077097 docs: v0.9.0 -- CHANGELOG, README What's New, beta-testing version pin Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

`config.FromEnv()` defaulted MdemgVersion/MdemgCommit to literal "0.6.0"/ "unknown" when MDEMG_VERSION/MDEMG_COMMIT envs were unset. Both /healthz and /readyz serialize cfg.MdemgVersion, so they reported "0.6.0" forever regardless of the actual binary's ldflags-injected cli.Version. Fix: defaults to "" in config; cli/config_loader.go injects cli.Version / cli.Commit (the build-time vars set by goreleaser ldflags) when the env override is unset. Operators can still pin via MDEMG_VERSION env. Live-verified: dev build (no ldflags) now reports {"version":"dev"} on /healthz instead of the lying "0.6.0". Production builds via goreleaser will report the real semver tag. TestHandleHealthz unaffected (sets cfg.MdemgVersion directly). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…a-server Phase 13.5 cutover (2026-05-03) replaced mlx_lm.server (port 8101) with llama.cpp llama-server (port 8102) as the production LLM runtime, but the embedded launchd plist template + service install code paths were never updated. Any operator running 'mdemg service install' from a fresh checkout got the decommissioned mlx_lm.server agent — mdemg's startup preflight then failed because LLM_ENDPOINT=http://127.0.0.1:8102/v1 wasn't reachable. Changes: - New packaging/launchd/com.mdemg.llama-server.plist with the Phase 13.5 production flags (--ctx-size 32768 --parallel 4 --cont-batching --metrics --jinja). Byte-identical mirror at internal/cli/launchd_templates/ for the embed.FS (CI sync-check enforced). - Removed packaging/launchd/com.mdemg.mlx-server.plist + embed.FS mirror. mlx_lm.server is decommissioned and known-broken on M5 + macOS 26.3.x; keeping the template would just risk re-deploying it. - internal/cli/service_darwin.go: launchdServices entry replaced with com.mdemg.llama-server. resolveMLXLMBin renamed to resolveLlamaServerBin with primary env MDEMG_LLAMA_SERVER_BIN, deprecation alias for MDEMG_MLX_LM_BIN (slog.Warn at boot, retained ≥1 release cycle per the Phase 13.6 deprecation pattern), PATH lookup of `llama-server`. resolveMDEMGModelPath default updated to the canonical Phase 13.5 GGUF filepath (.local-models/mdemg-llm-v1-gguf/mdemg-llm-v1.Q5_K_M.gguf) since llama-server takes a `.gguf` filepath, not an HF-format directory like mlx_lm.server. Install error message updated for the new env var name + remediation steps (`brew install llama.cpp`). - migrateLegacyMLXServerPlist() added: if a pre-cutover com.mdemg.mlx-server plist is bootstrapped on the operator's machine, Install() boots it out and renames the file to .disabled-phase13_5 (matches the manual operator convention from Phase 13.5 rollout). Best-effort: failures don't block the install. - internal/cli/service_darwin_test.go fully rewritten: * TestLaunchdServicesIncludesLlamaServer asserts the new entry exists and is Optional=false (production matches Hotfix 11.6.3.1; the old test asserted Optional=true, a latent lie since 2026-05-02 that Linux CI never caught because of //go:build darwin) * TestLlamaServerPlistEmbedded replaces TestMLXServerPlistEmbedded; additionally asserts mlx-server.plist is NOT in embed.FS * Two resolver tests for the primary env var * New TestResolveLlamaServerBinFallsBackToMLXAlias proves the Phase 13.6 deprecation alias path works * resolveMDEMGModelPath tests updated for the new GGUF default - internal/cli/watchdog.go: help text references com.mdemg.llama-server (instead of com.mdemg.mlx-server) and llama-server (instead of mlx_lm.server). Notes that mdemg_mlx_health_state metric name is retained for dashboard compatibility. Tested: - Tier 1 unit: 7/7 new tests pass; full ./internal/cli/... suite green (61s wall-clock). - Tier 2 integration: golangci-lint run ./internal/cli/ — 0 issues. CI plist sync-check (diff -q packaging/launchd/*.plist internal/cli/launchd_templates/) — 6/6 byte-identical. - Tier 3 live e2e: deferred. Running mdemg service install on the operator's currently-serving machine would briefly bootout the running llama-server LaunchAgent (PID 20527 actively serving production inference). The hand-installed llama-server plist on the operator's machine is byte-equivalent (modulo template substitutions) to what this commit will install via `mdemg service install` on a fresh operator setup, so the operator can verify on next planned redeploy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@sha256

Epic 0 of Sprint MODEL-DIST-001 — Local LoRA Distribution via Ollama Library. Sprint plan in 12-section v1.0 format. Supersedes parts of the speculative spec at docs/research/mdemg_sprint_ideas/MDEMG_FT_LORA_PACKAGING_SPEC.md (HF Hub vs Ollama Library; adapter-only vs both-fused-and-adapter; Apple Silicon scope vs cross-platform). Configurability Contract — every operator-visible value is dynamic per the framework's no-hardcoding rule. 12 env vars + flag overrides + sensible defaults. ModelFetcher interface decouples CLI from Ollama-specific knowledge; v1 ships OllamaFetcher only, future backends (HF / S3 / GitHub Release / file) plug in via factory dispatch on MDEMG_MODEL_BACKEND without touching the CLI surface. Forensic from Epic 0: - adapters/tier1/adapters.safetensors verified present (514 MB MLX, Phase 5 SFT Iter 2400 best output) - mdemg-llm-v1.Q5_K_M.gguf SHA256 captured (9.8 GB; 144ad7231...) - f16 GGUF intermediate NOT on disk; Epic 1 will regenerate via convert_hf_to_gguf.py from the MLX merged model (~5 min) - qwen3:14b model-layer digest captured from Ollama registry; manifest digest to be computed at Epic 3 for Modelfile FROM @sha256: pinning quant_manifest.json skeleton with Q5_K_M SHA pre-populated; Q4_K_M / Q8_0 / adapter SHAs filled in during Epics 1+2. Estimated effort 5–7 dev-days. OpenAI spend $0. Risk medium (Ollama publish one-way; MLX→PEFT→GGUF LoRA conversion is the riskiest engineering item with documented contingency to defer to MODEL-DIST-002 if blocked). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Pipeline (CLAUDE.md Phase 13.5 documented path): 1. mlx_lm.fuse --dequantize: mlx-community/Qwen3-14B-4bit + adapters/tier1/ -> 29.6 GB bf16 HF safetensors at .local-models/qwen3-14b-mdemg-v1-bf16/ 2. convert_hf_to_gguf.py --outtype f16 -> 30 GB f16 GGUF (required neural/.venv interpreter with torch + transformers + gguf installed; /opt/homebrew/bin/convert_hf_to_gguf.py uses system python which lacks these — installed gguf/sentencepiece/protobuf into neural/.venv) 3. llama-quantize Q4_K_M -> 9.0 GB (4.87 BPW; 40s wall on M5) 4. llama-quantize Q8_0 -> 16 GB (8.50 BPW; 11s wall on M5) 5. Live smoke per new quant via llama-server on port 18102 — both serve /v1/models cleanly with embedded chat_template SHAs captured in quant_manifest.json: Q4_K_M: 401161710c22f0ae...411d42ea Q5_K_M: 144ad723101d688f...d5f5d54 (matches Epic 0 baseline) Q8_0: fc14dcb40af1bb58...8db6089 f16: 436cd6f41a684805...3217bd (intermediate, retained for Epic 2) Resource matrix updated with empirical sizes (Q4_K_M is 9.0 GB vs estimated 6.5 GB; min RAM revised 8 -> 12 GB to cover ~3 GB working memory above weights). 14B params x 4.87 BPW ≈ 8.5 GB matches the formula. GGUF binary artifacts stay local — .local-models/ gitignored per .gitignore:70. Sprint deliverable in git is just the manifest update. Production llama-server (PID 20527 on port 8102) undisturbed throughout Epic 1; live smokes used port 18102. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adapter (LoRA-only Modelfile via ADAPTER directive) deferred per the sprint plan's documented contingency clause. Fused-only path (Epics 1, 3, 4, 5) continues — that's the primary operator value. Forensic findings (epic_2_forensic.md): - MLX adapter is well-formed: 560 tensors, 40 layers x 7 target_modules, rank 32, alpha 64, scale 20.0. - convert_lora_to_gguf.py is NOT in brew install llama.cpp; would need manual fetch from llama.cpp source. - MLX -> PEFT requires tensor transposition: MLX lora_a is (in, rank); PEFT expects (rank, in). Same for lora_b. - Estimated 80-95 min to complete vs ~30 min budget remaining for Epic 2. - Hit the contingency criterion: "MLX -> PEFT conversion blocked by tooling gaps." Decision: defer adapter scope to MODEL-DIST-002 (new follow-up sprint, to be planned separately). Fused-only ships this sprint. Knock-on changes (in-flight to subsequent epics): - Epic 3: drop Modelfile.adapter; publish only 3 fused quants. - Epic 4 CLI: --adapter flag accepted at parse-time but errors with "lands in MODEL-DIST-002"; machinery preserved for forward-compat. - Epic 6 e2e: drop adapter-pull step. - Epic 7 feature doc: adapter section notes "coming in MODEL-DIST-002". Artifacts preserved on disk for MODEL-DIST-002 pickup: - adapters/tier1/adapters.safetensors (MLX, 514 MB) - .local-models/mdemg-llm-v1-gguf/mdemg-llm-v1.f16.gguf (30 GB, retained as base for llama-server --lora verification later) quant_manifest.json adapter block updated with status=deferred + reason. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…sh pending) Authored 3 Ollama Modelfiles in packaging/ollama/: Modelfile.Q4_K_M — 9.0 GB, 12 GB min RAM, 16 GB recommended Modelfile.Q5_K_M — 11 GB, 14 GB min RAM, 24 GB recommended (production canonical) Modelfile.Q8_0 — 16 GB, 20 GB min RAM, 32 GB recommended Common shape: FROM ./../../.local-models/mdemg-llm-v1-gguf/...gguf relative path (operator-machine local); num_ctx 32768, num_predict 4096, stop tokens <|im_end|>/<|im_start|>; Apache-2.0 LICENSE; SYSTEM positioning block. No TEMPLATE directive — chat template baked into GGUF metadata (Qwen3 chat_template.jinja preserved through mlx_lm.fuse --dequantize → convert_hf → llama-quantize pipeline). packaging/ollama/README.md documents the publish workflow including the fork-customization path (operators publishing under a different namespace follow MDEMG_MODEL_NAMESPACE per the Configurability Contract). Local ollama create completed for all 3: reh3376/mdemg-llm-v1:Q4_K_M ID 5c3a7252c295 reh3376/mdemg-llm-v1:Q5_K_M ID 08c13b480864 reh3376/mdemg-llm-v1:Q8_0 ID 6b1006facd36 Layers de-duplicated: config + params + system layers (3 layers) are identical across all 3 quants; only the model blob (GGUF) differs. ** ollama push deferred ** — one-way action gated on operator confirmation per Sprint Plan §10 Risk #8. Operator must claim reh3376 namespace on ollama.com and generate API token before push proceeds. Local-create proves the Modelfiles are well-formed; push is a separate decision. Once pushed, manifest digests captured into quant_manifest.json (ollama_manifest_digest field per quant) for mdemg model verify. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…interface Sprint MODEL-DIST-001 Epic 4 — the bulk of the operator-facing surface. New CLI subcommand group: mdemg model pull # fetch + symlink + SHA verify mdemg model list # show pulled models mdemg model verify # re-check SHAs vs quant manifest mdemg model remove # destructive (requires --yes) mdemg model where # print resolved path for shell scripting Pluggable backend (internal/cli/model_fetcher.go): type Fetcher interface { Name, Fetch, Verify, Remove } NewFetcher dispatches on cfg.ModelBackend (env: MDEMG_MODEL_BACKEND) v1 ships OllamaFetcher only; future backends (hf, s3, github-release, file) plug in via factory branch — CLI surface unchanged. OllamaFetcher (internal/cli/model_fetcher_ollama.go): Encapsulates ALL Ollama-specific concepts: `ollama pull` invocation, manifest path under <OLLAMA_MODELS>/manifests/<OLLAMA_HOST>/<ns>/<n>/<tag>, mediaType=application/vnd.ollama.image.model layer filtering, blob path under <OLLAMA_MODELS>/blobs/sha256-<digest>, symlink under <MDEMG_MODEL_DIR>, idempotent. Configurability Contract (no hardcoding; memory: feedback_no_hardcoded_values.md): 12 env vars + flag overrides, each with v1-production-tuned defaults so `mdemg model pull` with no flags Just Works. See sprint plan §3. Live-verified all 3 resolution paths: `--quant Q5_K_M` → namespace=reh3376 `--namespace acme --name custom-model` → namespace=acme name=custom `MDEMG_MODEL_NAMESPACE=acme env` → env overrides applied Added to internal/config/config.go: ModelBackend, ModelNamespace, ModelName, ModelQuants, ModelRamTiers, ModelQuant, AdapterBase, ModelDir, OllamaModelsRoot, OllamaRegistryHost, ModelManifestPath. Embedded quant manifest (internal/cli/quant_manifest.json via embed.FS): Runtime source-of-truth for SHA verification. Operator override via MDEMG_MODEL_MANIFEST_PATH for air-gapped deployments. Mirrors docs/development/model-dist-001/quant_manifest.json. RAM-tier auto-pick: Default JSON `{"<16":"Q4_K_M","<24":"Q5_K_M","default":"Q8_0"}` maps host RAM (sysctl on darwin, /proc/meminfo on linux) to quant. Operator override via MDEMG_MODEL_RAM_TIERS. Adapter path (--adapter flag) returns ErrAdapterDeferred per Epic 2's contingency exit — adapter publication lands in MODEL-DIST-002. Flag machinery preserved for forward compatibility. Tests (22, all green) in internal/cli/model_test.go: - Backend factory dispatch (5 cases incl. case-insensitive, default, error) - Quant allowlist parsing (5 cases incl. whitespace + empty entries) - RAM-tier JSON parsing (default + operator override + malformed) - PickQuantForRAM (7 boundary cases) - ResolveQuant across paths (auto, explicit, rejection, operator-custom) - QuantManifest load (embedded + file override + missing-file error) - Ollama tag composition (fused + adapter forms) - Manifest path composition under custom OLLAMA_MODELS/OLLAMA_HOST - Blob path digest prefix handling - Adapter deferred error - Manifest JSON parser (mediaType filtering + malformed + no-model-layer) Grep audit (verification checklist): grep on internal/cli/model*.go for hardcoded values found only in help text Long/example strings documenting defaults to operators — not in logic. Behavior values all flow through cfg.Model* fields. Build + lint clean. Full cli test suite (61s wall) green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…+ writer Sprint MODEL-DIST-001 Epic 5 — observability for `mdemg model` operations. Grafana panels deferred to Sprint B (Grafana audit). New migration: internal/tsdb/migrations/021_model_install_events.sql Hypertable on recorded_at, 7-day chunks, 3 indexes (quant-time, failed-events partial, backend-event-time). Columns: event_id CUIDv2 PK + recorded_at, event_type (pull/verify/remove), backend_name, namespace, model_name, quant, adapter bool, success bool, latency_ms, sha256, size_bytes, err_message (1 KB cap). New writer: internal/tsdb/model_install_writer.go Synchronous single-row INSERT (not buffered + CopyFrom — CLI is one-shot, writes are infrequent vs the V0017/V0018/V0019/V0020 retrieval- path writers that fire per-request). Nil-pool no-op for degraded mode. errMessageMaxLen=1024 truncation at write time. New modelInstallPool interface (Exec-shaped) avoids touching the existing CopyFrom-shaped poolIface used by buffered writers. Wiring: internal/cli/model.go gets recordModelEvent(parent, cfg, row) helper: - Returns immediately if !cfg.TSDBEnabled || cfg.TSDBHost=="" - 2s timeout on connect (TSDB unreachable doesn't block CLI exit) - Logs warning + degrades gracefully on any TSDB error Called from runModelPull (success + failure paths), runModelVerify (single sweep row), runModelRemove (success + failure paths). Schema version bump: internal/config/config.go: TSDB_REQUIRED_SCHEMA_VERSION default 20→21. CI validator at .github/workflows/ci.yml:60-65 counts SQL files in internal/tsdb/migrations/ and asserts equality; now 21 files = 21 in config = passes. Build + lint clean. Existing tsdb / cli test suites green; no new tests added for the writer itself (single INSERT mirrors V0017/V0018/V0019 patterns already covered; integration is operational verification at Epic 6 once tsdb is up in the dev stack). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Sprint MODEL-DIST-001 Epic 7 — operator-facing feature documentation following the standard Why / Choices / How / How-to-use shape (memory: feedback_per_feature_docs_required.md). Contents: - Why: gap between brew install and a working local LLM after Phase 13.5 - Choices: backend matrix (Ollama vs HF vs GitHub vs S3 vs file://), artifact form (fused vs adapter), Apple Silicon scope, "Ollama runtime rejected (broken on M5+macOS 26.3.x), Ollama distribution only" - How it works: ASCII flow diagram covering CLI dispatch -> Fetcher interface -> OllamaFetcher (preflight, ollama pull, manifest discovery, blob resolve, symlink, SHA verify) -> V0021 observability row - How to use: * Quick start (3 commands: brew install ollama, mdemg model pull, curl /v1/models) * Explicit quant selection * Managing pulled models (list / verify / where / remove) * Forks + enterprise (MDEMG_MODEL_NAMESPACE override) * Air-gapped (MDEMG_MODEL_MANIFEST_PATH override) * Resource matrix per quant (disk, min RAM, recommended RAM, BPW) * Full Configurability Contract table (11 env vars + flags + defaults) * V0021 observability schema - Troubleshooting: ollama missing, SHA mismatch, quant allowlist rejection, RAM auto-detection failure, out-of-disk, symlink permission - Forward-looking: MODEL-DIST-002 adapter, Sprint B Grafana panels, future backends, cross-platform - References: all source-of-truth files cross-linked Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Sprint MODEL-DIST-001 Epic 8 — final epic, never cut (memory: feedback_sequential_epics.md). This commit lands the main-repo doc updates. The packaging/homebrew-mdemg/ submodule docs (README, CHANGELOG, formula caveats text) update at v0.10.0 release-tag time per the v0.9.0 release flow precedent — that's when goreleaser auto-regenerates mdemg.rb from .goreleaser.yaml's caveats template, and the tap-side README/CHANGELOG get edited in lockstep. Changes: - CHANGELOG.md: comprehensive Unreleased entry documenting Epics 0-5 + 7 landed in this sprint. Epic 3 ollama push and Epic 6 Tier 3 e2e marked as gated on operator confirmation. Adapter path explicitly deferred to MODEL-DIST-002 with epic_2_forensic.md cross-reference. Captures the Configurability Contract enumeration, the 3 quant SHAs, the Fetcher interface design, the V0021 hypertable, and the explicit out-of-scope list. - CLAUDE.md: new "Model Distribution (Sprint MODEL-DIST-001)" subsection in Architecture Notes, slotted ABOVE the existing Compose embed entry for visibility. Captures the pluggable-backend design, the Ollama-as- distribution-only constraint, the on-disk symlink + manifest discovery flow, the 11-knob Configurability Contract surface, the no-hardcoding enforcement, the TSDB V0021 hookup, and the Apple Silicon v1 scope. - README.md: new "Step 2b (optional): Pull the local LLM" section between Step 2 (Initialize/Start) and Open the Dashboard. 3-command quick start (brew install ollama -> mdemg model pull -> set MDEMG_MODEL_PATH). Cross-references the feature doc for the full Configurability Contract. - .goreleaser.yaml: caveats template updated to include `mdemg model pull` instructions. Goreleaser regenerates the homebrew formula's caveats block from this on the next v* tag push, so v0.10.0 will ship the new text to brew users automatically. Deferred to v0.10.0 release-tag time (handled per v0.9.0 precedent): - packaging/homebrew-mdemg/README.md update - packaging/homebrew-mdemg/CHANGELOG.md update - packaging/homebrew-mdemg/mdemg.rb regeneration (automatic via goreleaser from the .goreleaser.yaml change in this commit) - Submodule pointer bump in main repo Deferred to Epic 6 close (after operator does ollama push): - post.md sprint-close document - Capture of remote Ollama manifest digests into quant_manifest.json Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

All 3 fused quants now live on Ollama Library: https://ollama.com/reh3376/mdemg-llm-v1:Q4_K_M https://ollama.com/reh3376/mdemg-llm-v1:Q5_K_M https://ollama.com/reh3376/mdemg-llm-v1:Q8_0 End-to-end integrity verified: remote model-layer digests captured via GET https://registry.ollama.ai/v2/reh3376/mdemg-llm-v1/manifests/<quant> match the local Epic 1 SHAs exactly: Q4_K_M 401161710c22f0ae...411d42ea (matches Epic 1) Q5_K_M 144ad723101d688f...d5f5d54 (matches Epic 1) Q8_0 fc14dcb40af1bb58...8db6089 (matches Epic 1) Captured into quant_manifest.json (both docs canonical + internal/cli embed.FS mirror, byte-synced): - ollama_manifest_digest per quant (computed from the manifest body): Q4_K_M sha256:a210cccb12311773fd70bfa81f221ca0f7940a315bef87b84608caf894533b1b Q5_K_M sha256:ae6e54fe1ee0b487ae41260687ed14c46c30d1ffb0fece936282418b5bcb78e1 Q8_0 sha256:93df4d64bfa751506f7afba8bf08b891ea828575b838adec17b9399ad85be718 - Corrected size_bytes (Epic 1 used approximate values; replaced with registry-reported exact bytes for each tag): Q4_K_M 9.0 GB -> 8.4 GB (9001753408 B; was 9658404096) Q5_K_M 11 GB -> 9.8 GB (10514569568 B; was 11811160064) Q8_0 16 GB -> 14.6 GB (15698534208 B; was 17179869184) - Status flipped from "local-create done; push pending" to "published". Embedded runtime manifest (internal/cli/quant_manifest.json) re-built into the binary via embed.FS. TestLoadQuantManifest_EmbeddedFallback green with new values. Epic 3 of Sprint MODEL-DIST-001 now COMPLETE. Epic 6 (Tier 3 live e2e — `mdemg model pull` against the published tags + llama-server load on port 18102 + sanity inference) is now unblocked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Sprint MODEL-DIST-001 close-out per memory rule (feedback_sprint_plan_format.md §11 — sprint plans live in docs/development/<sprint-line>/ with the standard post.md companion). Sections (CLAUDE.md sprint-plan section guidance): - Outcome: 3 quants live on Ollama Library, mdemg model pull is the canonical install path - Process: how the plan held under reality (operator-surfaced no- hardcoding rule revised the plan in-place to add the Configurability Contract before code was written) - Findings: 5 smooth parts + 5 friction items, both honest: * convert_hf_to_gguf.py python deps gap (silent ModuleNotFoundError) * mlx_lm.fuse adapter-path requirement * convert_lora_to_gguf.py missing from brew install llama.cpp (proximate Epic 2 deferral trigger) * mdemg tsdb migrate CWD-aware .env loader quirk * Epic 1 size estimates off vs registry-reported exact bytes - Current state: per-layer state matrix - Testing & benchmarking: all 3 tiers documented (Tier 3 e2e captured V0021 rows for both pull + verify event_types — live-verified) - Risks & opportunities (forward): MODEL-DIST-002 adapter scope, Sprint B Grafana, cross-platform, HFFetcher slot, CWD-aware .env loader QoL - Sprint commits: 9 commits on dev01, mapped to their epics Closes Sprint MODEL-DIST-001 functionally. Operational sprint close (v0.10.0 release tag + tap-repo doc updates) is a separate motion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…shed main) PR #385 squash-merged the original Epic 3 quant_manifest values (estimated sizes from llama-quantize wall output, null ollama_manifest_digest because the push hadn't happened yet) into main as commit f1d029a. Meanwhile on dev01, commit 87293f8 (Epic 3 closeout) corrected those values to the registry-canonical state after the ollama push completed: - size_bytes: replaced Epic 1 approximations with registry-reported exact bytes (Q4_K_M 9001753408 / Q5_K_M 10514569568 / Q8_0 15698534208) - size_human: 9.0/11/16 GB -> 8.4/9.8/14.6 GB (more accurate) - ollama_manifest_digest: null -> sha256:a210cccb...|ae6e54fe...|93df4d64... - status: "local-create done; push pending" -> "published (...)" Conflict resolution: keep dev01 (HEAD) values for both files — those are the registry-canonical post-push state. JSON validity verified for both files; TestLoadQuantManifest_{EmbeddedFallback,OperatorOverride,OverrideMissingFile} all green against the resolved embedded manifest. The non-conflicting fast-forwarded changes from main (claude workflow edits + dependabot go.mod/go.sum bumps) are folded in by this merge unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Promote the Sprint MODEL-DIST-001 entry from Unreleased to v0.10.0 (2026-05-11) ahead of release.yml / goreleaser tag push. Fresh empty Unreleased section seeded above. v0.10.0 ships: - mdemg model pull|list|verify|remove|where — one-command path from brew install mdemg to a working local LLM - Pluggable ModelFetcher interface (Ollama in v1, slots for HF/S3/GHR/file) - 3 fused GGUF quants live on Ollama Library at reh3376/mdemg-llm-v1 (:Q4_K_M 8.4 GB / :Q5_K_M 9.8 GB / :Q8_0 14.6 GB) - 11-knob Configurability Contract (every operator-visible value dynamic) - TSDB V0021 model_install_events hypertable + writer - docs/features/local-model-distribution.md Adapter (LoRA-only) path deferred to MODEL-DIST-002 per the sprint plan's documented contingency (epic_2_forensic.md). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

rhenley1958 and others added 17 commits May 6, 2026 12:04

Merge remote-tracking branch 'origin/main' into reh3376_dev01

e738208

github-actions Bot requested a review from reh3376 as a code owner May 11, 2026 16:34

reh3376 merged commit a83f87f into main May 11, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dev: reh3376_dev01 -> main#387

dev: reh3376_dev01 -> main#387
reh3376 merged 17 commits into
mainfrom
reh3376_dev01

github-actions Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

github-actions Bot commented May 11, 2026

Summary

Commits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants