Skip to content

dev: reh3376_dev01 -> main#387

Merged
reh3376 merged 17 commits into
mainfrom
reh3376_dev01
May 11, 2026
Merged

dev: reh3376_dev01 -> main#387
reh3376 merged 17 commits into
mainfrom
reh3376_dev01

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

Summary

Development branch changes from reh3376_dev01.

Commits

  • docs(release): promote Unreleased -> v0.10.0
  • merge: resolve quant_manifest.json conflicts (Epic 3 closeout vs squashed main)
  • docs(model-dist-001): sprint close — post.md
  • feat(model-dist-001): Epic 3 closeout — Ollama Library push complete
  • docs(model-dist-001): Epic 8 — Documentation Update (main repo)
  • docs(model-dist-001): Epic 7 — local-model-distribution feature doc
  • feat(model-dist-001): Epic 5 — V0021 model_install_events hypertable + writer
  • feat(model-dist-001): Epic 4 — mdemg model CLI + pluggable Fetcher interface
  • feat(model-dist-001): Epic 3 — 3 Modelfiles + local ollama create (push pending)
  • docs(model-dist-001): Epic 2 — defer adapter to MODEL-DIST-002
  • feat(model-dist-001): Epic 1 — built Q4_K_M + Q8_0 fused GGUFs
  • docs(sprint): MODEL-DIST-001 sprint plan + quant manifest skeleton
  • fix(service): replace decommissioned mlx-server LaunchAgent with llama-server
  • fix(api): /healthz returns build-time version, not stale literal "0.6.0"
  • chore(submodule): bump homebrew-mdemg to v0.9.0 formula + docs
  • Merge remote-tracking branch 'origin/main' into reh3376_dev01
  • docs(release): promote Unreleased -> v0.9.0

Auto-generated PR from reh3376_dev01 push

rhenley1958 and others added 17 commits May 6, 2026 12:04
Promote the Unreleased CHANGELOG block to v0.9.0 (2026-05-06) ahead of
release.yml / goreleaser tag push.

New ### Breaking subsection captures two operator-visible cutovers since
v0.8.5: (1) Phase 13.5 LLM runtime port 8101 -> 8102 + .env migration
required; (2) Phase 13.6 MLX_* -> LLM_* env-var rename (legacy aliases
retained for >= 1 release cycle).

New ### Added entries: Phase 10.5 closeout (UBENCH framework promotion,
commit 0389b49) and Claude Code GitHub App workflows (PRs #378, #379).

All previously-Unreleased entries (Phase 14.2.3, 14.2.x, 14.1.x, 14, 13.6,
13.5, 13.2, 13.1) carried forward unchanged into the v0.9.0 block. Fresh
empty Unreleased section seeded above.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps packaging/homebrew-mdemg pointer a235977 -> 6077097, which incorporates:
- f9358cd  Brew formula update for mdemg version v0.8.5 (goreleaser, prior)
- b4a0d2c  Brew formula update for mdemg version v0.9.0 (goreleaser, this release)
- 6077097  docs: v0.9.0 -- CHANGELOG, README What's New, beta-testing version pin

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`config.FromEnv()` defaulted MdemgVersion/MdemgCommit to literal "0.6.0"/
"unknown" when MDEMG_VERSION/MDEMG_COMMIT envs were unset. Both /healthz
and /readyz serialize cfg.MdemgVersion, so they reported "0.6.0" forever
regardless of the actual binary's ldflags-injected cli.Version.

Fix: defaults to "" in config; cli/config_loader.go injects cli.Version /
cli.Commit (the build-time vars set by goreleaser ldflags) when the env
override is unset. Operators can still pin via MDEMG_VERSION env.

Live-verified: dev build (no ldflags) now reports {"version":"dev"} on
/healthz instead of the lying "0.6.0". Production builds via goreleaser
will report the real semver tag.

TestHandleHealthz unaffected (sets cfg.MdemgVersion directly).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a-server

Phase 13.5 cutover (2026-05-03) replaced mlx_lm.server (port 8101) with
llama.cpp llama-server (port 8102) as the production LLM runtime, but the
embedded launchd plist template + service install code paths were never
updated. Any operator running 'mdemg service install' from a fresh checkout
got the decommissioned mlx_lm.server agent — mdemg's startup preflight then
failed because LLM_ENDPOINT=http://127.0.0.1:8102/v1 wasn't reachable.

Changes:
- New packaging/launchd/com.mdemg.llama-server.plist with the Phase 13.5
  production flags (--ctx-size 32768 --parallel 4 --cont-batching --metrics
  --jinja). Byte-identical mirror at internal/cli/launchd_templates/ for
  the embed.FS (CI sync-check enforced).
- Removed packaging/launchd/com.mdemg.mlx-server.plist + embed.FS mirror.
  mlx_lm.server is decommissioned and known-broken on M5 + macOS 26.3.x;
  keeping the template would just risk re-deploying it.
- internal/cli/service_darwin.go: launchdServices entry replaced with
  com.mdemg.llama-server. resolveMLXLMBin renamed to resolveLlamaServerBin
  with primary env MDEMG_LLAMA_SERVER_BIN, deprecation alias for
  MDEMG_MLX_LM_BIN (slog.Warn at boot, retained ≥1 release cycle per the
  Phase 13.6 deprecation pattern), PATH lookup of `llama-server`.
  resolveMDEMGModelPath default updated to the canonical Phase 13.5 GGUF
  filepath (.local-models/mdemg-llm-v1-gguf/mdemg-llm-v1.Q5_K_M.gguf) since
  llama-server takes a `.gguf` filepath, not an HF-format directory like
  mlx_lm.server. Install error message updated for the new env var name +
  remediation steps (`brew install llama.cpp`).
- migrateLegacyMLXServerPlist() added: if a pre-cutover com.mdemg.mlx-server
  plist is bootstrapped on the operator's machine, Install() boots it out
  and renames the file to .disabled-phase13_5 (matches the manual operator
  convention from Phase 13.5 rollout). Best-effort: failures don't block
  the install.
- internal/cli/service_darwin_test.go fully rewritten:
    * TestLaunchdServicesIncludesLlamaServer asserts the new entry exists
      and is Optional=false (production matches Hotfix 11.6.3.1; the old
      test asserted Optional=true, a latent lie since 2026-05-02 that
      Linux CI never caught because of //go:build darwin)
    * TestLlamaServerPlistEmbedded replaces TestMLXServerPlistEmbedded;
      additionally asserts mlx-server.plist is NOT in embed.FS
    * Two resolver tests for the primary env var
    * New TestResolveLlamaServerBinFallsBackToMLXAlias proves the
      Phase 13.6 deprecation alias path works
    * resolveMDEMGModelPath tests updated for the new GGUF default
- internal/cli/watchdog.go: help text references com.mdemg.llama-server
  (instead of com.mdemg.mlx-server) and llama-server (instead of
  mlx_lm.server). Notes that mdemg_mlx_health_state metric name is
  retained for dashboard compatibility.

Tested:
- Tier 1 unit: 7/7 new tests pass; full ./internal/cli/... suite green
  (61s wall-clock).
- Tier 2 integration: golangci-lint run ./internal/cli/ — 0 issues.
  CI plist sync-check (diff -q packaging/launchd/*.plist
  internal/cli/launchd_templates/) — 6/6 byte-identical.
- Tier 3 live e2e: deferred. Running mdemg service install on the
  operator's currently-serving machine would briefly bootout the running
  llama-server LaunchAgent (PID 20527 actively serving production
  inference). The hand-installed llama-server plist on the operator's
  machine is byte-equivalent (modulo template substitutions) to what
  this commit will install via `mdemg service install` on a fresh
  operator setup, so the operator can verify on next planned redeploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Epic 0 of Sprint MODEL-DIST-001 — Local LoRA Distribution via Ollama Library.

Sprint plan in 12-section v1.0 format. Supersedes parts of the speculative spec
at docs/research/mdemg_sprint_ideas/MDEMG_FT_LORA_PACKAGING_SPEC.md (HF Hub vs
Ollama Library; adapter-only vs both-fused-and-adapter; Apple Silicon scope vs
cross-platform).

Configurability Contract — every operator-visible value is dynamic per the
framework's no-hardcoding rule. 12 env vars + flag overrides + sensible
defaults. ModelFetcher interface decouples CLI from Ollama-specific knowledge;
v1 ships OllamaFetcher only, future backends (HF / S3 / GitHub Release / file)
plug in via factory dispatch on MDEMG_MODEL_BACKEND without touching the CLI
surface.

Forensic from Epic 0:
- adapters/tier1/adapters.safetensors verified present (514 MB MLX, Phase 5
  SFT Iter 2400 best output)
- mdemg-llm-v1.Q5_K_M.gguf SHA256 captured (9.8 GB; 144ad7231...)
- f16 GGUF intermediate NOT on disk; Epic 1 will regenerate via
  convert_hf_to_gguf.py from the MLX merged model (~5 min)
- qwen3:14b model-layer digest captured from Ollama registry; manifest digest
  to be computed at Epic 3 for Modelfile FROM @sha256: pinning

quant_manifest.json skeleton with Q5_K_M SHA pre-populated; Q4_K_M / Q8_0 /
adapter SHAs filled in during Epics 1+2.

Estimated effort 5–7 dev-days. OpenAI spend $0. Risk medium (Ollama publish
one-way; MLX→PEFT→GGUF LoRA conversion is the riskiest engineering item with
documented contingency to defer to MODEL-DIST-002 if blocked).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pipeline (CLAUDE.md Phase 13.5 documented path):
  1. mlx_lm.fuse --dequantize: mlx-community/Qwen3-14B-4bit + adapters/tier1/
     -> 29.6 GB bf16 HF safetensors at .local-models/qwen3-14b-mdemg-v1-bf16/
  2. convert_hf_to_gguf.py --outtype f16 -> 30 GB f16 GGUF (required
     neural/.venv interpreter with torch + transformers + gguf installed;
     /opt/homebrew/bin/convert_hf_to_gguf.py uses system python which lacks
     these — installed gguf/sentencepiece/protobuf into neural/.venv)
  3. llama-quantize Q4_K_M -> 9.0 GB (4.87 BPW; 40s wall on M5)
  4. llama-quantize Q8_0 -> 16 GB (8.50 BPW; 11s wall on M5)
  5. Live smoke per new quant via llama-server on port 18102 — both serve
     /v1/models cleanly with embedded chat_template

SHAs captured in quant_manifest.json:
  Q4_K_M: 401161710c22f0ae...411d42ea
  Q5_K_M: 144ad723101d688f...d5f5d54 (matches Epic 0 baseline)
  Q8_0:   fc14dcb40af1bb58...8db6089
  f16:    436cd6f41a684805...3217bd (intermediate, retained for Epic 2)

Resource matrix updated with empirical sizes (Q4_K_M is 9.0 GB vs estimated
6.5 GB; min RAM revised 8 -> 12 GB to cover ~3 GB working memory above
weights). 14B params x 4.87 BPW ≈ 8.5 GB matches the formula.

GGUF binary artifacts stay local — .local-models/ gitignored per
.gitignore:70. Sprint deliverable in git is just the manifest update.

Production llama-server (PID 20527 on port 8102) undisturbed throughout
Epic 1; live smokes used port 18102.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adapter (LoRA-only Modelfile via ADAPTER directive) deferred per the sprint
plan's documented contingency clause. Fused-only path (Epics 1, 3, 4, 5)
continues — that's the primary operator value.

Forensic findings (epic_2_forensic.md):
- MLX adapter is well-formed: 560 tensors, 40 layers x 7 target_modules,
  rank 32, alpha 64, scale 20.0.
- convert_lora_to_gguf.py is NOT in brew install llama.cpp; would need
  manual fetch from llama.cpp source.
- MLX -> PEFT requires tensor transposition: MLX lora_a is (in, rank);
  PEFT expects (rank, in). Same for lora_b.
- Estimated 80-95 min to complete vs ~30 min budget remaining for Epic 2.
- Hit the contingency criterion: "MLX -> PEFT conversion blocked by
  tooling gaps."

Decision: defer adapter scope to MODEL-DIST-002 (new follow-up sprint, to
be planned separately). Fused-only ships this sprint.

Knock-on changes (in-flight to subsequent epics):
- Epic 3: drop Modelfile.adapter; publish only 3 fused quants.
- Epic 4 CLI: --adapter flag accepted at parse-time but errors with
  "lands in MODEL-DIST-002"; machinery preserved for forward-compat.
- Epic 6 e2e: drop adapter-pull step.
- Epic 7 feature doc: adapter section notes "coming in MODEL-DIST-002".

Artifacts preserved on disk for MODEL-DIST-002 pickup:
- adapters/tier1/adapters.safetensors (MLX, 514 MB)
- .local-models/mdemg-llm-v1-gguf/mdemg-llm-v1.f16.gguf (30 GB,
  retained as base for llama-server --lora verification later)

quant_manifest.json adapter block updated with status=deferred + reason.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sh pending)

Authored 3 Ollama Modelfiles in packaging/ollama/:
  Modelfile.Q4_K_M — 9.0 GB, 12 GB min RAM, 16 GB recommended
  Modelfile.Q5_K_M — 11 GB, 14 GB min RAM, 24 GB recommended (production canonical)
  Modelfile.Q8_0   — 16 GB, 20 GB min RAM, 32 GB recommended

Common shape: FROM ./../../.local-models/mdemg-llm-v1-gguf/...gguf relative
path (operator-machine local); num_ctx 32768, num_predict 4096, stop tokens
<|im_end|>/<|im_start|>; Apache-2.0 LICENSE; SYSTEM positioning block.
No TEMPLATE directive — chat template baked into GGUF metadata (Qwen3
chat_template.jinja preserved through mlx_lm.fuse --dequantize → convert_hf
→ llama-quantize pipeline).

packaging/ollama/README.md documents the publish workflow including the
fork-customization path (operators publishing under a different namespace
follow MDEMG_MODEL_NAMESPACE per the Configurability Contract).

Local ollama create completed for all 3:
  reh3376/mdemg-llm-v1:Q4_K_M  ID 5c3a7252c295
  reh3376/mdemg-llm-v1:Q5_K_M  ID 08c13b480864
  reh3376/mdemg-llm-v1:Q8_0    ID 6b1006facd36

Layers de-duplicated: config + params + system layers (3 layers) are
identical across all 3 quants; only the model blob (GGUF) differs.

** ollama push deferred ** — one-way action gated on operator confirmation
per Sprint Plan §10 Risk #8. Operator must claim reh3376 namespace on
ollama.com and generate API token before push proceeds. Local-create proves
the Modelfiles are well-formed; push is a separate decision.

Once pushed, manifest digests captured into quant_manifest.json
(ollama_manifest_digest field per quant) for mdemg model verify.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…interface

Sprint MODEL-DIST-001 Epic 4 — the bulk of the operator-facing surface.

New CLI subcommand group:
  mdemg model pull       # fetch + symlink + SHA verify
  mdemg model list       # show pulled models
  mdemg model verify     # re-check SHAs vs quant manifest
  mdemg model remove     # destructive (requires --yes)
  mdemg model where      # print resolved path for shell scripting

Pluggable backend (internal/cli/model_fetcher.go):
  type Fetcher interface { Name, Fetch, Verify, Remove }
  NewFetcher dispatches on cfg.ModelBackend (env: MDEMG_MODEL_BACKEND)
  v1 ships OllamaFetcher only; future backends (hf, s3, github-release,
  file) plug in via factory branch — CLI surface unchanged.

OllamaFetcher (internal/cli/model_fetcher_ollama.go):
  Encapsulates ALL Ollama-specific concepts: `ollama pull` invocation,
  manifest path under <OLLAMA_MODELS>/manifests/<OLLAMA_HOST>/<ns>/<n>/<tag>,
  mediaType=application/vnd.ollama.image.model layer filtering,
  blob path under <OLLAMA_MODELS>/blobs/sha256-<digest>, symlink under
  <MDEMG_MODEL_DIR>, idempotent.

Configurability Contract (no hardcoding; memory: feedback_no_hardcoded_values.md):
  12 env vars + flag overrides, each with v1-production-tuned defaults so
  `mdemg model pull` with no flags Just Works. See sprint plan §3.
  Live-verified all 3 resolution paths:
    `--quant Q5_K_M`                          → namespace=reh3376
    `--namespace acme --name custom-model`    → namespace=acme name=custom
    `MDEMG_MODEL_NAMESPACE=acme env`          → env overrides applied
  Added to internal/config/config.go: ModelBackend, ModelNamespace,
  ModelName, ModelQuants, ModelRamTiers, ModelQuant, AdapterBase,
  ModelDir, OllamaModelsRoot, OllamaRegistryHost, ModelManifestPath.

Embedded quant manifest (internal/cli/quant_manifest.json via embed.FS):
  Runtime source-of-truth for SHA verification. Operator override via
  MDEMG_MODEL_MANIFEST_PATH for air-gapped deployments. Mirrors
  docs/development/model-dist-001/quant_manifest.json.

RAM-tier auto-pick:
  Default JSON `{"<16":"Q4_K_M","<24":"Q5_K_M","default":"Q8_0"}` maps
  host RAM (sysctl on darwin, /proc/meminfo on linux) to quant. Operator
  override via MDEMG_MODEL_RAM_TIERS.

Adapter path (--adapter flag) returns ErrAdapterDeferred per Epic 2's
contingency exit — adapter publication lands in MODEL-DIST-002. Flag
machinery preserved for forward compatibility.

Tests (22, all green) in internal/cli/model_test.go:
- Backend factory dispatch (5 cases incl. case-insensitive, default, error)
- Quant allowlist parsing (5 cases incl. whitespace + empty entries)
- RAM-tier JSON parsing (default + operator override + malformed)
- PickQuantForRAM (7 boundary cases)
- ResolveQuant across paths (auto, explicit, rejection, operator-custom)
- QuantManifest load (embedded + file override + missing-file error)
- Ollama tag composition (fused + adapter forms)
- Manifest path composition under custom OLLAMA_MODELS/OLLAMA_HOST
- Blob path digest prefix handling
- Adapter deferred error
- Manifest JSON parser (mediaType filtering + malformed + no-model-layer)

Grep audit (verification checklist):
  grep on internal/cli/model*.go for hardcoded values found only in help
  text Long/example strings documenting defaults to operators — not in
  logic. Behavior values all flow through cfg.Model* fields.

Build + lint clean. Full cli test suite (61s wall) green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…+ writer

Sprint MODEL-DIST-001 Epic 5 — observability for `mdemg model` operations.
Grafana panels deferred to Sprint B (Grafana audit).

New migration:
  internal/tsdb/migrations/021_model_install_events.sql
  Hypertable on recorded_at, 7-day chunks, 3 indexes (quant-time,
  failed-events partial, backend-event-time). Columns: event_id CUIDv2
  PK + recorded_at, event_type (pull/verify/remove), backend_name,
  namespace, model_name, quant, adapter bool, success bool, latency_ms,
  sha256, size_bytes, err_message (1 KB cap).

New writer:
  internal/tsdb/model_install_writer.go
  Synchronous single-row INSERT (not buffered + CopyFrom — CLI is
  one-shot, writes are infrequent vs the V0017/V0018/V0019/V0020 retrieval-
  path writers that fire per-request). Nil-pool no-op for degraded mode.
  errMessageMaxLen=1024 truncation at write time. New modelInstallPool
  interface (Exec-shaped) avoids touching the existing CopyFrom-shaped
  poolIface used by buffered writers.

Wiring:
  internal/cli/model.go gets recordModelEvent(parent, cfg, row) helper:
  - Returns immediately if !cfg.TSDBEnabled || cfg.TSDBHost==""
  - 2s timeout on connect (TSDB unreachable doesn't block CLI exit)
  - Logs warning + degrades gracefully on any TSDB error
  Called from runModelPull (success + failure paths), runModelVerify
  (single sweep row), runModelRemove (success + failure paths).

Schema version bump:
  internal/config/config.go: TSDB_REQUIRED_SCHEMA_VERSION default 20→21.
  CI validator at .github/workflows/ci.yml:60-65 counts SQL files in
  internal/tsdb/migrations/ and asserts equality; now 21 files = 21
  in config = passes.

Build + lint clean. Existing tsdb / cli test suites green; no new tests
added for the writer itself (single INSERT mirrors V0017/V0018/V0019
patterns already covered; integration is operational verification at
Epic 6 once tsdb is up in the dev stack).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 Epic 7 — operator-facing feature documentation
following the standard Why / Choices / How / How-to-use shape (memory:
feedback_per_feature_docs_required.md).

Contents:
- Why: gap between brew install and a working local LLM after Phase 13.5
- Choices: backend matrix (Ollama vs HF vs GitHub vs S3 vs file://),
  artifact form (fused vs adapter), Apple Silicon scope, "Ollama runtime
  rejected (broken on M5+macOS 26.3.x), Ollama distribution only"
- How it works: ASCII flow diagram covering CLI dispatch -> Fetcher
  interface -> OllamaFetcher (preflight, ollama pull, manifest discovery,
  blob resolve, symlink, SHA verify) -> V0021 observability row
- How to use:
    * Quick start (3 commands: brew install ollama, mdemg model pull,
      curl /v1/models)
    * Explicit quant selection
    * Managing pulled models (list / verify / where / remove)
    * Forks + enterprise (MDEMG_MODEL_NAMESPACE override)
    * Air-gapped (MDEMG_MODEL_MANIFEST_PATH override)
    * Resource matrix per quant (disk, min RAM, recommended RAM, BPW)
    * Full Configurability Contract table (11 env vars + flags + defaults)
    * V0021 observability schema
- Troubleshooting: ollama missing, SHA mismatch, quant allowlist
  rejection, RAM auto-detection failure, out-of-disk, symlink permission
- Forward-looking: MODEL-DIST-002 adapter, Sprint B Grafana panels,
  future backends, cross-platform
- References: all source-of-truth files cross-linked

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 Epic 8 — final epic, never cut (memory:
feedback_sequential_epics.md).

This commit lands the main-repo doc updates. The packaging/homebrew-mdemg/
submodule docs (README, CHANGELOG, formula caveats text) update at
v0.10.0 release-tag time per the v0.9.0 release flow precedent — that's
when goreleaser auto-regenerates mdemg.rb from .goreleaser.yaml's caveats
template, and the tap-side README/CHANGELOG get edited in lockstep.

Changes:
- CHANGELOG.md: comprehensive Unreleased entry documenting Epics 0-5 + 7
  landed in this sprint. Epic 3 ollama push and Epic 6 Tier 3 e2e marked
  as gated on operator confirmation. Adapter path explicitly deferred to
  MODEL-DIST-002 with epic_2_forensic.md cross-reference. Captures the
  Configurability Contract enumeration, the 3 quant SHAs, the Fetcher
  interface design, the V0021 hypertable, and the explicit out-of-scope
  list.
- CLAUDE.md: new "Model Distribution (Sprint MODEL-DIST-001)" subsection
  in Architecture Notes, slotted ABOVE the existing Compose embed entry
  for visibility. Captures the pluggable-backend design, the Ollama-as-
  distribution-only constraint, the on-disk symlink + manifest discovery
  flow, the 11-knob Configurability Contract surface, the no-hardcoding
  enforcement, the TSDB V0021 hookup, and the Apple Silicon v1 scope.
- README.md: new "Step 2b (optional): Pull the local LLM" section
  between Step 2 (Initialize/Start) and Open the Dashboard. 3-command
  quick start (brew install ollama -> mdemg model pull -> set
  MDEMG_MODEL_PATH). Cross-references the feature doc for the full
  Configurability Contract.
- .goreleaser.yaml: caveats template updated to include `mdemg model pull`
  instructions. Goreleaser regenerates the homebrew formula's caveats
  block from this on the next v* tag push, so v0.10.0 will ship the new
  text to brew users automatically.

Deferred to v0.10.0 release-tag time (handled per v0.9.0 precedent):
- packaging/homebrew-mdemg/README.md update
- packaging/homebrew-mdemg/CHANGELOG.md update
- packaging/homebrew-mdemg/mdemg.rb regeneration (automatic via
  goreleaser from the .goreleaser.yaml change in this commit)
- Submodule pointer bump in main repo

Deferred to Epic 6 close (after operator does ollama push):
- post.md sprint-close document
- Capture of remote Ollama manifest digests into quant_manifest.json

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All 3 fused quants now live on Ollama Library:
  https://ollama.com/reh3376/mdemg-llm-v1:Q4_K_M
  https://ollama.com/reh3376/mdemg-llm-v1:Q5_K_M
  https://ollama.com/reh3376/mdemg-llm-v1:Q8_0

End-to-end integrity verified: remote model-layer digests captured via
GET https://registry.ollama.ai/v2/reh3376/mdemg-llm-v1/manifests/<quant>
match the local Epic 1 SHAs exactly:
  Q4_K_M  401161710c22f0ae...411d42ea  (matches Epic 1)
  Q5_K_M  144ad723101d688f...d5f5d54  (matches Epic 1)
  Q8_0    fc14dcb40af1bb58...8db6089  (matches Epic 1)

Captured into quant_manifest.json (both docs canonical + internal/cli
embed.FS mirror, byte-synced):
- ollama_manifest_digest per quant (computed from the manifest body):
    Q4_K_M  sha256:a210cccb12311773fd70bfa81f221ca0f7940a315bef87b84608caf894533b1b
    Q5_K_M  sha256:ae6e54fe1ee0b487ae41260687ed14c46c30d1ffb0fece936282418b5bcb78e1
    Q8_0    sha256:93df4d64bfa751506f7afba8bf08b891ea828575b838adec17b9399ad85be718
- Corrected size_bytes (Epic 1 used approximate values; replaced with
  registry-reported exact bytes for each tag):
    Q4_K_M   9.0 GB ->  8.4 GB (9001753408 B; was 9658404096)
    Q5_K_M  11 GB   ->  9.8 GB (10514569568 B; was 11811160064)
    Q8_0    16 GB   -> 14.6 GB (15698534208 B; was 17179869184)
- Status flipped from "local-create done; push pending" to "published".

Embedded runtime manifest (internal/cli/quant_manifest.json) re-built into
the binary via embed.FS. TestLoadQuantManifest_EmbeddedFallback green
with new values.

Epic 3 of Sprint MODEL-DIST-001 now COMPLETE. Epic 6 (Tier 3 live e2e —
`mdemg model pull` against the published tags + llama-server load on
port 18102 + sanity inference) is now unblocked.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sprint MODEL-DIST-001 close-out per memory rule
(feedback_sprint_plan_format.md §11 — sprint plans live in
docs/development/<sprint-line>/ with the standard post.md companion).

Sections (CLAUDE.md sprint-plan section guidance):
- Outcome: 3 quants live on Ollama Library, mdemg model pull is the
  canonical install path
- Process: how the plan held under reality (operator-surfaced no-
  hardcoding rule revised the plan in-place to add the Configurability
  Contract before code was written)
- Findings: 5 smooth parts + 5 friction items, both honest:
    * convert_hf_to_gguf.py python deps gap (silent ModuleNotFoundError)
    * mlx_lm.fuse adapter-path requirement
    * convert_lora_to_gguf.py missing from brew install llama.cpp
      (proximate Epic 2 deferral trigger)
    * mdemg tsdb migrate CWD-aware .env loader quirk
    * Epic 1 size estimates off vs registry-reported exact bytes
- Current state: per-layer state matrix
- Testing & benchmarking: all 3 tiers documented (Tier 3 e2e captured
  V0021 rows for both pull + verify event_types — live-verified)
- Risks & opportunities (forward): MODEL-DIST-002 adapter scope, Sprint
  B Grafana, cross-platform, HFFetcher slot, CWD-aware .env loader QoL
- Sprint commits: 9 commits on dev01, mapped to their epics

Closes Sprint MODEL-DIST-001 functionally. Operational sprint close
(v0.10.0 release tag + tap-repo doc updates) is a separate motion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…shed main)

PR #385 squash-merged the original Epic 3 quant_manifest values (estimated
sizes from llama-quantize wall output, null ollama_manifest_digest because
the push hadn't happened yet) into main as commit f1d029a. Meanwhile on
dev01, commit 87293f8 (Epic 3 closeout) corrected those values to the
registry-canonical state after the ollama push completed:

- size_bytes: replaced Epic 1 approximations with registry-reported exact
  bytes (Q4_K_M 9001753408 / Q5_K_M 10514569568 / Q8_0 15698534208)
- size_human: 9.0/11/16 GB -> 8.4/9.8/14.6 GB (more accurate)
- ollama_manifest_digest: null -> sha256:a210cccb...|ae6e54fe...|93df4d64...
- status: "local-create done; push pending" -> "published (...)"

Conflict resolution: keep dev01 (HEAD) values for both files — those are
the registry-canonical post-push state. JSON validity verified for both
files; TestLoadQuantManifest_{EmbeddedFallback,OperatorOverride,OverrideMissingFile}
all green against the resolved embedded manifest.

The non-conflicting fast-forwarded changes from main (claude workflow
edits + dependabot go.mod/go.sum bumps) are folded in by this merge
unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Promote the Sprint MODEL-DIST-001 entry from Unreleased to v0.10.0
(2026-05-11) ahead of release.yml / goreleaser tag push. Fresh empty
Unreleased section seeded above.

v0.10.0 ships:
- mdemg model pull|list|verify|remove|where — one-command path from
  brew install mdemg to a working local LLM
- Pluggable ModelFetcher interface (Ollama in v1, slots for HF/S3/GHR/file)
- 3 fused GGUF quants live on Ollama Library at reh3376/mdemg-llm-v1
  (:Q4_K_M 8.4 GB / :Q5_K_M 9.8 GB / :Q8_0 14.6 GB)
- 11-knob Configurability Contract (every operator-visible value dynamic)
- TSDB V0021 model_install_events hypertable + writer
- docs/features/local-model-distribution.md

Adapter (LoRA-only) path deferred to MODEL-DIST-002 per the sprint plan's
documented contingency (epic_2_forensic.md).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot requested a review from reh3376 as a code owner May 11, 2026 16:34
@reh3376 reh3376 merged commit a83f87f into main May 11, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants