Implement Cluster Detection and Abstraction Promotion CLI#5
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThe PR switches task references and permissions from task 006 to 005, adds a .gitignore entry, and introduces a new Go consolidation CLI that detects clusters in Neo4j and promotes them to abstraction nodes (with embedding averaging and dry-run/live modes), plus a comprehensive test suite. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI / Config
participant Driver as Neo4j Driver
participant Logic as Clustering Logic
participant Output as Results / Stats
CLI->>Driver: Initialize & verify connection
CLI->>Logic: Provide config (spaceID, thresholds, dry-run)
Logic->>Driver: Query cluster candidates (Cypher)
Driver-->>Logic: Candidate nodes + embeddings
Logic->>Logic: Build non-overlapping clusters (greedy)
Logic->>Logic: Average embeddings per cluster
alt Dry-run
Logic->>Output: Tally potential promotions & samples
else Live-run
Logic->>Driver: CREATE abstraction node (MemoryNode)
Driver-->>Logic: New abstraction ID
Logic->>Driver: CREATE ABSTRACTS_TO edges to members
Driver-->>Logic: Edge confirmations
Logic->>Output: Record abstraction results
end
Output-->>CLI: Print stats and samples
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Free 📒 Files selected for processing (5)
✏️ Tip: You can disable this entire section by setting Note 🎁 Summarized by CodeRabbit FreeYour organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login. Comment |
… struct, flag parsing, and Neo4j driver setup
…with Cypher query Implements cluster detection logic for detecting MemoryNodes with sufficient high-weight neighbors at the same layer: - Add clusterCandidate struct to hold query results (NodeID, Layer, Embedding, NeighborIDs) - Add queryClusterCandidates function with Cypher query that: - Matches CO_ACTIVATED_WITH edges with weight >= threshold - Filters to same-layer nodes (a.layer = b.layer) - Groups by node and collects neighbors - Filters to nodes with >= minClusterSize-1 neighbors - Orders results by neighbor count descending - Add asStringSlice helper for Neo4j array conversion Follows patterns from cmd/decay/main.go for Neo4j session handling and type conversion helpers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…candidates into non-overlapping clusters Implements greedy first-come cluster assignment algorithm: - Added clusterMember and cluster structs for cluster representation - buildClusters processes candidates in order (highest neighbor count first) - Each node can only belong to one cluster (first-come assignment) - Clusters smaller than minSize are discarded - Embeddings are preserved for averaging in later subtasks Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…mpute centroid embedding Add averageEmbeddings function that: - Computes element-wise average of multiple embedding vectors - Returns nil for empty input - Handles mismatched dimensions gracefully by skipping invalid embeddings - Uses validCount to ensure proper averaging even with some skipped embeddings - Determines dimension from first valid (non-nil, non-empty) embedding Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…Cypher query - Add createAbstraction function that creates MemoryNode at layer+1 - Creates ABSTRACTS_TO edges from cluster members to abstraction node - Add abstractionResult struct to hold creation result (NodeID, MemberCount) - Add generateAbstractionName helper to create readable names from member IDs - Follows learning/service.go patterns for Neo4j session handling Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…hestrating detection, clustering, and promotion Implements the main orchestration function for the consolidation job: - Step 1: Query cluster candidates via queryClusterCandidates() - Step 2: Build clusters via buildClusters() with greedy first-come assignment - Step 3: Process clusters and create abstractions or report for dry-run Features: - Respects --max-promotions cap to limit number of abstraction nodes - Supports --dry-run mode that reports what would happen without database writes - Tracks statistics: clusters found, nodes promoted, edges created, skipped counts - Handles edge cases: no candidates, no qualifying clusters, missing embeddings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…d output - Enhanced printStats function with detailed statistics output - Added clusterSample struct to track sample clusters for display - Added sample cluster collection in runConsolidationJob (first 5 clusters) - Output shows: clusters found, nodes promoted/to promote, edges created/to create - Shows skipped cluster counts (no embedding, too small) - Displays sample clusters with member counts, layer transitions, and member IDs - Added truncateID helper for readable UUID display - Follows decay job pattern for consistent user experience Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…erageEmb Add comprehensive unit tests for cmd/consolidate following cmd/decay/main_test.go patterns: - TestAverageEmbeddings: verifies centroid calculation of multiple embedding vectors - TestAverageEmbeddings_Empty: returns nil for empty input - TestAverageEmbeddings_MismatchedDims: handles mismatched dimensions gracefully - TestBuildClusters: groups nodes correctly by adjacency - TestBuildClusters_MinSize: excludes clusters smaller than min size - TestBuildClusters_LayerPreserved: cluster layer matches candidate layer - TestBuildClusters_EmbeddingsPreserved: embeddings copied to cluster members - TestConfigValidation_*: min cluster size, weight threshold, max promotions - TestAsConversionHelpers: safe conversion of Neo4j types - TestAsFloat64Slice/TestAsStringSlice: slice conversion helpers - TestTruncateID: ID truncation helper - TestGenerateAbstractionName: abstraction name generation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Verified complete CLI execution against live Neo4j: - Banner prints "MDEMG Consolidation Job" ✅ - CLI connects to Neo4j successfully ✅ - Dry-run mode displays correct configuration ✅ - Custom flags work (min-cluster-size, weight-threshold, max-promotions) ✅ - Required --space-id flag enforcement ✅ - Input validation works (min-cluster-size >= 2, weight-threshold 0-1) ✅ - All 14 unit tests pass ✅ Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…quested) Fixes: - Remove compiled binary mdemg_build/service/consolidate from repo (7.4MB) - Add /mdemg_build/service/consolidate to .gitignore to prevent future accidents Verified: - Binary removed from git tracking - .gitignore updated with new entry QA Fix Session: 1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
c99a8b2 to
02a99c4
Compare
G5: Replace nil guard in /readyz check #5 with a live Neo4j read query (RETURN 1). Detects CMS degradation when Neo4j is under stress or connections are stale. Adds Ping() method to conversation.Service. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lag-off) + Phase 13 Epic 6 V0017 audit-writer fix + Phase 11+ feature-doc backfill (narrow close)
Narrow close per operator approval after Epic 0+1+2 produced design questions
that warrant dedicated follow-up sprints. Note 05 deferred to Phase 14.2;
Note 06 default flip deferred to Phase 14.1.
What landed
-----------
* Phase 13 Epic 6 V0017 audit-writer fix (in-flight discovery)
- tsdb/retrieval_audit_writer.go (new, ~165 LOC; buffered + 30s flush via CopyFrom)
- retrievalAuditAdapter in api/server.go (cycle-safe translation)
- V0017 was empty since Phase 13 because SetRetrievalAuditWriter had no
callers; now writes per retrieve when RETRIEVAL_AUDIT_ENABLED=true.
- Live verification: 279 audit rows accumulated in 4h since fix landed.
* Note 06 sparse activation gate (flag-off)
- retrieval/gate.go (~190 LOC) + 9 Tier 1 unit tests, all green
- Wired post-aggregation, pre-rerank in service.go
- 4 config knobs (SPARSE_*); default off, percentile 0.95, min 3, max 20
- Per-request override via ?sparse=true|false and ?sparse_percentile=N
- debug.sparse_gate_* + debug.below_threshold_* (when JiminyEnabled)
- 3 Prometheus histograms
* TSDB V0019 sparse_gate_metrics
- migrations/019_sparse_gate_metrics.sql (hypertable, 7-day chunks)
- tsdb/sparse_gate_writer.go (~165 LOC)
- sparseGateRecorderAdapter in api/server.go (always wired so per-request
overrides record even when default off)
- TSDB_REQUIRED_SCHEMA_VERSION 18 -> 19
* Epic 0 forensic doc — phase_14_score_distribution_analysis.md
- Defaults derived from llm_interactions.retrieval_scores (99k+50k score
points across consulting.classify + retrieval.rerank_cross)
- Heavy-tail confirmed (p98/p50 ~ 4-5x); within-call clamp dominates
percentile choice in dominant K=20-50 regime
- Note 05 catalog redesign needed for whk-wms (0 distinct symbols, 0
distinct roles) — flagged for Phase 14.2
* A/B verdicts captured
- 16q quick at MIN=3 / p95,p98,p99: all FAIL (q69 boundary)
- 16q quick at MIN=10 / p95: PASS (mean +0.019, 0 regressions, 3 improvements)
- 120q full at MIN=10 / p95: FAIL per-question (mean parity 0.413=0.413,
7 boundary regressions across 4 categories, 3 of 7 in
architecture_structure)
- Per sprint plan §10 risk #1: ship flag-off; Phase 14.1 will retune.
* Phase 11+ feature-doc backfill (operator request 2026-05-04)
- new: docs/features/{mlx-watchdog,uvts-validation,column-voting-retrieval,
local-llm-runtime,sparse-retrieval}.md
- extended: docs/features/service-resilience.md (Phase 11.6.x additions)
- Standing rule saved as memory feedback_per_feature_docs_required.md
* Follow-up sprint stubs scoped
- sprint_plan_phase_14_1_adaptive_per_category_gate.md (~3 days, ~$15)
- sprint_plan_phase_14_2_note_05_sparse_fingerprints.md (~7 days, ~$25)
Decision-fork outcomes
----------------------
| Fork | Provisional | Outcome |
|---|---|---|
| #2 percentile default | 0.98 | 0.95 (Epic 0 data) |
| #5 catalog bit policy | static 64/64/64/64 | adaptive (deferred Phase 14.2) |
| #8 gate ordering | pre-rerank | pre-rerank (confirmed) |
| #9 default flip | per-Note conditional | flag-off (Phase 14.1 will flip) |
OpenAI spend (actual): ~$13. Well under sprint $25-50 budget.
Tests + lint
------------
* go test -race ./internal/{retrieval,config,metrics,tsdb}: all green
* golangci-lint run on affected packages: 0 issues
* Live smoke: /healthz green, retrieve returns 20 (gate off), 279 V0017
audit rows in 4h (Phase 13 Epic 6 fix verified in production)
Memory observations
-------------------
* rw0mzergwcqct8abpw0dli9x — Phase 14 Epic 8 doc-backfill scope
* sc4iwy3of9ndn5kowja1i14i — Epic 0 forensic + audit-writer gap
* omr2rs5jppqrvee2k0l1xtd1 — Epic 1 gate code complete
* re4k7rpd3hjt5a52l8qwx8fp — Epic 2 verdict + Phase 14.1 scope
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Create a CLI command (
cmd/consolidate) that detects stable clusters of MemoryNodes in Neo4j and promotes them to higher-layer abstraction nodes. This implements the "emergence" principle in the MDEMG memory system where frequently co-activated patterns crystallize into higher-level concepts. The tool will identify groups of 3+ nodes with strongCO_ACTIVATED_WITHedges, create abstraction nodes atlayer+1, and link them viaABSTRACTS_TOrelationships.Summary by CodeRabbit
New Features
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.