Implement Cluster Detection and Abstraction Promotion CLI by reh3376 · Pull Request #5 · reh3376/mdemg

reh3376 · 2026-01-16T18:35:48Z

Create a CLI command (cmd/consolidate) that detects stable clusters of MemoryNodes in Neo4j and promotes them to higher-layer abstraction nodes. This implements the "emergence" principle in the MDEMG memory system where frequently co-activated patterns crystallize into higher-level concepts. The tool will identify groups of 3+ nodes with strong CO_ACTIVATED_WITH edges, create abstraction nodes at layer+1, and link them via ABSTRACTS_TO relationships.

Summary by CodeRabbit

New Features
- Added clustering detection and abstraction-promotion workflow (Neo4j-backed), with dry-run and live execution modes.
Tests
- Added comprehensive tests for embedding aggregation, clustering logic, configuration validation, and utility helpers.
Chores
- Updated build configuration, task permissions, and repository ignore rules.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-16T18:35:54Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

The PR switches task references and permissions from task 006 to 005, adds a .gitignore entry, and introduces a new Go consolidation CLI that detects clusters in Neo4j and promotes them to abstraction nodes (with embedding averaging and dry-run/live modes), plus a comprehensive test suite.

Changes

Cohort / File(s)	Summary
Metadata & Settings `\.auto-claude-status`, `\.claude_settings.json`	Update task/spec references and session/phase metadata; replace permission entries for task `006-...metrics...` with `005-implement-cluster-detection-and-abstraction-promot` equivalents.
VCS Ignore `\.gitignore`	Add ignore entry for `/mdemg_build/service/consolidate`.
Consolidation CLI `mdemg_build/service/cmd/consolidate/main.go`	New CLI: config parsing, Neo4j driver init, queryClusterCandidates (Cypher), greedy non-overlapping cluster building, embedding averaging, createAbstraction (MemoryNode + ABSTRACTS_TO edges), dry-run vs live execution, and helper converters/utilities.
Consolidation Tests `mdemg_build/service/cmd/consolidate/main_test.go`	New tests covering averageEmbeddings, buildClusters, config validation, conversion helpers, truncateID, and generateAbstractionName with table-driven and edge-case cases.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI / Config
    participant Driver as Neo4j Driver
    participant Logic as Clustering Logic
    participant Output as Results / Stats

    CLI->>Driver: Initialize & verify connection
    CLI->>Logic: Provide config (spaceID, thresholds, dry-run)
    Logic->>Driver: Query cluster candidates (Cypher)
    Driver-->>Logic: Candidate nodes + embeddings
    Logic->>Logic: Build non-overlapping clusters (greedy)
    Logic->>Logic: Average embeddings per cluster
    alt Dry-run
        Logic->>Output: Tally potential promotions & samples
    else Live-run
        Logic->>Driver: CREATE abstraction node (MemoryNode)
        Driver-->>Logic: New abstraction ID
        Logic->>Driver: CREATE ABSTRACTS_TO edges to members
        Driver-->>Logic: Edge confirmations
        Logic->>Output: Record abstraction results
    end
    Output-->>CLI: Print stats and samples

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

"I nibbled through nodes and found a kindred plot,
Neo4j hummed and clusters knit a tiny knot,
Abstractions hop forth, neat and spry,
Tests checked my trail beneath the sky—
A rabbit's cheer for code that ties the lot!" 🐇✨

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Free

📥 Commits

Reviewing files that changed from the base of the PR and between c99a8b2 and 02a99c4.

📒 Files selected for processing (5)

.auto-claude-status
.claude_settings.json
.gitignore
mdemg_build/service/cmd/consolidate/main.go
mdemg_build/service/cmd/consolidate/main_test.go

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

… struct, flag parsing, and Neo4j driver setup

…with Cypher query Implements cluster detection logic for detecting MemoryNodes with sufficient high-weight neighbors at the same layer: - Add clusterCandidate struct to hold query results (NodeID, Layer, Embedding, NeighborIDs) - Add queryClusterCandidates function with Cypher query that: - Matches CO_ACTIVATED_WITH edges with weight >= threshold - Filters to same-layer nodes (a.layer = b.layer) - Groups by node and collects neighbors - Filters to nodes with >= minClusterSize-1 neighbors - Orders results by neighbor count descending - Add asStringSlice helper for Neo4j array conversion Follows patterns from cmd/decay/main.go for Neo4j session handling and type conversion helpers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…candidates into non-overlapping clusters Implements greedy first-come cluster assignment algorithm: - Added clusterMember and cluster structs for cluster representation - buildClusters processes candidates in order (highest neighbor count first) - Each node can only belong to one cluster (first-come assignment) - Clusters smaller than minSize are discarded - Embeddings are preserved for averaging in later subtasks Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…mpute centroid embedding Add averageEmbeddings function that: - Computes element-wise average of multiple embedding vectors - Returns nil for empty input - Handles mismatched dimensions gracefully by skipping invalid embeddings - Uses validCount to ensure proper averaging even with some skipped embeddings - Determines dimension from first valid (non-nil, non-empty) embedding Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…Cypher query - Add createAbstraction function that creates MemoryNode at layer+1 - Creates ABSTRACTS_TO edges from cluster members to abstraction node - Add abstractionResult struct to hold creation result (NodeID, MemberCount) - Add generateAbstractionName helper to create readable names from member IDs - Follows learning/service.go patterns for Neo4j session handling Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…hestrating detection, clustering, and promotion Implements the main orchestration function for the consolidation job: - Step 1: Query cluster candidates via queryClusterCandidates() - Step 2: Build clusters via buildClusters() with greedy first-come assignment - Step 3: Process clusters and create abstractions or report for dry-run Features: - Respects --max-promotions cap to limit number of abstraction nodes - Supports --dry-run mode that reports what would happen without database writes - Tracks statistics: clusters found, nodes promoted, edges created, skipped counts - Handles edge cases: no candidates, no qualifying clusters, missing embeddings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…d output - Enhanced printStats function with detailed statistics output - Added clusterSample struct to track sample clusters for display - Added sample cluster collection in runConsolidationJob (first 5 clusters) - Output shows: clusters found, nodes promoted/to promote, edges created/to create - Shows skipped cluster counts (no embedding, too small) - Displays sample clusters with member counts, layer transitions, and member IDs - Added truncateID helper for readable UUID display - Follows decay job pattern for consistent user experience Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…erageEmb Add comprehensive unit tests for cmd/consolidate following cmd/decay/main_test.go patterns: - TestAverageEmbeddings: verifies centroid calculation of multiple embedding vectors - TestAverageEmbeddings_Empty: returns nil for empty input - TestAverageEmbeddings_MismatchedDims: handles mismatched dimensions gracefully - TestBuildClusters: groups nodes correctly by adjacency - TestBuildClusters_MinSize: excludes clusters smaller than min size - TestBuildClusters_LayerPreserved: cluster layer matches candidate layer - TestBuildClusters_EmbeddingsPreserved: embeddings copied to cluster members - TestConfigValidation_*: min cluster size, weight threshold, max promotions - TestAsConversionHelpers: safe conversion of Neo4j types - TestAsFloat64Slice/TestAsStringSlice: slice conversion helpers - TestTruncateID: ID truncation helper - TestGenerateAbstractionName: abstraction name generation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Verified complete CLI execution against live Neo4j: - Banner prints "MDEMG Consolidation Job" ✅ - CLI connects to Neo4j successfully ✅ - Dry-run mode displays correct configuration ✅ - Custom flags work (min-cluster-size, weight-threshold, max-promotions) ✅ - Required --space-id flag enforcement ✅ - Input validation works (min-cluster-size >= 2, weight-threshold 0-1) ✅ - All 14 unit tests pass ✅ Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…quested) Fixes: - Remove compiled binary mdemg_build/service/consolidate from repo (7.4MB) - Add /mdemg_build/service/consolidate to .gitignore to prevent future accidents Verified: - Binary removed from git tracking - .gitignore updated with new entry QA Fix Session: 1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

G5: Replace nil guard in /readyz check #5 with a live Neo4j read query (RETURN 1). Detects CMS degradation when Neo4j is under stress or connections are stale. Adds Ping() method to conversation.Service. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lag-off) + Phase 13 Epic 6 V0017 audit-writer fix + Phase 11+ feature-doc backfill (narrow close) Narrow close per operator approval after Epic 0+1+2 produced design questions that warrant dedicated follow-up sprints. Note 05 deferred to Phase 14.2; Note 06 default flip deferred to Phase 14.1. What landed ----------- * Phase 13 Epic 6 V0017 audit-writer fix (in-flight discovery) - tsdb/retrieval_audit_writer.go (new, ~165 LOC; buffered + 30s flush via CopyFrom) - retrievalAuditAdapter in api/server.go (cycle-safe translation) - V0017 was empty since Phase 13 because SetRetrievalAuditWriter had no callers; now writes per retrieve when RETRIEVAL_AUDIT_ENABLED=true. - Live verification: 279 audit rows accumulated in 4h since fix landed. * Note 06 sparse activation gate (flag-off) - retrieval/gate.go (~190 LOC) + 9 Tier 1 unit tests, all green - Wired post-aggregation, pre-rerank in service.go - 4 config knobs (SPARSE_*); default off, percentile 0.95, min 3, max 20 - Per-request override via ?sparse=true|false and ?sparse_percentile=N - debug.sparse_gate_* + debug.below_threshold_* (when JiminyEnabled) - 3 Prometheus histograms * TSDB V0019 sparse_gate_metrics - migrations/019_sparse_gate_metrics.sql (hypertable, 7-day chunks) - tsdb/sparse_gate_writer.go (~165 LOC) - sparseGateRecorderAdapter in api/server.go (always wired so per-request overrides record even when default off) - TSDB_REQUIRED_SCHEMA_VERSION 18 -> 19 * Epic 0 forensic doc — phase_14_score_distribution_analysis.md - Defaults derived from llm_interactions.retrieval_scores (99k+50k score points across consulting.classify + retrieval.rerank_cross) - Heavy-tail confirmed (p98/p50 ~ 4-5x); within-call clamp dominates percentile choice in dominant K=20-50 regime - Note 05 catalog redesign needed for whk-wms (0 distinct symbols, 0 distinct roles) — flagged for Phase 14.2 * A/B verdicts captured - 16q quick at MIN=3 / p95,p98,p99: all FAIL (q69 boundary) - 16q quick at MIN=10 / p95: PASS (mean +0.019, 0 regressions, 3 improvements) - 120q full at MIN=10 / p95: FAIL per-question (mean parity 0.413=0.413, 7 boundary regressions across 4 categories, 3 of 7 in architecture_structure) - Per sprint plan §10 risk #1: ship flag-off; Phase 14.1 will retune. * Phase 11+ feature-doc backfill (operator request 2026-05-04) - new: docs/features/{mlx-watchdog,uvts-validation,column-voting-retrieval, local-llm-runtime,sparse-retrieval}.md - extended: docs/features/service-resilience.md (Phase 11.6.x additions) - Standing rule saved as memory feedback_per_feature_docs_required.md * Follow-up sprint stubs scoped - sprint_plan_phase_14_1_adaptive_per_category_gate.md (~3 days, ~$15) - sprint_plan_phase_14_2_note_05_sparse_fingerprints.md (~7 days, ~$25) Decision-fork outcomes ---------------------- | Fork | Provisional | Outcome | |---|---|---| | #2 percentile default | 0.98 | 0.95 (Epic 0 data) | | #5 catalog bit policy | static 64/64/64/64 | adaptive (deferred Phase 14.2) | | #8 gate ordering | pre-rerank | pre-rerank (confirmed) | | #9 default flip | per-Note conditional | flag-off (Phase 14.1 will flip) | OpenAI spend (actual): ~$13. Well under sprint $25-50 budget. Tests + lint ------------ * go test -race ./internal/{retrieval,config,metrics,tsdb}: all green * golangci-lint run on affected packages: 0 issues * Live smoke: /healthz green, retrieve returns 20 (gate off), 279 V0017 audit rows in 4h (Phase 13 Epic 6 fix verified in production) Memory observations ------------------- * rw0mzergwcqct8abpw0dli9x — Phase 14 Epic 8 doc-backfill scope * sc4iwy3of9ndn5kowja1i14i — Epic 0 forensic + audit-writer gap * omr2rs5jppqrvee2k0l1xtd1 — Epic 1 gate code complete * re4k7rpd3hjt5a52l8qwx8fp — Epic 2 verdict + Phase 14.1 scope Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

rhenley1958 and others added 10 commits January 16, 2026 13:38

auto-claude: subtask-1-1 - Create cmd/consolidate/main.go with config…

e4e45af

… struct, flag parsing, and Neo4j driver setup

reh3376 force-pushed the auto-claude/005-implement-cluster-detection-and-abstraction-promot branch from c99a8b2 to 02a99c4 Compare January 16, 2026 18:38

reh3376 merged commit 8bc3442 into main Jan 16, 2026
1 check was pending

reh3376 deleted the auto-claude/005-implement-cluster-detection-and-abstraction-promot branch January 28, 2026 01:03

reh3376 mentioned this pull request Apr 7, 2026

dev: reh3376_dev01 -> main #295

Merged

reh3376 mentioned this pull request May 4, 2026

dev: reh3376_dev01 -> main #366

Merged

reh3376 mentioned this pull request Jun 11, 2026

dev: reh3376_dev01 -> main #427

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Cluster Detection and Abstraction Promotion CLI#5

Implement Cluster Detection and Abstraction Promotion CLI#5
reh3376 merged 10 commits into
mainfrom
auto-claude/005-implement-cluster-detection-and-abstraction-promot

reh3376 commented Jan 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 16, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

reh3376 commented Jan 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

reh3376 commented Jan 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 16, 2026 •

edited

Loading