Skip to content

Implement Cluster Detection and Abstraction Promotion CLI#5

Merged
reh3376 merged 10 commits into
mainfrom
auto-claude/005-implement-cluster-detection-and-abstraction-promot
Jan 16, 2026
Merged

Implement Cluster Detection and Abstraction Promotion CLI#5
reh3376 merged 10 commits into
mainfrom
auto-claude/005-implement-cluster-detection-and-abstraction-promot

Conversation

@reh3376

@reh3376 reh3376 commented Jan 16, 2026

Copy link
Copy Markdown
Owner

Create a CLI command (cmd/consolidate) that detects stable clusters of MemoryNodes in Neo4j and promotes them to higher-layer abstraction nodes. This implements the "emergence" principle in the MDEMG memory system where frequently co-activated patterns crystallize into higher-level concepts. The tool will identify groups of 3+ nodes with strong CO_ACTIVATED_WITH edges, create abstraction nodes at layer+1, and link them via ABSTRACTS_TO relationships.

Summary by CodeRabbit

  • New Features

    • Added clustering detection and abstraction-promotion workflow (Neo4j-backed), with dry-run and live execution modes.
  • Tests

    • Added comprehensive tests for embedding aggregation, clustering logic, configuration validation, and utility helpers.
  • Chores

    • Updated build configuration, task permissions, and repository ignore rules.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Jan 16, 2026

Copy link
Copy Markdown

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

The PR switches task references and permissions from task 006 to 005, adds a .gitignore entry, and introduces a new Go consolidation CLI that detects clusters in Neo4j and promotes them to abstraction nodes (with embedding averaging and dry-run/live modes), plus a comprehensive test suite.

Changes

Cohort / File(s) Summary
Metadata & Settings
\.auto-claude-status, \.claude_settings.json
Update task/spec references and session/phase metadata; replace permission entries for task 006-...metrics... with 005-implement-cluster-detection-and-abstraction-promot equivalents.
VCS Ignore
\.gitignore
Add ignore entry for /mdemg_build/service/consolidate.
Consolidation CLI
mdemg_build/service/cmd/consolidate/main.go
New CLI: config parsing, Neo4j driver init, queryClusterCandidates (Cypher), greedy non-overlapping cluster building, embedding averaging, createAbstraction (MemoryNode + ABSTRACTS_TO edges), dry-run vs live execution, and helper converters/utilities.
Consolidation Tests
mdemg_build/service/cmd/consolidate/main_test.go
New tests covering averageEmbeddings, buildClusters, config validation, conversion helpers, truncateID, and generateAbstractionName with table-driven and edge-case cases.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI / Config
    participant Driver as Neo4j Driver
    participant Logic as Clustering Logic
    participant Output as Results / Stats

    CLI->>Driver: Initialize & verify connection
    CLI->>Logic: Provide config (spaceID, thresholds, dry-run)
    Logic->>Driver: Query cluster candidates (Cypher)
    Driver-->>Logic: Candidate nodes + embeddings
    Logic->>Logic: Build non-overlapping clusters (greedy)
    Logic->>Logic: Average embeddings per cluster
    alt Dry-run
        Logic->>Output: Tally potential promotions & samples
    else Live-run
        Logic->>Driver: CREATE abstraction node (MemoryNode)
        Driver-->>Logic: New abstraction ID
        Logic->>Driver: CREATE ABSTRACTS_TO edges to members
        Driver-->>Logic: Edge confirmations
        Logic->>Output: Record abstraction results
    end
    Output-->>CLI: Print stats and samples
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

"I nibbled through nodes and found a kindred plot,
Neo4j hummed and clusters knit a tiny knot,
Abstractions hop forth, neat and spry,
Tests checked my trail beneath the sky—
A rabbit's cheer for code that ties the lot!" 🐇✨



📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Free

📥 Commits

Reviewing files that changed from the base of the PR and between c99a8b2 and 02a99c4.

📒 Files selected for processing (5)
  • .auto-claude-status
  • .claude_settings.json
  • .gitignore
  • mdemg_build/service/cmd/consolidate/main.go
  • mdemg_build/service/cmd/consolidate/main_test.go

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

Comment @coderabbitai help to get the list of available commands and usage tips.

rhenley1958 and others added 10 commits January 16, 2026 13:38
… struct, flag parsing, and Neo4j driver setup
…with Cypher query

Implements cluster detection logic for detecting MemoryNodes with sufficient
high-weight neighbors at the same layer:

- Add clusterCandidate struct to hold query results (NodeID, Layer, Embedding, NeighborIDs)
- Add queryClusterCandidates function with Cypher query that:
  - Matches CO_ACTIVATED_WITH edges with weight >= threshold
  - Filters to same-layer nodes (a.layer = b.layer)
  - Groups by node and collects neighbors
  - Filters to nodes with >= minClusterSize-1 neighbors
  - Orders results by neighbor count descending
- Add asStringSlice helper for Neo4j array conversion

Follows patterns from cmd/decay/main.go for Neo4j session handling and
type conversion helpers.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…candidates into non-overlapping clusters

Implements greedy first-come cluster assignment algorithm:
- Added clusterMember and cluster structs for cluster representation
- buildClusters processes candidates in order (highest neighbor count first)
- Each node can only belong to one cluster (first-come assignment)
- Clusters smaller than minSize are discarded
- Embeddings are preserved for averaging in later subtasks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…mpute centroid embedding

Add averageEmbeddings function that:
- Computes element-wise average of multiple embedding vectors
- Returns nil for empty input
- Handles mismatched dimensions gracefully by skipping invalid embeddings
- Uses validCount to ensure proper averaging even with some skipped embeddings
- Determines dimension from first valid (non-nil, non-empty) embedding

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…Cypher query

- Add createAbstraction function that creates MemoryNode at layer+1
- Creates ABSTRACTS_TO edges from cluster members to abstraction node
- Add abstractionResult struct to hold creation result (NodeID, MemberCount)
- Add generateAbstractionName helper to create readable names from member IDs
- Follows learning/service.go patterns for Neo4j session handling

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…hestrating detection, clustering, and promotion

Implements the main orchestration function for the consolidation job:
- Step 1: Query cluster candidates via queryClusterCandidates()
- Step 2: Build clusters via buildClusters() with greedy first-come assignment
- Step 3: Process clusters and create abstractions or report for dry-run

Features:
- Respects --max-promotions cap to limit number of abstraction nodes
- Supports --dry-run mode that reports what would happen without database writes
- Tracks statistics: clusters found, nodes promoted, edges created, skipped counts
- Handles edge cases: no candidates, no qualifying clusters, missing embeddings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…d output

- Enhanced printStats function with detailed statistics output
- Added clusterSample struct to track sample clusters for display
- Added sample cluster collection in runConsolidationJob (first 5 clusters)
- Output shows: clusters found, nodes promoted/to promote, edges created/to create
- Shows skipped cluster counts (no embedding, too small)
- Displays sample clusters with member counts, layer transitions, and member IDs
- Added truncateID helper for readable UUID display
- Follows decay job pattern for consistent user experience

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…erageEmb

Add comprehensive unit tests for cmd/consolidate following cmd/decay/main_test.go patterns:

- TestAverageEmbeddings: verifies centroid calculation of multiple embedding vectors
- TestAverageEmbeddings_Empty: returns nil for empty input
- TestAverageEmbeddings_MismatchedDims: handles mismatched dimensions gracefully
- TestBuildClusters: groups nodes correctly by adjacency
- TestBuildClusters_MinSize: excludes clusters smaller than min size
- TestBuildClusters_LayerPreserved: cluster layer matches candidate layer
- TestBuildClusters_EmbeddingsPreserved: embeddings copied to cluster members
- TestConfigValidation_*: min cluster size, weight threshold, max promotions
- TestAsConversionHelpers: safe conversion of Neo4j types
- TestAsFloat64Slice/TestAsStringSlice: slice conversion helpers
- TestTruncateID: ID truncation helper
- TestGenerateAbstractionName: abstraction name generation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Verified complete CLI execution against live Neo4j:
- Banner prints "MDEMG Consolidation Job" ✅
- CLI connects to Neo4j successfully ✅
- Dry-run mode displays correct configuration ✅
- Custom flags work (min-cluster-size, weight-threshold, max-promotions) ✅
- Required --space-id flag enforcement ✅
- Input validation works (min-cluster-size >= 2, weight-threshold 0-1) ✅
- All 14 unit tests pass ✅

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…quested)

Fixes:
- Remove compiled binary mdemg_build/service/consolidate from repo (7.4MB)
- Add /mdemg_build/service/consolidate to .gitignore to prevent future accidents

Verified:
- Binary removed from git tracking
- .gitignore updated with new entry

QA Fix Session: 1

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@reh3376 reh3376 force-pushed the auto-claude/005-implement-cluster-detection-and-abstraction-promot branch from c99a8b2 to 02a99c4 Compare January 16, 2026 18:38
@reh3376 reh3376 merged commit 8bc3442 into main Jan 16, 2026
1 check was pending
@reh3376 reh3376 deleted the auto-claude/005-implement-cluster-detection-and-abstraction-promot branch January 28, 2026 01:03
reh3376 pushed a commit that referenced this pull request Apr 7, 2026
G5: Replace nil guard in /readyz check #5 with a live Neo4j read query
(RETURN 1). Detects CMS degradation when Neo4j is under stress or
connections are stale. Adds Ping() method to conversation.Service.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
reh3376 pushed a commit that referenced this pull request May 4, 2026
…lag-off) + Phase 13 Epic 6 V0017 audit-writer fix + Phase 11+ feature-doc backfill (narrow close)

Narrow close per operator approval after Epic 0+1+2 produced design questions
that warrant dedicated follow-up sprints. Note 05 deferred to Phase 14.2;
Note 06 default flip deferred to Phase 14.1.

What landed
-----------

* Phase 13 Epic 6 V0017 audit-writer fix (in-flight discovery)
  - tsdb/retrieval_audit_writer.go (new, ~165 LOC; buffered + 30s flush via CopyFrom)
  - retrievalAuditAdapter in api/server.go (cycle-safe translation)
  - V0017 was empty since Phase 13 because SetRetrievalAuditWriter had no
    callers; now writes per retrieve when RETRIEVAL_AUDIT_ENABLED=true.
  - Live verification: 279 audit rows accumulated in 4h since fix landed.

* Note 06 sparse activation gate (flag-off)
  - retrieval/gate.go (~190 LOC) + 9 Tier 1 unit tests, all green
  - Wired post-aggregation, pre-rerank in service.go
  - 4 config knobs (SPARSE_*); default off, percentile 0.95, min 3, max 20
  - Per-request override via ?sparse=true|false and ?sparse_percentile=N
  - debug.sparse_gate_* + debug.below_threshold_* (when JiminyEnabled)
  - 3 Prometheus histograms

* TSDB V0019 sparse_gate_metrics
  - migrations/019_sparse_gate_metrics.sql (hypertable, 7-day chunks)
  - tsdb/sparse_gate_writer.go (~165 LOC)
  - sparseGateRecorderAdapter in api/server.go (always wired so per-request
    overrides record even when default off)
  - TSDB_REQUIRED_SCHEMA_VERSION 18 -> 19

* Epic 0 forensic doc — phase_14_score_distribution_analysis.md
  - Defaults derived from llm_interactions.retrieval_scores (99k+50k score
    points across consulting.classify + retrieval.rerank_cross)
  - Heavy-tail confirmed (p98/p50 ~ 4-5x); within-call clamp dominates
    percentile choice in dominant K=20-50 regime
  - Note 05 catalog redesign needed for whk-wms (0 distinct symbols, 0
    distinct roles) — flagged for Phase 14.2

* A/B verdicts captured
  - 16q quick at MIN=3 / p95,p98,p99: all FAIL (q69 boundary)
  - 16q quick at MIN=10 / p95: PASS (mean +0.019, 0 regressions, 3 improvements)
  - 120q full at MIN=10 / p95: FAIL per-question (mean parity 0.413=0.413,
    7 boundary regressions across 4 categories, 3 of 7 in
    architecture_structure)
  - Per sprint plan §10 risk #1: ship flag-off; Phase 14.1 will retune.

* Phase 11+ feature-doc backfill (operator request 2026-05-04)
  - new: docs/features/{mlx-watchdog,uvts-validation,column-voting-retrieval,
         local-llm-runtime,sparse-retrieval}.md
  - extended: docs/features/service-resilience.md (Phase 11.6.x additions)
  - Standing rule saved as memory feedback_per_feature_docs_required.md

* Follow-up sprint stubs scoped
  - sprint_plan_phase_14_1_adaptive_per_category_gate.md (~3 days, ~$15)
  - sprint_plan_phase_14_2_note_05_sparse_fingerprints.md (~7 days, ~$25)

Decision-fork outcomes
----------------------

| Fork | Provisional | Outcome |
|---|---|---|
| #2 percentile default | 0.98 | 0.95 (Epic 0 data) |
| #5 catalog bit policy | static 64/64/64/64 | adaptive (deferred Phase 14.2) |
| #8 gate ordering | pre-rerank | pre-rerank (confirmed) |
| #9 default flip | per-Note conditional | flag-off (Phase 14.1 will flip) |

OpenAI spend (actual): ~$13. Well under sprint $25-50 budget.

Tests + lint
------------

* go test -race ./internal/{retrieval,config,metrics,tsdb}: all green
* golangci-lint run on affected packages: 0 issues
* Live smoke: /healthz green, retrieve returns 20 (gate off), 279 V0017
  audit rows in 4h (Phase 13 Epic 6 fix verified in production)

Memory observations
-------------------

* rw0mzergwcqct8abpw0dli9x — Phase 14 Epic 8 doc-backfill scope
* sc4iwy3of9ndn5kowja1i14i — Epic 0 forensic + audit-writer gap
* omr2rs5jppqrvee2k0l1xtd1 — Epic 1 gate code complete
* re4k7rpd3hjt5a52l8qwx8fp — Epic 2 verdict + Phase 14.1 scope

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants