Skip to content

Conversation

@altafan
Copy link
Collaborator

@altafan altafan commented Nov 25, 2025

This adds context and error return value to all methods of the interfaces that compose ports.LiveStore.

In the app service, this moves from relying fully on cache to store cached data in variables that are reused in the rest of the function bodies.

Please @louisinger @sekulicd review.

Summary by CodeRabbit

  • Bug Fixes

    • Operations now honor cancellation/timeouts and fail fast on downstream store errors to avoid silent continuation.
  • Improvements

    • Broad context propagation across services and stores; APIs now return explicit errors for better observability and cancellation semantics.
    • Stronger error handling and safer concurrency in in-memory components.
  • New Features

    • Retry/backoff and stricter validation for Redis-backed stores.
    • New boarding-inputs stores (in-memory and Redis) and context-aware signing/confirmation flows.
  • Tests

    • Tests updated to use per-test contexts and handle error-aware APIs.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

Walkthrough

Propagates context.Context across live-store interfaces and implementations, changes many methods to return errors or (value, error), adds retry + WATCH/TxPipelined loops with retryDelay for Redis stores, threads ctx and explicit roundId through service orchestration, and updates tests to the ctx-aware API.

Changes

Cohort / File(s) Change Summary
Core Ports — Live-Store Interfaces
internal/core/ports/live_store.go
Add context.Context as first parameter for many methods and change return shapes to error or (value, error) across IntentStore, ForfeitTxsStore, OffChainTxStore, ConfirmationSessionsStore, TreeSigningSessionsStore, BoardingInputsStore, CurrentRoundStore.
Application Service Layer
internal/core/application/service.go
Thread ctx and explicit roundId through orchestration and event propagation; replace context.Background() uses; update many public method signatures to accept ctx and call ctx-aware store/cache methods.
Admin & Ban Call Sites
internal/core/application/admin.go, internal/core/application/ban.go
Update call sites to pass ctx into store methods (e.g., ViewAll(ctx, ...), DeleteAll(ctx), Delete(ctx, ...), GetUnsignedInputs(ctx)); add error handling and early returns on failures.
In-Memory Stores
internal/infrastructure/live-store/inmemory/*
.../intents.go, .../forfeits.go, .../offchain_txs.go, .../confirmation_session.go, .../tree_signing_session.go, .../round.go
Convert public methods to accept context.Context and return error/(value,error); adjust locking and semantics; confirmation sessions become ctx-aware with channel signalling; remove/change some boarding-inputs/round-fail behavior; Get returns (*domain.Round, error).
In-Memory BoardingInputs Store
internal/infrastructure/live-store/inmemory/boarding_inputs.go
Add thread-safe in-memory BoardingInputsStore with constructor NewBoardingInputsStore() and methods: Set, Get, AddSignatures, GetSignatures, DeleteSignatures.
Redis Stores — Core Implementations
internal/infrastructure/live-store/redis/*
.../intents.go, .../forfeits.go, .../offchain_txs.go, .../confirmation_sessions.go, .../tree_signing_session.go, .../round.go, .../boarding_inputs.go
Add numOfRetries/retryDelay; introduce retry loops and WATCH+TxPipelined transactional writes with backoff; convert APIs to be ctx-aware and return errors; improve validation and error messages; add Redis-backed boarding inputs DTOs/serialization.
Redis Store Factory
internal/infrastructure/live-store/redis/store.go
Update constructors to accept and forward numOfRetries to NewForfeitTxsStore, NewOffChainTxStore, NewTreeSigningSessionsStore.
Tests
internal/infrastructure/live-store/live_store_test.go
Update tests to supply ctx and handle new (value, error) returns across intents, forfeits, offchain txs, confirmation/tree-signing sessions, boarding inputs and current round interactions; add concurrency scaffolding and ctx-aware assertions.
Domain OffchainTx
internal/core/domain/offchain_tx.go
Adjust OffchainTx lifecycle: clear Stage.Failed and FailReason on finalization; stop updating EndingTimestamp on failure.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Service
    participant LiveStore
    participant Redis

    Note over Client,Service: Old flow (no ctx)
    Client->>Service: startRound()
    Service->>LiveStore: Push(intent)
    LiveStore-->>Service: success

    Note over Client,Redis: New flow (ctx-aware + retries)
    Client->>Service: startRound(ctx)
    Service->>LiveStore: Push(ctx, intent)
    alt In-memory store
        LiveStore-->>Service: (nil) or error
    else Redis-backed store
        LiveStore->>Redis: WATCH keys / TxPipelined (with retries & retryDelay)
        alt Success
            Redis-->>LiveStore: OK
            LiveStore-->>Service: nil
        else Retries exhausted / error
            Redis-->>LiveStore: error
            LiveStore-->>Service: error
        end
    end
    Service->>LiveStore: ViewAll(ctx, ids)
    LiveStore-->>Service: ([]TimedIntent, nil) or (nil, error)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Focus review on:
    • internal/core/application/service.go — correctness of ctx propagation across goroutines, cancellation handling, and roundId usage.
    • Redis WATCH/TxPipelined retry loops — correctness of retries, idempotency, and backoff behavior.
    • tree_signing_session & confirmation_sessions (redis + inmemory) — channel lifecycle, polling/ticker semantics, session existence checks, and error propagation.
    • currentRoundStore JSON unmarshalling and event reconstruction — error reporting and recovery.
    • Tests — ensure they reflect new ctx/error contract and that teardown/parallel tests remain stable.

Possibly related PRs

  • fix Redis LiveStore #827 — changes to Redis tree_signing_session and watcher logic; strong overlap with tree-signing session retries and ctx handling.
  • Fix redis implementation of current round #813 — context-threading changes for CurrentRound/LiveStore APIs and service interactions; closely related to Upsert/Get and ctx-aware store calls.
  • Fix stack exceed panic #649 — service round lifecycle/context propagation (startRound/startConfirmation/startFinalization) changes; related to service.go edits.

Suggested reviewers

  • sekulicd

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title "Update live store" is vague and generic, using a non-descriptive term that doesn't convey the specific nature of the changes (context propagation and error handling additions to all store interface methods). Consider a more specific title like "Add context and error handling to live store interfaces" to clearly communicate the primary architectural change being made.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch update-redis

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66a1a3a and 01b0e49.

📒 Files selected for processing (3)
  • internal/infrastructure/live-store/inmemory/forfeits.go (5 hunks)
  • internal/infrastructure/live-store/live_store_test.go (8 hunks)
  • internal/infrastructure/live-store/redis/forfeits.go (8 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/infrastructure/live-store/redis/forfeits.go
  • internal/infrastructure/live-store/inmemory/forfeits.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern.

Applied to files:

  • internal/infrastructure/live-store/live_store_test.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern. A common bug is manually base64-decoding the string and then passing b64=true, which causes a double-decode error.

Applied to files:

  • internal/infrastructure/live-store/live_store_test.go
🧬 Code graph analysis (1)
internal/infrastructure/live-store/inmemory/forfeits.go (3)
pkg/ark-lib/tree/tx_tree.go (1)
  • FlatTxTree (36-36)
internal/core/domain/vtxo.go (1)
  • Outpoint (15-18)
internal/core/ports/tx_builder.go (1)
  • ValidForfeitTx (49-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration tests
  • GitHub Check: unit tests
  • GitHub Check: Build and Scan
🔇 Additional comments (4)
internal/infrastructure/live-store/live_store_test.go (1)

9-9: LGTM: Comprehensive test updates for context-aware API.

The test file properly exercises the new context-aware store methods with:

  • Context propagation via ctx := t.Context() in each test
  • Proper error handling for all new error returns
  • Concurrent operation tests with correct WaitGroup synchronization
  • Mid-session deletion scenarios for ForfeitTxs, ConfirmationSessions, and TreeSigningSessions
  • Expanded fixtures to support additional test scenarios

Also applies to: 20-20, 45-64, 112-757

internal/infrastructure/live-store/inmemory/forfeits.go (2)

153-164: LGTM: Read lock properly added to AllSigned.

The method now correctly acquires RLock before iterating over m.forfeitTxs, preventing data races with concurrent Sign or Reset operations.


184-190: LGTM: GetConnectorsIndexes now returns a cloned map.

Using maps.Clone prevents external callers from mutating the internal m.connectorsIndex after the lock is released.

internal/infrastructure/live-store/redis/forfeits.go (1)

40-116: LGTM: Context-aware retry logic properly implemented.

Init, Sign, and Reset methods correctly:

  • Accept context for cancellation/timeout support
  • Use Watch/TxPipelined for atomic operations
  • Include retry delays (time.Sleep(s.retryDelay)) between attempts
  • Return descriptive errors after exhausting retries

Also applies to: 118-182, 184-206


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
internal/infrastructure/live-store/inmemory/tree_signing_session.go (1)

88-93: Potential deadlock: sending on unbuffered channel while holding lock.

AddNonces sends on nonceCollectedCh (line 91) while holding the write lock. Since the channel is unbuffered, this blocks until a receiver is ready. If the receiver needs to call any method on this store (e.g., Get), it will deadlock waiting for the lock.

Consider using a buffered channel or sending outside the lock:

 func (s *treeSigningSessionsStore) AddNonces(
 	_ context.Context, roundId string, pubkey string, nonces tree.TreeNonces,
 ) error {
 	s.lock.Lock()
-	defer s.lock.Unlock()
 	// ... validation ...
 	s.sessions[roundId].Nonces[pubkey] = nonces

+	shouldNotify := len(s.sessions[roundId].Nonces) == s.sessions[roundId].NbCosigners-1
+	ch := s.nonceCollectedCh[roundId]
+	s.lock.Unlock()
+
-	if len(s.sessions[roundId].Nonces) == s.sessions[roundId].NbCosigners-1 {
-		s.nonceCollectedCh[roundId] <- struct{}{}
+	if shouldNotify {
+		ch <- struct{}{}
 	}
 	return nil
 }

The same pattern applies to AddSignatures at lines 113-117.

internal/infrastructure/live-store/redis/round.go (1)

36-66: Potential stale-read issue in Upsert.

The Get call at Line 39 is performed outside the Watch transaction, and the update function fn(round) is computed at Line 46 before entering the retry loop. If another process modifies the round between Get and Watch, the update will be based on stale data. The Watch only protects the write, not the read-modify-write cycle.

Consider moving the Get and fn invocation inside the Watch callback to ensure atomicity:

 func (s *currentRoundStore) Upsert(
 	ctx context.Context, fn func(m *domain.Round) *domain.Round,
 ) error {
-	round, err := s.Get(ctx)
-	if err != nil {
-		return err
-	}
-	if round == nil {
-		round = &domain.Round{}
-	}
-	updated := fn(round)
-	val, err := json.Marshal(updated)
-	if err != nil {
-		return err
-	}
-
 	for range s.numOfRetries {
 		if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error {
+			data, err := tx.Get(ctx, currentRoundKey).Bytes()
+			var round *domain.Round
+			if err != nil && !errors.Is(err, redis.Nil) {
+				return err
+			}
+			if errors.Is(err, redis.Nil) || len(data) == 0 {
+				round = &domain.Round{}
+			} else {
+				// unmarshal round from data...
+			}
+			updated := fn(round)
+			val, err := json.Marshal(updated)
+			if err != nil {
+				return err
+			}
 			_, err = tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error {
 				pipe.Set(ctx, currentRoundKey, val, 0)
 				return nil
 			})
-
 			return err
-		}); err == nil {
+		}, currentRoundKey); err == nil {
 			return nil
 		}
 		time.Sleep(s.retryDelay)
 	}
internal/infrastructure/live-store/redis/intents.go (1)

167-172: Unchecked Redis errors may cause silent failures.

The SRem and SAdd calls here don't check for errors. If Redis fails mid-operation, intents could be deleted from the KV store but not removed from the ID set, or vtxos could fail to be added to the removal list.

Consider checking these errors or wrapping the operations in a transaction for atomicity.

-		s.rdb.SRem(ctx, intentStoreIdsKey, intent.Id)
+		if err := s.rdb.SRem(ctx, intentStoreIdsKey, intent.Id).Err(); err != nil {
+			return nil, fmt.Errorf("failed to remove intent id %s from set: %v", intent.Id, err)
+		}
 	}
 
 	if len(vtxosToRemove) > 0 {
-		s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, vtxosToRemove)
+		if err := s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, vtxosToRemove...).Err(); err != nil {
+			return nil, fmt.Errorf("failed to add vtxos to remove list: %v", err)
+		}
 	}
🧹 Nitpick comments (12)
internal/infrastructure/live-store/inmemory/offchain_txs.go (1)

31-36: Silently ignoring PSBT parse errors may cause inconsistent input tracking.

If psbt.NewFromRawBytes fails for a malformed checkpoint transaction, the inputs from that transaction won't be added to the inputs map. This could lead to inconsistent state where some inputs are tracked and others are not.

Consider propagating the error or at least logging the failure:

 for _, tx := range offchainTx.CheckpointTxs {
-	ptx, _ := psbt.NewFromRawBytes(strings.NewReader(tx), true)
+	ptx, err := psbt.NewFromRawBytes(strings.NewReader(tx), true)
+	if err != nil {
+		return fmt.Errorf("failed to parse checkpoint tx: %w", err)
+	}
 	for _, in := range ptx.UnsignedTx.TxIn {
 		m.inputs[in.PreviousOutPoint.String()] = struct{}{}
 	}
 }

The same pattern exists in Remove at line 50.

internal/infrastructure/live-store/redis/offchain_txs.go (2)

49-66: Retry loop ignores context cancellation.

The retry loop sleeps unconditionally without checking if the context has been cancelled. This means the operation will continue retrying even after the caller has given up.

Consider checking context before retrying:

 for range s.numOfRetries {
 	if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error {
 		// ... pipeline logic ...
 	}); err == nil {
 		return nil
 	}
+	select {
+	case <-ctx.Done():
+		return fmt.Errorf("context cancelled while adding offchain tx %s: %w", offchainTx.ArkTxid, ctx.Err())
+	default:
+	}
 	time.Sleep(s.retryDelay)
 }

The same pattern applies to Remove at lines 94-108.


36-47: PSBT parse error silently ignored in Add.

Same issue as the in-memory implementation: if psbt.NewFromRawBytes fails, the inputs from that checkpoint transaction won't be tracked. This could lead to the Includes check returning incorrect results.

 for _, tx := range offchainTx.CheckpointTxs {
-	ptx, _ := psbt.NewFromRawBytes(strings.NewReader(tx), true)
+	ptx, err := psbt.NewFromRawBytes(strings.NewReader(tx), true)
+	if err != nil {
+		return fmt.Errorf("failed to parse checkpoint tx: %w", err)
+	}
 	for _, in := range ptx.UnsignedTx.TxIn {
 		inputs = append(inputs, in.PreviousOutPoint.String())
 	}
 }
internal/infrastructure/live-store/redis/boarding_inputs.go (2)

97-120: Variable shadowing: key reused in loop.

The loop variable key on line 106 shadows the outer key variable from line 100. Consider renaming to field for clarity.

-	for key, value := range values {
+	for field, value := range values {
 		rawSig := &sigsDTO{}
 		sig, err := rawSig.deserialize([]byte(value))
 		if err != nil {
-			return nil, fmt.Errorf("malformed signatures for input %s in storage: %v", key, err)
+			return nil, fmt.Errorf("malformed signatures for input %s in storage: %v", field, err)
 		}
-		inIndex, err := strconv.Atoi(key)
+		inIndex, err := strconv.Atoi(field)

122-125: Consider adding retry logic for consistency.

DeleteSignatures doesn't use retry logic unlike Set and AddSignatures. While Del is simpler, adding retry logic would make the error handling consistent across write operations.

internal/infrastructure/live-store/redis/round.go (1)

163-164: Unknown event types are silently skipped.

When an unknown eventType is encountered, the code silently continues without logging or returning an error. While this may be intentional for forward compatibility, it could mask data corruption or version mismatches. Consider adding a warning log to aid debugging.

 		default:
+			log.Warnf("unknown event type %d in round storage, skipping", eventType)
 			continue
 		}
internal/infrastructure/live-store/redis/tree_signing_session.go (1)

278-279: Silently ignoring Redis errors in watch goroutine.

Errors from HGetAll are discarded. While transient errors should allow the loop to continue, persistent failures will cause silent infinite polling. Consider logging the error.

-			noncesMap, _ := s.rdb.HGetAll(ctx, noncesKey).Result()
+			noncesMap, err := s.rdb.HGetAll(ctx, noncesKey).Result()
+			if err != nil {
+				log.Warnf("watchNoncesCollected: failed to get nonces: %v", err)
+				continue
+			}

The same pattern applies at Line 329 for sigsMap.

internal/core/application/service.go (2)

2275-2289: Inconsistent error handling on cache cleanup failures.

The cleanup operations in startRound log warnings/errors but continue execution. While this is reasonable for cleanup operations, the error messages reference existingRound.Id which could be empty or nil if this is the first round.

 	if err := s.cache.ForfeitTxs().Reset(ctx); err != nil {
-		log.WithError(err).Warnf(
-			"failed to delete forfeit txs from cache for round %s", existingRound.Id,
-		)
+		if existingRound != nil {
+			log.WithError(err).Warnf(
+				"failed to delete forfeit txs from cache for round %s", existingRound.Id,
+			)
+		} else {
+			log.WithError(err).Warn("failed to reset forfeit txs from cache")
+		}
 	}

Similar patterns apply to the subsequent cleanup calls.


3183-3188: Address the TODO comment.

There's a TODO about testing broadcast before updating storage. This ordering could lead to inconsistent state if the broadcast succeeds but subsequent storage updates fail. Consider implementing the suggested pattern or creating a tracking issue.

Do you want me to open a new issue to track this improvement?

internal/infrastructure/live-store/redis/intents.go (2)

271-292: Inconsistent error handling in Delete method.

The method returns errors from Get but ignores errors from SRem (lines 282, 289) and only logs errors from Delete (line 286). This inconsistency could leave orphaned entries in the vtxos set if Redis operations fail.

If the intent is idempotent deletion, consider at least logging the SRem failures.

 		for _, vtxo := range intent.Inputs {
-			s.rdb.SRem(ctx, intentStoreVtxosKey, vtxo.Outpoint.String())
+			if err := s.rdb.SRem(ctx, intentStoreVtxosKey, vtxo.Outpoint.String()).Err(); err != nil {
+				log.Warnf("delete: failed to remove vtxo %s from set: %v", vtxo.Outpoint.String(), err)
+			}
 		}
 
 		if err := s.intents.Delete(ctx, id); err != nil {
 			log.Warnf("delete:failed to delete intent %s: %v", id, err)
 		}
 
-		s.rdb.SRem(ctx, intentStoreIdsKey, id)
+		if err := s.rdb.SRem(ctx, intentStoreIdsKey, id).Err(); err != nil {
+			log.Warnf("delete: failed to remove intent id %s from set: %v", id, err)
+		}

347-357: Consider returning error from IncludesAny.

The method logs errors but cannot propagate them to callers due to the interface signature. Callers cannot distinguish between "not found" and "Redis error". If this distinction matters, consider updating the interface to return an error.

internal/infrastructure/live-store/live_store_test.go (1)

433-550: Minor style inconsistency in context usage.

The test uses ctx := t.Context() for early operations (lines 434-448) but then uses t.Context() directly for later operations (lines 515+). This works correctly but is slightly inconsistent. Consider using the ctx variable throughout for consistency.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 099487a and ce1e8bd.

📒 Files selected for processing (20)
  • internal/core/application/admin.go (2 hunks)
  • internal/core/application/ban.go (1 hunks)
  • internal/core/application/service.go (59 hunks)
  • internal/core/ports/live_store.go (2 hunks)
  • internal/infrastructure/live-store/inmemory/boarding_inputs.go (1 hunks)
  • internal/infrastructure/live-store/inmemory/confirmation_session.go (4 hunks)
  • internal/infrastructure/live-store/inmemory/forfeits.go (5 hunks)
  • internal/infrastructure/live-store/inmemory/intents.go (9 hunks)
  • internal/infrastructure/live-store/inmemory/offchain_txs.go (3 hunks)
  • internal/infrastructure/live-store/inmemory/round.go (1 hunks)
  • internal/infrastructure/live-store/inmemory/tree_signing_session.go (3 hunks)
  • internal/infrastructure/live-store/live_store_test.go (8 hunks)
  • internal/infrastructure/live-store/redis/boarding_inputs.go (1 hunks)
  • internal/infrastructure/live-store/redis/confirmation_sessions.go (4 hunks)
  • internal/infrastructure/live-store/redis/forfeits.go (8 hunks)
  • internal/infrastructure/live-store/redis/intents.go (10 hunks)
  • internal/infrastructure/live-store/redis/offchain_txs.go (2 hunks)
  • internal/infrastructure/live-store/redis/round.go (6 hunks)
  • internal/infrastructure/live-store/redis/store.go (1 hunks)
  • internal/infrastructure/live-store/redis/tree_signing_session.go (5 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/infrastructure/live-store/redis/confirmation_sessions.go
  • internal/infrastructure/live-store/inmemory/confirmation_session.go
  • internal/infrastructure/live-store/inmemory/forfeits.go
  • internal/core/application/service.go
  • internal/infrastructure/live-store/live_store_test.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern. A common bug is manually base64-decoding the string and then passing b64=true, which causes a double-decode error.

Applied to files:

  • internal/infrastructure/live-store/redis/round.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern.

Applied to files:

  • internal/infrastructure/live-store/redis/round.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.

Applied to files:

  • internal/core/application/service.go
🧬 Code graph analysis (14)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (3)
internal/core/ports/tx_builder.go (1)
  • SignedBoardingInput (54-57)
internal/infrastructure/live-store/redis/boarding_inputs.go (1)
  • NewBoardingInputsStore (23-29)
internal/core/ports/live_store.go (1)
  • BoardingInputsStore (82-90)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
internal/infrastructure/live-store/redis/boarding_inputs.go (3)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (1)
  • NewBoardingInputsStore (17-22)
internal/core/ports/live_store.go (1)
  • BoardingInputsStore (82-90)
internal/core/ports/tx_builder.go (1)
  • SignedBoardingInput (54-57)
internal/infrastructure/live-store/redis/offchain_txs.go (2)
internal/infrastructure/live-store/inmemory/offchain_txs.go (1)
  • NewOffChainTxStore (19-24)
internal/core/ports/live_store.go (1)
  • OffChainTxStore (49-54)
internal/infrastructure/live-store/inmemory/confirmation_session.go (1)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
internal/infrastructure/live-store/redis/store.go (4)
internal/infrastructure/live-store/redis/forfeits.go (1)
  • NewForfeitTxsStore (29-38)
internal/infrastructure/live-store/redis/offchain_txs.go (1)
  • NewOffChainTxStore (28-34)
internal/infrastructure/live-store/redis/round.go (1)
  • NewCurrentRoundStore (28-34)
internal/infrastructure/live-store/redis/tree_signing_session.go (1)
  • NewTreeSigningSessionsStore (38-50)
internal/infrastructure/live-store/redis/tree_signing_session.go (3)
internal/infrastructure/live-store/inmemory/tree_signing_session.go (1)
  • NewTreeSigningSessionsStore (19-26)
internal/core/ports/live_store.go (2)
  • TreeSigningSessionsStore (70-80)
  • MusigSigningSession (104-110)
pkg/ark-lib/tree/musig2.go (2)
  • TreeNonces (32-32)
  • TreePartialSigs (98-98)
internal/infrastructure/live-store/inmemory/forfeits.go (5)
pkg/ark-lib/tree/tx_tree.go (1)
  • FlatTxTree (36-36)
internal/infrastructure/db/postgres/sqlc/queries/models.go (2)
  • Intent (33-38)
  • Tx (178-185)
internal/infrastructure/db/sqlite/sqlc/queries/models.go (2)
  • Intent (31-36)
  • Tx (165-172)
internal/core/domain/vtxo.go (1)
  • Outpoint (15-18)
internal/core/ports/tx_builder.go (1)
  • ValidForfeitTx (49-52)
internal/infrastructure/live-store/inmemory/tree_signing_session.go (2)
internal/core/ports/live_store.go (1)
  • MusigSigningSession (104-110)
pkg/ark-lib/tree/musig2.go (2)
  • TreeNonces (32-32)
  • TreePartialSigs (98-98)
internal/infrastructure/live-store/redis/round.go (3)
internal/infrastructure/live-store/inmemory/round.go (1)
  • NewCurrentRoundStore (16-18)
internal/core/ports/live_store.go (1)
  • CurrentRoundStore (56-59)
internal/core/domain/round_event.go (3)
  • RoundStarted (17-20)
  • RoundFinalizationStarted (22-30)
  • IntentsRegistered (45-48)
internal/infrastructure/live-store/inmemory/intents.go (2)
internal/core/ports/tx_builder.go (1)
  • BoardingInput (44-47)
internal/core/ports/live_store.go (1)
  • TimedIntent (92-97)
internal/infrastructure/live-store/redis/intents.go (3)
internal/infrastructure/live-store/inmemory/intents.go (1)
  • NewIntentStore (22-33)
internal/core/ports/live_store.go (2)
  • IntentStore (22-36)
  • TimedIntent (92-97)
internal/infrastructure/live-store/redis/kv.go (1)
  • NewRedisKVStore (18-20)
internal/core/application/service.go (3)
internal/core/domain/vtxo.go (1)
  • Outpoint (15-18)
internal/core/domain/round.go (2)
  • Round (41-59)
  • ForfeitTx (36-39)
internal/core/domain/round_event.go (2)
  • RoundFailed (39-43)
  • RoundEvent (9-12)
internal/infrastructure/live-store/live_store_test.go (6)
internal/infrastructure/db/postgres/sqlc/queries/models.go (1)
  • Intent (33-38)
internal/infrastructure/db/sqlite/sqlc/queries/models.go (1)
  • Intent (31-36)
pkg/errors/errors.go (1)
  • Error (39-46)
internal/core/application/types.go (2)
  • Outpoint (177-177)
  • VOut (131-131)
internal/core/domain/vtxo.go (1)
  • Outpoint (15-18)
internal/infrastructure/wallet/wallet_client.go (1)
  • New (31-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration tests
  • GitHub Check: Build and Scan
  • GitHub Check: unit tests
🔇 Additional comments (47)
internal/infrastructure/live-store/inmemory/offchain_txs.go (1)

59-68: LGTM!

The Get method correctly returns (nil, nil) for not-found cases, aligning with the interface contract. Thread-safety is properly maintained with the read lock.

internal/infrastructure/live-store/inmemory/forfeits.go (1)

29-87: LGTM!

The Init method properly validates inputs, handles edge cases (no connectors, vtxos > connectors), and maintains thread-safety with the mutex. The context-aware signature aligns with the interface contract.

internal/core/application/ban.go (1)

150-161: Consider returning early if GetUnsignedInputs fails.

When GetUnsignedInputs returns an error, the code logs a warning but continues with a potentially nil/empty unsignedVtxoKeys. While this graceful degradation prevents crashes, it means the ban logic silently fails to identify users who should be banned.

If banning is a critical security measure, consider returning early on error:

 unsignedVtxoKeys, err := s.cache.ForfeitTxs().GetUnsignedInputs(ctx)
 if err != nil {
 	log.WithError(err).Warn("failed to get unsigned inputs from cache")
+	return
 }

Alternatively, if the current behavior is intentional (graceful degradation), add a comment explaining the rationale.

internal/core/application/admin.go (1)

314-375: LGTM!

The context propagation to ViewAll, DeleteAll, and Delete calls correctly aligns with the updated live-store interface signatures. Error handling remains intact.

internal/infrastructure/live-store/inmemory/tree_signing_session.go (1)

54-69: LGTM!

The Delete method correctly closes channels before removing entries from maps, preventing goroutine leaks. The cleanup order is appropriate.

internal/infrastructure/live-store/redis/offchain_txs.go (1)

114-137: LGTM!

The Get and Includes methods properly handle redis.Nil for not-found cases and return appropriate error messages for other failures. The return semantics align with the interface contract.

internal/infrastructure/live-store/inmemory/round.go (2)

20-27: LGTM!

Context propagation correctly added to Upsert. The unused context parameter is appropriately ignored for the in-memory implementation.


28-32: LGTM!

Get now returns an error as the second value, consistent with the interface changes.

internal/infrastructure/live-store/redis/store.go (1)

21-22: LGTM!

Constructor calls correctly updated to pass numOfRetries to NewForfeitTxsStore, NewOffChainTxStore, and NewTreeSigningSessionsStore, enabling consistent retry behavior across all Redis-backed stores.

Also applies to: 25-25

internal/infrastructure/live-store/inmemory/confirmation_session.go (5)

28-41: LGTM!

Init correctly updated with context parameter and error return value.


43-67: LGTM!

Confirm properly accepts context. The existing logic for confirmation tracking and session completion signaling remains intact.


69-78: LGTM!

Get now returns the tuple (*ports.ConfirmationSessions, error) consistent with the interface changes.


80-89: LGTM!

Reset correctly updated with context and error return.


91-95: LGTM!

Initialized now accepts context parameter.

internal/infrastructure/live-store/redis/boarding_inputs.go (5)

17-29: LGTM!

Constructor follows the established pattern for Redis-backed stores with retry configuration.


31-48: LGTM!

Set correctly implements retry logic with Watch/TxPipelined pattern.


50-56: Consider handling redis.Nil for uninitialized state.

When the key doesn't exist, this returns -1 with the redis.Nil error. Callers need to distinguish between "not set" and actual Redis errors. Consider returning 0, nil for missing keys if that represents a valid uninitialized state, or document the expected error handling.


145-164: LGTM!

newSigsDTO correctly converts domain types to DTO with hex encoding for cryptographic fields.


170-212: LGTM!

Deserialization properly reconstructs PSBT structures with appropriate type conversions for SigHashType and TapscriptLeafVersion.

internal/infrastructure/live-store/redis/confirmation_sessions.go (6)

46-47: LGTM!

Added retryDelay field for configurable retry timing, consistent with other Redis stores.

Also applies to: 59-59


65-92: LGTM!

Init correctly implements retry logic with proper transaction handling. The atomic delete-then-set pattern ensures clean initialization.


131-155: LGTM!

Get properly retrieves all confirmation session state and constructs the response struct.


157-191: LGTM!

Reset correctly handles cleanup with retry logic and properly restarts the watch goroutine with a new context. Based on learnings, using context.Background() for the watch goroutine is the correct pattern to decouple from request lifetime.


193-196: LGTM!

Initialized now accepts context for consistency with the interface.


204-246: LGTM!

watchSessionCompletion implementation is well-designed with proper panic recovery and context cancellation handling.

internal/infrastructure/live-store/redis/tree_signing_session.go (1)

82-88: LGTM!

Using context.Background() for the watch goroutines is correct. These goroutines need to outlive the request context that created the session, so decoupling them from the request lifetime is the right design. Based on learnings from PR 686.

internal/infrastructure/live-store/inmemory/intents.go (1)

35-46: LGTM!

The context-aware signatures with error returns align well with the Redis-backed store interface. Ignoring the context parameter is appropriate for in-memory implementations.

internal/infrastructure/live-store/redis/forfeits.go (1)

195-224: LGTM!

The Pop method correctly reads all data before calling Reset, and properly propagates the error if reset fails. The caller can retry if needed.

internal/core/application/service.go (1)

2598-2614: LGTM!

Good error recovery pattern. When the round cannot be retrieved from cache, the code creates a RoundFailed event with proper metadata, saves it, and starts a new round. This ensures the system can recover from cache failures gracefully.

internal/infrastructure/live-store/redis/intents.go (7)

22-36: LGTM on struct and constructor changes.

The addition of retryDelay as a configurable field with a sensible default of 10ms is appropriate for retry backoff logic.


38-56: LGTM on Len method.

Proper context propagation and error handling. Returning -1 on error is a reasonable sentinel value.


58-123: LGTM on Push method with retry logic.

The use of s.rdb.SIsMember (not tx.SIsMember) inside the WATCH callback is intentional and correct—reads happen against the client while the WATCH mechanism detects conflicts, and writes are pipelined atomically.


187-193: LGTM on GetSelectedIntents.

Clean implementation with proper error propagation.


228-269: LGTM on Update method.

Proper validation and error handling. The input/output sum validation ensures data integrity.


294-321: LGTM on DeleteAll with retry logic.

The retry mechanism for the transactional set deletions is well-implemented. The individual KV deletions being outside the transaction is acceptable since they're keyed by ID and orphaned entries would be harmless.


323-345: LGTM on DeleteVtxos.

Well-structured retry logic with proper transactional cleanup.

internal/core/ports/live_store.go (4)

22-36: LGTM on IntentStore interface updates.

The context and error additions are consistent and well-designed. The IncludesAny method returning (bool, string) without an error is a design choice that prioritizes simplicity for callers, though it means Redis errors are logged but not propagated.


38-54: LGTM on ForfeitTxsStore and OffChainTxStore interfaces.

All methods properly accept context and return errors where appropriate.


56-90: LGTM on remaining store interfaces.

The Initialized method returning just bool without an error is acceptable for a simple state check. Context propagation is consistent across all methods.


92-116: LGTM on data structures.

TimedIntent, MusigSigningSession, and ConfirmationSessions structs are unchanged and appropriate for the updated interfaces.

internal/infrastructure/live-store/live_store_test.go (7)

76-98: LGTM on test setup.

Good pattern of testing both in-memory and Redis implementations with the same test suite, ensuring consistent behavior across implementations.


100-205: LGTM on IntentStore tests.

Comprehensive coverage of all IntentStore methods with proper context propagation and error assertions.


207-252: LGTM on ForfeitTxsStore tests.

Good coverage of the forfeit transaction lifecycle with proper context and error handling.


254-294: LGTM on OffChainTxStore tests.

Good coverage including edge cases like getting non-existent entries and removing non-existent entries.


296-308: LGTM on CurrentRoundStore tests.

Properly tests the basic round upsert and retrieval flow.


310-360: LGTM on ConfirmationSessionsStore tests.

Good async testing pattern with proper timeout handling and context propagation through the confirmation flow.


362-431: LGTM on TreeSigningSessionsStore tests.

Comprehensive testing of the signing session lifecycle. The use of context.Background() in the inner goroutine (line 388) is acceptable—it ensures the async operations complete independently of the test context's lifecycle.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
internal/core/application/service.go (2)

3344-3362: Error fetching selected intents still prevents RoundFailed propagation

On domain.RoundFailed you still abort event propagation when reading selected intents fails:

case domain.RoundFailed:
    intents, err := s.cache.Intents().GetSelectedIntents(ctx)
    if err != nil {
        log.WithError(err).Warn("failed to get selected intents")
        return
    }
    ...
    lastEvent = RoundFailed{ev, topics}

If the cache is temporarily unavailable, clients never see the RoundFailed event, which is more harmful than omitting topic details.

You can keep logging the error but fall back to an empty topic list:

- intents, err := s.cache.Intents().GetSelectedIntents(ctx)
- if err != nil {
-     log.WithError(err).Warn("failed to get selected intents")
-     return
- }
-
- topics := make([]string, 0, len(intents))
+ intents, err := s.cache.Intents().GetSelectedIntents(ctx)
+ if err != nil {
+     log.WithError(err).Warn("failed to get selected intents, propagating RoundFailed without topics")
+ }
+
+ topics := make([]string, 0, len(intents))
  for _, intent := range intents {
     ...

This preserves the critical failure signal while degrading gracefully on cache issues.


3330-3334: Connector index cache error still aborts RoundFinalizationStarted event propagation

On RoundFinalizationStarted you still return early if GetConnectorsIndexes fails:

connectorsIndex, err := s.cache.ForfeitTxs().GetConnectorsIndexes(ctx)
if err != nil {
    log.WithError(err).Warn("failed to get connectors index")
    return
}

This prevents connector tree events (and thus parts of round finalization info) from reaching clients, even though the main round events are available.

Consider logging and proceeding with an empty index to keep event propagation best‑effort:

- connectorsIndex, err := s.cache.ForfeitTxs().GetConnectorsIndexes(ctx)
- if err != nil {
-     log.WithError(err).Warn("failed to get connectors index")
-     return
- }
+ connectorsIndex, err := s.cache.ForfeitTxs().GetConnectorsIndexes(ctx)
+ if err != nil {
+     log.WithError(err).Warn("failed to get connectors index, propagating without connector topics")
+     connectorsIndex = make(map[string]domain.Outpoint)
+ }

This matches the earlier review suggestion and avoids silently dropping RoundFinalizationStarted propagation when Redis is flaky.

🧹 Nitpick comments (2)
internal/core/application/service.go (2)

2271-2292: Guard logging on existingRound to avoid relying on non‑nil contract

Here you read existingRound.Id unconditionally for logging:

existingRound, err := s.cache.CurrentRound().Get(ctx)
...
if err := s.cache.ForfeitTxs().Reset(ctx); err != nil {
    log.WithError(err).Warnf(
        "failed to delete forfeit txs from cache for round %s", existingRound.Id,
    )
}
...
if err := s.cache.Intents().DeleteVtxos(ctx); err != nil {
    log.WithError(err).Warnf(
        "failed to delete spent vtxos from cache after round %s", existingRound.Id,
    )
}
...
if err := s.cache.ConfirmationSessions().Reset(ctx); err != nil {
    log.WithError(err).Errorf(
        "failed to reset confirmation session from cache for round %s", existingRound.Id,
    )
}

This assumes CurrentRound().Get will never return (*domain.Round)(nil) when err == nil. If that contract is ever broken (e.g., a “not found” mapped to (nil, nil)), these logs will panic even though you’re just trying to reset caches.

Safer pattern:

 existingRound, err := s.cache.CurrentRound().Get(ctx)
 ...
- if err := s.cache.ForfeitTxs().Reset(ctx); err != nil {
-     log.WithError(err).Warnf(
-         "failed to delete forfeit txs from cache for round %s", existingRound.Id,
-     )
- }
+ roundID := "<none>"
+ if existingRound != nil {
+     roundID = existingRound.Id
+ }
+
+ if err := s.cache.ForfeitTxs().Reset(ctx); err != nil {
+     log.WithError(err).Warnf(
+         "failed to delete forfeit txs from cache for round %s", roundID,
+     )
+ }
  ...
- if err := s.cache.Intents().DeleteVtxos(ctx); err != nil {
+ if err := s.cache.Intents().DeleteVtxos(ctx); err != nil {
      log.WithError(err).Warnf(
-         "failed to delete spent vtxos from cache after round %s", existingRound.Id,
+         "failed to delete spent vtxos from cache after round %s", roundID,
      )
  }
  if err := s.cache.ConfirmationSessions().Reset(ctx); err != nil {
      log.WithError(err).Errorf(
-         "failed to reset confirmation session from cache for round %s", existingRound.Id,
+         "failed to reset confirmation session from cache for round %s", roundID,
      )
  }

This keeps behavior the same when a round exists, but protects against future changes in the cache implementation.


3454-3479: Consider using service context instead of context.Background() in forfeits/boarding watcher

checkForfeitsAndBoardingSigsSent currently hard‑codes context.Background():

func (s *service) checkForfeitsAndBoardingSigsSent(commitmentTxid string) {
    ctx := context.Background()
    ...
    sigs, err := s.cache.BoardingInputs().GetSignatures(ctx, commitmentTxid)
    ...
    numOfBoardingInputs, err := s.cache.BoardingInputs().Get(ctx)
    ...
    allForfeitTxsSigned, err := s.cache.ForfeitTxs().AllSigned(ctx)

Since this function is tied to a specific round and is invoked from SubmitForfeitTxs / SignCommitmentTx, it arguably should respect the service lifecycle. Using s.ctx (or receiving a ctx parameter) would allow Stop() to cancel any long‑running Redis calls rather than leaving them uncancellable.

Not a correctness bug, but you may want to align with the rest of the round pipeline which already gates on s.ctx.Done().

Based on learnings, this is about service‑lifetime cancellation, not request‑lifetime, so it shouldn’t conflict with the intentional context.Background() usage in reactToFraud.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce1e8bd and 26531b8.

📒 Files selected for processing (1)
  • internal/core/application/service.go (59 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/core/application/service.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.

Applied to files:

  • internal/core/application/service.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build and Scan
  • GitHub Check: integration tests
  • GitHub Check: unit tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
internal/core/application/service.go (1)

2552-2587: Inconsistent error handling pattern may cause unexpected behavior.

Several error paths in startConfirmation call round.Fail() but don't set roundAborted = true, while the defer uses roundAborted to decide whether to proceed to finalization or restart:

Lines 2553-2556, 2560, 2566, 2585 call round.Fail() without setting roundAborted = true. This means:

  • The defer will save the failed round events (correct)
  • Then check round.IsFailed() and call startRound() (lines 2399-2402)
  • But it will also potentially call startFinalization() if the failed check doesn't catch it

Compare this to other error paths (lines 2409-2412, 2426-2429, 2434-2437) which set roundAborted = true and return, causing the defer to call startRound() immediately without checking round.IsFailed().

The inconsistency creates two different error-handling flows. Consider standardizing: either always set roundAborted = true when calling round.Fail(), or remove the roundAborted variable and rely solely on round.IsFailed().

internal/infrastructure/live-store/redis/tree_signing_session.go (1)

52-91: Watch is called without specifying keys to watch.

The New method (and similarly Delete, AddNonces, AddSignatures) uses Watch without passing any keys. This pattern appears throughout the Redis stores in this PR. Without watched keys, the WATCH/MULTI/EXEC pattern doesn't provide the intended optimistic locking — it will always succeed on the first attempt (assuming no errors), making the retry loop effectively unused for concurrency control.

-		if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error {
+		if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error {
 			_, err := tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error {
 				pipe.HSet(ctx, metaKey, meta)
 				return nil
 			})
 			return err
-		}); err == nil {
+		}, metaKey); err == nil {
internal/infrastructure/live-store/redis/forfeits.go (1)

48-88: Potential out-of-bounds when building connIndex.

In Init, this loop can index past the end of vtxosToSign:

if len(vtxosToSign) > len(connectorsOutpoints) {
    return fmt.Errorf("more vtxos to sign than outpoints, %d > %d", ...)
}
for i, connectorOutpoint := range connectorsOutpoints {
    connIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint
}

If len(connectorsOutpoints) > len(vtxosToSign), the check passes but the loop still runs over all connectors and will panic once i >= len(vtxosToSign).

Limit the loop to the number of VTXOs (or loop over vtxosToSign and index connectorsOutpoints) to avoid panics:

-       for i, connectorOutpoint := range connectorsOutpoints {
-           connIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint
-       }
+       for i, vtxo := range vtxosToSign {
+           connectorOutpoint := connectorsOutpoints[i]
+           connIndex[connectorOutpoint.String()] = vtxo.Outpoint
+       }

This keeps the existing invariant check and ensures a safe one-to-one mapping.

internal/infrastructure/live-store/redis/intents.go (1)

125-185: Fix nil dereference and SAdd semantics in Pop (lines 138-145, 171).

Two critical issues confirmed in the Pop function:

  1. Nil dereference at line 143: s.intents.Get() returns (nil, nil) when the entry is not found (via redis.Nil). The current error check only guards against non-nil errors, leaving a gap where intent can be nil when accessed at line 143:

    intent, err := s.intents.Get(ctx, id)
    if err != nil {
        return nil, fmt.Errorf("failed to get intent %s: %v", id, err)
    }
    if len(intent.Receivers) > 0 {  // panic if intent == nil

    Add a nil check: if intent != nil && len(intent.Receivers) > 0 {

  2. SAdd passes slice as single member at line 171: Calling s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, vtxosToRemove) without ... expansion passes the entire []string as one interface{} argument instead of expanding each element. Convert to []interface{} and expand:

    if len(vtxosToRemove) > 0 {
        members := make([]interface{}, len(vtxosToRemove))
        for i, v := range vtxosToRemove {
            members[i] = v
        }
        s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, members...)
    }
♻️ Duplicate comments (5)
internal/core/application/service.go (2)

3334-3338: [Duplicate] Error in connector index fetch silently aborts event propagation.

If GetConnectorsIndexes fails, the function returns early without propagating any events. This leaves clients without notification of the round finalization.

As noted in previous review, consider logging the error and continuing with an empty connector topics map, or propagating a partial event:

 	connectorsIndex, err := s.cache.ForfeitTxs().GetConnectorsIndexes(ctx)
 	if err != nil {
 		log.WithError(err).Warn("failed to get connectors index")
-		return
+		connectorsIndex = make(map[string]domain.Outpoint)
 	}

3348-3352: [Duplicate] Error fetching selected intents prevents RoundFailed propagation.

If GetSelectedIntents fails, the RoundFailed event is not propagated to clients. This is critical because clients need to know the round failed to handle their pending intents appropriately.

As noted in previous review, consider propagating the failure event with empty topics rather than silently returning:

 case domain.RoundFailed:
 	intents, err := s.cache.Intents().GetSelectedIntents(ctx)
 	if err != nil {
 		log.WithError(err).Warn("failed to get selected intents")
-		return
+		intents = []ports.TimedIntent{}
 	}
internal/infrastructure/live-store/inmemory/boarding_inputs.go (1)

38-54: batchId parameter is still ignored across all signature methods.

The batchId parameter is ignored in AddSignatures, GetSignatures, and DeleteSignatures. This was previously flagged — signatures from different batches will be mixed together, which may cause incorrect behavior if multiple batches are processed.

Also applies to: 64-68

internal/infrastructure/live-store/redis/confirmation_sessions.go (1)

94-130: Race condition: numConfirmed is still read outside the transaction.

The numConfirmed value is read at line 111, but used inside the Watch callback at line 119. If another process confirms an intent between the read and the transaction, the count will be incorrect. The Watch call at line 116 should also watch confirmationNumConfirmedKey, and the read should be inside the callback.

internal/infrastructure/live-store/redis/forfeits.go (1)

176-193: Reset retry delay issue from earlier review is resolved.

Reset now uses the same for range s.numOfRetries + Watch + TxPipelined pattern as other methods and includes time.Sleep(s.retryDelay) between attempts, preventing tight-loop retries.

🧹 Nitpick comments (10)
internal/infrastructure/live-store/redis/round.go (2)

52-64: Minor inefficiency: unnecessary sleep after final retry.

The retry loop sleeps after every attempt, including the final failed one. The last sleep is wasteful since the function returns immediately afterward.

Consider tracking the iteration to skip the sleep after the last attempt:

-	for range s.numOfRetries {
+	for i := range s.numOfRetries {
 		if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error {
 			_, err = tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error {
 				pipe.Set(ctx, currentRoundKey, val, 0)
 				return nil
 			})
 
 			return err
 		}); err == nil {
 			return nil
 		}
-		time.Sleep(s.retryDelay)
+		if i < s.numOfRetries-1 {
+			time.Sleep(s.retryDelay)
+		}
 	}

163-165: Consider whether silently skipping unknown event types is intended.

The default case at line 164 uses continue to silently skip unknown event types. This provides forward compatibility (older code can ignore new event types), but it could also hide bugs if an expected event type is missing due to a typo or version mismatch.

If this silent skip is intentional for backward compatibility, consider adding a log statement or comment documenting the design decision. If not, consider returning an error for truly unknown types.

internal/core/application/service.go (1)

3213-3218: Verify VtxoTree is not nil before accessing.

After round.EndFinalization() completes, the code accesses round.VtxoTree.Leaves() and len(round.VtxoTree) without checking if VtxoTree is nil. While EndFinalization likely populates this field, if the round had no vtxo tree (e.g., only boarding inputs), this could panic.

Consider adding a nil check for defensive programming:

+	totalOutputVtxos := 0
+	numOfTreeNodes := 0
+	if round.VtxoTree != nil {
+		totalOutputVtxos = len(round.VtxoTree.Leaves())
+		numOfTreeNodes = len(round.VtxoTree)
+	}
-	totalOutputVtxos := len(round.VtxoTree.Leaves())
-	numOfTreeNodes := len(round.VtxoTree)
internal/infrastructure/live-store/redis/boarding_inputs.go (2)

50-56: Consider handling redis.Nil case explicitly in Get.

When the key doesn't exist, Get returns -1 with a redis.Nil error. Callers may need to distinguish between "not initialized" (key doesn't exist) and actual errors. The in-memory version returns 0 by default when not set.

 func (b *boardingInputsStore) Get(ctx context.Context) (int, error) {
 	num, err := b.rdb.Get(ctx, boardingInputsKey).Int()
 	if err != nil {
+		if errors.Is(err, redis.Nil) {
+			return 0, nil
+		}
 		return -1, err
 	}
 	return num, nil
 }

You'll need to add "errors" to the imports.


122-125: Consider adding retry logic for consistency.

DeleteSignatures doesn't use retry logic, unlike other write operations (Set, AddSignatures). While delete operations are generally idempotent, this creates inconsistency. If transient Redis errors are a concern for other operations, they should be for this one too.

internal/infrastructure/live-store/inmemory/forfeits.go (1)

126-132: Error from Reset is silently ignored.

The // nolint comment indicates that the error return from Reset(ctx) is intentionally ignored. While Reset currently always returns nil in the in-memory implementation, this pattern could mask issues if the implementation changes or if this pattern is copied to other stores.

Consider logging or handling the error:

 	defer func() {
 		m.lock.Unlock()
-		// nolint
-		m.Reset(ctx)
+		if err := m.Reset(ctx); err != nil {
+			// Log but don't fail - best effort cleanup
+			log.Warnf("failed to reset forfeit store after pop: %v", err)
+		}
 	}()
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)

132-156: Get reads multiple keys non-atomically.

The Get method reads confirmationIntentsKey, confirmationNumIntentsKey, and confirmationNumConfirmedKey in separate Redis calls. If another process modifies the state between reads, the returned ConfirmationSessions could be inconsistent (e.g., NumConfirmedIntents doesn't match the count of confirmed entries in IntentsHashes).

Consider using a Lua script or transaction for atomic reads if consistency is important.

internal/infrastructure/live-store/redis/tree_signing_session.go (1)

93-153: Get performs multiple non-atomic Redis reads.

The method reads metaKey, noncesKey, and sigsKey in separate HGetAll calls. If another process modifies the session between reads, the returned MusigSigningSession could be inconsistent.

For read operations, consider using a Lua script or accepting eventual consistency if that's acceptable for this use case.

internal/infrastructure/live-store/redis/forfeits.go (1)

196-225: Pop semantics and error handling look good.

Pop:

  • Validates that each stored forfeit has JSON and that connectors aren’t reused.
  • Collects the signed txs and then calls Reset(ctx) to clear state, propagating any reset error.

Only (optional) improvement would be to wrap the initial HGetAll error with context (e.g., "failed to get forfeit txs: %v"), but behavior is functionally correct.

internal/infrastructure/live-store/redis/intents.go (1)

58-123: Retry logic should differentiate transient TxFailedErr from deterministic validation errors.

The suggestion to separate retry-worthy transient failures from application logic errors is sound. In go-redis/v9, the sentinel TxFailedErr is returned when EXEC returns no results because WATCHed keys were modified, and it's recommended to treat it as a special, expected/retryable condition rather than an application logic error.

Currently, the code retries on any error without differentiation, which means deterministic validation errors (like "duplicated intent %s" or "duplicated input...") waste retry iterations. Instead, use errors.Is(err, redis.TxFailedErr) to detect the transaction failure and retry, while returning other errors normally.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26531b8 and aa04094.

📒 Files selected for processing (10)
  • internal/core/application/service.go (59 hunks)
  • internal/infrastructure/live-store/inmemory/boarding_inputs.go (1 hunks)
  • internal/infrastructure/live-store/inmemory/forfeits.go (5 hunks)
  • internal/infrastructure/live-store/inmemory/round.go (1 hunks)
  • internal/infrastructure/live-store/redis/boarding_inputs.go (1 hunks)
  • internal/infrastructure/live-store/redis/confirmation_sessions.go (4 hunks)
  • internal/infrastructure/live-store/redis/forfeits.go (8 hunks)
  • internal/infrastructure/live-store/redis/intents.go (10 hunks)
  • internal/infrastructure/live-store/redis/round.go (6 hunks)
  • internal/infrastructure/live-store/redis/tree_signing_session.go (5 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern. A common bug is manually base64-decoding the string and then passing b64=true, which causes a double-decode error.

Applied to files:

  • internal/infrastructure/live-store/redis/round.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern.

Applied to files:

  • internal/infrastructure/live-store/redis/round.go
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/infrastructure/live-store/redis/confirmation_sessions.go
  • internal/infrastructure/live-store/redis/forfeits.go
  • internal/infrastructure/live-store/inmemory/forfeits.go
  • internal/core/application/service.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.

Applied to files:

  • internal/core/application/service.go
🧬 Code graph analysis (9)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (3)
internal/core/ports/tx_builder.go (1)
  • SignedBoardingInput (54-57)
internal/infrastructure/live-store/redis/boarding_inputs.go (1)
  • NewBoardingInputsStore (23-29)
internal/core/ports/live_store.go (1)
  • BoardingInputsStore (82-90)
internal/infrastructure/live-store/redis/round.go (3)
internal/infrastructure/live-store/inmemory/round.go (1)
  • NewCurrentRoundStore (16-18)
internal/core/ports/live_store.go (1)
  • CurrentRoundStore (56-59)
internal/core/domain/round_event.go (3)
  • RoundStarted (17-20)
  • RoundFinalizationStarted (22-30)
  • IntentsRegistered (45-48)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
internal/infrastructure/live-store/redis/boarding_inputs.go (3)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (1)
  • NewBoardingInputsStore (18-23)
internal/core/ports/live_store.go (1)
  • BoardingInputsStore (82-90)
internal/core/ports/tx_builder.go (1)
  • SignedBoardingInput (54-57)
internal/infrastructure/live-store/redis/forfeits.go (5)
internal/core/ports/tx_builder.go (2)
  • TxBuilder (59-88)
  • ValidForfeitTx (49-52)
internal/infrastructure/live-store/inmemory/forfeits.go (1)
  • NewForfeitTxsStore (22-27)
internal/core/ports/live_store.go (1)
  • ForfeitTxsStore (38-47)
pkg/ark-lib/tree/tx_tree.go (1)
  • FlatTxTree (36-36)
internal/core/domain/vtxo.go (2)
  • Vtxo (38-53)
  • Outpoint (15-18)
internal/infrastructure/live-store/inmemory/forfeits.go (5)
pkg/ark-lib/tree/tx_tree.go (1)
  • FlatTxTree (36-36)
internal/infrastructure/db/postgres/sqlc/queries/models.go (2)
  • Intent (33-38)
  • Tx (178-185)
internal/infrastructure/db/sqlite/sqlc/queries/models.go (2)
  • Intent (31-36)
  • Tx (165-172)
internal/core/domain/vtxo.go (1)
  • Outpoint (15-18)
internal/core/ports/tx_builder.go (1)
  • ValidForfeitTx (49-52)
internal/infrastructure/live-store/redis/tree_signing_session.go (2)
internal/core/ports/live_store.go (2)
  • TreeSigningSessionsStore (70-80)
  • MusigSigningSession (104-110)
pkg/ark-lib/tree/musig2.go (2)
  • TreeNonces (32-32)
  • TreePartialSigs (98-98)
internal/core/application/service.go (4)
internal/core/domain/vtxo.go (1)
  • Outpoint (15-18)
internal/core/domain/round.go (1)
  • Round (41-59)
internal/core/domain/round_event.go (2)
  • RoundFailed (39-43)
  • RoundEvent (9-12)
pkg/ark-lib/tree/tx_tree.go (1)
  • TxTree (19-22)
internal/infrastructure/live-store/redis/intents.go (3)
internal/core/ports/live_store.go (2)
  • IntentStore (22-36)
  • TimedIntent (92-97)
internal/infrastructure/live-store/redis/kv.go (1)
  • NewRedisKVStore (18-20)
internal/core/ports/tx_builder.go (1)
  • BoardingInput (44-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit tests
  • GitHub Check: integration tests
  • GitHub Check: Build and Scan
🔇 Additional comments (28)
internal/infrastructure/live-store/redis/round.go (2)

28-34: LGTM! Clean initialization with sensible defaults.

The constructor properly initializes all fields including the new retryDelay set to 10ms, which is appropriate for Redis retry operations.


84-84: Verify that error messages don't expose sensitive data.

Error messages throughout this function include raw JSON data from Redis (e.g., string(data), string(raw)). If the round or event data contains sensitive information, these detailed error messages could leak it into logs or error tracking systems.

Review the data model to confirm whether rounds and events may contain sensitive information. If so, consider sanitizing or truncating the raw data in error messages, or use a debug flag to control verbosity.

Also applies to: 91-95, 124-126, 132-135, 141-143, 149-151, 157-160

internal/infrastructure/live-store/inmemory/round.go (2)

20-27: LGTM: Proper locking and context-aware signature.

The method correctly uses write locks and follows the new context-aware interface. Passing a potentially nil s.round to the callback function is appropriate for initialization scenarios.


28-32: Verify nil handling at Get() call sites.

The files in internal/infrastructure/live-store/ cannot be located in the current repository state, preventing automated verification of whether callers properly check for nil round values returned by the Get() method. Please manually verify that all call sites of currentRoundStore.Get() check for nil returns before dereferencing the round value to avoid potential panics.

internal/core/application/service.go (1)

2270-2312: Cache cleanup operations appropriately handle errors.

The cache reset operations at round start (ForfeitTxs().Reset, Intents().DeleteVtxos, ConfirmationSessions().Reset, etc.) log errors as warnings rather than failing the round start. This is appropriate since:

  1. These are cleanup operations for the previous round
  2. Failing here would prevent new rounds from starting
  3. Worst case is stale data in cache, which the new round operations will overwrite

The pattern of logging errors but continuing is correct for this cleanup phase.

internal/infrastructure/live-store/inmemory/boarding_inputs.go (2)

56-62: Good fix: Using maps.Clone to prevent external mutation.

The use of maps.Clone(b.sigs) correctly addresses the previous concern about returning internal map references that could be mutated by callers after the lock is released.


25-36: LGTM: Thread-safe Set/Get implementation.

The Set and Get methods correctly use write and read locks respectively for thread-safe access to numOfInputs.

internal/infrastructure/live-store/redis/boarding_inputs.go (1)

145-164: LGTM: Clean DTO conversion.

The newSigsDTO function correctly converts the domain type to the DTO representation with proper hex encoding of binary fields.

internal/infrastructure/live-store/inmemory/forfeits.go (2)

152-163: Good fix: Added proper locking to AllSigned.

The race condition flagged in the previous review has been addressed by adding RLock/RUnlock around the iteration over forfeitTxs.


29-87: LGTM: Robust initialization with proper validation.

The Init method properly validates the connector tree against vtxos count and correctly builds the connectors index with appropriate locking.

internal/infrastructure/live-store/redis/confirmation_sessions.go (1)

205-247: LGTM: Robust polling implementation with proper safeguards.

The watchSessionCompletion goroutine correctly uses sync.Once to ensure the channel is signaled only once, includes panic recovery, and properly handles context cancellation. The polling approach is well-documented in the file header comments.

internal/infrastructure/live-store/redis/tree_signing_session.go (2)

127-127: Good fix: Typo corrected.

The previous typo "cosgienr" has been corrected to "cosigner" in the error messages at lines 127 and 142.

Also applies to: 142-142


257-306: LGTM: Well-structured polling with proper safeguards.

The watchNoncesCollected goroutine (and similarly watchSigsCollected) correctly:

  • Uses a ticker for controlled polling
  • Handles context cancellation
  • Includes panic recovery for channel operations
  • Properly signals only once when the condition is met
internal/infrastructure/live-store/redis/forfeits.go (7)

15-20: Struct and constructor changes look consistent with retry pattern.

Adding builder, numOfRetries, and a small retryDelay plus wiring them in NewForfeitTxsStore is straightforward and matches the retry usage in the methods below. No issues here.

Also applies to: 29-37


40-47: Init transaction and retry logic are otherwise sound.

Beyond the indexing issue, the marshalling, basic shape validation (non-empty leaves, vtxo vs connector count), and the Watch + TxPipelined + bounded retry with a small sleep look reasonable for this store.

Also applies to: 57-61, 89-112


114-174: Sign path correctly validates inputs and uses transactional updates.

The Sign method now:

  • Short-circuits on empty txs and on missing builder.
  • Rehydrates VTXOs, connectors, and connector index with clear error messages on corruption.
  • Delegates verification to TxBuilder.VerifyForfeitTxs and writes results via Watch + TxPipelined with bounded retries.

This looks correct and consistent with the Init/Reset patterns.


227-247: AllSigned correctly distinguishes “not all signed” from storage errors.

The method:

  • Returns (false, err) on Redis/unmarshal failures.
  • Returns (false, nil) when any entry is missing JSON or has an empty Tx.
  • Returns (true, nil) only when all entries successfully unmarshal and have non-empty Tx.

This matches the expected semantics.


249-275: GetUnsignedInputs is robust against malformed and partially populated entries.

The logic to:

  • Treat malformed keys or JSON as an error for the caller.
  • Treat missing/empty JSON or empty Tx as “unsigned” and include the outpoint.

gives reasonable resilience to partial writes or decode issues.


277-283: Len implementation is straightforward.

Using HLen on the txs hash and returning (-1, err) on failure is clear enough, assuming callers always check the error before trusting the count.


285-299: GetConnectorsIndexes API and error messages look fine.

The method unmarshals the stored connector index map and returns descriptive errors when retrieval or JSON decode fails. The previous typo (“fet”) in this error message is also fixed in this version.

internal/infrastructure/live-store/redis/intents.go (8)

22-27: Struct and constructor changes align with the retry/backoff pattern.

Adding retryDelay and wiring it in NewIntentStore matches the other Redis-backed stores and keeps retry behavior centralized on the store.

Also applies to: 29-35


38-56: Len correctly filters out empty/invalid intents.

Counting only non-nil intents with at least one receiver and returning (-1, err) on Redis/KV failures is clear and matches the intended API.


187-193: GetSelectedIntents now propagates list errors with a clear message.

Wrapping ListRange errors as "failed to get selected intents: %v" while otherwise returning the slice directly is appropriate.


195-226: ViewAll behavior and error handling are consistent.

The method:

  • Uses GetMulti with nil-checks when specific IDs are provided.
  • Falls back to scanning all IDs via Redis, then GetMulti over that set.
  • Wraps both Redis and KV errors with context.

This matches the LiveStore interface expectations.


228-269: Update enforces input/output balance and handles missing intents correctly.

Update:

  • Fails fast if the intent doesn’t exist.
  • Enforces sum(inputs + boardingInputs) == sum(outputs) before mutating.
  • Optionally updates cosigner pubkeys, then persists via Set with a contextual error message.

This is a solid invariant check and update flow.


271-291: Delete gracefully handles missing intents and cleans input vtxos.

Delete:

  • Fails with context if the underlying KV Get errors.
  • Skips IDs whose intents are already nil.
  • Removes each input outpoint from intentStoreVtxosKey and logs (but does not surface) errors from intents.Delete, which is acceptable for a best-effort cleanup.

Implementation is reasonable as-is.


294-321: DeleteAll correctly combines best-effort KV deletes with transactional key cleanup.

The pattern of:

  • Iterating all IDs and tolerating individual KV delete failures (with warnings).
  • Using Watch + TxPipelined + bounded retries to atomically clear all Redis keys for this store.

looks consistent with other stores and gives a clean reset behavior.


347-356: IncludesAny behavior (log on error, return false) is acceptable for a best-effort probe.

The method checks membership per outpoint, short-circuiting on the first positive hit and logging (but not surfacing) Redis errors. That’s reasonable for a “does this conflict with anything?” helper.

Comment on lines +2514 to +2521
session, err := s.cache.ConfirmationSessions().Get(ctx)
if err != nil {
log.WithError(err).Error("failed to get confirmation session from cache")
round.Fail(errors.INTERNAL_ERROR.New(
"failed to get confirmation session from cache: %s", err,
))
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Past review concern appears resolved.

The previous review flagged that if s.cache.ConfirmationSessions().Get(ctx) returns an error, session could be nil and dereferencing session.IntentsHashes would panic.

The updated code now handles this:

  • On error, it calls round.Fail() with an error message
  • Returns early, preventing the nil dereference
  • Sets the error in the round which will be persisted

However, note that roundAborted is not set to true, which means the defer will attempt to save events and potentially call startFinalization instead of startRound. This is inconsistent with other error paths in this function.

Apply this diff to ensure consistent error handling:

 case <-time.After(roundTiming.confirmationDuration()):
 	session, err := s.cache.ConfirmationSessions().Get(ctx)
 	if err != nil {
+		roundAborted = true
 		log.WithError(err).Error("failed to get confirmation session from cache")
 		round.Fail(errors.INTERNAL_ERROR.New(
 			"failed to get confirmation session from cache: %s", err,
 		))
 		return
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
session, err := s.cache.ConfirmationSessions().Get(ctx)
if err != nil {
log.WithError(err).Error("failed to get confirmation session from cache")
round.Fail(errors.INTERNAL_ERROR.New(
"failed to get confirmation session from cache: %s", err,
))
return
}
session, err := s.cache.ConfirmationSessions().Get(ctx)
if err != nil {
roundAborted = true
log.WithError(err).Error("failed to get confirmation session from cache")
round.Fail(errors.INTERNAL_ERROR.New(
"failed to get confirmation session from cache: %s", err,
))
return
}
🤖 Prompt for AI Agents
internal/core/application/service.go around lines 2514 to 2521: when
s.cache.ConfirmationSessions().Get(ctx) returns an error the code logs, calls
round.Fail and returns, but does not set roundAborted=true which is inconsistent
with other error paths and causes the defer to attempt normal finalization; set
roundAborted = true before returning on this error so the deferred cleanup knows
the round was aborted and will follow the same abort path as other errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/infrastructure/live-store/redis/forfeits.go (1)

67-92: Fix out-of-bounds panic in connector index construction.

The out-of-bounds issue is real and critical. However, the retry loop concern is unfounded: Go 1.22+ supports for range int syntax, and the codebase targets Go 1.25.3, so those loops compile correctly.

Connector index loop bug (lines 88–90 in Init, similar pattern in Sign and Reset):

You only guard len(vtxosToSign) <= len(connectorsOutpoints), but then iterate over all connectorsOutpoints:

for i, connectorOutpoint := range connectorsOutpoints {
    connIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint
}

If there are more connector outpoints than vtxos, vtxosToSign[i] will panic when i >= len(vtxosToSign).

Loop over vtxosToSign instead:

-for i, connectorOutpoint := range connectorsOutpoints {
-    connIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint
-}
+for i, vtxo := range vtxosToSign {
+    connectorOutpoint := connectorsOutpoints[i]
+    connIndex[connectorOutpoint.String()] = vtxo.Outpoint
+}

This respects the guard and prevents out-of-bounds access.

♻️ Duplicate comments (2)
internal/infrastructure/live-store/redis/intents.go (1)

136-185: Handle nil intents in Pop and fix remaining SAdd slice bug for vtxosToRemove.

Two issues in Pop/DeleteVtxos:

  1. Possible nil dereference in Pop:
intent, err := s.intents.Get(ctx, id)
if err != nil {
    return nil, fmt.Errorf("failed to get intent %s: %v", id, err)
}
if len(intent.Receivers) > 0 {  // intent may be nil here
    intentsByTime = append(intentsByTime, *intent)
}

If the KV store returns (nil, nil) for a missing intent, this will panic. Add a nil check:

 	intent, err := s.intents.Get(ctx, id)
 	if err != nil {
 		return nil, fmt.Errorf("failed to get intent %s: %v", id, err)
 	}
-
-	if len(intent.Receivers) > 0 {
+	if intent != nil && len(intent.Receivers) > 0 {
 		intentsByTime = append(intentsByTime, *intent)
 	}
  1. SAdd still called with []string instead of variadic members (previously flagged):
if len(vtxosToRemove) > 0 {
    s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, vtxosToRemove)
}

Passing a []string without conversion/... sends the slice as a single member; Redis will not see individual vtxo strings, and DeleteVtxos will later remove the wrong thing. A prior review already called out this exact issue for Pop and DeleteVtxos; you’ve fixed DeleteVtxos (now converts to []interface{} and uses members...), but Pop still has the old pattern.

Mirror the DeleteVtxos fix here:

-	if len(vtxosToRemove) > 0 {
-		s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, vtxosToRemove)
-	}
+	if len(vtxosToRemove) > 0 {
+		members := make([]interface{}, len(vtxosToRemove))
+		for i, v := range vtxosToRemove {
+			members[i] = v
+		}
+		if err := s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, members...).Err(); err != nil {
+			return nil, fmt.Errorf("failed to enqueue vtxos to remove: %v", err)
+		}
+	}

With this, the vtxosToRemove set and DeleteVtxos’s SRem logic will be consistent and actually clear VTXOs from intentStoreVtxosKey.

Also applies to: 326-352

internal/infrastructure/live-store/redis/confirmation_sessions.go (1)

113-134: Move numConfirmed read inside the WATCH transaction to avoid lost updates.

numConfirmed is read once before the retry loop:

numConfirmed, err := s.rdb.Get(ctx, confirmationNumConfirmedKey).Int()

and then reused inside the Watch callback without re-reading under the WATCHed transaction. If another process confirms a different intent between this read and a successful transaction, its increment can be lost (the later transaction will overwrite confirmationNumConfirmedKey with staleNumConfirmed+1).

This is the same race that was already pointed out in a previous review. You can fix it by reading numConfirmed inside the WATCH callback:

 func (s *confirmationSessionsStore) Confirm(ctx context.Context, intentId string) error {
 	// ... existing confirmed check ...

-	numConfirmed, err := s.rdb.Get(ctx, confirmationNumConfirmedKey).Int()
-	if err != nil && !errors.Is(err, redis.Nil) {
-		return fmt.Errorf("failed to get number of confirmed intents: %v", err)
-	}
-
 	keys := []string{confirmationIntentsKey, confirmationNumConfirmedKey}
-	for range s.numOfRetries {
-		if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error {
-			_, err := tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error {
-				pipe.HSet(ctx, confirmationIntentsKey, hashKey, 1)
-				pipe.Set(ctx, confirmationNumConfirmedKey, numConfirmed+1, 0)
-				return nil
-			})
-			return err
-		}, keys...); err == nil {
-			return nil
-		}
-		time.Sleep(s.retryDelay)
-	}
+	for i := 0; i < s.numOfRetries; i++ {
+		if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error {
+			numConfirmed, err := tx.Get(ctx, confirmationNumConfirmedKey).Int()
+			if err != nil && !errors.Is(err, redis.Nil) {
+				return fmt.Errorf("failed to get number of confirmed intents: %v", err)
+			}
+			_, err = tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error {
+				pipe.HSet(ctx, confirmationIntentsKey, hashKey, 1)
+				pipe.Set(ctx, confirmationNumConfirmedKey, numConfirmed+1, 0)
+				return nil
+			})
+			return err
+		}, keys...); err == nil {
+			return nil
+		}
+		if i < s.numOfRetries-1 {
+			time.Sleep(s.retryDelay)
+		}
+	}

This way each confirm observes the latest count under WATCH and increments it without losing concurrent confirmations.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa04094 and 5c23a6a.

📒 Files selected for processing (6)
  • internal/infrastructure/live-store/redis/confirmation_sessions.go (4 hunks)
  • internal/infrastructure/live-store/redis/forfeits.go (8 hunks)
  • internal/infrastructure/live-store/redis/intents.go (10 hunks)
  • internal/infrastructure/live-store/redis/offchain_txs.go (2 hunks)
  • internal/infrastructure/live-store/redis/round.go (5 hunks)
  • internal/infrastructure/live-store/redis/tree_signing_session.go (5 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/infrastructure/live-store/redis/confirmation_sessions.go
  • internal/infrastructure/live-store/redis/forfeits.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern. A common bug is manually base64-decoding the string and then passing b64=true, which causes a double-decode error.

Applied to files:

  • internal/infrastructure/live-store/redis/round.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern.

Applied to files:

  • internal/infrastructure/live-store/redis/round.go
🧬 Code graph analysis (6)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
internal/infrastructure/live-store/redis/round.go (3)
internal/infrastructure/live-store/inmemory/round.go (1)
  • NewCurrentRoundStore (16-18)
internal/core/ports/live_store.go (1)
  • CurrentRoundStore (56-59)
internal/core/domain/round_event.go (3)
  • RoundStarted (17-20)
  • RoundFinalizationStarted (22-30)
  • IntentsRegistered (45-48)
internal/infrastructure/live-store/redis/tree_signing_session.go (3)
internal/infrastructure/live-store/inmemory/tree_signing_session.go (1)
  • NewTreeSigningSessionsStore (19-26)
internal/core/ports/live_store.go (2)
  • TreeSigningSessionsStore (70-80)
  • MusigSigningSession (104-110)
pkg/ark-lib/tree/musig2.go (2)
  • TreeNonces (32-32)
  • TreePartialSigs (98-98)
internal/infrastructure/live-store/redis/offchain_txs.go (2)
internal/infrastructure/live-store/inmemory/offchain_txs.go (1)
  • NewOffChainTxStore (19-24)
internal/core/ports/live_store.go (1)
  • OffChainTxStore (49-54)
internal/infrastructure/live-store/redis/intents.go (3)
internal/infrastructure/live-store/inmemory/intents.go (1)
  • NewIntentStore (22-33)
internal/core/ports/live_store.go (2)
  • IntentStore (22-36)
  • TimedIntent (92-97)
internal/infrastructure/live-store/redis/kv.go (1)
  • NewRedisKVStore (18-20)
internal/infrastructure/live-store/redis/forfeits.go (5)
internal/core/ports/tx_builder.go (2)
  • TxBuilder (59-88)
  • ValidForfeitTx (49-52)
internal/infrastructure/live-store/inmemory/forfeits.go (1)
  • NewForfeitTxsStore (22-27)
internal/core/ports/live_store.go (1)
  • ForfeitTxsStore (38-47)
pkg/ark-lib/tree/tx_tree.go (1)
  • FlatTxTree (36-36)
internal/core/domain/vtxo.go (2)
  • Vtxo (38-53)
  • Outpoint (15-18)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build and Scan
  • GitHub Check: integration tests
  • GitHub Check: unit tests
🔇 Additional comments (4)
internal/infrastructure/live-store/redis/round.go (1)

36-66: The review comment's critical claim is incorrect.

The code uses for range s.numOfRetries where numOfRetries is an int. The review states this "won't compile," but this is false. Go 1.22 introduced for range N syntax for iterating over integers—the codebase uses Go 1.25.3, which fully supports this. The code will compile without errors.

The syntax is valid and idiomatic modern Go. No changes are required to fix a compilation error.

The secondary point about CAS semantics (moving the read under WATCH) remains a valid design suggestion if strict compare-and-set semantics are needed, but that is a separate design consideration, not a compiler issue.

Likely an incorrect or invalid review comment.

internal/infrastructure/live-store/redis/confirmation_sessions.go (1)

71-96: The review comment is incorrect; the code will compile with Go 1.22+.

Go 1.22 introduced the ability to range over integers directly. The syntax for range s.numOfRetries (where numOfRetries is an int) is valid and iterates from 0 to numOfRetries-1. Since the codebase targets Go 1.25+, this feature is fully supported and the code will compile without errors.

Likely an incorrect or invalid review comment.

internal/infrastructure/live-store/redis/tree_signing_session.go (1)

65-80: ****

The review assumes Go < 1.22, but this project targets Go 1.25.3 (specified in go.mod and CI workflows). Go 1.22 and later support for range over integers, making for range s.numOfRetries valid syntax that will compile without errors. The code at lines 66, 164, 206, and 230 is syntactically correct as written.

Likely an incorrect or invalid review comment.

internal/infrastructure/live-store/redis/intents.go (1)

62-123: The review comment is incorrect.

Go 1.22 introduced support for ranging over integers, enabling syntax like for range 5 or for range numOfRetries. The project targets Go 1.25.3, which is well after this feature was introduced, making the code valid.

The code at lines 63, 308, and 332 correctly uses for range s.numOfRetries and compiles without error. The suggested fix to convert these to three-clause loops (for i := 0; i < s.numOfRetries; i++) is unnecessary.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/infrastructure/live-store/inmemory/forfeits.go (1)

73-82: Prevent out-of-range panic when more connector outpoints than vtxosToSign

The loop builds connectorsIndex over all connectorsOutpoints but indexes into vtxosToSign[i]. The only guard is len(vtxosToSign) > len(connectorsOutpoints), so when len(connectorsOutpoints) > len(vtxosToSign) this will panic with index out of range.

You already guarantee len(vtxosToSign) <= len(connectorsOutpoints), so just iterate up to len(vtxosToSign):

-	for i, connectorOutpoint := range connectorsOutpoints {
-		connectorsIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint
-	}
+	for i, connectorOutpoint := range connectorsOutpoints[:len(vtxosToSign)] {
+		connectorsIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint
+	}
♻️ Duplicate comments (1)
internal/infrastructure/live-store/inmemory/forfeits.go (1)

117-133: Fix deadlock between Pop and Reset and avoid calling a locking method while holding the lock

Pop acquires m.lock and defers a call to m.Reset(ctx), but Reset also acquires m.lock. Since sync.Mutex/RWMutex are not reentrant, this will deadlock. This is the same issue previously flagged in an earlier review.

Refactor the reset logic into a private helper that assumes the lock is held, use it from both Reset and Pop, and avoid calling Reset from inside the locked region:

-func (m *forfeitTxsStore) Reset(_ context.Context) error {
-	m.lock.Lock()
-	defer m.lock.Unlock()
-
-	m.forfeitTxs = make(map[domain.Outpoint]ports.ValidForfeitTx)
-	m.connectors = nil
-	m.connectorsIndex = nil
-	m.vtxos = nil
-	return nil
-}
-func (m *forfeitTxsStore) Pop(ctx context.Context) ([]string, error) {
-	m.lock.Lock()
-	defer func() {
-		m.lock.Unlock()
-		// nolint
-		m.Reset(ctx)
-	}()
+func (m *forfeitTxsStore) resetLocked() {
+	m.forfeitTxs = make(map[domain.Outpoint]ports.ValidForfeitTx)
+	m.connectors = nil
+	m.connectorsIndex = nil
+	m.vtxos = nil
+}
+
+func (m *forfeitTxsStore) Reset(_ context.Context) error {
+	m.lock.Lock()
+	defer m.lock.Unlock()
+
+	m.resetLocked()
+	return nil
+}
+
+func (m *forfeitTxsStore) Pop(_ context.Context) ([]string, error) {
+	m.lock.Lock()
+	defer m.lock.Unlock()
@@
-	return txs, nil
+	m.resetLocked()
+	return txs, nil
 }

This keeps the reset atomic with the Pop operation, removes the need for // nolint, and avoids the deadlock.

🧹 Nitpick comments (5)
internal/infrastructure/live-store/redis/boarding_inputs.go (4)

50-56: Consider handling redis.Nil explicitly for unset keys.

When the key doesn't exist, redis.Nil is returned as an error. Callers may need to distinguish between "key not set" and "actual error". Consider returning a sentinel value or wrapping the error for clarity.

 func (b *boardingInputsStore) Get(ctx context.Context) (int, error) {
 	num, err := b.rdb.Get(ctx, boardingInputsKey).Int()
 	if err != nil {
+		if err == redis.Nil {
+			return 0, nil // Key not set, return default
+		}
 		return -1, err
 	}
 	return num, nil
 }

79-95: Wrap final error with context for consistency.

Unlike the Set method which wraps the error with a descriptive message, AddSignatures returns the raw error after exhausting retries. This makes debugging harder.

 	}
-	return err
+	return fmt.Errorf(
+		"failed to add boarding input signatures after max number of retries: %v", err,
+	)
 }

100-118: Variable shadowing: key is reused in the loop.

The outer key (Redis key) is shadowed by the loop variable key (hash field name). While the code works correctly, this is confusing and a code smell.

 func (b *boardingInputsStore) GetSignatures(
 	ctx context.Context, batchId string,
 ) (map[uint32]ports.SignedBoardingInput, error) {
 	key := fmt.Sprintf("%s:%s", boardingInputSigsKey, batchId)
 	values, err := b.rdb.HGetAll(ctx, key).Result()
 	if err != nil {
 		return nil, err
 	}
 	m := make(map[uint32]ports.SignedBoardingInput)
-	for key, value := range values {
+	for field, value := range values {
 		rawSig := &sigsDTO{}
 		sig, err := rawSig.deserialize([]byte(value))
 		if err != nil {
-			return nil, fmt.Errorf("malformed signatures for input %s in storage: %v", key, err)
+			return nil, fmt.Errorf("malformed signatures for input %s in storage: %v", field, err)
 		}
-		inIndex, err := strconv.Atoi(key)
+		inIndex, err := strconv.Atoi(field)
 		if err != nil {
 			return nil, err
 		}

 		m[uint32(inIndex)] = *sig
 	}
 	return m, nil
 }

170-173: Value receiver on deserialize is unconventional.

The method has a value receiver but immediately unmarshals into &s. While this works, it's confusing. Consider using a pointer receiver or a standalone function for clarity.

-func (s sigsDTO) deserialize(buf []byte) (*ports.SignedBoardingInput, error) {
-	if err := json.Unmarshal(buf, &s); err != nil {
+func (s *sigsDTO) deserialize(buf []byte) (*ports.SignedBoardingInput, error) {
+	if err := json.Unmarshal(buf, s); err != nil {
 		return nil, err
 	}
internal/infrastructure/live-store/inmemory/intents.go (1)

102-136: LGTM: Context and error return added.

The method signature correctly implements the updated interface. The sorting and selection logic remains sound.

For very large intent pools, consider checking ctx.Err() before/after the sort operation to allow early cancellation. However, since the operation holds a write lock, completing atomically may be the correct design choice.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c23a6a and 72607e7.

📒 Files selected for processing (3)
  • internal/infrastructure/live-store/inmemory/forfeits.go (5 hunks)
  • internal/infrastructure/live-store/inmemory/intents.go (9 hunks)
  • internal/infrastructure/live-store/redis/boarding_inputs.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/infrastructure/live-store/inmemory/forfeits.go
🧬 Code graph analysis (3)
internal/infrastructure/live-store/inmemory/forfeits.go (3)
pkg/ark-lib/tree/tx_tree.go (1)
  • FlatTxTree (36-36)
internal/core/domain/vtxo.go (1)
  • Outpoint (15-18)
internal/core/ports/tx_builder.go (1)
  • ValidForfeitTx (49-52)
internal/infrastructure/live-store/redis/boarding_inputs.go (3)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (1)
  • NewBoardingInputsStore (18-23)
internal/core/ports/live_store.go (1)
  • BoardingInputsStore (82-90)
internal/core/ports/tx_builder.go (1)
  • SignedBoardingInput (54-57)
internal/infrastructure/live-store/inmemory/intents.go (2)
internal/core/ports/tx_builder.go (1)
  • BoardingInput (44-47)
internal/core/ports/live_store.go (1)
  • TimedIntent (92-97)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit tests
  • GitHub Check: Build and Scan
  • GitHub Check: integration tests
🔇 Additional comments (18)
internal/infrastructure/live-store/inmemory/forfeits.go (4)

153-163: Read locking in AllSigned correctly avoids races

Adding RLock/RUnlock around iteration of m.forfeitTxs brings AllSigned in line with the other readers and eliminates the prior data-race risk. The new (bool, error) signature also matches the updated interface.


166-177: GetUnsignedInputs ctx+error surface and locking look consistent

Signature change to GetUnsignedInputs(_ context.Context) ([]domain.Outpoint, error) and the use of RLock around the map iteration are consistent with the rest of the store API. Returning nil error keeps the in-memory implementation simple.


178-181: Len follows the same context/error and locking pattern

Len(_ context.Context) (int, error) guarded by RLock matches the updated interface and is safe for concurrent access.


184-189: Cloning connectorsIndex avoids external mutation of internal state

Switching GetConnectorsIndexes to return maps.Clone(m.connectorsIndex) under RLock prevents callers from mutating the internal map after the lock is released, addressing the earlier race/corruption concern.

internal/infrastructure/live-store/redis/boarding_inputs.go (4)

17-29: LGTM!

The struct and constructor are well-defined with appropriate fields for retry-enabled Redis operations.


31-48: LGTM!

The Set method correctly implements optimistic locking with Watch on the appropriate key and proper retry logic.


83-85: Verify HSetNX semantics are intentional.

HSetNX only sets the field if it doesn't already exist. If a signature for an input index already exists, it will silently be preserved rather than updated. If overwriting is expected behavior, use HSet instead.


122-125: LGTM!

The DeleteSignatures method is straightforward. Redis DEL is atomic and doesn't require the retry/watch pattern used for updates.

internal/infrastructure/live-store/inmemory/intents.go (10)

4-6: LGTM: Appropriate imports added.

The context and slices imports are correctly added to support the updated method signatures and the use of slices.Clone for safe cloning.


36-47: LGTM: Context and error return added.

The method signature correctly implements the updated interface. Returning nil error is appropriate for this in-memory implementation.


49-100: LGTM: Context parameter added with preserved validation logic.

The method correctly propagates context through the interface and maintains all existing duplicate detection logic for both regular inputs and boarding inputs.


138-142: LGTM: Defensive clone added for safety.

The use of slices.Clone on line 141 is an excellent improvement that prevents external code from mutating the internal selected intents slice.


144-183: LGTM: Context added with preserved validation logic.

The balance validation correctly accounts for all input sources (intent inputs + boarding inputs) and ensures they match receiver outputs. The logic is sound.


185-200: LGTM: Context and error return added.

The deletion logic correctly removes intents and cleans up associated vtxos. The idempotent behavior (skipping non-existent IDs) is appropriate.


202-210: LGTM: Context and error return added.

The complete state reset logic is correct and thread-safe under the write lock.


212-221: LGTM: Context and error return added.

The vtxo cleanup logic correctly removes tracked vtxos and resets the removal queue.


223-241: LGTM: Context and error return added.

The view logic correctly handles both filtered (by ID) and unfiltered queries, returning dereferenced copies of the timed intents.


243-254: LGTM: Context parameter added.

The outpoint existence check correctly returns both a boolean flag and the matching outpoint string when found.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/core/application/service.go (1)

2798-2813: Ignoring error from Get could cause nil pointer dereference.

At lines 2798-2807, the error from s.cache.TreeSigingSessions().Get(ctx, roundId) is ignored. If the Get operation fails, signingSession could be nil, and accessing signingSession.Nonces at line 2801 or iterating over signingSession.Nonces at line 2808 would panic.

Apply this diff to handle the error:

 case <-time.After(thirdOfRemainingDuration):
-	signingSession, _ := s.cache.TreeSigingSessions().Get(ctx, roundId)
-	round.Fail(errors.SIGNING_SESSION_TIMED_OUT.New(
+	signingSession, err := s.cache.TreeSigingSessions().Get(ctx, roundId)
+	if err != nil || signingSession == nil {
+		round.Fail(errors.INTERNAL_ERROR.New("failed to get signing session from cache"))
+		return
+	}
+	round.Fail(errors.SIGNING_SESSION_TIMED_OUT.New(
 		"musig2 signing session timed out (nonce collection), collected %d/%d nonces",
 		len(signingSession.Nonces), len(uniqueSignerPubkeys),
 	))
♻️ Duplicate comments (3)
internal/core/application/service.go (3)

2876-2911: Error ignored from Get could cause nil pointer dereference.

At line 2877, the error from s.cache.TreeSigingSessions().Get(ctx, roundId) is ignored. If signingSession is nil, iterating over signingSession.Signatures at line 2880 would panic. This was flagged in a previous review.

Apply this diff:

 case <-s.cache.TreeSigingSessions().SignaturesCollected(roundId):
-	signingSession, _ := s.cache.TreeSigingSessions().Get(ctx, roundId)
+	signingSession, err := s.cache.TreeSigingSessions().Get(ctx, roundId)
+	if err != nil || signingSession == nil {
+		round.Fail(errors.INTERNAL_ERROR.New("failed to get signing session from cache"))
+		return
+	}
 	cosignersToBan := make(map[string]domain.Crime)

3337-3355: Error handling still prevents event propagation.

As flagged in previous reviews:

  1. Lines 3337-3341: If GetConnectorsIndexes fails, returning early prevents event propagation
  2. Lines 3351-3355: If GetSelectedIntents fails, RoundFailed event is not propagated

These were previously suggested to continue with empty values rather than returning early.


2514-2521: roundAborted should be set to true on confirmation session retrieval failure.

When s.cache.ConfirmationSessions().Get(ctx) fails, the code calls round.Fail() and returns, but roundAborted remains false. This causes the defer to attempt saving events and check round.IsFailed() instead of immediately restarting the round via the roundAborted path.

This was flagged in a previous review but the fix appears incomplete.

Apply this diff:

 case <-time.After(roundTiming.confirmationDuration()):
 	session, err := s.cache.ConfirmationSessions().Get(ctx)
 	if err != nil {
+		roundAborted = true
 		log.WithError(err).Error("failed to get confirmation session from cache")
 		round.Fail(errors.INTERNAL_ERROR.New(
 			"failed to get confirmation session from cache: %s", err,
 		))
 		return
 	}
🧹 Nitpick comments (2)
internal/infrastructure/live-store/redis/boarding_inputs.go (2)

23-29: Consider making retry delay configurable.

The retry delay is hardcoded to 10ms. While this appears consistent across the codebase, making it a constructor parameter would improve flexibility for different deployment scenarios or testing.

-func NewBoardingInputsStore(rdb *redis.Client, numOfRetries int) ports.BoardingInputsStore {
+func NewBoardingInputsStore(rdb *redis.Client, numOfRetries int, retryDelay time.Duration) ports.BoardingInputsStore {
 	return &boardingInputsStore{
 		rdb:          rdb,
 		numOfRetries: numOfRetries,
-		retryDelay:   10 * time.Millisecond,
+		retryDelay:   retryDelay,
 	}
 }

50-56: Consider returning zero value instead of -1 on error.

Returning -1 as a sentinel value is unconventional in Go. The idiomatic approach is to return the zero value (0) and rely on the error to indicate failure.

 func (b *boardingInputsStore) Get(ctx context.Context) (int, error) {
 	num, err := b.rdb.Get(ctx, boardingInputsKey).Int()
 	if err != nil {
-		return -1, err
+		return 0, err
 	}
 	return num, nil
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72607e7 and 38ca0fa.

📒 Files selected for processing (3)
  • internal/core/application/ban.go (1 hunks)
  • internal/core/application/service.go (58 hunks)
  • internal/infrastructure/live-store/redis/boarding_inputs.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/core/application/service.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.

Applied to files:

  • internal/core/application/service.go
🧬 Code graph analysis (2)
internal/infrastructure/live-store/redis/boarding_inputs.go (3)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (1)
  • NewBoardingInputsStore (18-23)
internal/core/ports/live_store.go (1)
  • BoardingInputsStore (82-90)
internal/core/ports/tx_builder.go (1)
  • SignedBoardingInput (54-57)
internal/core/application/service.go (5)
internal/core/domain/vtxo.go (1)
  • Outpoint (15-18)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
internal/core/domain/round.go (2)
  • Round (41-59)
  • ForfeitTx (36-39)
internal/core/application/service_event.go (1)
  • RoundFailed (61-64)
internal/core/domain/round_event.go (2)
  • RoundFailed (39-43)
  • RoundEvent (9-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit tests
  • GitHub Check: integration tests
🔇 Additional comments (9)
internal/core/application/ban.go (1)

150-162: LGTM! Error handling properly added for GetUnsignedInputs.

The context-aware call with proper error handling follows the established pattern. Logging the warning and returning early on failure is appropriate for this background operation.

internal/core/application/service.go (8)

516-532: LGTM! Context and error handling properly propagated for cache operations.

The error handling pattern is consistent—logging the error for observability and returning a generic internal error to users. The control flow correctly separates the error case from the "already spent" case.


1107-1119: LGTM! Proper fallback pattern from cache to database.

The error handling gracefully falls back to the database when the cache lookup fails or returns nil. This ensures resilience while maintaining the expected behavior.


1547-1557: Consistent error handling for cache operations.

The pattern matches other usages in the codebase—log the error and return a generic internal error to the user.


1910-1930: LGTM! Context propagation with proper error handling.

The early return pattern ensures round.CommitmentTxid is only accessed when the round was successfully retrieved from the cache.


1932-1969: LGTM! Consistent cache access pattern.

The error handling mirrors the pattern established in SubmitForfeitTxs.


2960-2977: Good defensive error handling for cache failure.

The pattern of creating a RoundFailed event directly when the cache fails to retrieve the round is a robust fallback that ensures the round lifecycle completes even under adverse conditions.


3460-3492: LGTM! Comprehensive error handling for forfeit and boarding signature checks.

The early returns on errors are appropriate here—signaling completion when the state is unknown would be incorrect.


2193-2226: LGTM! Context propagation for intent cache operations.

The existing error handling is preserved while adding context support.

domain.RoundTopic, func(events []domain.Event) {
round := domain.NewRoundFromEvents(events)
go svc.propagateEvents(round)
go svc.propagateEvents(context.Background(), round)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since in this PR, beside other things, we are handling context shall we use one context on service level that would be used for cancellation so that everything is shutdown gracefully when service Stop(), which is also using context.Background, is invoked.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll do in another pr

// check if already spent by another offchain tx
if s.cache.OffchainTxs().Includes(vtxo.Outpoint) {
isSpent, err := s.cache.OffchainTxs().Includes(ctx, vtxo.Outpoint)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general comment for log handling introduced in this PR

 if err != nil {
     log.WithError(err).Errorf("failed to check spent status of inputs against tx in cache")
     return nil, errors.INTERNAL_ERROR.New("something went wrong").
         WithMetadata(map[string]any{"vtxos": vtxo.Outpoint.String()})
 }

 Issues:
 1. Double handling: Logging is handling. Log+return handles the error twice
 2. Lost context: Original error is discarded. "something went wrong" is not debuggable.
 3. Error interceptor should check error and if internal hide it from users

I would recommend to return error and then handle logging + sanitization at the gRPC interceptor level:
 - Log full error with context (once, at API boundary)
 - Return sanitized message to client based on error type

 This gives you: single log entry, full error chain for debugging, safe client messages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally agree, we'll do in another pr or this will get too messy

return session, nil
}
func (s *treeSigningSessionsStore) Delete(roundId string) {
func (s *treeSigningSessionsStore) Delete(_ context.Context, roundId string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goroutine channel pattern needs refactoring

changes to handling panics/timeout added in prev PR were not perfect imo.

The current pattern uses recover() to catch panics from writing to closed channels:

func() {
    defer func() {
        if r := recover(); r != nil {
            log.Warnf("watchNoncesCollected:recovered from panic: %v", r)
        }
    }()
    select {
    case ch <- struct{}{}:
    case <-ctx.Done():
    }
}()

Problems:
1. recover() masks a real race condition instead of fixing it
2. Race exists because Delete() closes channels while watchers might be sending
3. Complex nested selects add unnecessary cognitive load

Recommended fix - Watcher Ownership Pattern:

1. Watchers own the channels - only they send and close via defer close(ch)
2. Delete() never closes channels - just cancels context
3. Channels passed as arguments to goroutines (not read from struct fields)

func (s *treeSigningSessionsStore) watchNoncesCollected(
    ctx context.Context, roundId string, ch chan struct{},
) {
    defer close(ch)  // watcher owns the channel

    ticker := time.NewTicker(s.pollInterval)
    defer ticker.Stop()

    for {
        select {
        case <-ctx.Done():
            return  // defer close(ch) runs
        case <-ticker.C:
            if s.checkNoncesReady(ctx, roundId) {
                select {
                case ch <- struct{}{}:
                case <-ctx.Done():
                }
                return  // defer close(ch) runs
            }
        }
    }
}

func (s *treeSigningSessionsStore) Delete(ctx context.Context, roundId string) error {
    // ... redis cleanup ...

    s.lock.Lock()
    defer s.lock.Unlock()

    if s.currentRoundID == roundId && s.cancelRound != nil {
        s.cancelRound()  // watchers exit via ctx.Done(), close their channels
        s.cancelRound = nil
        s.currentRoundID = ""
        s.nonceCh = nil  // just nil, don't close
        s.sigsCh = nil
    }
    return nil
}

This eliminates the race entirely - no recover() needed.
This is applicable to watchSigsCollected, watchNoncesCollected, watchSessionCompletion

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done for confirmation session, more tricky for tree signing session because we use a map of channels. It's ok anyway because the lock guarantees atomicity, we'll optimize in a later pr, so we can get rid of the recover

Signatures: make(map[string]tree.TreePartialSigs),
}
s.sessions[roundId] = sess
s.sessions[roundId] = session
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-round model

Multiple rounds support in some tree_signing_session.go were added in prev PR as well
nonceChs map[string]chan struct{}
sigsChs map[string]chan struct{}
currently we agreed we have only one round and there are no parallel rounds for processing, also one server no cluster and we should simplify and be consistent like other redis live store impls that have only one round.

Current struct uses maps for channels:

type treeSigningSessionsStore struct {
    nonceChs map[string]chan struct{}
    sigsChs  map[string]chan struct{}
    ctxs     map[string]context.CancelFunc
}

Since there's only ever one active round at a time (confirmed by startRound() always calling Delete(existingRound.Id) first), these maps add unnecessary complexity.

Simplify to:
type treeSigningSessionsStore struct {
    rdb *redis.Client
    mu  sync.RWMutex

    // single active round (not maps)
    currentRoundID string
    nonceCh        chan struct{}
    sigsCh         chan struct{}
    cancelRound    context.CancelFunc

    pollInterval time.Duration
    numOfRetries int
    retryDelay   time.Duration
}

Benefits:
- Simpler code, fewer allocations
- No map cleanup needed
- Matches actual usage pattern
- Easier to reason about lifecycle

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in-memory cache, we'll add support once we are ready to support multiple batches

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but if we keep this we introduce inconsistency since in other stores we only support one round

if !exists {
return errors.TX_NOT_FOUND.New("offchain tx %s not found", txid).
WithMetadata(errors.TxNotFoundMetadata{Txid: txid})
offchainTx, err := s.cache.OffchainTxs().Get(ctx, txid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback masks potential bugs. We expect cache to always have the data, then a miss indicates something is wrong. Silently falling back to DB means:

  1. We won't know cache sync is broken
  2. Performance degrades (DB hit instead of cache)
  3. Could indicate data integrity issues

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually ok. We need to store the offchain tx in the cache because clients may call FinalizeTx before the db is updated, otherwise we could rely fully on db. If we don't find the tx in the cache, nor in the db we return an error. If data is not found in cache maybe we deleted it manually, this is intentional, i don't see drawbacks.


ctx := context.Background()
existingRound := s.cache.CurrentRound().Get(ctx)
existingRound, err := s.cache.CurrentRound().Get(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup not resilient to cache failures

In startRound(), if CurrentRound().Get() fails, the function returns early and no cleanup happens:

existingRound, err := s.cache.CurrentRound().Get(ctx)
if err != nil {
    log.WithError(err).Error("failed to get current round from cache")
    return  // NO CLEANUP!
}

If Redis is temporarily unavailable after a round failure:
- Old ForfeitTxs, ConfirmationSessions, TreeSigingSessions remain
- New round is never created (server stuck)

Fix: Move unconditional cleanup before the Get(), and don't return early.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this happens, it means we can't access the cache and can't do anything, not even cleaning it up, so we just return and stop any operation here, in an another pr we improve the handling of this scenario

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (5)
internal/core/application/service.go (5)

2514-2521: Missing roundAborted = true on confirmation session cache failure.

Per past review, when s.cache.ConfirmationSessions().Get(ctx) fails, the code logs the error and calls round.Fail() but does not set roundAborted = true. This is inconsistent with other error paths in this function and may cause the defer block to behave unexpectedly.

The past review already flagged this issue. Apply the fix:

 case <-time.After(roundTiming.confirmationDuration()):
 	session, err := s.cache.ConfirmationSessions().Get(ctx)
 	if err != nil {
+		roundAborted = true
 		log.WithError(err).Error("failed to get confirmation session from cache")
 		round.Fail(errors.INTERNAL_ERROR.New(
 			"failed to get confirmation session from cache: %s", err,
 		))
 		return
 	}

3344-3348: Error in connector index fetch silently aborts event propagation.

This was flagged in a past review. When GetConnectorsIndexes fails, the function returns early without propagating the RoundFinalizationStarted event. Clients won't be notified of the round finalization.


3358-3363: Error fetching selected intents prevents RoundFailed propagation.

This was flagged in a past review. When GetSelectedIntents fails, the RoundFailed event is not propagated to clients, leaving them unaware that the round failed.


3391-3395: Error in Init prevents BatchStarted event propagation.

This was flagged in a past review. When s.cache.ConfirmationSessions().Init(ctx, hashedIntentIds) fails, the function returns early and the BatchStarted event is never sent to clients.


2278-2292: Potential nil pointer dereference when logging cache cleanup errors.

If existingRound is nil (valid when no round exists yet), accessing existingRound.Id in the log messages at lines 2280, 2285, and 2290 will panic. The nil check at line 2293 comes after these log statements.

Apply this diff to guard against nil:

 	// Reset the cache for the new batch
 	if err := s.cache.ForfeitTxs().Reset(ctx); err != nil {
-		log.WithError(err).Warnf(
-			"failed to delete forfeit txs from cache for round %s", existingRound.Id,
-		)
+		roundId := "<none>"
+		if existingRound != nil {
+			roundId = existingRound.Id
+		}
+		log.WithError(err).Warnf("failed to delete forfeit txs from cache for round %s", roundId)
 	}
 	if err := s.cache.Intents().DeleteVtxos(ctx); err != nil {
-		log.WithError(err).Warnf(
-			"failed to delete spent vtxos from cache after round %s", existingRound.Id,
-		)
+		roundId := "<none>"
+		if existingRound != nil {
+			roundId = existingRound.Id
+		}
+		log.WithError(err).Warnf("failed to delete spent vtxos from cache after round %s", roundId)
 	}
 	if err := s.cache.ConfirmationSessions().Reset(ctx); err != nil {
-		log.WithError(err).Errorf(
-			"failed to reset confirmation session from cache for round %s", existingRound.Id,
-		)
+		roundId := "<none>"
+		if existingRound != nil {
+			roundId = existingRound.Id
+		}
+		log.WithError(err).Errorf("failed to reset confirmation session from cache for round %s", roundId)
 	}

Alternatively, move these cleanup operations inside the if existingRound != nil block or extract the round ID once with a nil check before all log statements.

🧹 Nitpick comments (1)
internal/infrastructure/live-store/inmemory/confirmation_session.go (1)

60-64: Consider potential goroutine leak if no receiver.

The goroutine sending to sessionCompleteCh will block indefinitely if no receiver calls SessionCompleted(). This is likely acceptable given the expected usage pattern, but consider using a buffered channel of size 1 if the signal only needs to be sent once.

-	sessionCompleteCh: make(chan struct{}),
+	sessionCompleteCh: make(chan struct{}, 1),

And in Init:

-	c.sessionCompleteCh = make(chan struct{})
+	c.sessionCompleteCh = make(chan struct{}, 1)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38ca0fa and 341849f.

📒 Files selected for processing (7)
  • internal/core/application/service.go (56 hunks)
  • internal/infrastructure/live-store/inmemory/confirmation_session.go (4 hunks)
  • internal/infrastructure/live-store/inmemory/forfeits.go (5 hunks)
  • internal/infrastructure/live-store/live_store_test.go (8 hunks)
  • internal/infrastructure/live-store/redis/confirmation_sessions.go (5 hunks)
  • internal/infrastructure/live-store/redis/forfeits.go (8 hunks)
  • internal/infrastructure/live-store/redis/tree_signing_session.go (6 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/infrastructure/live-store/redis/confirmation_sessions.go
  • internal/infrastructure/live-store/inmemory/forfeits.go
  • internal/infrastructure/live-store/inmemory/confirmation_session.go
  • internal/infrastructure/live-store/redis/forfeits.go
  • internal/core/application/service.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.

Applied to files:

  • internal/core/application/service.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern.

Applied to files:

  • internal/infrastructure/live-store/live_store_test.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern. A common bug is manually base64-decoding the string and then passing b64=true, which causes a double-decode error.

Applied to files:

  • internal/infrastructure/live-store/live_store_test.go
🧬 Code graph analysis (6)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
internal/infrastructure/live-store/inmemory/forfeits.go (3)
pkg/ark-lib/tree/tx_tree.go (1)
  • FlatTxTree (36-36)
internal/core/domain/vtxo.go (1)
  • Outpoint (15-18)
internal/core/ports/tx_builder.go (1)
  • ValidForfeitTx (49-52)
internal/infrastructure/live-store/redis/tree_signing_session.go (3)
internal/infrastructure/live-store/inmemory/tree_signing_session.go (1)
  • NewTreeSigningSessionsStore (19-26)
internal/core/ports/live_store.go (2)
  • TreeSigningSessionsStore (70-80)
  • MusigSigningSession (104-110)
pkg/ark-lib/tree/musig2.go (2)
  • TreeNonces (32-32)
  • TreePartialSigs (98-98)
internal/infrastructure/live-store/inmemory/confirmation_session.go (1)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
internal/core/application/service.go (5)
internal/core/ports/live_store.go (2)
  • ConfirmationSessions (112-116)
  • TimedIntent (92-97)
internal/core/domain/round.go (1)
  • Round (41-59)
internal/core/application/service_event.go (1)
  • RoundFailed (61-64)
internal/core/application/round_report.go (1)
  • BuildCommitmentTxStage (16-16)
internal/core/domain/conviction.go (2)
  • Crime (36-40)
  • CrimeTypeMusig2InvalidSignature (16-16)
internal/infrastructure/live-store/live_store_test.go (4)
internal/core/application/types.go (2)
  • Outpoint (177-177)
  • VOut (131-131)
internal/core/domain/vtxo.go (1)
  • Outpoint (15-18)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
pkg/ark-lib/tree/musig2.go (2)
  • TreeNonces (32-32)
  • TreePartialSigs (98-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build and Scan
  • GitHub Check: unit tests
  • GitHub Check: integration tests
🔇 Additional comments (41)
internal/infrastructure/live-store/redis/tree_signing_session.go (8)

38-50: LGTM!

Constructor properly initializes all fields including the new numOfRetries and retryDelay for retry logic.


52-91: LGTM!

The New method properly implements retry logic with WATCH/TxPipelined for transactional safety. The use of context.Background() for the watch goroutines is appropriate to decouple their lifetime from the request context.


93-153: LGTM!

The Get method correctly handles the redis.Nil case by returning nil, nil for non-existent sessions, with proper error propagation for other failures.


155-195: LGTM!

The Delete method properly cleans up Redis state and local resources (cancel functions, channels) with correct retry logic.


197-223: LGTM!

The AddNonces method properly validates session existence before writing and uses consistent retry logic.


225-251: LGTM!

The AddSignatures method follows the same consistent pattern as AddNonces with proper validation and retry logic.


265-314: LGTM!

The watchNoncesCollected goroutine properly uses a ticker for polling and includes appropriate panic recovery and context cancellation handling.


367-378: LGTM!

The checkSessionExists helper provides clear validation with descriptive error messages.

internal/infrastructure/live-store/inmemory/confirmation_session.go (3)

21-26: LGTM!

Constructor properly initializes the mutex and completion channel.


28-42: LGTM!

The Init method properly creates a fresh sessionCompleteCh for each new session.


92-96: LGTM!

The Initialized method properly uses read lock for thread-safe access.

internal/infrastructure/live-store/redis/confirmation_sessions.go (4)

49-63: LGTM!

Constructor properly initializes the store and starts the background watch goroutine with a cancellable context.


65-96: LGTM!

The Init method now correctly passes all relevant keys to Watch for proper optimistic locking, addressing the previous review feedback.


135-159: LGTM!

The Get method properly reads all confirmation state with appropriate error handling.


212-256: LGTM!

The watchSessionCompletion goroutine properly handles polling, context cancellation, and includes panic recovery. The use of sync.Once ensures the completion signal is only sent once.

internal/infrastructure/live-store/inmemory/forfeits.go (5)

30-88: LGTM!

The Init method properly initializes forfeit state with correct locking and validation.


90-116: LGTM!

The Sign method properly validates state and uses the builder for transaction verification.


127-133: Deadlock risk resolved but Reset error ignored.

The defer ordering now correctly unlocks before calling Reset, avoiding the deadlock. However, the // nolint comment suggests the Reset error return is intentionally ignored. Since Reset for in-memory store always returns nil, this is acceptable, but documenting why the error is ignored would improve maintainability.


153-168: LGTM!

The AllSigned method now properly acquires read lock before iterating over forfeitTxs, addressing the previous data race concern.


188-194: LGTM!

The GetConnectorsIndexes method now returns a cloned map via maps.Clone, preventing external mutation of internal state.

internal/infrastructure/live-store/redis/forfeits.go (7)

29-38: LGTM!

Constructor properly initializes all fields including retry configuration.


40-116: LGTM!

The Init method properly marshals data, validates constraints, and uses consistent retry logic with WATCH/TxPipelined.


118-182: LGTM!

The Sign method properly validates state, verifies transactions through the builder, and stores results with consistent retry logic.


184-206: LGTM!

The Reset method now includes time.Sleep(s.retryDelay) between retry attempts, addressing the previous review feedback.


208-237: LGTM!

The Pop method properly validates forfeit data and handles the Reset error return, unlike the in-memory version.


239-264: LGTM!

The AllSigned method properly checks all forfeits and includes error handling for malformed data.


266-316: LGTM!

The remaining methods (GetUnsignedInputs, Len, GetConnectorsIndexes) properly handle Redis reads with appropriate error handling. The typo "fet" has been corrected to "get".

internal/core/application/service.go (7)

516-532: Context-aware cache check pattern looks correct.

The error handling for s.cache.OffchainTxs().Includes(ctx, vtxo.Outpoint) properly logs the error and returns an internal error to the client without exposing internal details. The separation of error check from the spent check is clean.


1107-1119: Fallback to DB when cache miss is noted but acceptable.

Per past review comment from sekulicd, falling back silently to DB when cache lookup fails can mask cache sync issues. However, the error is now logged at Error level (line 1109), which provides visibility. The fallback ensures the operation can complete even during transient cache issues.


2374-2380: Early return on cache error before defer setup.

When s.cache.CurrentRound().Get(ctx) fails, the function returns early before the defer is set up. This is acceptable since s.startRound() is called to restart the round lifecycle, and startRound handles its own cleanup. However, the events from this partial round attempt won't be saved.


2407-2418: Proper abort handling for cache errors.

The pattern correctly sets roundAborted = true before returning when s.cache.Intents().Len(ctx) fails. This ensures the defer block at line 2387 will properly restart the round instead of proceeding to finalization.


2605-2621: Good error recovery pattern with explicit event creation.

When s.cache.CurrentRound().Get(ctx) fails in startFinalization, the code creates a synthetic RoundFailed event and saves it to ensure clients are notified. This is a robust pattern that maintains event consistency even during cache failures.


2967-2984: Consistent error recovery pattern in finalizeRound.

The error handling when s.cache.CurrentRound().Get(ctx) fails mirrors the pattern in startFinalization - creating a synthetic RoundFailed event and saving it before restarting the round. This consistency is good for maintainability.


3468-3498: Context and error handling in checkForfeitsAndBoardingSigsSent.

The function correctly uses context.Background() since it runs as a goroutine that may outlive its caller. The error handling logs issues and returns early, which is appropriate for a status-checking helper function.

internal/infrastructure/live-store/live_store_test.go (7)

111-215: IntentStore test properly updated for context-aware API.

The test correctly uses t.Context() for per-test context and updates all store method calls to pass the context. Error handling assertions are properly placed after each operation that can fail.


234-243: Concurrent forfeit signing test is well-structured.

The test properly uses sync.WaitGroup to coordinate concurrent Sign operations and waits for all goroutines to complete before asserting the final state. This tests the thread-safety of the forfeit signing mechanism.


308-348: OffChainTxStore test has good coverage.

The test covers all key operations (Add, Get, Includes, Remove) with proper error handling assertions. It correctly tests both existing and non-existent entries for Get and Remove operations.


374-389: Concurrent confirmation test with proper timeout handling.

The test uses a sync.WaitGroup for concurrent confirmations and a select with timeout to wait for session completion. The 5-second timeout is reasonable for test stability.


717-750: Thorough concurrent signature submission test.

The test verifies both successful concurrent signature submission and that subsequent overwrite attempts don't modify existing signatures. The sigsMatch helper provides detailed error messages for debugging test failures.


45-54: Pre-computed intent ID hashes for confirmation tests.

The fixtures pre-compute SHA256 hashes of intent IDs, which aligns with how the confirmation session identifies intents by their hash. This approach keeps the test setup clean and deterministic.


533-541: No issue found. Project supports Go 1.21+ features.

The project requires Go 1.25.3 (from go.mod), which fully supports sync.WaitGroup.Go() added in Go 1.21. The code is compatible with the project's minimum Go version.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (7)
internal/core/application/service.go (7)

516-521: Error handling pattern: context discarded in user-facing message.

The code logs the actual error but returns a generic "something went wrong" message. This pattern appears throughout the file and was previously flagged by reviewers as problematic because:

  1. Logs and returns handle the error twice
  2. Original error context is lost for the client
  3. Debugging becomes harder

However, per past review discussions, this was deferred to a future PR to avoid making this one too messy.


1107-1119: Silent fallback masks cache sync issues.

When cache.OffchainTxs().Get() returns nil (not an error), the code silently falls back to the database. As noted in a previous review by sekulicd, this fallback masks potential bugs:

  1. Cache is expected to always have the data
  2. A miss indicates cache sync is broken
  3. Silent DB fallback means degraded performance goes unnoticed

Consider logging a warning when falling back to the DB to make cache misses visible for debugging.


3345-3349: Error in connector index fetch silently aborts event propagation.

When GetConnectorsIndexes fails, the function returns early without propagating the round finalization event. This leaves clients without notification of the round completion, which is critical for them to process results.

Consider logging the error and continuing with an empty connector map, or propagate a partial event that indicates connector data is unavailable.


3359-3364: RoundFailed event not propagated when intent fetch fails.

If GetSelectedIntents fails, the RoundFailed event is never sent to clients. This is critical because clients need to know the round failed to properly handle their pending intents and retry.

Consider propagating the failure event with empty topics rather than silently returning, so clients are notified even when intent retrieval fails.


3392-3395: BatchStarted event suppressed on Init failure.

When ConfirmationSessions().Init() fails, the function returns early and the BatchStarted event is never propagated. This prevents clients from knowing that a batch was attempted, which could lead to stale state on the client side.

Consider propagating the event even if session initialization fails, or ensure callers can handle the missing event gracefully.


2515-2522: Missing roundAborted = true on cache failure path.

When s.cache.ConfirmationSessions().Get(ctx) fails, the code calls round.Fail() and returns but does not set roundAborted = true. This is inconsistent with other error paths in this function (e.g., lines 2410-2413, 2427-2429).

The defer block at lines 2388-2406 checks roundAborted to decide whether to call startRound() or startFinalization(). Without setting roundAborted = true, the defer will attempt finalization instead of properly aborting and restarting the round.

Apply this diff to ensure consistent abort behavior:

 case <-time.After(roundTiming.confirmationDuration()):
 	session, err := s.cache.ConfirmationSessions().Get(ctx)
 	if err != nil {
+		roundAborted = true
 		log.WithError(err).Error("failed to get confirmation session from cache")
 		round.Fail(errors.INTERNAL_ERROR.New(
 			"failed to get confirmation session from cache: %s", err,
 		))
 		return
 	}

2553-2558: Missing roundAborted = true on boarding inputs cache failure.

Similar to the confirmation session error handling, when s.cache.BoardingInputs().Set(ctx, numOfBoardingInputs) fails, the code calls round.Fail() but does not set roundAborted = true. This inconsistency causes the defer block to attempt saving events and calling startFinalization() instead of properly aborting the round.

Apply this diff:

 	if err := s.cache.BoardingInputs().Set(ctx, numOfBoardingInputs); err != nil {
+		roundAborted = true
 		round.Fail(errors.INTERNAL_ERROR.New(
 			"failed to update boarding inputs in cache: %s", err,
 		))
 		return
 	}
🧹 Nitpick comments (1)
internal/core/application/service.go (1)

2293-2301: Redundant nil check inside non-nil block.

Line 2293 checks if existingRound != nil again, but this entire code block (lines 2277-2312) is already inside the if existingRound != nil check from line 2276. The inner check at line 2293 is redundant.

Apply this diff to remove the redundant check:

 	}
 	if existingRound != nil {
 		// Reset the cache for the new batch
 		if err := s.cache.ForfeitTxs().Reset(ctx); err != nil {
 			log.WithError(err).Warnf(
 				"failed to delete forfeit txs from cache for round %s", existingRound.Id,
 			)
 		}
 		if err := s.cache.Intents().DeleteVtxos(ctx); err != nil {
 			log.WithError(err).Warnf(
 				"failed to delete spent vtxos from cache after round %s", existingRound.Id,
 			)
 		}
 		if err := s.cache.ConfirmationSessions().Reset(ctx); err != nil {
 			log.WithError(err).Errorf(
 				"failed to reset confirmation session from cache for round %s", existingRound.Id,
 			)
 		}
-		if existingRound != nil {
-			if existingRound.Id != "" {
-				if err := s.cache.TreeSigingSessions().Delete(ctx, existingRound.Id); err != nil {
-					log.WithError(err).Errorf(
-						"failed to delete tree signing sessions for round from cache %s",
-						existingRound.Id,
-					)
-				}
-			}
-			if existingRound.CommitmentTxid != "" {
-				if err := s.cache.BoardingInputs().DeleteSignatures(
-					ctx, existingRound.CommitmentTxid,
-				); err != nil {
-					log.WithError(err).Errorf(
-						"failed to delete boarding input signatures from cache for round %s",
-						existingRound.Id,
-					)
-				}
-			}
+		if existingRound.Id != "" {
+			if err := s.cache.TreeSigingSessions().Delete(ctx, existingRound.Id); err != nil {
+				log.WithError(err).Errorf(
+					"failed to delete tree signing sessions for round from cache %s",
+					existingRound.Id,
+				)
+			}
+		}
+		if existingRound.CommitmentTxid != "" {
+			if err := s.cache.BoardingInputs().DeleteSignatures(
+				ctx, existingRound.CommitmentTxid,
+			); err != nil {
+				log.WithError(err).Errorf(
+					"failed to delete boarding input signatures from cache for round %s",
+					existingRound.Id,
+				)
+			}
 		}
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10dde8e and 66a1a3a.

📒 Files selected for processing (4)
  • internal/core/application/service.go (56 hunks)
  • internal/core/domain/offchain_tx.go (1 hunks)
  • internal/infrastructure/live-store/inmemory/confirmation_session.go (4 hunks)
  • internal/infrastructure/live-store/redis/confirmation_sessions.go (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/infrastructure/live-store/inmemory/confirmation_session.go
  • internal/infrastructure/live-store/redis/confirmation_sessions.go
  • internal/core/application/service.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.

Applied to files:

  • internal/core/application/service.go
🧬 Code graph analysis (4)
internal/core/domain/offchain_tx.go (1)
internal/core/domain/round.go (1)
  • Stage (30-34)
internal/infrastructure/live-store/inmemory/confirmation_session.go (1)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
internal/core/application/service.go (8)
api-spec/protobuf/gen/ark/v1/types.pb.go (6)
  • Outpoint (24-30)
  • Outpoint (43-43)
  • Outpoint (58-60)
  • Intent (441-447)
  • Intent (460-460)
  • Intent (475-477)
internal/infrastructure/wallet/wallet_client.go (1)
  • New (31-56)
internal/core/ports/live_store.go (1)
  • ConfirmationSessions (112-116)
internal/core/domain/round.go (2)
  • Round (41-59)
  • ForfeitTx (36-39)
internal/core/domain/round_event.go (2)
  • RoundFailed (39-43)
  • RoundEvent (9-12)
internal/core/application/round_report.go (4)
  • BuildCommitmentTxStage (16-16)
  • CreateTreeNoncesOp (24-24)
  • TreeSigningStage (17-17)
  • ForfeitTxsCollectionStage (18-18)
internal/core/domain/conviction.go (2)
  • Crime (36-40)
  • CrimeTypeMusig2InvalidSignature (16-16)
pkg/ark-lib/tree/tx_tree.go (1)
  • TxTree (19-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build and Scan
  • GitHub Check: integration tests
  • GitHub Check: unit tests
🔇 Additional comments (8)
internal/core/domain/offchain_tx.go (1)

244-245: LGTM: Clean state reset on finalization.

Explicitly clearing the failure flag and reason when an offchain transaction is finalized ensures a clean, consistent state. This defensive approach prevents stale failure information from persisting after successful finalization.

internal/core/application/service.go (1)

3469-3499: LGTM: Context usage aligns with design intent.

This function correctly uses context.Background() on line 3469. Based on learnings, background goroutines that manage asynchronous state checks should be decoupled from request lifetimes to prevent premature cancellation. The error handling appropriately logs and returns early when cache operations fail.

internal/infrastructure/live-store/inmemory/confirmation_session.go (2)

84-87: LGTM: Nil-safe channel management.

The nil-check before closing and the creation of a new channel ensure safe lifecycle management. This addresses the previous review concern about potential panics from closing a nil channel.


28-99: LGTM: Context-aware API aligns with live-store interface.

All public methods now accept context.Context and return appropriate error types. While the context is currently unused in the in-memory implementation (as indicated by _ prefix), this aligns the API surface with the Redis implementation and prepares for future context-aware operations like cancellation or timeout handling.

internal/infrastructure/live-store/redis/confirmation_sessions.go (4)

71-95: LGTM: Watch keys and retry delay properly implemented.

The Init method now correctly:

  1. Specifies keys to watch (line 90: keys...), enabling proper optimistic locking
  2. Includes retry delay between attempts (line 93)
  3. Clears existing data before setting new values

These changes address previous review concerns about missing watch keys and tight retry loops.


114-127: LGTM: Transactional read within Watch callback.

Line 116 now correctly uses tx.Get instead of s.rdb.Get, ensuring the read of confirmationNumConfirmedKey is part of the watched transaction. This prevents race conditions where another process modifies the counter between the read and the write.

The keys are properly passed to Watch at line 127, and retry delay is included at line 130.


170-184: LGTM: Retry delay now included in Reset.

The retry loop now includes time.Sleep(s.retryDelay) at line 183, addressing the previous review concern about tight-loop retries during contention. This matches the retry pattern in Init and Confirm.


65-205: LGTM: Context-aware API with robust retry patterns.

The Redis implementation now provides a context-aware API surface that matches the in-memory version. Key improvements:

  1. All public methods accept context.Context for cancellation/timeout support
  2. Consistent retry patterns with proper delays across all methods
  3. Explicit error returns with informative messages
  4. Optimistic locking with WATCH on appropriate keys

The implementation addresses all concerns from previous reviews regarding race conditions, missing watch keys, and tight retry loops.

@altafan altafan merged commit 6340ddc into master Nov 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants