Skip to content

Conversation

@louisinger
Copy link
Collaborator

@louisinger louisinger commented Nov 22, 2025

@altafan @sekulicd please review

Summary by CodeRabbit

  • Refactor
    • Improved multi-round signing coordination with per-round contexts, isolated per-round notification channels, and safer watcher lifecycle handling for more reliable, race-resistant signing.
    • Hardened confirmation session processing with cancellable background watchers, synchronized lifecycle control, guarded completion signaling, and more predictable reset and cleanup behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 22, 2025

Walkthrough

Refactors tree signing session coordination from global state to per-round maps with per-round cancellable contexts and RWMutex protection; adds context-aware polling watchers and lifecycle controls to the confirmation sessions store, introducing cancellable context management, safer channel signaling, and guarded resets.

Changes

Cohort / File(s) Summary
Per-Round Tree Signing
internal/infrastructure/live-store/redis/tree_signing_session.go
Replace single global nonce/sig channels with maps (nonceChs, sigsChs) keyed by roundId; add ctxs map and lock sync.RWMutex. Create per-round cancellable contexts and channels on init, start per-round watchers with context. Delete cancels context, closes channels, and removes map entries. Watchers accept context.Context, poll Redis with cancellation support, emit on per-round channels with panic protection, and return after signaling.
Confirmation Sessions Lifecycle
internal/infrastructure/live-store/redis/confirmation_sessions.go
Add lock sync.RWMutex, ctx context.Context, cancel context.CancelFunc, and pollInterval time.Duration to the store. NewConfirmationSessionsStore creates a cancellable context and starts watchSessionCompletion(ctx). Reset cancels/recreates context, restarts watcher, and resets channel under lock. SessionCompleted reads channel under RLock. watchSessionCompletion now accepts ctx, polls Redis using select with ctx.Done(), uses a once guard and safe send/panic handling when signaling completion.

Sequence Diagram

sequenceDiagram
    participant Client
    participant TreeMgr as TreeSigningManager
    participant Round as Per-Round State
    participant Redis
    participant Watcher
    participant ConfMgr as ConfirmationStore
    participant ConfWatcher

    Client->>TreeMgr: StartRound(roundId)
    activate TreeMgr
    TreeMgr->>Round: create ctxs[roundId], nonceChs[roundId], sigsChs[roundId]
    TreeMgr->>Watcher: start watchNoncesCollected(ctxs[roundId])
    TreeMgr->>Watcher: start watchSigsCollected(ctxs[roundId])
    deactivate TreeMgr

    par Per-round watchers
        loop poll until threshold or ctx cancel
            Watcher->>Redis: GET nonce/sig state
            Redis-->>Watcher: current counts
            alt threshold met
                Watcher->>Round: send on nonceChs/sigsChs (safe send)
                Watcher-->Watcher: return
            else ctx cancelled
                Watcher-->Watcher: exit
            end
        end
    end

    Client->>ConfMgr: NewConfirmationStore()
    activate ConfMgr
    ConfMgr->>ConfWatcher: start watchSessionCompletion(ctx)
    deactivate ConfMgr

    loop confirmation polling
        ConfWatcher->>Redis: poll session state
        Redis-->>ConfWatcher: state
        alt completed
            ConfWatcher->>ConfMgr: safe send on sessionCompleteCh (once)
        else ctx cancelled
            ConfWatcher-->ConfWatcher: exit
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Verify RWMutex placement and correctness around map/channel access in both files.
  • Ensure contexts are created/cancelled correctly and watcher goroutines exit (no leaks).
  • Confirm safe send patterns and panic protection when channels may be closed concurrently.
  • Review Reset behavior in confirmation store — correct channel restart and once-guard semantics.
  • Check clean removal of per-round resources in Delete and absence of races/use-after-close.

Suggested reviewers

  • altafan

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'fix Redis LiveStore' is overly vague and generic. It describes a general category of changes (fixing Redis LiveStore) but does not clearly indicate the main focus or nature of the fixes, such as concurrency controls, per-round coordination, or context-based lifecycle management. Provide a more specific title that highlights the primary change, such as 'Add per-round coordination and context-based lifecycle to Redis tree signing and confirmation sessions' or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-tree-signing-session-redis

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
internal/infrastructure/live-store/redis/tree_signing_session.go (4)

26-33: Per‑round maps + RWMutex look correct, but rely on strict New/Delete discipline

The move to lock + {nonce,sigs}Chs/ctxs maps is sound and removes the previous global‑state coupling; map access is properly guarded with RWMutex.

However, this design assumes:

  • Every New(roundId, …) is eventually paired with Delete(roundId); otherwise the maps (and cancel funcs/channels) grow without bound.
  • New is not called twice for the same roundId without an intervening Delete; currently a second New overwrites ctxs[roundId] and channels without cancelling/closing the previous ones, which is a leak and can leave stale watchers around.

Consider at least:

  • Documenting these invariants at the interface level, and/or
  • Adding a defensive check in New that logs (or even rejects) if a session already exists for roundId before overwriting.

Also applies to: 38-41


45-49: Lock scope in New is broader than needed and includes Redis I/O

New takes s.lock.Lock() around:

  • Redis writes (HSet), and
  • Per‑round map setup + watcher goroutine spawning.

Only the map mutations and the hand‑off of per‑round channels/ctxs actually require the mutex; holding the lock across HSet can unnecessarily serialize creation of independent sessions and block Delete/NoncesCollected/SignaturesCollected for other rounds if Redis is slow.

You could narrow the critical section roughly like:

 func (s *treeSigningSessionsStore) New(
   roundId string, uniqueSignersPubKeys map[string]struct{},
 ) *ports.MusigSigningSession {
-  s.lock.Lock()
-  defer s.lock.Unlock()
-
-  ctx := context.Background()
+  ctx := context.Background()
   metaKey := fmt.Sprintf(treeSessMetaKeyFmt, roundId)
   cosignersBytes, _ := json.Marshal(uniqueSignersPubKeys)
   meta := map[string]interface{}{
     "Cosigners":   cosignersBytes,
     "NbCosigners": len(uniqueSignersPubKeys) + 1, // operator included
   }
   s.rdb.HSet(ctx, metaKey, meta)
 
-  watchCtx, cancel := context.WithCancel(context.Background())
-  s.ctxs[roundId] = cancel
-  s.nonceChs[roundId] = make(chan struct{})
-  s.sigsChs[roundId] = make(chan struct{})
-
-  go s.watchNoncesCollected(watchCtx, roundId)
-  go s.watchSigsCollected(watchCtx, roundId)
+  watchCtx, cancel := context.WithCancel(context.Background())
+  s.lock.Lock()
+  s.ctxs[roundId] = cancel
+  s.nonceChs[roundId] = make(chan struct{})
+  s.sigsChs[roundId] = make(chan struct{})
+  s.lock.Unlock()
+
+  go s.watchNoncesCollected(watchCtx, roundId)
+  go s.watchSigsCollected(watchCtx, roundId)

This keeps correctness while avoiding holding the lock during Redis I/O.

Also applies to: 60-66


127-149: Delete semantics are safe, but round cleanup depends entirely on callers

The Delete implementation looks race‑safe:

  • You take the write lock while cancelling the ctx and closing/removing per‑round channels, so map access is well‑synchronized.
  • Closing channels under the lock and accessing them via RLock in watchers avoids concurrent map writes.

Two behavioral points to be aware of:

  1. If Delete(roundId) is never called for a completed/aborted round, the per‑round ctx cancel func and channels remain in the maps indefinitely. Over time this is a memory/leak risk on a long‑lived node.

  2. If Delete is called before the watcher has signaled, it will:

    • Cancel the watcher via ctx, causing it to return without signaling, and
    • Close the per‑round channel; any goroutine blocked on <-NoncesCollected(roundId)/<-SignaturesCollected(roundId) will then unblock due to close rather than an explicit send.

Both behaviors may be acceptable, but they are observable. I’d recommend:

  • Explicitly documenting that Delete both tears down the watcher and “unblocks waiters” via channel close, and
  • Ensuring call‑sites always pair New with Delete once the session is done.

186-231: Watcher loops busy‑poll Redis and ignore some errors; use pollInterval and improve logging

Both watchNoncesCollected and watchSigsCollected:

  • Run a tight for loop with no sleep/backoff; on “not ready yet” they just continue, effectively hammering Redis and burning CPU.
  • Treat HGetAll errors the same as “missing meta” (len(meta) == 0) and just spin without logging or backoff.
  • Log parse errors and recovered panics without including roundId, which makes multi‑round debugging harder.
  • Duplicate almost identical logic between the two functions.

Given pollInterval already exists, you can introduce inexpensive backoff and better error logging with a small change. For example, in watchNoncesCollected:

func (s *treeSigningSessionsStore) watchNoncesCollected(ctx context.Context, roundId string) {
  metaKey := fmt.Sprintf(treeSessMetaKeyFmt, roundId)
  noncesKey := fmt.Sprintf(treeSessNoncesKeyFmt, roundId)
  for {
    select {
    case <-ctx.Done():
      return
    default:
-      meta, err := s.rdb.HGetAll(ctx, metaKey).Result()
-      if err != nil || len(meta) == 0 {
-        continue
-      }
+      meta, err := s.rdb.HGetAll(ctx, metaKey).Result()
+      if err != nil {
+        log.Warnf("watchNoncesCollected: failed to fetch meta for round %s: %v", roundId, err)
+        time.Sleep(s.pollInterval)
+        continue
+      }
+      if len(meta) == 0 {
+        time.Sleep(s.pollInterval)
+        continue
+      }
       nbCosigners := 0
       if _, err := fmt.Sscanf(meta["NbCosigners"], "%d", &nbCosigners); err != nil {
-        log.Warnf("watchNoncesCollected:failed to parse NbCosigners: %v", err)
+        log.Warnf("watchNoncesCollected: failed to parse NbCosigners for round %s: %v", roundId, err)
+        time.Sleep(s.pollInterval)
         continue
       }
       noncesMap, _ := s.rdb.HGetAll(ctx, noncesKey).Result()
       if len(noncesMap) == nbCosigners-1 {
         // unchanged send logic...
         return
       }
+      time.Sleep(s.pollInterval)
    }
  }
}

And mirror the same changes (including roundId in log messages and a time.Sleep(s.pollInterval) on “not ready / error” paths) in watchSigsCollected.

Optionally, you could also factor the common watcher logic into a shared helper that takes the key suffix and log prefix, to avoid duplication.

Also applies to: 233-278

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7afe52a and 02dc9a6.

📒 Files selected for processing (1)
  • internal/infrastructure/live-store/redis/tree_signing_session.go (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/infrastructure/live-store/redis/tree_signing_session.go (1)
internal/core/ports/live_store.go (2)
  • TreeSigningSessionsStore (70-80)
  • MusigSigningSession (104-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit tests
  • GitHub Check: integration tests
  • GitHub Check: Build and Scan

Comment on lines 174 to 184
func (s *treeSigningSessionsStore) NoncesCollected(roundId string) <-chan struct{} {
return s.nonceCh
s.lock.RLock()
defer s.lock.RUnlock()
return s.nonceChs[roundId]
}

func (s *treeSigningSessionsStore) SignaturesCollected(roundId string) <-chan struct{} {
return s.sigsCh
s.lock.RLock()
defer s.lock.RUnlock()
return s.sigsChs[roundId]
}
Copy link
Contributor

@coderabbitai coderabbitai bot Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

NoncesCollected / SignaturesCollected can return nil channels

Both access the maps under RLock and return the channel value directly:

s.lock.RLock()
defer s.lock.RUnlock()
return s.nonceChs[roundId]

If roundId is unknown (no New yet, or already Deleted), this returns a nil channel. Any caller doing a bare <-store.NoncesCollected(id) / <-store.SignaturesCollected(id) will then block forever, which is a change from the previous “single global channel” behavior.

At minimum, consider:

func (s *treeSigningSessionsStore) NoncesCollected(roundId string) <-chan struct{} {
- s.lock.RLock()
- defer s.lock.RUnlock()
- return s.nonceChs[roundId]
+ s.lock.RLock()
+ ch, ok := s.nonceChs[roundId]
+ s.lock.RUnlock()
+ if !ok {
+   log.Warnf("NoncesCollected called for unknown roundId %s", roundId)
+ }
+ return ch
}

(and same for SignaturesCollected).

Please also verify that all call‑sites only call these methods after a successful New(roundId, …), and that they handle the possibility of nil appropriately.


I'll verify the review comment by examining the code implementation and all call sites. Let me gather the necessary information.


---

🏁 Script executed:

```shell
#!/bin/bash
# Find the file and examine its structure
fd -t f "tree_signing_session.go" -x cat {}

Length of output: 11190


🏁 Script executed:

#!/bin/bash
# Find all call sites of NoncesCollected and SignaturesCollected
rg -n "NoncesCollected|SignaturesCollected" --type go -B2 -A2

Length of output: 5938


🏁 Script executed:

#!/bin/bash
# Search for the New() and Delete() methods to understand lifecycle
rg -n "func.*New|func.*Delete" "internal/infrastructure/live-store/redis/tree_signing_session.go" -A10

Length of output: 1204


🏁 Script executed:

#!/bin/bash
# Check how channels are used in select statements or other patterns
rg -n "<-.*NoncesCollected|<-.*SignaturesCollected" --type go -B2 -A2

Length of output: 983


🏁 Script executed:

#!/bin/bash
# Look for nil channel handling patterns in the codebase
rg -n "if.*nil" "internal/infrastructure/live-store/redis/tree_signing_session.go" -B2 -A2

Length of output: 2684


I need to examine the context around the call sites in service.go to verify whether New() is guaranteed to be called first.

Now let me examine the exact context of how these methods are called in service.go:

Let me examine the actual service.go code context around those call sites:


🏁 Script executed:

#!/bin/bash
# Get the context around the NoncesCollected call in service.go
sed -n '2500,2530p' internal/core/application/service.go

Length of output: 1155


🏁 Script executed:

#!/bin/bash
# Get the context around the SignaturesCollected call in service.go
sed -n '2580,2610p' internal/core/application/service.go

Length of output: 1279


🏁 Script executed:

#!/bin/bash
# Check if there's a Get() call before NoncesCollected to verify session exists
sed -n '2500,2530p' internal/core/application/service.go | grep -A5 -B5 "Get\|New"

Length of output: 851


🏁 Script executed:

#!/bin/bash
# Look for the entire select block for nonces
sed -n '2480,2550p' internal/core/application/service.go

Length of output: 2341


🏁 Script executed:

#!/bin/bash
# Look for the entire select block for sigs
sed -n '2550,2620p' internal/core/application/service.go

Length of output: 2294


Add nil checks to NoncesCollected and SignaturesCollected with explicit error logging

The nil channel issue is confirmed: when roundId is unknown, both methods return a nil channel from the map. In the Redis implementation, this causes the select statement to ignore the channel case and fire the timeout instead, resulting in a misleading "signing session timed out" error rather than an explicit "session not found" message. While the timeout protection prevents actual hangs, the confusing error message masks the real problem.

Call sites in service.go (lines 2486–2519 and 2486–2592) do call New(roundId, ...) before accessing these methods in normal flow, so the happy path is protected. However, the defensive approach suggested in the review is sound: adding explicit nil checks and logging improves clarity and error diagnostics.

Additionally, note that the in-memory implementation (internal/infrastructure/live-store/inmemory/tree_signing_session.go lines 119–125) lacks locks entirely on these methods, creating a race condition when reading from the maps—a more serious issue than the nil return problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applicaton/service.go always call New(roundId) before getting the channels

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@louisinger, you're absolutely right that application/service.go always calls New(roundId) before accessing the channels in the current implementation, so the happy path is protected.

The suggestion for nil checks was more about defensive programming—making the code more resilient to future changes or unexpected call patterns, and providing clearer error messages if something goes wrong (e.g., "session not found" vs. a timeout). However, since you have control over the call flow and it's working correctly, the defensive checks are optional.

Feel free to keep the implementation as-is if you're confident about the call ordering! 👍

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)

191-212: Consider simplifying the signaling logic.

The nested selects and panic recovery add complexity:

  • The outer select (line 192) and inner select (line 205) both check ctx.Done(), which is redundant protection against a TOCTOU issue that the current lock pattern doesn't fully prevent anyway
  • The panic recovery (lines 197-204) guards against nil channel sends, but the nil check at line 190 already covers this

While this defensive coding isn't harmful, consider simplifying to improve maintainability:

if ch != nil {
    chOnce.Do(func() {
        select {
        case ch <- struct{}{}:
        case <-ctx.Done():
        }
    })
}
return
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02dc9a6 and 00d3664.

📒 Files selected for processing (1)
  • internal/infrastructure/live-store/redis/confirmation_sessions.go (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
  • ConfirmationSessionsStore (61-68)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build and Scan
  • GitHub Check: unit tests
  • GitHub Check: integration tests
🔇 Additional comments (4)
internal/infrastructure/live-store/redis/confirmation_sessions.go (4)

49-63: Constructor setup looks good.

The cancellable context and watcher initialization follow good practices for lifecycle management.


171-175: LGTM!

Proper use of read lock to safely return the channel.


184-185: Use context.Background() for Redis operations.

The Redis Get() calls use the watcher's cancellation context ctx. If Reset() cancels this context while a Get operation is in progress, the operation could fail prematurely with a context cancellation error, even though the Redis operation itself is still valid.

Redis operations should use a separate context (e.g., context.Background() or a context with timeout) rather than the watcher's lifecycle context.

This is addressed in the diff in the previous comment.

⛔ Skipped due to learnings
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

142-163: Line number mismatch in original review; however, the race condition is valid but likely benign.

The review references signaling logic at "lines 191-212" but the actual watchSessionCompletion spans lines 177-208. The concern itself is correct: the old watcher can pass the ctx.Done() check (line 181) and hold a stale reference to sessionCompleteCh (read at line 188 under lock, then released at line 189). Between lock release and the nested select at lines 205-207, Reset() can cancel the context and create a new channel. However, the old watcher's send to the old channel is likely safe because:

  1. Anyone calling SessionCompleted() before Reset() gets the old channel reference and should receive its completion signal
  2. New callers of SessionCompleted() after Reset() get the new channel
  3. The multiple ctx.Done() checks (lines 192, 206) provide layered protection

The vulnerability remains: if the old watcher hasn't exited by the time the new session processes confirmations, stale Redis state (from before reset) could cause the old watcher to emit a completion signal while the new session is active. Verify this scenario won't cause issues in your usage pattern—specifically, confirm that old channel listeners from before Reset() won't interfere with the new session's logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)

177-219: Watcher loop now uses pollInterval; consider minor context and error‑handling tweaks.

The reworked watchSessionCompletion correctly:

  • Uses a time.NewTicker(s.pollInterval) instead of busy-waiting,
  • Is cancelable via ctx.Done(),
  • Uses RWMutex to safely snapshot sessionCompleteCh,
  • Guards the notification with sync.Once and panic recovery.

A few potential improvements:

  1. Error handling on Redis reads.
    Both Get calls ignore errors:

    numIntents, _ := s.rdb.Get(ctx, confirmationNumIntentsKey).Int()
    numConfirmed, _ := s.rdb.Get(ctx, confirmationNumConfirmedKey).Int()

    This can mask Redis connectivity or serialization issues and silently stall completion detection. Even a lightweight if err != nil { log.Warnf(...) ; continue } would improve observability.

    # Check for other places where Redis errors are dropped in this package.
    rg -n "_, _ := s\.rdb\." internal/infrastructure/live-store/redis -C2
  2. Context choice for Redis Get calls.
    You’re currently using the watcher’s cancelable ctx for the Get operations. A previous review suggested using context.Background() for these reads so they aren’t aborted by watcher cancellation, relying only on the select around the loop for termination. Whether that’s preferable depends on your desired shutdown behavior (fast cancellation vs. completing the last read). Because this mirrors past feedback, treat it as an optional design choice rather than a bug.

    In go-redis/v9, what are the semantics of using a canceled context vs. context.Background()
    for simple GET operations in long-running polling loops? Are there recommended patterns
    for shutdown behavior in such watchers?
    
  3. Docs vs implementation for completion signaling.
    The header comment talks about “closing the local sessionCompleteCh channel when the session is complete”, but the implementation sends a single struct{}{} instead of closing. If multiple goroutines may wait on SessionCompleted(), closing is a broadcast whereas a single send only wakes one waiter. Either is valid, but it would be good to align the comment with the actual behavior or vice versa, depending on how call sites are written.

Overall, the loop changes themselves are a solid improvement over the previous busy-waiting implementation.
Based on learnings

🧹 Nitpick comments (2)
internal/infrastructure/live-store/redis/confirmation_sessions.go (2)

39-47: New fields look appropriate; ctx field currently unused outside Reset/New.

lock, cancel, and pollInterval make sense for coordinating the watcher and guarding sessionCompleteCh. The stored ctx field, however, is not read anywhere in this file; if it is not used by other packages via unsafe reflection/type assertions, you could drop it to reduce state surface area.


142-163: Race condition is confirmed, but callers have defensive timeout handling; refactoring suggestions remain valid but not critical.

The race condition you identified is real: if Reset() is called while watchSessionCompletion() is in the nested select (lines 208–212), the ctx.Done() case can win before the signal send, causing the completion to go unnoticed.

However, examination of the callers shows defensive handling: both the test (line 315–318) and the application (service.go ~2240) use select statements with timeout fallbacks. If the completion signal is missed, the timeout fires and execution continues—making this a timing inefficiency rather than a correctness failure.

The suggested refactors (delaying cancel(), using generation IDs in Redis) are sound improvements that eliminate the race window, but the current implementation is safe in practice due to these defensive timeouts.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00d3664 and abef48d.

📒 Files selected for processing (1)
  • internal/infrastructure/live-store/redis/confirmation_sessions.go (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.

Applied to files:

  • internal/infrastructure/live-store/redis/confirmation_sessions.go
🧬 Code graph analysis (1)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
  • ConfirmationSessionsStore (61-68)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration tests
  • GitHub Check: unit tests
  • GitHub Check: Build and Scan
🔇 Additional comments (2)
internal/infrastructure/live-store/redis/confirmation_sessions.go (2)

49-61: Constructor wiring and watcher startup look sound.

Creating a store-level cancellable context from context.Background() and starting watchSessionCompletion(ctx) here matches the intended decoupling from request lifetimes, and a 100ms default pollInterval is a reasonable compromise between latency and load. No functional issues spotted in this segment.


171-175: RWMutex around SessionCompleted() accessor is appropriate.

Using RLock/RUnlock to guard reads of sessionCompleteCh ensures that callers won’t observe a torn update while Reset() is replacing the channel under a write lock. The returned type is <-chan struct{}, so external code cannot close or send on it, which keeps ownership clear.

@louisinger louisinger changed the title fix tree-signing-session.go fix Redis LiveStore Nov 22, 2025
@altafan altafan merged commit 099487a into master Nov 24, 2025
5 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants