-
Notifications
You must be signed in to change notification settings - Fork 53
Update live store #836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update live store #836
Conversation
WalkthroughPropagates context.Context across live-store interfaces and implementations, changes many methods to return errors or (value, error), adds retry + WATCH/TxPipelined loops with retryDelay for Redis stores, threads ctx and explicit roundId through service orchestration, and updates tests to the ctx-aware API. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Service
participant LiveStore
participant Redis
Note over Client,Service: Old flow (no ctx)
Client->>Service: startRound()
Service->>LiveStore: Push(intent)
LiveStore-->>Service: success
Note over Client,Redis: New flow (ctx-aware + retries)
Client->>Service: startRound(ctx)
Service->>LiveStore: Push(ctx, intent)
alt In-memory store
LiveStore-->>Service: (nil) or error
else Redis-backed store
LiveStore->>Redis: WATCH keys / TxPipelined (with retries & retryDelay)
alt Success
Redis-->>LiveStore: OK
LiveStore-->>Service: nil
else Retries exhausted / error
Redis-->>LiveStore: error
LiveStore-->>Service: error
end
end
Service->>LiveStore: ViewAll(ctx, ids)
LiveStore-->>Service: ([]TimedIntent, nil) or (nil, error)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used🧠 Learnings (3)📚 Learning: 2025-08-28T08:21:01.170ZApplied to files:
📚 Learning: 2025-09-29T14:33:52.871ZApplied to files:
📚 Learning: 2025-09-29T14:33:52.871ZApplied to files:
🧬 Code graph analysis (1)internal/infrastructure/live-store/inmemory/forfeits.go (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
internal/infrastructure/live-store/inmemory/tree_signing_session.go (1)
88-93: Potential deadlock: sending on unbuffered channel while holding lock.
AddNoncessends onnonceCollectedCh(line 91) while holding the write lock. Since the channel is unbuffered, this blocks until a receiver is ready. If the receiver needs to call any method on this store (e.g.,Get), it will deadlock waiting for the lock.Consider using a buffered channel or sending outside the lock:
func (s *treeSigningSessionsStore) AddNonces( _ context.Context, roundId string, pubkey string, nonces tree.TreeNonces, ) error { s.lock.Lock() - defer s.lock.Unlock() // ... validation ... s.sessions[roundId].Nonces[pubkey] = nonces + shouldNotify := len(s.sessions[roundId].Nonces) == s.sessions[roundId].NbCosigners-1 + ch := s.nonceCollectedCh[roundId] + s.lock.Unlock() + - if len(s.sessions[roundId].Nonces) == s.sessions[roundId].NbCosigners-1 { - s.nonceCollectedCh[roundId] <- struct{}{} + if shouldNotify { + ch <- struct{}{} } return nil }The same pattern applies to
AddSignaturesat lines 113-117.internal/infrastructure/live-store/redis/round.go (1)
36-66: Potential stale-read issue in Upsert.The
Getcall at Line 39 is performed outside theWatchtransaction, and the update functionfn(round)is computed at Line 46 before entering the retry loop. If another process modifies the round betweenGetandWatch, the update will be based on stale data. TheWatchonly protects the write, not the read-modify-write cycle.Consider moving the
Getandfninvocation inside theWatchcallback to ensure atomicity:func (s *currentRoundStore) Upsert( ctx context.Context, fn func(m *domain.Round) *domain.Round, ) error { - round, err := s.Get(ctx) - if err != nil { - return err - } - if round == nil { - round = &domain.Round{} - } - updated := fn(round) - val, err := json.Marshal(updated) - if err != nil { - return err - } - for range s.numOfRetries { if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error { + data, err := tx.Get(ctx, currentRoundKey).Bytes() + var round *domain.Round + if err != nil && !errors.Is(err, redis.Nil) { + return err + } + if errors.Is(err, redis.Nil) || len(data) == 0 { + round = &domain.Round{} + } else { + // unmarshal round from data... + } + updated := fn(round) + val, err := json.Marshal(updated) + if err != nil { + return err + } _, err = tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error { pipe.Set(ctx, currentRoundKey, val, 0) return nil }) - return err - }); err == nil { + }, currentRoundKey); err == nil { return nil } time.Sleep(s.retryDelay) }internal/infrastructure/live-store/redis/intents.go (1)
167-172: Unchecked Redis errors may cause silent failures.The
SRemandSAddcalls here don't check for errors. If Redis fails mid-operation, intents could be deleted from the KV store but not removed from the ID set, or vtxos could fail to be added to the removal list.Consider checking these errors or wrapping the operations in a transaction for atomicity.
- s.rdb.SRem(ctx, intentStoreIdsKey, intent.Id) + if err := s.rdb.SRem(ctx, intentStoreIdsKey, intent.Id).Err(); err != nil { + return nil, fmt.Errorf("failed to remove intent id %s from set: %v", intent.Id, err) + } } if len(vtxosToRemove) > 0 { - s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, vtxosToRemove) + if err := s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, vtxosToRemove...).Err(); err != nil { + return nil, fmt.Errorf("failed to add vtxos to remove list: %v", err) + } }
🧹 Nitpick comments (12)
internal/infrastructure/live-store/inmemory/offchain_txs.go (1)
31-36: Silently ignoring PSBT parse errors may cause inconsistent input tracking.If
psbt.NewFromRawBytesfails for a malformed checkpoint transaction, the inputs from that transaction won't be added to theinputsmap. This could lead to inconsistent state where some inputs are tracked and others are not.Consider propagating the error or at least logging the failure:
for _, tx := range offchainTx.CheckpointTxs { - ptx, _ := psbt.NewFromRawBytes(strings.NewReader(tx), true) + ptx, err := psbt.NewFromRawBytes(strings.NewReader(tx), true) + if err != nil { + return fmt.Errorf("failed to parse checkpoint tx: %w", err) + } for _, in := range ptx.UnsignedTx.TxIn { m.inputs[in.PreviousOutPoint.String()] = struct{}{} } }The same pattern exists in
Removeat line 50.internal/infrastructure/live-store/redis/offchain_txs.go (2)
49-66: Retry loop ignores context cancellation.The retry loop sleeps unconditionally without checking if the context has been cancelled. This means the operation will continue retrying even after the caller has given up.
Consider checking context before retrying:
for range s.numOfRetries { if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error { // ... pipeline logic ... }); err == nil { return nil } + select { + case <-ctx.Done(): + return fmt.Errorf("context cancelled while adding offchain tx %s: %w", offchainTx.ArkTxid, ctx.Err()) + default: + } time.Sleep(s.retryDelay) }The same pattern applies to
Removeat lines 94-108.
36-47: PSBT parse error silently ignored inAdd.Same issue as the in-memory implementation: if
psbt.NewFromRawBytesfails, the inputs from that checkpoint transaction won't be tracked. This could lead to theIncludescheck returning incorrect results.for _, tx := range offchainTx.CheckpointTxs { - ptx, _ := psbt.NewFromRawBytes(strings.NewReader(tx), true) + ptx, err := psbt.NewFromRawBytes(strings.NewReader(tx), true) + if err != nil { + return fmt.Errorf("failed to parse checkpoint tx: %w", err) + } for _, in := range ptx.UnsignedTx.TxIn { inputs = append(inputs, in.PreviousOutPoint.String()) } }internal/infrastructure/live-store/redis/boarding_inputs.go (2)
97-120: Variable shadowing:keyreused in loop.The loop variable
keyon line 106 shadows the outerkeyvariable from line 100. Consider renaming tofieldfor clarity.- for key, value := range values { + for field, value := range values { rawSig := &sigsDTO{} sig, err := rawSig.deserialize([]byte(value)) if err != nil { - return nil, fmt.Errorf("malformed signatures for input %s in storage: %v", key, err) + return nil, fmt.Errorf("malformed signatures for input %s in storage: %v", field, err) } - inIndex, err := strconv.Atoi(key) + inIndex, err := strconv.Atoi(field)
122-125: Consider adding retry logic for consistency.
DeleteSignaturesdoesn't use retry logic unlikeSetandAddSignatures. WhileDelis simpler, adding retry logic would make the error handling consistent across write operations.internal/infrastructure/live-store/redis/round.go (1)
163-164: Unknown event types are silently skipped.When an unknown
eventTypeis encountered, the code silently continues without logging or returning an error. While this may be intentional for forward compatibility, it could mask data corruption or version mismatches. Consider adding a warning log to aid debugging.default: + log.Warnf("unknown event type %d in round storage, skipping", eventType) continue }internal/infrastructure/live-store/redis/tree_signing_session.go (1)
278-279: Silently ignoring Redis errors in watch goroutine.Errors from
HGetAllare discarded. While transient errors should allow the loop to continue, persistent failures will cause silent infinite polling. Consider logging the error.- noncesMap, _ := s.rdb.HGetAll(ctx, noncesKey).Result() + noncesMap, err := s.rdb.HGetAll(ctx, noncesKey).Result() + if err != nil { + log.Warnf("watchNoncesCollected: failed to get nonces: %v", err) + continue + }The same pattern applies at Line 329 for
sigsMap.internal/core/application/service.go (2)
2275-2289: Inconsistent error handling on cache cleanup failures.The cleanup operations in
startRoundlog warnings/errors but continue execution. While this is reasonable for cleanup operations, the error messages referenceexistingRound.Idwhich could be empty or nil if this is the first round.if err := s.cache.ForfeitTxs().Reset(ctx); err != nil { - log.WithError(err).Warnf( - "failed to delete forfeit txs from cache for round %s", existingRound.Id, - ) + if existingRound != nil { + log.WithError(err).Warnf( + "failed to delete forfeit txs from cache for round %s", existingRound.Id, + ) + } else { + log.WithError(err).Warn("failed to reset forfeit txs from cache") + } }Similar patterns apply to the subsequent cleanup calls.
3183-3188: Address the TODO comment.There's a TODO about testing broadcast before updating storage. This ordering could lead to inconsistent state if the broadcast succeeds but subsequent storage updates fail. Consider implementing the suggested pattern or creating a tracking issue.
Do you want me to open a new issue to track this improvement?
internal/infrastructure/live-store/redis/intents.go (2)
271-292: Inconsistent error handling in Delete method.The method returns errors from
Getbut ignores errors fromSRem(lines 282, 289) and only logs errors fromDelete(line 286). This inconsistency could leave orphaned entries in the vtxos set if Redis operations fail.If the intent is idempotent deletion, consider at least logging the
SRemfailures.for _, vtxo := range intent.Inputs { - s.rdb.SRem(ctx, intentStoreVtxosKey, vtxo.Outpoint.String()) + if err := s.rdb.SRem(ctx, intentStoreVtxosKey, vtxo.Outpoint.String()).Err(); err != nil { + log.Warnf("delete: failed to remove vtxo %s from set: %v", vtxo.Outpoint.String(), err) + } } if err := s.intents.Delete(ctx, id); err != nil { log.Warnf("delete:failed to delete intent %s: %v", id, err) } - s.rdb.SRem(ctx, intentStoreIdsKey, id) + if err := s.rdb.SRem(ctx, intentStoreIdsKey, id).Err(); err != nil { + log.Warnf("delete: failed to remove intent id %s from set: %v", id, err) + }
347-357: Consider returning error from IncludesAny.The method logs errors but cannot propagate them to callers due to the interface signature. Callers cannot distinguish between "not found" and "Redis error". If this distinction matters, consider updating the interface to return an error.
internal/infrastructure/live-store/live_store_test.go (1)
433-550: Minor style inconsistency in context usage.The test uses
ctx := t.Context()for early operations (lines 434-448) but then usest.Context()directly for later operations (lines 515+). This works correctly but is slightly inconsistent. Consider using thectxvariable throughout for consistency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
internal/core/application/admin.go(2 hunks)internal/core/application/ban.go(1 hunks)internal/core/application/service.go(59 hunks)internal/core/ports/live_store.go(2 hunks)internal/infrastructure/live-store/inmemory/boarding_inputs.go(1 hunks)internal/infrastructure/live-store/inmemory/confirmation_session.go(4 hunks)internal/infrastructure/live-store/inmemory/forfeits.go(5 hunks)internal/infrastructure/live-store/inmemory/intents.go(9 hunks)internal/infrastructure/live-store/inmemory/offchain_txs.go(3 hunks)internal/infrastructure/live-store/inmemory/round.go(1 hunks)internal/infrastructure/live-store/inmemory/tree_signing_session.go(3 hunks)internal/infrastructure/live-store/live_store_test.go(8 hunks)internal/infrastructure/live-store/redis/boarding_inputs.go(1 hunks)internal/infrastructure/live-store/redis/confirmation_sessions.go(4 hunks)internal/infrastructure/live-store/redis/forfeits.go(8 hunks)internal/infrastructure/live-store/redis/intents.go(10 hunks)internal/infrastructure/live-store/redis/offchain_txs.go(2 hunks)internal/infrastructure/live-store/redis/round.go(6 hunks)internal/infrastructure/live-store/redis/store.go(1 hunks)internal/infrastructure/live-store/redis/tree_signing_session.go(5 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
Applied to files:
internal/infrastructure/live-store/redis/confirmation_sessions.gointernal/infrastructure/live-store/inmemory/confirmation_session.gointernal/infrastructure/live-store/inmemory/forfeits.gointernal/core/application/service.gointernal/infrastructure/live-store/live_store_test.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern. A common bug is manually base64-decoding the string and then passing b64=true, which causes a double-decode error.
Applied to files:
internal/infrastructure/live-store/redis/round.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern.
Applied to files:
internal/infrastructure/live-store/redis/round.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.
Applied to files:
internal/core/application/service.go
🧬 Code graph analysis (14)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (3)
internal/core/ports/tx_builder.go (1)
SignedBoardingInput(54-57)internal/infrastructure/live-store/redis/boarding_inputs.go (1)
NewBoardingInputsStore(23-29)internal/core/ports/live_store.go (1)
BoardingInputsStore(82-90)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)
internal/infrastructure/live-store/redis/boarding_inputs.go (3)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (1)
NewBoardingInputsStore(17-22)internal/core/ports/live_store.go (1)
BoardingInputsStore(82-90)internal/core/ports/tx_builder.go (1)
SignedBoardingInput(54-57)
internal/infrastructure/live-store/redis/offchain_txs.go (2)
internal/infrastructure/live-store/inmemory/offchain_txs.go (1)
NewOffChainTxStore(19-24)internal/core/ports/live_store.go (1)
OffChainTxStore(49-54)
internal/infrastructure/live-store/inmemory/confirmation_session.go (1)
internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)
internal/infrastructure/live-store/redis/store.go (4)
internal/infrastructure/live-store/redis/forfeits.go (1)
NewForfeitTxsStore(29-38)internal/infrastructure/live-store/redis/offchain_txs.go (1)
NewOffChainTxStore(28-34)internal/infrastructure/live-store/redis/round.go (1)
NewCurrentRoundStore(28-34)internal/infrastructure/live-store/redis/tree_signing_session.go (1)
NewTreeSigningSessionsStore(38-50)
internal/infrastructure/live-store/redis/tree_signing_session.go (3)
internal/infrastructure/live-store/inmemory/tree_signing_session.go (1)
NewTreeSigningSessionsStore(19-26)internal/core/ports/live_store.go (2)
TreeSigningSessionsStore(70-80)MusigSigningSession(104-110)pkg/ark-lib/tree/musig2.go (2)
TreeNonces(32-32)TreePartialSigs(98-98)
internal/infrastructure/live-store/inmemory/forfeits.go (5)
pkg/ark-lib/tree/tx_tree.go (1)
FlatTxTree(36-36)internal/infrastructure/db/postgres/sqlc/queries/models.go (2)
Intent(33-38)Tx(178-185)internal/infrastructure/db/sqlite/sqlc/queries/models.go (2)
Intent(31-36)Tx(165-172)internal/core/domain/vtxo.go (1)
Outpoint(15-18)internal/core/ports/tx_builder.go (1)
ValidForfeitTx(49-52)
internal/infrastructure/live-store/inmemory/tree_signing_session.go (2)
internal/core/ports/live_store.go (1)
MusigSigningSession(104-110)pkg/ark-lib/tree/musig2.go (2)
TreeNonces(32-32)TreePartialSigs(98-98)
internal/infrastructure/live-store/redis/round.go (3)
internal/infrastructure/live-store/inmemory/round.go (1)
NewCurrentRoundStore(16-18)internal/core/ports/live_store.go (1)
CurrentRoundStore(56-59)internal/core/domain/round_event.go (3)
RoundStarted(17-20)RoundFinalizationStarted(22-30)IntentsRegistered(45-48)
internal/infrastructure/live-store/inmemory/intents.go (2)
internal/core/ports/tx_builder.go (1)
BoardingInput(44-47)internal/core/ports/live_store.go (1)
TimedIntent(92-97)
internal/infrastructure/live-store/redis/intents.go (3)
internal/infrastructure/live-store/inmemory/intents.go (1)
NewIntentStore(22-33)internal/core/ports/live_store.go (2)
IntentStore(22-36)TimedIntent(92-97)internal/infrastructure/live-store/redis/kv.go (1)
NewRedisKVStore(18-20)
internal/core/application/service.go (3)
internal/core/domain/vtxo.go (1)
Outpoint(15-18)internal/core/domain/round.go (2)
Round(41-59)ForfeitTx(36-39)internal/core/domain/round_event.go (2)
RoundFailed(39-43)RoundEvent(9-12)
internal/infrastructure/live-store/live_store_test.go (6)
internal/infrastructure/db/postgres/sqlc/queries/models.go (1)
Intent(33-38)internal/infrastructure/db/sqlite/sqlc/queries/models.go (1)
Intent(31-36)pkg/errors/errors.go (1)
Error(39-46)internal/core/application/types.go (2)
Outpoint(177-177)VOut(131-131)internal/core/domain/vtxo.go (1)
Outpoint(15-18)internal/infrastructure/wallet/wallet_client.go (1)
New(31-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: integration tests
- GitHub Check: Build and Scan
- GitHub Check: unit tests
🔇 Additional comments (47)
internal/infrastructure/live-store/inmemory/offchain_txs.go (1)
59-68: LGTM!The
Getmethod correctly returns(nil, nil)for not-found cases, aligning with the interface contract. Thread-safety is properly maintained with the read lock.internal/infrastructure/live-store/inmemory/forfeits.go (1)
29-87: LGTM!The
Initmethod properly validates inputs, handles edge cases (no connectors, vtxos > connectors), and maintains thread-safety with the mutex. The context-aware signature aligns with the interface contract.internal/core/application/ban.go (1)
150-161: Consider returning early ifGetUnsignedInputsfails.When
GetUnsignedInputsreturns an error, the code logs a warning but continues with a potentially nil/emptyunsignedVtxoKeys. While this graceful degradation prevents crashes, it means the ban logic silently fails to identify users who should be banned.If banning is a critical security measure, consider returning early on error:
unsignedVtxoKeys, err := s.cache.ForfeitTxs().GetUnsignedInputs(ctx) if err != nil { log.WithError(err).Warn("failed to get unsigned inputs from cache") + return }Alternatively, if the current behavior is intentional (graceful degradation), add a comment explaining the rationale.
internal/core/application/admin.go (1)
314-375: LGTM!The context propagation to
ViewAll,DeleteAll, andDeletecalls correctly aligns with the updated live-store interface signatures. Error handling remains intact.internal/infrastructure/live-store/inmemory/tree_signing_session.go (1)
54-69: LGTM!The
Deletemethod correctly closes channels before removing entries from maps, preventing goroutine leaks. The cleanup order is appropriate.internal/infrastructure/live-store/redis/offchain_txs.go (1)
114-137: LGTM!The
GetandIncludesmethods properly handleredis.Nilfor not-found cases and return appropriate error messages for other failures. The return semantics align with the interface contract.internal/infrastructure/live-store/inmemory/round.go (2)
20-27: LGTM!Context propagation correctly added to
Upsert. The unused context parameter is appropriately ignored for the in-memory implementation.
28-32: LGTM!
Getnow returns an error as the second value, consistent with the interface changes.internal/infrastructure/live-store/redis/store.go (1)
21-22: LGTM!Constructor calls correctly updated to pass
numOfRetriestoNewForfeitTxsStore,NewOffChainTxStore, andNewTreeSigningSessionsStore, enabling consistent retry behavior across all Redis-backed stores.Also applies to: 25-25
internal/infrastructure/live-store/inmemory/confirmation_session.go (5)
28-41: LGTM!
Initcorrectly updated with context parameter and error return value.
43-67: LGTM!
Confirmproperly accepts context. The existing logic for confirmation tracking and session completion signaling remains intact.
69-78: LGTM!
Getnow returns the tuple(*ports.ConfirmationSessions, error)consistent with the interface changes.
80-89: LGTM!
Resetcorrectly updated with context and error return.
91-95: LGTM!
Initializednow accepts context parameter.internal/infrastructure/live-store/redis/boarding_inputs.go (5)
17-29: LGTM!Constructor follows the established pattern for Redis-backed stores with retry configuration.
31-48: LGTM!
Setcorrectly implements retry logic with Watch/TxPipelined pattern.
50-56: Consider handlingredis.Nilfor uninitialized state.When the key doesn't exist, this returns
-1with theredis.Nilerror. Callers need to distinguish between "not set" and actual Redis errors. Consider returning0, nilfor missing keys if that represents a valid uninitialized state, or document the expected error handling.
145-164: LGTM!
newSigsDTOcorrectly converts domain types to DTO with hex encoding for cryptographic fields.
170-212: LGTM!Deserialization properly reconstructs PSBT structures with appropriate type conversions for
SigHashTypeandTapscriptLeafVersion.internal/infrastructure/live-store/redis/confirmation_sessions.go (6)
46-47: LGTM!Added
retryDelayfield for configurable retry timing, consistent with other Redis stores.Also applies to: 59-59
65-92: LGTM!
Initcorrectly implements retry logic with proper transaction handling. The atomic delete-then-set pattern ensures clean initialization.
131-155: LGTM!
Getproperly retrieves all confirmation session state and constructs the response struct.
157-191: LGTM!
Resetcorrectly handles cleanup with retry logic and properly restarts the watch goroutine with a new context. Based on learnings, usingcontext.Background()for the watch goroutine is the correct pattern to decouple from request lifetime.
193-196: LGTM!
Initializednow accepts context for consistency with the interface.
204-246: LGTM!
watchSessionCompletionimplementation is well-designed with proper panic recovery and context cancellation handling.internal/infrastructure/live-store/redis/tree_signing_session.go (1)
82-88: LGTM!Using
context.Background()for the watch goroutines is correct. These goroutines need to outlive the request context that created the session, so decoupling them from the request lifetime is the right design. Based on learnings from PR 686.internal/infrastructure/live-store/inmemory/intents.go (1)
35-46: LGTM!The context-aware signatures with error returns align well with the Redis-backed store interface. Ignoring the context parameter is appropriate for in-memory implementations.
internal/infrastructure/live-store/redis/forfeits.go (1)
195-224: LGTM!The
Popmethod correctly reads all data before callingReset, and properly propagates the error if reset fails. The caller can retry if needed.internal/core/application/service.go (1)
2598-2614: LGTM!Good error recovery pattern. When the round cannot be retrieved from cache, the code creates a
RoundFailedevent with proper metadata, saves it, and starts a new round. This ensures the system can recover from cache failures gracefully.internal/infrastructure/live-store/redis/intents.go (7)
22-36: LGTM on struct and constructor changes.The addition of
retryDelayas a configurable field with a sensible default of 10ms is appropriate for retry backoff logic.
38-56: LGTM on Len method.Proper context propagation and error handling. Returning
-1on error is a reasonable sentinel value.
58-123: LGTM on Push method with retry logic.The use of
s.rdb.SIsMember(nottx.SIsMember) inside the WATCH callback is intentional and correct—reads happen against the client while the WATCH mechanism detects conflicts, and writes are pipelined atomically.
187-193: LGTM on GetSelectedIntents.Clean implementation with proper error propagation.
228-269: LGTM on Update method.Proper validation and error handling. The input/output sum validation ensures data integrity.
294-321: LGTM on DeleteAll with retry logic.The retry mechanism for the transactional set deletions is well-implemented. The individual KV deletions being outside the transaction is acceptable since they're keyed by ID and orphaned entries would be harmless.
323-345: LGTM on DeleteVtxos.Well-structured retry logic with proper transactional cleanup.
internal/core/ports/live_store.go (4)
22-36: LGTM on IntentStore interface updates.The context and error additions are consistent and well-designed. The
IncludesAnymethod returning(bool, string)without an error is a design choice that prioritizes simplicity for callers, though it means Redis errors are logged but not propagated.
38-54: LGTM on ForfeitTxsStore and OffChainTxStore interfaces.All methods properly accept context and return errors where appropriate.
56-90: LGTM on remaining store interfaces.The
Initializedmethod returning justboolwithout an error is acceptable for a simple state check. Context propagation is consistent across all methods.
92-116: LGTM on data structures.
TimedIntent,MusigSigningSession, andConfirmationSessionsstructs are unchanged and appropriate for the updated interfaces.internal/infrastructure/live-store/live_store_test.go (7)
76-98: LGTM on test setup.Good pattern of testing both in-memory and Redis implementations with the same test suite, ensuring consistent behavior across implementations.
100-205: LGTM on IntentStore tests.Comprehensive coverage of all IntentStore methods with proper context propagation and error assertions.
207-252: LGTM on ForfeitTxsStore tests.Good coverage of the forfeit transaction lifecycle with proper context and error handling.
254-294: LGTM on OffChainTxStore tests.Good coverage including edge cases like getting non-existent entries and removing non-existent entries.
296-308: LGTM on CurrentRoundStore tests.Properly tests the basic round upsert and retrieval flow.
310-360: LGTM on ConfirmationSessionsStore tests.Good async testing pattern with proper timeout handling and context propagation through the confirmation flow.
362-431: LGTM on TreeSigningSessionsStore tests.Comprehensive testing of the signing session lifecycle. The use of
context.Background()in the inner goroutine (line 388) is acceptable—it ensures the async operations complete independently of the test context's lifecycle.
internal/infrastructure/live-store/redis/tree_signing_session.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
internal/core/application/service.go (2)
3344-3362: Error fetching selected intents still prevents RoundFailed propagationOn
domain.RoundFailedyou still abort event propagation when reading selected intents fails:case domain.RoundFailed: intents, err := s.cache.Intents().GetSelectedIntents(ctx) if err != nil { log.WithError(err).Warn("failed to get selected intents") return } ... lastEvent = RoundFailed{ev, topics}If the cache is temporarily unavailable, clients never see the RoundFailed event, which is more harmful than omitting topic details.
You can keep logging the error but fall back to an empty topic list:
- intents, err := s.cache.Intents().GetSelectedIntents(ctx) - if err != nil { - log.WithError(err).Warn("failed to get selected intents") - return - } - - topics := make([]string, 0, len(intents)) + intents, err := s.cache.Intents().GetSelectedIntents(ctx) + if err != nil { + log.WithError(err).Warn("failed to get selected intents, propagating RoundFailed without topics") + } + + topics := make([]string, 0, len(intents)) for _, intent := range intents { ...This preserves the critical failure signal while degrading gracefully on cache issues.
3330-3334: Connector index cache error still aborts RoundFinalizationStarted event propagationOn RoundFinalizationStarted you still return early if
GetConnectorsIndexesfails:connectorsIndex, err := s.cache.ForfeitTxs().GetConnectorsIndexes(ctx) if err != nil { log.WithError(err).Warn("failed to get connectors index") return }This prevents connector tree events (and thus parts of round finalization info) from reaching clients, even though the main round events are available.
Consider logging and proceeding with an empty index to keep event propagation best‑effort:
- connectorsIndex, err := s.cache.ForfeitTxs().GetConnectorsIndexes(ctx) - if err != nil { - log.WithError(err).Warn("failed to get connectors index") - return - } + connectorsIndex, err := s.cache.ForfeitTxs().GetConnectorsIndexes(ctx) + if err != nil { + log.WithError(err).Warn("failed to get connectors index, propagating without connector topics") + connectorsIndex = make(map[string]domain.Outpoint) + }This matches the earlier review suggestion and avoids silently dropping RoundFinalizationStarted propagation when Redis is flaky.
🧹 Nitpick comments (2)
internal/core/application/service.go (2)
2271-2292: Guard logging onexistingRoundto avoid relying on non‑nil contractHere you read
existingRound.Idunconditionally for logging:existingRound, err := s.cache.CurrentRound().Get(ctx) ... if err := s.cache.ForfeitTxs().Reset(ctx); err != nil { log.WithError(err).Warnf( "failed to delete forfeit txs from cache for round %s", existingRound.Id, ) } ... if err := s.cache.Intents().DeleteVtxos(ctx); err != nil { log.WithError(err).Warnf( "failed to delete spent vtxos from cache after round %s", existingRound.Id, ) } ... if err := s.cache.ConfirmationSessions().Reset(ctx); err != nil { log.WithError(err).Errorf( "failed to reset confirmation session from cache for round %s", existingRound.Id, ) }This assumes
CurrentRound().Getwill never return(*domain.Round)(nil)whenerr == nil. If that contract is ever broken (e.g., a “not found” mapped to(nil, nil)), these logs will panic even though you’re just trying to reset caches.Safer pattern:
existingRound, err := s.cache.CurrentRound().Get(ctx) ... - if err := s.cache.ForfeitTxs().Reset(ctx); err != nil { - log.WithError(err).Warnf( - "failed to delete forfeit txs from cache for round %s", existingRound.Id, - ) - } + roundID := "<none>" + if existingRound != nil { + roundID = existingRound.Id + } + + if err := s.cache.ForfeitTxs().Reset(ctx); err != nil { + log.WithError(err).Warnf( + "failed to delete forfeit txs from cache for round %s", roundID, + ) + } ... - if err := s.cache.Intents().DeleteVtxos(ctx); err != nil { + if err := s.cache.Intents().DeleteVtxos(ctx); err != nil { log.WithError(err).Warnf( - "failed to delete spent vtxos from cache after round %s", existingRound.Id, + "failed to delete spent vtxos from cache after round %s", roundID, ) } if err := s.cache.ConfirmationSessions().Reset(ctx); err != nil { log.WithError(err).Errorf( - "failed to reset confirmation session from cache for round %s", existingRound.Id, + "failed to reset confirmation session from cache for round %s", roundID, ) }This keeps behavior the same when a round exists, but protects against future changes in the cache implementation.
3454-3479: Consider using service context instead ofcontext.Background()in forfeits/boarding watcher
checkForfeitsAndBoardingSigsSentcurrently hard‑codescontext.Background():func (s *service) checkForfeitsAndBoardingSigsSent(commitmentTxid string) { ctx := context.Background() ... sigs, err := s.cache.BoardingInputs().GetSignatures(ctx, commitmentTxid) ... numOfBoardingInputs, err := s.cache.BoardingInputs().Get(ctx) ... allForfeitTxsSigned, err := s.cache.ForfeitTxs().AllSigned(ctx)Since this function is tied to a specific round and is invoked from
SubmitForfeitTxs/SignCommitmentTx, it arguably should respect the service lifecycle. Usings.ctx(or receiving a ctx parameter) would allowStop()to cancel any long‑running Redis calls rather than leaving them uncancellable.Not a correctness bug, but you may want to align with the rest of the round pipeline which already gates on
s.ctx.Done().Based on learnings, this is about service‑lifetime cancellation, not request‑lifetime, so it shouldn’t conflict with the intentional
context.Background()usage inreactToFraud.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/core/application/service.go(59 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
Applied to files:
internal/core/application/service.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.
Applied to files:
internal/core/application/service.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build and Scan
- GitHub Check: integration tests
- GitHub Check: unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
internal/core/application/service.go (1)
2552-2587: Inconsistent error handling pattern may cause unexpected behavior.Several error paths in
startConfirmationcallround.Fail()but don't setroundAborted = true, while the defer usesroundAbortedto decide whether to proceed to finalization or restart:Lines 2553-2556, 2560, 2566, 2585 call
round.Fail()without settingroundAborted = true. This means:
- The defer will save the failed round events (correct)
- Then check
round.IsFailed()and callstartRound()(lines 2399-2402)- But it will also potentially call
startFinalization()if the failed check doesn't catch itCompare this to other error paths (lines 2409-2412, 2426-2429, 2434-2437) which set
roundAborted = trueand return, causing the defer to callstartRound()immediately without checkinground.IsFailed().The inconsistency creates two different error-handling flows. Consider standardizing: either always set
roundAborted = truewhen callinground.Fail(), or remove theroundAbortedvariable and rely solely onround.IsFailed().internal/infrastructure/live-store/redis/tree_signing_session.go (1)
52-91:Watchis called without specifying keys to watch.The
Newmethod (and similarlyDelete,AddNonces,AddSignatures) usesWatchwithout passing any keys. This pattern appears throughout the Redis stores in this PR. Without watched keys, the WATCH/MULTI/EXEC pattern doesn't provide the intended optimistic locking — it will always succeed on the first attempt (assuming no errors), making the retry loop effectively unused for concurrency control.- if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error { + if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error { _, err := tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error { pipe.HSet(ctx, metaKey, meta) return nil }) return err - }); err == nil { + }, metaKey); err == nil {internal/infrastructure/live-store/redis/forfeits.go (1)
48-88: Potential out-of-bounds when buildingconnIndex.In
Init, this loop can index past the end ofvtxosToSign:if len(vtxosToSign) > len(connectorsOutpoints) { return fmt.Errorf("more vtxos to sign than outpoints, %d > %d", ...) } for i, connectorOutpoint := range connectorsOutpoints { connIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint }If
len(connectorsOutpoints) > len(vtxosToSign), the check passes but the loop still runs over all connectors and will panic oncei >= len(vtxosToSign).Limit the loop to the number of VTXOs (or loop over
vtxosToSignand indexconnectorsOutpoints) to avoid panics:- for i, connectorOutpoint := range connectorsOutpoints { - connIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint - } + for i, vtxo := range vtxosToSign { + connectorOutpoint := connectorsOutpoints[i] + connIndex[connectorOutpoint.String()] = vtxo.Outpoint + }This keeps the existing invariant check and ensures a safe one-to-one mapping.
internal/infrastructure/live-store/redis/intents.go (1)
125-185: Fix nil dereference and SAdd semantics in Pop (lines 138-145, 171).Two critical issues confirmed in the
Popfunction:
Nil dereference at line 143:
s.intents.Get()returns(nil, nil)when the entry is not found (viaredis.Nil). The current error check only guards against non-nil errors, leaving a gap whereintentcan be nil when accessed at line 143:intent, err := s.intents.Get(ctx, id) if err != nil { return nil, fmt.Errorf("failed to get intent %s: %v", id, err) } if len(intent.Receivers) > 0 { // panic if intent == nilAdd a nil check:
if intent != nil && len(intent.Receivers) > 0 {SAdd passes slice as single member at line 171: Calling
s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, vtxosToRemove)without...expansion passes the entire[]stringas oneinterface{}argument instead of expanding each element. Convert to[]interface{}and expand:if len(vtxosToRemove) > 0 { members := make([]interface{}, len(vtxosToRemove)) for i, v := range vtxosToRemove { members[i] = v } s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, members...) }
♻️ Duplicate comments (5)
internal/core/application/service.go (2)
3334-3338: [Duplicate] Error in connector index fetch silently aborts event propagation.If
GetConnectorsIndexesfails, the function returns early without propagating any events. This leaves clients without notification of the round finalization.As noted in previous review, consider logging the error and continuing with an empty connector topics map, or propagating a partial event:
connectorsIndex, err := s.cache.ForfeitTxs().GetConnectorsIndexes(ctx) if err != nil { log.WithError(err).Warn("failed to get connectors index") - return + connectorsIndex = make(map[string]domain.Outpoint) }
3348-3352: [Duplicate] Error fetching selected intents prevents RoundFailed propagation.If
GetSelectedIntentsfails, theRoundFailedevent is not propagated to clients. This is critical because clients need to know the round failed to handle their pending intents appropriately.As noted in previous review, consider propagating the failure event with empty topics rather than silently returning:
case domain.RoundFailed: intents, err := s.cache.Intents().GetSelectedIntents(ctx) if err != nil { log.WithError(err).Warn("failed to get selected intents") - return + intents = []ports.TimedIntent{} }internal/infrastructure/live-store/inmemory/boarding_inputs.go (1)
38-54:batchIdparameter is still ignored across all signature methods.The
batchIdparameter is ignored inAddSignatures,GetSignatures, andDeleteSignatures. This was previously flagged — signatures from different batches will be mixed together, which may cause incorrect behavior if multiple batches are processed.Also applies to: 64-68
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
94-130: Race condition:numConfirmedis still read outside the transaction.The
numConfirmedvalue is read at line 111, but used inside the Watch callback at line 119. If another process confirms an intent between the read and the transaction, the count will be incorrect. TheWatchcall at line 116 should also watchconfirmationNumConfirmedKey, and the read should be inside the callback.internal/infrastructure/live-store/redis/forfeits.go (1)
176-193: Reset retry delay issue from earlier review is resolved.
Resetnow uses the samefor range s.numOfRetries+Watch+TxPipelinedpattern as other methods and includestime.Sleep(s.retryDelay)between attempts, preventing tight-loop retries.
🧹 Nitpick comments (10)
internal/infrastructure/live-store/redis/round.go (2)
52-64: Minor inefficiency: unnecessary sleep after final retry.The retry loop sleeps after every attempt, including the final failed one. The last sleep is wasteful since the function returns immediately afterward.
Consider tracking the iteration to skip the sleep after the last attempt:
- for range s.numOfRetries { + for i := range s.numOfRetries { if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error { _, err = tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error { pipe.Set(ctx, currentRoundKey, val, 0) return nil }) return err }); err == nil { return nil } - time.Sleep(s.retryDelay) + if i < s.numOfRetries-1 { + time.Sleep(s.retryDelay) + } }
163-165: Consider whether silently skipping unknown event types is intended.The
defaultcase at line 164 usescontinueto silently skip unknown event types. This provides forward compatibility (older code can ignore new event types), but it could also hide bugs if an expected event type is missing due to a typo or version mismatch.If this silent skip is intentional for backward compatibility, consider adding a log statement or comment documenting the design decision. If not, consider returning an error for truly unknown types.
internal/core/application/service.go (1)
3213-3218: Verify VtxoTree is not nil before accessing.After
round.EndFinalization()completes, the code accessesround.VtxoTree.Leaves()andlen(round.VtxoTree)without checking ifVtxoTreeis nil. WhileEndFinalizationlikely populates this field, if the round had no vtxo tree (e.g., only boarding inputs), this could panic.Consider adding a nil check for defensive programming:
+ totalOutputVtxos := 0 + numOfTreeNodes := 0 + if round.VtxoTree != nil { + totalOutputVtxos = len(round.VtxoTree.Leaves()) + numOfTreeNodes = len(round.VtxoTree) + } - totalOutputVtxos := len(round.VtxoTree.Leaves()) - numOfTreeNodes := len(round.VtxoTree)internal/infrastructure/live-store/redis/boarding_inputs.go (2)
50-56: Consider handlingredis.Nilcase explicitly inGet.When the key doesn't exist,
Getreturns-1with aredis.Nilerror. Callers may need to distinguish between "not initialized" (key doesn't exist) and actual errors. The in-memory version returns0by default when not set.func (b *boardingInputsStore) Get(ctx context.Context) (int, error) { num, err := b.rdb.Get(ctx, boardingInputsKey).Int() if err != nil { + if errors.Is(err, redis.Nil) { + return 0, nil + } return -1, err } return num, nil }You'll need to add
"errors"to the imports.
122-125: Consider adding retry logic for consistency.
DeleteSignaturesdoesn't use retry logic, unlike other write operations (Set,AddSignatures). While delete operations are generally idempotent, this creates inconsistency. If transient Redis errors are a concern for other operations, they should be for this one too.internal/infrastructure/live-store/inmemory/forfeits.go (1)
126-132: Error fromResetis silently ignored.The
// nolintcomment indicates that the error return fromReset(ctx)is intentionally ignored. WhileResetcurrently always returnsnilin the in-memory implementation, this pattern could mask issues if the implementation changes or if this pattern is copied to other stores.Consider logging or handling the error:
defer func() { m.lock.Unlock() - // nolint - m.Reset(ctx) + if err := m.Reset(ctx); err != nil { + // Log but don't fail - best effort cleanup + log.Warnf("failed to reset forfeit store after pop: %v", err) + } }()internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
132-156:Getreads multiple keys non-atomically.The
Getmethod readsconfirmationIntentsKey,confirmationNumIntentsKey, andconfirmationNumConfirmedKeyin separate Redis calls. If another process modifies the state between reads, the returnedConfirmationSessionscould be inconsistent (e.g.,NumConfirmedIntentsdoesn't match the count of confirmed entries inIntentsHashes).Consider using a Lua script or transaction for atomic reads if consistency is important.
internal/infrastructure/live-store/redis/tree_signing_session.go (1)
93-153:Getperforms multiple non-atomic Redis reads.The method reads
metaKey,noncesKey, andsigsKeyin separateHGetAllcalls. If another process modifies the session between reads, the returnedMusigSigningSessioncould be inconsistent.For read operations, consider using a Lua script or accepting eventual consistency if that's acceptable for this use case.
internal/infrastructure/live-store/redis/forfeits.go (1)
196-225: Pop semantics and error handling look good.
Pop:
- Validates that each stored forfeit has JSON and that connectors aren’t reused.
- Collects the signed txs and then calls
Reset(ctx)to clear state, propagating any reset error.Only (optional) improvement would be to wrap the initial
HGetAllerror with context (e.g.,"failed to get forfeit txs: %v"), but behavior is functionally correct.internal/infrastructure/live-store/redis/intents.go (1)
58-123: Retry logic should differentiate transient TxFailedErr from deterministic validation errors.The suggestion to separate retry-worthy transient failures from application logic errors is sound. In go-redis/v9, the sentinel TxFailedErr is returned when EXEC returns no results because WATCHed keys were modified, and it's recommended to treat it as a special, expected/retryable condition rather than an application logic error.
Currently, the code retries on any error without differentiation, which means deterministic validation errors (like "duplicated intent %s" or "duplicated input...") waste retry iterations. Instead, use errors.Is(err, redis.TxFailedErr) to detect the transaction failure and retry, while returning other errors normally.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
internal/core/application/service.go(59 hunks)internal/infrastructure/live-store/inmemory/boarding_inputs.go(1 hunks)internal/infrastructure/live-store/inmemory/forfeits.go(5 hunks)internal/infrastructure/live-store/inmemory/round.go(1 hunks)internal/infrastructure/live-store/redis/boarding_inputs.go(1 hunks)internal/infrastructure/live-store/redis/confirmation_sessions.go(4 hunks)internal/infrastructure/live-store/redis/forfeits.go(8 hunks)internal/infrastructure/live-store/redis/intents.go(10 hunks)internal/infrastructure/live-store/redis/round.go(6 hunks)internal/infrastructure/live-store/redis/tree_signing_session.go(5 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern. A common bug is manually base64-decoding the string and then passing b64=true, which causes a double-decode error.
Applied to files:
internal/infrastructure/live-store/redis/round.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern.
Applied to files:
internal/infrastructure/live-store/redis/round.go
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
Applied to files:
internal/infrastructure/live-store/redis/confirmation_sessions.gointernal/infrastructure/live-store/redis/forfeits.gointernal/infrastructure/live-store/inmemory/forfeits.gointernal/core/application/service.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.
Applied to files:
internal/core/application/service.go
🧬 Code graph analysis (9)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (3)
internal/core/ports/tx_builder.go (1)
SignedBoardingInput(54-57)internal/infrastructure/live-store/redis/boarding_inputs.go (1)
NewBoardingInputsStore(23-29)internal/core/ports/live_store.go (1)
BoardingInputsStore(82-90)
internal/infrastructure/live-store/redis/round.go (3)
internal/infrastructure/live-store/inmemory/round.go (1)
NewCurrentRoundStore(16-18)internal/core/ports/live_store.go (1)
CurrentRoundStore(56-59)internal/core/domain/round_event.go (3)
RoundStarted(17-20)RoundFinalizationStarted(22-30)IntentsRegistered(45-48)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)
internal/infrastructure/live-store/redis/boarding_inputs.go (3)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (1)
NewBoardingInputsStore(18-23)internal/core/ports/live_store.go (1)
BoardingInputsStore(82-90)internal/core/ports/tx_builder.go (1)
SignedBoardingInput(54-57)
internal/infrastructure/live-store/redis/forfeits.go (5)
internal/core/ports/tx_builder.go (2)
TxBuilder(59-88)ValidForfeitTx(49-52)internal/infrastructure/live-store/inmemory/forfeits.go (1)
NewForfeitTxsStore(22-27)internal/core/ports/live_store.go (1)
ForfeitTxsStore(38-47)pkg/ark-lib/tree/tx_tree.go (1)
FlatTxTree(36-36)internal/core/domain/vtxo.go (2)
Vtxo(38-53)Outpoint(15-18)
internal/infrastructure/live-store/inmemory/forfeits.go (5)
pkg/ark-lib/tree/tx_tree.go (1)
FlatTxTree(36-36)internal/infrastructure/db/postgres/sqlc/queries/models.go (2)
Intent(33-38)Tx(178-185)internal/infrastructure/db/sqlite/sqlc/queries/models.go (2)
Intent(31-36)Tx(165-172)internal/core/domain/vtxo.go (1)
Outpoint(15-18)internal/core/ports/tx_builder.go (1)
ValidForfeitTx(49-52)
internal/infrastructure/live-store/redis/tree_signing_session.go (2)
internal/core/ports/live_store.go (2)
TreeSigningSessionsStore(70-80)MusigSigningSession(104-110)pkg/ark-lib/tree/musig2.go (2)
TreeNonces(32-32)TreePartialSigs(98-98)
internal/core/application/service.go (4)
internal/core/domain/vtxo.go (1)
Outpoint(15-18)internal/core/domain/round.go (1)
Round(41-59)internal/core/domain/round_event.go (2)
RoundFailed(39-43)RoundEvent(9-12)pkg/ark-lib/tree/tx_tree.go (1)
TxTree(19-22)
internal/infrastructure/live-store/redis/intents.go (3)
internal/core/ports/live_store.go (2)
IntentStore(22-36)TimedIntent(92-97)internal/infrastructure/live-store/redis/kv.go (1)
NewRedisKVStore(18-20)internal/core/ports/tx_builder.go (1)
BoardingInput(44-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit tests
- GitHub Check: integration tests
- GitHub Check: Build and Scan
🔇 Additional comments (28)
internal/infrastructure/live-store/redis/round.go (2)
28-34: LGTM! Clean initialization with sensible defaults.The constructor properly initializes all fields including the new
retryDelayset to 10ms, which is appropriate for Redis retry operations.
84-84: Verify that error messages don't expose sensitive data.Error messages throughout this function include raw JSON data from Redis (e.g.,
string(data),string(raw)). If the round or event data contains sensitive information, these detailed error messages could leak it into logs or error tracking systems.Review the data model to confirm whether rounds and events may contain sensitive information. If so, consider sanitizing or truncating the raw data in error messages, or use a debug flag to control verbosity.
Also applies to: 91-95, 124-126, 132-135, 141-143, 149-151, 157-160
internal/infrastructure/live-store/inmemory/round.go (2)
20-27: LGTM: Proper locking and context-aware signature.The method correctly uses write locks and follows the new context-aware interface. Passing a potentially nil
s.roundto the callback function is appropriate for initialization scenarios.
28-32: Verify nil handling at Get() call sites.The files in
internal/infrastructure/live-store/cannot be located in the current repository state, preventing automated verification of whether callers properly check for nil round values returned by theGet()method. Please manually verify that all call sites ofcurrentRoundStore.Get()check for nil returns before dereferencing the round value to avoid potential panics.internal/core/application/service.go (1)
2270-2312: Cache cleanup operations appropriately handle errors.The cache reset operations at round start (
ForfeitTxs().Reset,Intents().DeleteVtxos,ConfirmationSessions().Reset, etc.) log errors as warnings rather than failing the round start. This is appropriate since:
- These are cleanup operations for the previous round
- Failing here would prevent new rounds from starting
- Worst case is stale data in cache, which the new round operations will overwrite
The pattern of logging errors but continuing is correct for this cleanup phase.
internal/infrastructure/live-store/inmemory/boarding_inputs.go (2)
56-62: Good fix: Usingmaps.Cloneto prevent external mutation.The use of
maps.Clone(b.sigs)correctly addresses the previous concern about returning internal map references that could be mutated by callers after the lock is released.
25-36: LGTM: Thread-safe Set/Get implementation.The
SetandGetmethods correctly use write and read locks respectively for thread-safe access tonumOfInputs.internal/infrastructure/live-store/redis/boarding_inputs.go (1)
145-164: LGTM: Clean DTO conversion.The
newSigsDTOfunction correctly converts the domain type to the DTO representation with proper hex encoding of binary fields.internal/infrastructure/live-store/inmemory/forfeits.go (2)
152-163: Good fix: Added proper locking toAllSigned.The race condition flagged in the previous review has been addressed by adding
RLock/RUnlockaround the iteration overforfeitTxs.
29-87: LGTM: Robust initialization with proper validation.The
Initmethod properly validates the connector tree against vtxos count and correctly builds the connectors index with appropriate locking.internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
205-247: LGTM: Robust polling implementation with proper safeguards.The
watchSessionCompletiongoroutine correctly usessync.Onceto ensure the channel is signaled only once, includes panic recovery, and properly handles context cancellation. The polling approach is well-documented in the file header comments.internal/infrastructure/live-store/redis/tree_signing_session.go (2)
127-127: Good fix: Typo corrected.The previous typo "cosgienr" has been corrected to "cosigner" in the error messages at lines 127 and 142.
Also applies to: 142-142
257-306: LGTM: Well-structured polling with proper safeguards.The
watchNoncesCollectedgoroutine (and similarlywatchSigsCollected) correctly:
- Uses a ticker for controlled polling
- Handles context cancellation
- Includes panic recovery for channel operations
- Properly signals only once when the condition is met
internal/infrastructure/live-store/redis/forfeits.go (7)
15-20: Struct and constructor changes look consistent with retry pattern.Adding
builder,numOfRetries, and a smallretryDelayplus wiring them inNewForfeitTxsStoreis straightforward and matches the retry usage in the methods below. No issues here.Also applies to: 29-37
40-47: Init transaction and retry logic are otherwise sound.Beyond the indexing issue, the marshalling, basic shape validation (non-empty leaves, vtxo vs connector count), and the
Watch+TxPipelined+ bounded retry with a small sleep look reasonable for this store.Also applies to: 57-61, 89-112
114-174: Sign path correctly validates inputs and uses transactional updates.The
Signmethod now:
- Short-circuits on empty
txsand on missingbuilder.- Rehydrates VTXOs, connectors, and connector index with clear error messages on corruption.
- Delegates verification to
TxBuilder.VerifyForfeitTxsand writes results viaWatch+TxPipelinedwith bounded retries.This looks correct and consistent with the Init/Reset patterns.
227-247: AllSigned correctly distinguishes “not all signed” from storage errors.The method:
- Returns
(false, err)on Redis/unmarshal failures.- Returns
(false, nil)when any entry is missing JSON or has an emptyTx.- Returns
(true, nil)only when all entries successfully unmarshal and have non-emptyTx.This matches the expected semantics.
249-275: GetUnsignedInputs is robust against malformed and partially populated entries.The logic to:
- Treat malformed keys or JSON as an error for the caller.
- Treat missing/empty JSON or empty
Txas “unsigned” and include the outpoint.gives reasonable resilience to partial writes or decode issues.
277-283: Len implementation is straightforward.Using
HLenon the txs hash and returning(-1, err)on failure is clear enough, assuming callers always check the error before trusting the count.
285-299: GetConnectorsIndexes API and error messages look fine.The method unmarshals the stored connector index map and returns descriptive errors when retrieval or JSON decode fails. The previous typo (“fet”) in this error message is also fixed in this version.
internal/infrastructure/live-store/redis/intents.go (8)
22-27: Struct and constructor changes align with the retry/backoff pattern.Adding
retryDelayand wiring it inNewIntentStorematches the other Redis-backed stores and keeps retry behavior centralized on the store.Also applies to: 29-35
38-56: Len correctly filters out empty/invalid intents.Counting only non-nil intents with at least one receiver and returning
(-1, err)on Redis/KV failures is clear and matches the intended API.
187-193: GetSelectedIntents now propagates list errors with a clear message.Wrapping
ListRangeerrors as"failed to get selected intents: %v"while otherwise returning the slice directly is appropriate.
195-226: ViewAll behavior and error handling are consistent.The method:
- Uses
GetMultiwith nil-checks when specific IDs are provided.- Falls back to scanning all IDs via Redis, then
GetMultiover that set.- Wraps both Redis and KV errors with context.
This matches the LiveStore interface expectations.
228-269: Update enforces input/output balance and handles missing intents correctly.
Update:
- Fails fast if the intent doesn’t exist.
- Enforces
sum(inputs + boardingInputs) == sum(outputs)before mutating.- Optionally updates cosigner pubkeys, then persists via
Setwith a contextual error message.This is a solid invariant check and update flow.
271-291: Delete gracefully handles missing intents and cleans input vtxos.
Delete:
- Fails with context if the underlying KV
Geterrors.- Skips IDs whose intents are already nil.
- Removes each input outpoint from
intentStoreVtxosKeyand logs (but does not surface) errors fromintents.Delete, which is acceptable for a best-effort cleanup.Implementation is reasonable as-is.
294-321: DeleteAll correctly combines best-effort KV deletes with transactional key cleanup.The pattern of:
- Iterating all IDs and tolerating individual KV delete failures (with warnings).
- Using
Watch+TxPipelined+ bounded retries to atomically clear all Redis keys for this store.looks consistent with other stores and gives a clean reset behavior.
347-356: IncludesAny behavior (log on error, return false) is acceptable for a best-effort probe.The method checks membership per outpoint, short-circuiting on the first positive hit and logging (but not surfacing) Redis errors. That’s reasonable for a “does this conflict with anything?” helper.
| session, err := s.cache.ConfirmationSessions().Get(ctx) | ||
| if err != nil { | ||
| log.WithError(err).Error("failed to get confirmation session from cache") | ||
| round.Fail(errors.INTERNAL_ERROR.New( | ||
| "failed to get confirmation session from cache: %s", err, | ||
| )) | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Past review concern appears resolved.
The previous review flagged that if s.cache.ConfirmationSessions().Get(ctx) returns an error, session could be nil and dereferencing session.IntentsHashes would panic.
The updated code now handles this:
- On error, it calls
round.Fail()with an error message - Returns early, preventing the nil dereference
- Sets the error in the round which will be persisted
However, note that roundAborted is not set to true, which means the defer will attempt to save events and potentially call startFinalization instead of startRound. This is inconsistent with other error paths in this function.
Apply this diff to ensure consistent error handling:
case <-time.After(roundTiming.confirmationDuration()):
session, err := s.cache.ConfirmationSessions().Get(ctx)
if err != nil {
+ roundAborted = true
log.WithError(err).Error("failed to get confirmation session from cache")
round.Fail(errors.INTERNAL_ERROR.New(
"failed to get confirmation session from cache: %s", err,
))
return
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| session, err := s.cache.ConfirmationSessions().Get(ctx) | |
| if err != nil { | |
| log.WithError(err).Error("failed to get confirmation session from cache") | |
| round.Fail(errors.INTERNAL_ERROR.New( | |
| "failed to get confirmation session from cache: %s", err, | |
| )) | |
| return | |
| } | |
| session, err := s.cache.ConfirmationSessions().Get(ctx) | |
| if err != nil { | |
| roundAborted = true | |
| log.WithError(err).Error("failed to get confirmation session from cache") | |
| round.Fail(errors.INTERNAL_ERROR.New( | |
| "failed to get confirmation session from cache: %s", err, | |
| )) | |
| return | |
| } |
🤖 Prompt for AI Agents
internal/core/application/service.go around lines 2514 to 2521: when
s.cache.ConfirmationSessions().Get(ctx) returns an error the code logs, calls
round.Fail and returns, but does not set roundAborted=true which is inconsistent
with other error paths and causes the defer to attempt normal finalization; set
roundAborted = true before returning on this error so the deferred cleanup knows
the round was aborted and will follow the same abort path as other errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/infrastructure/live-store/redis/forfeits.go (1)
67-92: Fix out-of-bounds panic in connector index construction.The out-of-bounds issue is real and critical. However, the retry loop concern is unfounded: Go 1.22+ supports
for range intsyntax, and the codebase targets Go 1.25.3, so those loops compile correctly.Connector index loop bug (lines 88–90 in Init, similar pattern in Sign and Reset):
You only guard
len(vtxosToSign) <= len(connectorsOutpoints), but then iterate over allconnectorsOutpoints:for i, connectorOutpoint := range connectorsOutpoints { connIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint }If there are more connector outpoints than vtxos,
vtxosToSign[i]will panic wheni >= len(vtxosToSign).Loop over
vtxosToSigninstead:-for i, connectorOutpoint := range connectorsOutpoints { - connIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint -} +for i, vtxo := range vtxosToSign { + connectorOutpoint := connectorsOutpoints[i] + connIndex[connectorOutpoint.String()] = vtxo.Outpoint +}This respects the guard and prevents out-of-bounds access.
♻️ Duplicate comments (2)
internal/infrastructure/live-store/redis/intents.go (1)
136-185: Handle nil intents in Pop and fix remaining SAdd slice bug for vtxosToRemove.Two issues in
Pop/DeleteVtxos:
- Possible nil dereference in
Pop:intent, err := s.intents.Get(ctx, id) if err != nil { return nil, fmt.Errorf("failed to get intent %s: %v", id, err) } if len(intent.Receivers) > 0 { // intent may be nil here intentsByTime = append(intentsByTime, *intent) }If the KV store returns
(nil, nil)for a missing intent, this will panic. Add a nil check:intent, err := s.intents.Get(ctx, id) if err != nil { return nil, fmt.Errorf("failed to get intent %s: %v", id, err) } - - if len(intent.Receivers) > 0 { + if intent != nil && len(intent.Receivers) > 0 { intentsByTime = append(intentsByTime, *intent) }
SAddstill called with[]stringinstead of variadic members (previously flagged):if len(vtxosToRemove) > 0 { s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, vtxosToRemove) }Passing a
[]stringwithout conversion/...sends the slice as a single member; Redis will not see individual vtxo strings, andDeleteVtxoswill later remove the wrong thing. A prior review already called out this exact issue forPopandDeleteVtxos; you’ve fixedDeleteVtxos(now converts to[]interface{}and usesmembers...), butPopstill has the old pattern.Mirror the
DeleteVtxosfix here:- if len(vtxosToRemove) > 0 { - s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, vtxosToRemove) - } + if len(vtxosToRemove) > 0 { + members := make([]interface{}, len(vtxosToRemove)) + for i, v := range vtxosToRemove { + members[i] = v + } + if err := s.rdb.SAdd(ctx, intentStoreVtxosToRemoveKey, members...).Err(); err != nil { + return nil, fmt.Errorf("failed to enqueue vtxos to remove: %v", err) + } + }With this, the
vtxosToRemoveset andDeleteVtxos’s SRem logic will be consistent and actually clear VTXOs fromintentStoreVtxosKey.Also applies to: 326-352
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
113-134: MovenumConfirmedread inside the WATCH transaction to avoid lost updates.
numConfirmedis read once before the retry loop:numConfirmed, err := s.rdb.Get(ctx, confirmationNumConfirmedKey).Int()and then reused inside the
Watchcallback without re-reading under the WATCHed transaction. If another process confirms a different intent between this read and a successful transaction, its increment can be lost (the later transaction will overwriteconfirmationNumConfirmedKeywithstaleNumConfirmed+1).This is the same race that was already pointed out in a previous review. You can fix it by reading
numConfirmedinside the WATCH callback:func (s *confirmationSessionsStore) Confirm(ctx context.Context, intentId string) error { // ... existing confirmed check ... - numConfirmed, err := s.rdb.Get(ctx, confirmationNumConfirmedKey).Int() - if err != nil && !errors.Is(err, redis.Nil) { - return fmt.Errorf("failed to get number of confirmed intents: %v", err) - } - keys := []string{confirmationIntentsKey, confirmationNumConfirmedKey} - for range s.numOfRetries { - if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error { - _, err := tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error { - pipe.HSet(ctx, confirmationIntentsKey, hashKey, 1) - pipe.Set(ctx, confirmationNumConfirmedKey, numConfirmed+1, 0) - return nil - }) - return err - }, keys...); err == nil { - return nil - } - time.Sleep(s.retryDelay) - } + for i := 0; i < s.numOfRetries; i++ { + if err = s.rdb.Watch(ctx, func(tx *redis.Tx) error { + numConfirmed, err := tx.Get(ctx, confirmationNumConfirmedKey).Int() + if err != nil && !errors.Is(err, redis.Nil) { + return fmt.Errorf("failed to get number of confirmed intents: %v", err) + } + _, err = tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error { + pipe.HSet(ctx, confirmationIntentsKey, hashKey, 1) + pipe.Set(ctx, confirmationNumConfirmedKey, numConfirmed+1, 0) + return nil + }) + return err + }, keys...); err == nil { + return nil + } + if i < s.numOfRetries-1 { + time.Sleep(s.retryDelay) + } + }This way each confirm observes the latest count under WATCH and increments it without losing concurrent confirmations.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
internal/infrastructure/live-store/redis/confirmation_sessions.go(4 hunks)internal/infrastructure/live-store/redis/forfeits.go(8 hunks)internal/infrastructure/live-store/redis/intents.go(10 hunks)internal/infrastructure/live-store/redis/offchain_txs.go(2 hunks)internal/infrastructure/live-store/redis/round.go(5 hunks)internal/infrastructure/live-store/redis/tree_signing_session.go(5 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
Applied to files:
internal/infrastructure/live-store/redis/confirmation_sessions.gointernal/infrastructure/live-store/redis/forfeits.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern. A common bug is manually base64-decoding the string and then passing b64=true, which causes a double-decode error.
Applied to files:
internal/infrastructure/live-store/redis/round.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern.
Applied to files:
internal/infrastructure/live-store/redis/round.go
🧬 Code graph analysis (6)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)
internal/infrastructure/live-store/redis/round.go (3)
internal/infrastructure/live-store/inmemory/round.go (1)
NewCurrentRoundStore(16-18)internal/core/ports/live_store.go (1)
CurrentRoundStore(56-59)internal/core/domain/round_event.go (3)
RoundStarted(17-20)RoundFinalizationStarted(22-30)IntentsRegistered(45-48)
internal/infrastructure/live-store/redis/tree_signing_session.go (3)
internal/infrastructure/live-store/inmemory/tree_signing_session.go (1)
NewTreeSigningSessionsStore(19-26)internal/core/ports/live_store.go (2)
TreeSigningSessionsStore(70-80)MusigSigningSession(104-110)pkg/ark-lib/tree/musig2.go (2)
TreeNonces(32-32)TreePartialSigs(98-98)
internal/infrastructure/live-store/redis/offchain_txs.go (2)
internal/infrastructure/live-store/inmemory/offchain_txs.go (1)
NewOffChainTxStore(19-24)internal/core/ports/live_store.go (1)
OffChainTxStore(49-54)
internal/infrastructure/live-store/redis/intents.go (3)
internal/infrastructure/live-store/inmemory/intents.go (1)
NewIntentStore(22-33)internal/core/ports/live_store.go (2)
IntentStore(22-36)TimedIntent(92-97)internal/infrastructure/live-store/redis/kv.go (1)
NewRedisKVStore(18-20)
internal/infrastructure/live-store/redis/forfeits.go (5)
internal/core/ports/tx_builder.go (2)
TxBuilder(59-88)ValidForfeitTx(49-52)internal/infrastructure/live-store/inmemory/forfeits.go (1)
NewForfeitTxsStore(22-27)internal/core/ports/live_store.go (1)
ForfeitTxsStore(38-47)pkg/ark-lib/tree/tx_tree.go (1)
FlatTxTree(36-36)internal/core/domain/vtxo.go (2)
Vtxo(38-53)Outpoint(15-18)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build and Scan
- GitHub Check: integration tests
- GitHub Check: unit tests
🔇 Additional comments (4)
internal/infrastructure/live-store/redis/round.go (1)
36-66: The review comment's critical claim is incorrect.The code uses
for range s.numOfRetrieswherenumOfRetriesis anint. The review states this "won't compile," but this is false. Go 1.22 introducedfor range Nsyntax for iterating over integers—the codebase uses Go 1.25.3, which fully supports this. The code will compile without errors.The syntax is valid and idiomatic modern Go. No changes are required to fix a compilation error.
The secondary point about CAS semantics (moving the read under WATCH) remains a valid design suggestion if strict compare-and-set semantics are needed, but that is a separate design consideration, not a compiler issue.
Likely an incorrect or invalid review comment.
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
71-96: The review comment is incorrect; the code will compile with Go 1.22+.Go 1.22 introduced the ability to range over integers directly. The syntax
for range s.numOfRetries(wherenumOfRetriesis anint) is valid and iterates from 0 tonumOfRetries-1. Since the codebase targets Go 1.25+, this feature is fully supported and the code will compile without errors.Likely an incorrect or invalid review comment.
internal/infrastructure/live-store/redis/tree_signing_session.go (1)
65-80: ****The review assumes Go < 1.22, but this project targets Go 1.25.3 (specified in
go.modand CI workflows). Go 1.22 and later supportfor rangeover integers, makingfor range s.numOfRetriesvalid syntax that will compile without errors. The code at lines 66, 164, 206, and 230 is syntactically correct as written.Likely an incorrect or invalid review comment.
internal/infrastructure/live-store/redis/intents.go (1)
62-123: The review comment is incorrect.Go 1.22 introduced support for ranging over integers, enabling syntax like
for range 5orfor range numOfRetries. The project targets Go 1.25.3, which is well after this feature was introduced, making the code valid.The code at lines 63, 308, and 332 correctly uses
for range s.numOfRetriesand compiles without error. The suggested fix to convert these to three-clause loops (for i := 0; i < s.numOfRetries; i++) is unnecessary.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/infrastructure/live-store/inmemory/forfeits.go (1)
73-82: Prevent out-of-range panic when more connector outpoints than vtxosToSignThe loop builds
connectorsIndexover allconnectorsOutpointsbut indexes intovtxosToSign[i]. The only guard islen(vtxosToSign) > len(connectorsOutpoints), so whenlen(connectorsOutpoints) > len(vtxosToSign)this will panic withindex out of range.You already guarantee
len(vtxosToSign) <= len(connectorsOutpoints), so just iterate up tolen(vtxosToSign):- for i, connectorOutpoint := range connectorsOutpoints { - connectorsIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint - } + for i, connectorOutpoint := range connectorsOutpoints[:len(vtxosToSign)] { + connectorsIndex[connectorOutpoint.String()] = vtxosToSign[i].Outpoint + }
♻️ Duplicate comments (1)
internal/infrastructure/live-store/inmemory/forfeits.go (1)
117-133: Fix deadlock betweenPopandResetand avoid calling a locking method while holding the lock
Popacquiresm.lockand defers a call tom.Reset(ctx), butResetalso acquiresm.lock. Sincesync.Mutex/RWMutexare not reentrant, this will deadlock. This is the same issue previously flagged in an earlier review.Refactor the reset logic into a private helper that assumes the lock is held, use it from both
ResetandPop, and avoid callingResetfrom inside the locked region:-func (m *forfeitTxsStore) Reset(_ context.Context) error { - m.lock.Lock() - defer m.lock.Unlock() - - m.forfeitTxs = make(map[domain.Outpoint]ports.ValidForfeitTx) - m.connectors = nil - m.connectorsIndex = nil - m.vtxos = nil - return nil -} -func (m *forfeitTxsStore) Pop(ctx context.Context) ([]string, error) { - m.lock.Lock() - defer func() { - m.lock.Unlock() - // nolint - m.Reset(ctx) - }() +func (m *forfeitTxsStore) resetLocked() { + m.forfeitTxs = make(map[domain.Outpoint]ports.ValidForfeitTx) + m.connectors = nil + m.connectorsIndex = nil + m.vtxos = nil +} + +func (m *forfeitTxsStore) Reset(_ context.Context) error { + m.lock.Lock() + defer m.lock.Unlock() + + m.resetLocked() + return nil +} + +func (m *forfeitTxsStore) Pop(_ context.Context) ([]string, error) { + m.lock.Lock() + defer m.lock.Unlock() @@ - return txs, nil + m.resetLocked() + return txs, nil }This keeps the reset atomic with the Pop operation, removes the need for
// nolint, and avoids the deadlock.
🧹 Nitpick comments (5)
internal/infrastructure/live-store/redis/boarding_inputs.go (4)
50-56: Consider handlingredis.Nilexplicitly for unset keys.When the key doesn't exist,
redis.Nilis returned as an error. Callers may need to distinguish between "key not set" and "actual error". Consider returning a sentinel value or wrapping the error for clarity.func (b *boardingInputsStore) Get(ctx context.Context) (int, error) { num, err := b.rdb.Get(ctx, boardingInputsKey).Int() if err != nil { + if err == redis.Nil { + return 0, nil // Key not set, return default + } return -1, err } return num, nil }
79-95: Wrap final error with context for consistency.Unlike the
Setmethod which wraps the error with a descriptive message,AddSignaturesreturns the raw error after exhausting retries. This makes debugging harder.} - return err + return fmt.Errorf( + "failed to add boarding input signatures after max number of retries: %v", err, + ) }
100-118: Variable shadowing:keyis reused in the loop.The outer
key(Redis key) is shadowed by the loop variablekey(hash field name). While the code works correctly, this is confusing and a code smell.func (b *boardingInputsStore) GetSignatures( ctx context.Context, batchId string, ) (map[uint32]ports.SignedBoardingInput, error) { key := fmt.Sprintf("%s:%s", boardingInputSigsKey, batchId) values, err := b.rdb.HGetAll(ctx, key).Result() if err != nil { return nil, err } m := make(map[uint32]ports.SignedBoardingInput) - for key, value := range values { + for field, value := range values { rawSig := &sigsDTO{} sig, err := rawSig.deserialize([]byte(value)) if err != nil { - return nil, fmt.Errorf("malformed signatures for input %s in storage: %v", key, err) + return nil, fmt.Errorf("malformed signatures for input %s in storage: %v", field, err) } - inIndex, err := strconv.Atoi(key) + inIndex, err := strconv.Atoi(field) if err != nil { return nil, err } m[uint32(inIndex)] = *sig } return m, nil }
170-173: Value receiver ondeserializeis unconventional.The method has a value receiver but immediately unmarshals into
&s. While this works, it's confusing. Consider using a pointer receiver or a standalone function for clarity.-func (s sigsDTO) deserialize(buf []byte) (*ports.SignedBoardingInput, error) { - if err := json.Unmarshal(buf, &s); err != nil { +func (s *sigsDTO) deserialize(buf []byte) (*ports.SignedBoardingInput, error) { + if err := json.Unmarshal(buf, s); err != nil { return nil, err }internal/infrastructure/live-store/inmemory/intents.go (1)
102-136: LGTM: Context and error return added.The method signature correctly implements the updated interface. The sorting and selection logic remains sound.
For very large intent pools, consider checking
ctx.Err()before/after the sort operation to allow early cancellation. However, since the operation holds a write lock, completing atomically may be the correct design choice.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
internal/infrastructure/live-store/inmemory/forfeits.go(5 hunks)internal/infrastructure/live-store/inmemory/intents.go(9 hunks)internal/infrastructure/live-store/redis/boarding_inputs.go(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
Applied to files:
internal/infrastructure/live-store/inmemory/forfeits.go
🧬 Code graph analysis (3)
internal/infrastructure/live-store/inmemory/forfeits.go (3)
pkg/ark-lib/tree/tx_tree.go (1)
FlatTxTree(36-36)internal/core/domain/vtxo.go (1)
Outpoint(15-18)internal/core/ports/tx_builder.go (1)
ValidForfeitTx(49-52)
internal/infrastructure/live-store/redis/boarding_inputs.go (3)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (1)
NewBoardingInputsStore(18-23)internal/core/ports/live_store.go (1)
BoardingInputsStore(82-90)internal/core/ports/tx_builder.go (1)
SignedBoardingInput(54-57)
internal/infrastructure/live-store/inmemory/intents.go (2)
internal/core/ports/tx_builder.go (1)
BoardingInput(44-47)internal/core/ports/live_store.go (1)
TimedIntent(92-97)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit tests
- GitHub Check: Build and Scan
- GitHub Check: integration tests
🔇 Additional comments (18)
internal/infrastructure/live-store/inmemory/forfeits.go (4)
153-163: Read locking inAllSignedcorrectly avoids racesAdding
RLock/RUnlockaround iteration ofm.forfeitTxsbringsAllSignedin line with the other readers and eliminates the prior data-race risk. The new(bool, error)signature also matches the updated interface.
166-177:GetUnsignedInputsctx+error surface and locking look consistentSignature change to
GetUnsignedInputs(_ context.Context) ([]domain.Outpoint, error)and the use ofRLockaround the map iteration are consistent with the rest of the store API. Returningnilerror keeps the in-memory implementation simple.
178-181:Lenfollows the same context/error and locking pattern
Len(_ context.Context) (int, error)guarded byRLockmatches the updated interface and is safe for concurrent access.
184-189: CloningconnectorsIndexavoids external mutation of internal stateSwitching
GetConnectorsIndexesto returnmaps.Clone(m.connectorsIndex)underRLockprevents callers from mutating the internal map after the lock is released, addressing the earlier race/corruption concern.internal/infrastructure/live-store/redis/boarding_inputs.go (4)
17-29: LGTM!The struct and constructor are well-defined with appropriate fields for retry-enabled Redis operations.
31-48: LGTM!The
Setmethod correctly implements optimistic locking withWatchon the appropriate key and proper retry logic.
83-85: VerifyHSetNXsemantics are intentional.
HSetNXonly sets the field if it doesn't already exist. If a signature for an input index already exists, it will silently be preserved rather than updated. If overwriting is expected behavior, useHSetinstead.
122-125: LGTM!The
DeleteSignaturesmethod is straightforward. RedisDELis atomic and doesn't require the retry/watch pattern used for updates.internal/infrastructure/live-store/inmemory/intents.go (10)
4-6: LGTM: Appropriate imports added.The
contextandslicesimports are correctly added to support the updated method signatures and the use ofslices.Clonefor safe cloning.
36-47: LGTM: Context and error return added.The method signature correctly implements the updated interface. Returning
nilerror is appropriate for this in-memory implementation.
49-100: LGTM: Context parameter added with preserved validation logic.The method correctly propagates context through the interface and maintains all existing duplicate detection logic for both regular inputs and boarding inputs.
138-142: LGTM: Defensive clone added for safety.The use of
slices.Cloneon line 141 is an excellent improvement that prevents external code from mutating the internal selected intents slice.
144-183: LGTM: Context added with preserved validation logic.The balance validation correctly accounts for all input sources (intent inputs + boarding inputs) and ensures they match receiver outputs. The logic is sound.
185-200: LGTM: Context and error return added.The deletion logic correctly removes intents and cleans up associated vtxos. The idempotent behavior (skipping non-existent IDs) is appropriate.
202-210: LGTM: Context and error return added.The complete state reset logic is correct and thread-safe under the write lock.
212-221: LGTM: Context and error return added.The vtxo cleanup logic correctly removes tracked vtxos and resets the removal queue.
223-241: LGTM: Context and error return added.The view logic correctly handles both filtered (by ID) and unfiltered queries, returning dereferenced copies of the timed intents.
243-254: LGTM: Context parameter added.The outpoint existence check correctly returns both a boolean flag and the matching outpoint string when found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/core/application/service.go (1)
2798-2813: Ignoring error fromGetcould cause nil pointer dereference.At lines 2798-2807, the error from
s.cache.TreeSigingSessions().Get(ctx, roundId)is ignored. If the Get operation fails,signingSessioncould be nil, and accessingsigningSession.Noncesat line 2801 or iterating oversigningSession.Noncesat line 2808 would panic.Apply this diff to handle the error:
case <-time.After(thirdOfRemainingDuration): - signingSession, _ := s.cache.TreeSigingSessions().Get(ctx, roundId) - round.Fail(errors.SIGNING_SESSION_TIMED_OUT.New( + signingSession, err := s.cache.TreeSigingSessions().Get(ctx, roundId) + if err != nil || signingSession == nil { + round.Fail(errors.INTERNAL_ERROR.New("failed to get signing session from cache")) + return + } + round.Fail(errors.SIGNING_SESSION_TIMED_OUT.New( "musig2 signing session timed out (nonce collection), collected %d/%d nonces", len(signingSession.Nonces), len(uniqueSignerPubkeys), ))
♻️ Duplicate comments (3)
internal/core/application/service.go (3)
2876-2911: Error ignored fromGetcould cause nil pointer dereference.At line 2877, the error from
s.cache.TreeSigingSessions().Get(ctx, roundId)is ignored. IfsigningSessionis nil, iterating oversigningSession.Signaturesat line 2880 would panic. This was flagged in a previous review.Apply this diff:
case <-s.cache.TreeSigingSessions().SignaturesCollected(roundId): - signingSession, _ := s.cache.TreeSigingSessions().Get(ctx, roundId) + signingSession, err := s.cache.TreeSigingSessions().Get(ctx, roundId) + if err != nil || signingSession == nil { + round.Fail(errors.INTERNAL_ERROR.New("failed to get signing session from cache")) + return + } cosignersToBan := make(map[string]domain.Crime)
3337-3355: Error handling still prevents event propagation.As flagged in previous reviews:
- Lines 3337-3341: If
GetConnectorsIndexesfails, returning early prevents event propagation- Lines 3351-3355: If
GetSelectedIntentsfails,RoundFailedevent is not propagatedThese were previously suggested to continue with empty values rather than returning early.
2514-2521:roundAbortedshould be set totrueon confirmation session retrieval failure.When
s.cache.ConfirmationSessions().Get(ctx)fails, the code callsround.Fail()and returns, butroundAbortedremainsfalse. This causes the defer to attempt saving events and checkround.IsFailed()instead of immediately restarting the round via theroundAbortedpath.This was flagged in a previous review but the fix appears incomplete.
Apply this diff:
case <-time.After(roundTiming.confirmationDuration()): session, err := s.cache.ConfirmationSessions().Get(ctx) if err != nil { + roundAborted = true log.WithError(err).Error("failed to get confirmation session from cache") round.Fail(errors.INTERNAL_ERROR.New( "failed to get confirmation session from cache: %s", err, )) return }
🧹 Nitpick comments (2)
internal/infrastructure/live-store/redis/boarding_inputs.go (2)
23-29: Consider making retry delay configurable.The retry delay is hardcoded to 10ms. While this appears consistent across the codebase, making it a constructor parameter would improve flexibility for different deployment scenarios or testing.
-func NewBoardingInputsStore(rdb *redis.Client, numOfRetries int) ports.BoardingInputsStore { +func NewBoardingInputsStore(rdb *redis.Client, numOfRetries int, retryDelay time.Duration) ports.BoardingInputsStore { return &boardingInputsStore{ rdb: rdb, numOfRetries: numOfRetries, - retryDelay: 10 * time.Millisecond, + retryDelay: retryDelay, } }
50-56: Consider returning zero value instead of -1 on error.Returning -1 as a sentinel value is unconventional in Go. The idiomatic approach is to return the zero value (0) and rely on the error to indicate failure.
func (b *boardingInputsStore) Get(ctx context.Context) (int, error) { num, err := b.rdb.Get(ctx, boardingInputsKey).Int() if err != nil { - return -1, err + return 0, err } return num, nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
internal/core/application/ban.go(1 hunks)internal/core/application/service.go(58 hunks)internal/infrastructure/live-store/redis/boarding_inputs.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
Applied to files:
internal/core/application/service.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.
Applied to files:
internal/core/application/service.go
🧬 Code graph analysis (2)
internal/infrastructure/live-store/redis/boarding_inputs.go (3)
internal/infrastructure/live-store/inmemory/boarding_inputs.go (1)
NewBoardingInputsStore(18-23)internal/core/ports/live_store.go (1)
BoardingInputsStore(82-90)internal/core/ports/tx_builder.go (1)
SignedBoardingInput(54-57)
internal/core/application/service.go (5)
internal/core/domain/vtxo.go (1)
Outpoint(15-18)internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)internal/core/domain/round.go (2)
Round(41-59)ForfeitTx(36-39)internal/core/application/service_event.go (1)
RoundFailed(61-64)internal/core/domain/round_event.go (2)
RoundFailed(39-43)RoundEvent(9-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit tests
- GitHub Check: integration tests
🔇 Additional comments (9)
internal/core/application/ban.go (1)
150-162: LGTM! Error handling properly added forGetUnsignedInputs.The context-aware call with proper error handling follows the established pattern. Logging the warning and returning early on failure is appropriate for this background operation.
internal/core/application/service.go (8)
516-532: LGTM! Context and error handling properly propagated for cache operations.The error handling pattern is consistent—logging the error for observability and returning a generic internal error to users. The control flow correctly separates the error case from the "already spent" case.
1107-1119: LGTM! Proper fallback pattern from cache to database.The error handling gracefully falls back to the database when the cache lookup fails or returns nil. This ensures resilience while maintaining the expected behavior.
1547-1557: Consistent error handling for cache operations.The pattern matches other usages in the codebase—log the error and return a generic internal error to the user.
1910-1930: LGTM! Context propagation with proper error handling.The early return pattern ensures
round.CommitmentTxidis only accessed when the round was successfully retrieved from the cache.
1932-1969: LGTM! Consistent cache access pattern.The error handling mirrors the pattern established in
SubmitForfeitTxs.
2960-2977: Good defensive error handling for cache failure.The pattern of creating a
RoundFailedevent directly when the cache fails to retrieve the round is a robust fallback that ensures the round lifecycle completes even under adverse conditions.
3460-3492: LGTM! Comprehensive error handling for forfeit and boarding signature checks.The early returns on errors are appropriate here—signaling completion when the state is unknown would be incorrect.
2193-2226: LGTM! Context propagation for intent cache operations.The existing error handling is preserved while adding context support.
| domain.RoundTopic, func(events []domain.Event) { | ||
| round := domain.NewRoundFromEvents(events) | ||
| go svc.propagateEvents(round) | ||
| go svc.propagateEvents(context.Background(), round) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since in this PR, beside other things, we are handling context shall we use one context on service level that would be used for cancellation so that everything is shutdown gracefully when service Stop(), which is also using context.Background, is invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll do in another pr
| // check if already spent by another offchain tx | ||
| if s.cache.OffchainTxs().Includes(vtxo.Outpoint) { | ||
| isSpent, err := s.cache.OffchainTxs().Includes(ctx, vtxo.Outpoint) | ||
| if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general comment for log handling introduced in this PR
if err != nil {
log.WithError(err).Errorf("failed to check spent status of inputs against tx in cache")
return nil, errors.INTERNAL_ERROR.New("something went wrong").
WithMetadata(map[string]any{"vtxos": vtxo.Outpoint.String()})
}
Issues:
1. Double handling: Logging is handling. Log+return handles the error twice
2. Lost context: Original error is discarded. "something went wrong" is not debuggable.
3. Error interceptor should check error and if internal hide it from users
I would recommend to return error and then handle logging + sanitization at the gRPC interceptor level:
- Log full error with context (once, at API boundary)
- Return sanitized message to client based on error type
This gives you: single log entry, full error chain for debugging, safe client messages.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally agree, we'll do in another pr or this will get too messy
| return session, nil | ||
| } | ||
| func (s *treeSigningSessionsStore) Delete(roundId string) { | ||
| func (s *treeSigningSessionsStore) Delete(_ context.Context, roundId string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goroutine channel pattern needs refactoring
changes to handling panics/timeout added in prev PR were not perfect imo.
The current pattern uses recover() to catch panics from writing to closed channels:
func() {
defer func() {
if r := recover(); r != nil {
log.Warnf("watchNoncesCollected:recovered from panic: %v", r)
}
}()
select {
case ch <- struct{}{}:
case <-ctx.Done():
}
}()
Problems:
1. recover() masks a real race condition instead of fixing it
2. Race exists because Delete() closes channels while watchers might be sending
3. Complex nested selects add unnecessary cognitive load
Recommended fix - Watcher Ownership Pattern:
1. Watchers own the channels - only they send and close via defer close(ch)
2. Delete() never closes channels - just cancels context
3. Channels passed as arguments to goroutines (not read from struct fields)
func (s *treeSigningSessionsStore) watchNoncesCollected(
ctx context.Context, roundId string, ch chan struct{},
) {
defer close(ch) // watcher owns the channel
ticker := time.NewTicker(s.pollInterval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return // defer close(ch) runs
case <-ticker.C:
if s.checkNoncesReady(ctx, roundId) {
select {
case ch <- struct{}{}:
case <-ctx.Done():
}
return // defer close(ch) runs
}
}
}
}
func (s *treeSigningSessionsStore) Delete(ctx context.Context, roundId string) error {
// ... redis cleanup ...
s.lock.Lock()
defer s.lock.Unlock()
if s.currentRoundID == roundId && s.cancelRound != nil {
s.cancelRound() // watchers exit via ctx.Done(), close their channels
s.cancelRound = nil
s.currentRoundID = ""
s.nonceCh = nil // just nil, don't close
s.sigsCh = nil
}
return nil
}
This eliminates the race entirely - no recover() needed.
This is applicable to watchSigsCollected, watchNoncesCollected, watchSessionCompletionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done for confirmation session, more tricky for tree signing session because we use a map of channels. It's ok anyway because the lock guarantees atomicity, we'll optimize in a later pr, so we can get rid of the recover
| Signatures: make(map[string]tree.TreePartialSigs), | ||
| } | ||
| s.sessions[roundId] = sess | ||
| s.sessions[roundId] = session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single-round model
Multiple rounds support in some tree_signing_session.go were added in prev PR as well
nonceChs map[string]chan struct{}
sigsChs map[string]chan struct{}
currently we agreed we have only one round and there are no parallel rounds for processing, also one server no cluster and we should simplify and be consistent like other redis live store impls that have only one round.
Current struct uses maps for channels:
type treeSigningSessionsStore struct {
nonceChs map[string]chan struct{}
sigsChs map[string]chan struct{}
ctxs map[string]context.CancelFunc
}
Since there's only ever one active round at a time (confirmed by startRound() always calling Delete(existingRound.Id) first), these maps add unnecessary complexity.
Simplify to:
type treeSigningSessionsStore struct {
rdb *redis.Client
mu sync.RWMutex
// single active round (not maps)
currentRoundID string
nonceCh chan struct{}
sigsCh chan struct{}
cancelRound context.CancelFunc
pollInterval time.Duration
numOfRetries int
retryDelay time.Duration
}
Benefits:
- Simpler code, fewer allocations
- No map cleanup needed
- Matches actual usage pattern
- Easier to reason about lifecycleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is in-memory cache, we'll add support once we are ready to support multiple batches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but if we keep this we introduce inconsistency since in other stores we only support one round
| if !exists { | ||
| return errors.TX_NOT_FOUND.New("offchain tx %s not found", txid). | ||
| WithMetadata(errors.TxNotFoundMetadata{Txid: txid}) | ||
| offchainTx, err := s.cache.OffchainTxs().Get(ctx, txid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback masks potential bugs. We expect cache to always have the data, then a miss indicates something is wrong. Silently falling back to DB means:
- We won't know cache sync is broken
- Performance degrades (DB hit instead of cache)
- Could indicate data integrity issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually ok. We need to store the offchain tx in the cache because clients may call FinalizeTx before the db is updated, otherwise we could rely fully on db. If we don't find the tx in the cache, nor in the db we return an error. If data is not found in cache maybe we deleted it manually, this is intentional, i don't see drawbacks.
|
|
||
| ctx := context.Background() | ||
| existingRound := s.cache.CurrentRound().Get(ctx) | ||
| existingRound, err := s.cache.CurrentRound().Get(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup not resilient to cache failures
In startRound(), if CurrentRound().Get() fails, the function returns early and no cleanup happens:
existingRound, err := s.cache.CurrentRound().Get(ctx)
if err != nil {
log.WithError(err).Error("failed to get current round from cache")
return // NO CLEANUP!
}
If Redis is temporarily unavailable after a round failure:
- Old ForfeitTxs, ConfirmationSessions, TreeSigingSessions remain
- New round is never created (server stuck)
Fix: Move unconditional cleanup before the Get(), and don't return early.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this happens, it means we can't access the cache and can't do anything, not even cleaning it up, so we just return and stop any operation here, in an another pr we improve the handling of this scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (5)
internal/core/application/service.go (5)
2514-2521: MissingroundAborted = trueon confirmation session cache failure.Per past review, when
s.cache.ConfirmationSessions().Get(ctx)fails, the code logs the error and callsround.Fail()but does not setroundAborted = true. This is inconsistent with other error paths in this function and may cause the defer block to behave unexpectedly.The past review already flagged this issue. Apply the fix:
case <-time.After(roundTiming.confirmationDuration()): session, err := s.cache.ConfirmationSessions().Get(ctx) if err != nil { + roundAborted = true log.WithError(err).Error("failed to get confirmation session from cache") round.Fail(errors.INTERNAL_ERROR.New( "failed to get confirmation session from cache: %s", err, )) return }
3344-3348: Error in connector index fetch silently aborts event propagation.This was flagged in a past review. When
GetConnectorsIndexesfails, the function returns early without propagating theRoundFinalizationStartedevent. Clients won't be notified of the round finalization.
3358-3363: Error fetching selected intents prevents RoundFailed propagation.This was flagged in a past review. When
GetSelectedIntentsfails, theRoundFailedevent is not propagated to clients, leaving them unaware that the round failed.
3391-3395: Error in Init prevents BatchStarted event propagation.This was flagged in a past review. When
s.cache.ConfirmationSessions().Init(ctx, hashedIntentIds)fails, the function returns early and theBatchStartedevent is never sent to clients.
2278-2292: Potential nil pointer dereference when logging cache cleanup errors.If
existingRoundis nil (valid when no round exists yet), accessingexistingRound.Idin the log messages at lines 2280, 2285, and 2290 will panic. The nil check at line 2293 comes after these log statements.Apply this diff to guard against nil:
// Reset the cache for the new batch if err := s.cache.ForfeitTxs().Reset(ctx); err != nil { - log.WithError(err).Warnf( - "failed to delete forfeit txs from cache for round %s", existingRound.Id, - ) + roundId := "<none>" + if existingRound != nil { + roundId = existingRound.Id + } + log.WithError(err).Warnf("failed to delete forfeit txs from cache for round %s", roundId) } if err := s.cache.Intents().DeleteVtxos(ctx); err != nil { - log.WithError(err).Warnf( - "failed to delete spent vtxos from cache after round %s", existingRound.Id, - ) + roundId := "<none>" + if existingRound != nil { + roundId = existingRound.Id + } + log.WithError(err).Warnf("failed to delete spent vtxos from cache after round %s", roundId) } if err := s.cache.ConfirmationSessions().Reset(ctx); err != nil { - log.WithError(err).Errorf( - "failed to reset confirmation session from cache for round %s", existingRound.Id, - ) + roundId := "<none>" + if existingRound != nil { + roundId = existingRound.Id + } + log.WithError(err).Errorf("failed to reset confirmation session from cache for round %s", roundId) }Alternatively, move these cleanup operations inside the
if existingRound != nilblock or extract the round ID once with a nil check before all log statements.
🧹 Nitpick comments (1)
internal/infrastructure/live-store/inmemory/confirmation_session.go (1)
60-64: Consider potential goroutine leak if no receiver.The goroutine sending to
sessionCompleteChwill block indefinitely if no receiver callsSessionCompleted(). This is likely acceptable given the expected usage pattern, but consider using a buffered channel of size 1 if the signal only needs to be sent once.- sessionCompleteCh: make(chan struct{}), + sessionCompleteCh: make(chan struct{}, 1),And in
Init:- c.sessionCompleteCh = make(chan struct{}) + c.sessionCompleteCh = make(chan struct{}, 1)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
internal/core/application/service.go(56 hunks)internal/infrastructure/live-store/inmemory/confirmation_session.go(4 hunks)internal/infrastructure/live-store/inmemory/forfeits.go(5 hunks)internal/infrastructure/live-store/live_store_test.go(8 hunks)internal/infrastructure/live-store/redis/confirmation_sessions.go(5 hunks)internal/infrastructure/live-store/redis/forfeits.go(8 hunks)internal/infrastructure/live-store/redis/tree_signing_session.go(6 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
Applied to files:
internal/infrastructure/live-store/redis/confirmation_sessions.gointernal/infrastructure/live-store/inmemory/forfeits.gointernal/infrastructure/live-store/inmemory/confirmation_session.gointernal/infrastructure/live-store/redis/forfeits.gointernal/core/application/service.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.
Applied to files:
internal/core/application/service.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern.
Applied to files:
internal/infrastructure/live-store/live_store_test.go
📚 Learning: 2025-09-29T14:33:52.871Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 722
File: pkg/ark-lib/intent/proof.go:52-58
Timestamp: 2025-09-29T14:33:52.871Z
Learning: In btcsuite/btcd's psbt package, the NewFromRawBytes function's boolean parameter (b64) automatically handles base64 decoding when set to true, so passing a base64 string via strings.NewReader with b64=true is the correct usage pattern. A common bug is manually base64-decoding the string and then passing b64=true, which causes a double-decode error.
Applied to files:
internal/infrastructure/live-store/live_store_test.go
🧬 Code graph analysis (6)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)
internal/infrastructure/live-store/inmemory/forfeits.go (3)
pkg/ark-lib/tree/tx_tree.go (1)
FlatTxTree(36-36)internal/core/domain/vtxo.go (1)
Outpoint(15-18)internal/core/ports/tx_builder.go (1)
ValidForfeitTx(49-52)
internal/infrastructure/live-store/redis/tree_signing_session.go (3)
internal/infrastructure/live-store/inmemory/tree_signing_session.go (1)
NewTreeSigningSessionsStore(19-26)internal/core/ports/live_store.go (2)
TreeSigningSessionsStore(70-80)MusigSigningSession(104-110)pkg/ark-lib/tree/musig2.go (2)
TreeNonces(32-32)TreePartialSigs(98-98)
internal/infrastructure/live-store/inmemory/confirmation_session.go (1)
internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)
internal/core/application/service.go (5)
internal/core/ports/live_store.go (2)
ConfirmationSessions(112-116)TimedIntent(92-97)internal/core/domain/round.go (1)
Round(41-59)internal/core/application/service_event.go (1)
RoundFailed(61-64)internal/core/application/round_report.go (1)
BuildCommitmentTxStage(16-16)internal/core/domain/conviction.go (2)
Crime(36-40)CrimeTypeMusig2InvalidSignature(16-16)
internal/infrastructure/live-store/live_store_test.go (4)
internal/core/application/types.go (2)
Outpoint(177-177)VOut(131-131)internal/core/domain/vtxo.go (1)
Outpoint(15-18)internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)pkg/ark-lib/tree/musig2.go (2)
TreeNonces(32-32)TreePartialSigs(98-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build and Scan
- GitHub Check: unit tests
- GitHub Check: integration tests
🔇 Additional comments (41)
internal/infrastructure/live-store/redis/tree_signing_session.go (8)
38-50: LGTM!Constructor properly initializes all fields including the new
numOfRetriesandretryDelayfor retry logic.
52-91: LGTM!The
Newmethod properly implements retry logic with WATCH/TxPipelined for transactional safety. The use ofcontext.Background()for the watch goroutines is appropriate to decouple their lifetime from the request context.
93-153: LGTM!The
Getmethod correctly handles theredis.Nilcase by returningnil, nilfor non-existent sessions, with proper error propagation for other failures.
155-195: LGTM!The
Deletemethod properly cleans up Redis state and local resources (cancel functions, channels) with correct retry logic.
197-223: LGTM!The
AddNoncesmethod properly validates session existence before writing and uses consistent retry logic.
225-251: LGTM!The
AddSignaturesmethod follows the same consistent pattern asAddNonceswith proper validation and retry logic.
265-314: LGTM!The
watchNoncesCollectedgoroutine properly uses a ticker for polling and includes appropriate panic recovery and context cancellation handling.
367-378: LGTM!The
checkSessionExistshelper provides clear validation with descriptive error messages.internal/infrastructure/live-store/inmemory/confirmation_session.go (3)
21-26: LGTM!Constructor properly initializes the mutex and completion channel.
28-42: LGTM!The
Initmethod properly creates a freshsessionCompleteChfor each new session.
92-96: LGTM!The
Initializedmethod properly uses read lock for thread-safe access.internal/infrastructure/live-store/redis/confirmation_sessions.go (4)
49-63: LGTM!Constructor properly initializes the store and starts the background watch goroutine with a cancellable context.
65-96: LGTM!The
Initmethod now correctly passes all relevant keys toWatchfor proper optimistic locking, addressing the previous review feedback.
135-159: LGTM!The
Getmethod properly reads all confirmation state with appropriate error handling.
212-256: LGTM!The
watchSessionCompletiongoroutine properly handles polling, context cancellation, and includes panic recovery. The use ofsync.Onceensures the completion signal is only sent once.internal/infrastructure/live-store/inmemory/forfeits.go (5)
30-88: LGTM!The
Initmethod properly initializes forfeit state with correct locking and validation.
90-116: LGTM!The
Signmethod properly validates state and uses the builder for transaction verification.
127-133: Deadlock risk resolved but Reset error ignored.The defer ordering now correctly unlocks before calling
Reset, avoiding the deadlock. However, the// nolintcomment suggests theReseterror return is intentionally ignored. SinceResetfor in-memory store always returnsnil, this is acceptable, but documenting why the error is ignored would improve maintainability.
153-168: LGTM!The
AllSignedmethod now properly acquires read lock before iterating overforfeitTxs, addressing the previous data race concern.
188-194: LGTM!The
GetConnectorsIndexesmethod now returns a cloned map viamaps.Clone, preventing external mutation of internal state.internal/infrastructure/live-store/redis/forfeits.go (7)
29-38: LGTM!Constructor properly initializes all fields including retry configuration.
40-116: LGTM!The
Initmethod properly marshals data, validates constraints, and uses consistent retry logic with WATCH/TxPipelined.
118-182: LGTM!The
Signmethod properly validates state, verifies transactions through the builder, and stores results with consistent retry logic.
184-206: LGTM!The
Resetmethod now includestime.Sleep(s.retryDelay)between retry attempts, addressing the previous review feedback.
208-237: LGTM!The
Popmethod properly validates forfeit data and handles theReseterror return, unlike the in-memory version.
239-264: LGTM!The
AllSignedmethod properly checks all forfeits and includes error handling for malformed data.
266-316: LGTM!The remaining methods (
GetUnsignedInputs,Len,GetConnectorsIndexes) properly handle Redis reads with appropriate error handling. The typo "fet" has been corrected to "get".internal/core/application/service.go (7)
516-532: Context-aware cache check pattern looks correct.The error handling for
s.cache.OffchainTxs().Includes(ctx, vtxo.Outpoint)properly logs the error and returns an internal error to the client without exposing internal details. The separation of error check from the spent check is clean.
1107-1119: Fallback to DB when cache miss is noted but acceptable.Per past review comment from sekulicd, falling back silently to DB when cache lookup fails can mask cache sync issues. However, the error is now logged at Error level (line 1109), which provides visibility. The fallback ensures the operation can complete even during transient cache issues.
2374-2380: Early return on cache error before defer setup.When
s.cache.CurrentRound().Get(ctx)fails, the function returns early before the defer is set up. This is acceptable sinces.startRound()is called to restart the round lifecycle, andstartRoundhandles its own cleanup. However, the events from this partial round attempt won't be saved.
2407-2418: Proper abort handling for cache errors.The pattern correctly sets
roundAborted = truebefore returning whens.cache.Intents().Len(ctx)fails. This ensures the defer block at line 2387 will properly restart the round instead of proceeding to finalization.
2605-2621: Good error recovery pattern with explicit event creation.When
s.cache.CurrentRound().Get(ctx)fails instartFinalization, the code creates a syntheticRoundFailedevent and saves it to ensure clients are notified. This is a robust pattern that maintains event consistency even during cache failures.
2967-2984: Consistent error recovery pattern in finalizeRound.The error handling when
s.cache.CurrentRound().Get(ctx)fails mirrors the pattern instartFinalization- creating a syntheticRoundFailedevent and saving it before restarting the round. This consistency is good for maintainability.
3468-3498: Context and error handling in checkForfeitsAndBoardingSigsSent.The function correctly uses
context.Background()since it runs as a goroutine that may outlive its caller. The error handling logs issues and returns early, which is appropriate for a status-checking helper function.internal/infrastructure/live-store/live_store_test.go (7)
111-215: IntentStore test properly updated for context-aware API.The test correctly uses
t.Context()for per-test context and updates all store method calls to pass the context. Error handling assertions are properly placed after each operation that can fail.
234-243: Concurrent forfeit signing test is well-structured.The test properly uses
sync.WaitGroupto coordinate concurrentSignoperations and waits for all goroutines to complete before asserting the final state. This tests the thread-safety of the forfeit signing mechanism.
308-348: OffChainTxStore test has good coverage.The test covers all key operations (Add, Get, Includes, Remove) with proper error handling assertions. It correctly tests both existing and non-existent entries for Get and Remove operations.
374-389: Concurrent confirmation test with proper timeout handling.The test uses a
sync.WaitGroupfor concurrent confirmations and aselectwith timeout to wait for session completion. The 5-second timeout is reasonable for test stability.
717-750: Thorough concurrent signature submission test.The test verifies both successful concurrent signature submission and that subsequent overwrite attempts don't modify existing signatures. The
sigsMatchhelper provides detailed error messages for debugging test failures.
45-54: Pre-computed intent ID hashes for confirmation tests.The fixtures pre-compute SHA256 hashes of intent IDs, which aligns with how the confirmation session identifies intents by their hash. This approach keeps the test setup clean and deterministic.
533-541: No issue found. Project supports Go 1.21+ features.The project requires Go 1.25.3 (from go.mod), which fully supports
sync.WaitGroup.Go()added in Go 1.21. The code is compatible with the project's minimum Go version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (7)
internal/core/application/service.go (7)
516-521: Error handling pattern: context discarded in user-facing message.The code logs the actual error but returns a generic "something went wrong" message. This pattern appears throughout the file and was previously flagged by reviewers as problematic because:
- Logs and returns handle the error twice
- Original error context is lost for the client
- Debugging becomes harder
However, per past review discussions, this was deferred to a future PR to avoid making this one too messy.
1107-1119: Silent fallback masks cache sync issues.When
cache.OffchainTxs().Get()returns nil (not an error), the code silently falls back to the database. As noted in a previous review by sekulicd, this fallback masks potential bugs:
- Cache is expected to always have the data
- A miss indicates cache sync is broken
- Silent DB fallback means degraded performance goes unnoticed
Consider logging a warning when falling back to the DB to make cache misses visible for debugging.
3345-3349: Error in connector index fetch silently aborts event propagation.When
GetConnectorsIndexesfails, the function returns early without propagating the round finalization event. This leaves clients without notification of the round completion, which is critical for them to process results.Consider logging the error and continuing with an empty connector map, or propagate a partial event that indicates connector data is unavailable.
3359-3364: RoundFailed event not propagated when intent fetch fails.If
GetSelectedIntentsfails, theRoundFailedevent is never sent to clients. This is critical because clients need to know the round failed to properly handle their pending intents and retry.Consider propagating the failure event with empty topics rather than silently returning, so clients are notified even when intent retrieval fails.
3392-3395: BatchStarted event suppressed on Init failure.When
ConfirmationSessions().Init()fails, the function returns early and theBatchStartedevent is never propagated. This prevents clients from knowing that a batch was attempted, which could lead to stale state on the client side.Consider propagating the event even if session initialization fails, or ensure callers can handle the missing event gracefully.
2515-2522: MissingroundAborted = trueon cache failure path.When
s.cache.ConfirmationSessions().Get(ctx)fails, the code callsround.Fail()and returns but does not setroundAborted = true. This is inconsistent with other error paths in this function (e.g., lines 2410-2413, 2427-2429).The defer block at lines 2388-2406 checks
roundAbortedto decide whether to callstartRound()orstartFinalization(). Without settingroundAborted = true, the defer will attempt finalization instead of properly aborting and restarting the round.Apply this diff to ensure consistent abort behavior:
case <-time.After(roundTiming.confirmationDuration()): session, err := s.cache.ConfirmationSessions().Get(ctx) if err != nil { + roundAborted = true log.WithError(err).Error("failed to get confirmation session from cache") round.Fail(errors.INTERNAL_ERROR.New( "failed to get confirmation session from cache: %s", err, )) return }
2553-2558: MissingroundAborted = trueon boarding inputs cache failure.Similar to the confirmation session error handling, when
s.cache.BoardingInputs().Set(ctx, numOfBoardingInputs)fails, the code callsround.Fail()but does not setroundAborted = true. This inconsistency causes the defer block to attempt saving events and callingstartFinalization()instead of properly aborting the round.Apply this diff:
if err := s.cache.BoardingInputs().Set(ctx, numOfBoardingInputs); err != nil { + roundAborted = true round.Fail(errors.INTERNAL_ERROR.New( "failed to update boarding inputs in cache: %s", err, )) return }
🧹 Nitpick comments (1)
internal/core/application/service.go (1)
2293-2301: Redundant nil check inside non-nil block.Line 2293 checks
if existingRound != nilagain, but this entire code block (lines 2277-2312) is already inside theif existingRound != nilcheck from line 2276. The inner check at line 2293 is redundant.Apply this diff to remove the redundant check:
} if existingRound != nil { // Reset the cache for the new batch if err := s.cache.ForfeitTxs().Reset(ctx); err != nil { log.WithError(err).Warnf( "failed to delete forfeit txs from cache for round %s", existingRound.Id, ) } if err := s.cache.Intents().DeleteVtxos(ctx); err != nil { log.WithError(err).Warnf( "failed to delete spent vtxos from cache after round %s", existingRound.Id, ) } if err := s.cache.ConfirmationSessions().Reset(ctx); err != nil { log.WithError(err).Errorf( "failed to reset confirmation session from cache for round %s", existingRound.Id, ) } - if existingRound != nil { - if existingRound.Id != "" { - if err := s.cache.TreeSigingSessions().Delete(ctx, existingRound.Id); err != nil { - log.WithError(err).Errorf( - "failed to delete tree signing sessions for round from cache %s", - existingRound.Id, - ) - } - } - if existingRound.CommitmentTxid != "" { - if err := s.cache.BoardingInputs().DeleteSignatures( - ctx, existingRound.CommitmentTxid, - ); err != nil { - log.WithError(err).Errorf( - "failed to delete boarding input signatures from cache for round %s", - existingRound.Id, - ) - } - } + if existingRound.Id != "" { + if err := s.cache.TreeSigingSessions().Delete(ctx, existingRound.Id); err != nil { + log.WithError(err).Errorf( + "failed to delete tree signing sessions for round from cache %s", + existingRound.Id, + ) + } + } + if existingRound.CommitmentTxid != "" { + if err := s.cache.BoardingInputs().DeleteSignatures( + ctx, existingRound.CommitmentTxid, + ); err != nil { + log.WithError(err).Errorf( + "failed to delete boarding input signatures from cache for round %s", + existingRound.Id, + ) + } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
internal/core/application/service.go(56 hunks)internal/core/domain/offchain_tx.go(1 hunks)internal/infrastructure/live-store/inmemory/confirmation_session.go(4 hunks)internal/infrastructure/live-store/redis/confirmation_sessions.go(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-28T08:21:01.170Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 686
File: internal/core/application/fraud.go:47-61
Timestamp: 2025-08-28T08:21:01.170Z
Learning: In reactToFraud function in internal/core/application/fraud.go, the goroutine that waits for confirmation and schedules checkpoint sweep should use context.Background() instead of the request context, as this is intentional design to decouple the checkpoint sweep scheduling from the request lifetime.
Applied to files:
internal/infrastructure/live-store/inmemory/confirmation_session.gointernal/infrastructure/live-store/redis/confirmation_sessions.gointernal/core/application/service.go
📚 Learning: 2025-08-19T10:58:41.042Z
Learnt from: louisinger
Repo: arkade-os/arkd PR: 691
File: internal/core/application/service.go:557-562
Timestamp: 2025-08-19T10:58:41.042Z
Learning: In the arkd SubmitOffchainTx method, using the checkpoint PSBT input's tapscript (forfeit path) for the VtxoInput.Tapscript field is the correct behavior, not a bug as initially thought. The system correctly handles the relationship between checkpoint inputs and Ark transaction inputs.
Applied to files:
internal/core/application/service.go
🧬 Code graph analysis (4)
internal/core/domain/offchain_tx.go (1)
internal/core/domain/round.go (1)
Stage(30-34)
internal/infrastructure/live-store/inmemory/confirmation_session.go (1)
internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)
internal/infrastructure/live-store/redis/confirmation_sessions.go (1)
internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)
internal/core/application/service.go (8)
api-spec/protobuf/gen/ark/v1/types.pb.go (6)
Outpoint(24-30)Outpoint(43-43)Outpoint(58-60)Intent(441-447)Intent(460-460)Intent(475-477)internal/infrastructure/wallet/wallet_client.go (1)
New(31-56)internal/core/ports/live_store.go (1)
ConfirmationSessions(112-116)internal/core/domain/round.go (2)
Round(41-59)ForfeitTx(36-39)internal/core/domain/round_event.go (2)
RoundFailed(39-43)RoundEvent(9-12)internal/core/application/round_report.go (4)
BuildCommitmentTxStage(16-16)CreateTreeNoncesOp(24-24)TreeSigningStage(17-17)ForfeitTxsCollectionStage(18-18)internal/core/domain/conviction.go (2)
Crime(36-40)CrimeTypeMusig2InvalidSignature(16-16)pkg/ark-lib/tree/tx_tree.go (1)
TxTree(19-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build and Scan
- GitHub Check: integration tests
- GitHub Check: unit tests
🔇 Additional comments (8)
internal/core/domain/offchain_tx.go (1)
244-245: LGTM: Clean state reset on finalization.Explicitly clearing the failure flag and reason when an offchain transaction is finalized ensures a clean, consistent state. This defensive approach prevents stale failure information from persisting after successful finalization.
internal/core/application/service.go (1)
3469-3499: LGTM: Context usage aligns with design intent.This function correctly uses
context.Background()on line 3469. Based on learnings, background goroutines that manage asynchronous state checks should be decoupled from request lifetimes to prevent premature cancellation. The error handling appropriately logs and returns early when cache operations fail.internal/infrastructure/live-store/inmemory/confirmation_session.go (2)
84-87: LGTM: Nil-safe channel management.The nil-check before closing and the creation of a new channel ensure safe lifecycle management. This addresses the previous review concern about potential panics from closing a nil channel.
28-99: LGTM: Context-aware API aligns with live-store interface.All public methods now accept
context.Contextand return appropriate error types. While the context is currently unused in the in-memory implementation (as indicated by_prefix), this aligns the API surface with the Redis implementation and prepares for future context-aware operations like cancellation or timeout handling.internal/infrastructure/live-store/redis/confirmation_sessions.go (4)
71-95: LGTM: Watch keys and retry delay properly implemented.The
Initmethod now correctly:
- Specifies keys to watch (line 90:
keys...), enabling proper optimistic locking- Includes retry delay between attempts (line 93)
- Clears existing data before setting new values
These changes address previous review concerns about missing watch keys and tight retry loops.
114-127: LGTM: Transactional read within Watch callback.Line 116 now correctly uses
tx.Getinstead ofs.rdb.Get, ensuring the read ofconfirmationNumConfirmedKeyis part of the watched transaction. This prevents race conditions where another process modifies the counter between the read and the write.The keys are properly passed to
Watchat line 127, and retry delay is included at line 130.
170-184: LGTM: Retry delay now included in Reset.The retry loop now includes
time.Sleep(s.retryDelay)at line 183, addressing the previous review concern about tight-loop retries during contention. This matches the retry pattern inInitandConfirm.
65-205: LGTM: Context-aware API with robust retry patterns.The Redis implementation now provides a context-aware API surface that matches the in-memory version. Key improvements:
- All public methods accept
context.Contextfor cancellation/timeout support- Consistent retry patterns with proper delays across all methods
- Explicit error returns with informative messages
- Optimistic locking with WATCH on appropriate keys
The implementation addresses all concerns from previous reviews regarding race conditions, missing watch keys, and tight retry loops.
This adds context and error return value to all methods of the interfaces that compose
ports.LiveStore.In the app service, this moves from relying fully on cache to store cached data in variables that are reused in the rest of the function bodies.
Please @louisinger @sekulicd review.
Summary by CodeRabbit
Bug Fixes
Improvements
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.