Skip to content

Conversation

@joroshiba
Copy link
Member

Summary

Gives conductor a larger backup when attempting to fetch a block commit and validator set, and adds a debug log so we can see if getting errors which are not retried through logs.

Background

When a sequencer restarts or ingress has a bounce, we tend to get lagging firm commits because we can't fetch the sequencer information and conductor drops the block from its queue. This change increases the amount of time it will go, noting the retries here are on an exponential backoff.

Changes

  • Backoff max time increased from 10s to 5m on fetch commit and fetch validators
  • new debug log

@joroshiba joroshiba requested a review from a team as a code owner November 3, 2025 19:51
@github-actions github-actions bot added the conductor pertaining to the astria-conductor crate label Nov 3, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Increased retry backoff max delay from 10s to 5 minutes for fetching sequencer commits and validators to handle ingress/sequencer restarts better.

  • Increased max_delay from 10s to 300s (5 minutes) in fetch_commit_with_retry and fetch_validators_with_retry
  • Added debug log when non-retryable errors are encountered (but with incorrect syntax)
  • Uses exponential backoff starting at 100ms, doubling until reaching the 5-minute cap
  • Only retries on HTTP/Timeout errors; other errors break immediately

Confidence Score: 3/5

  • This PR has a syntax error that will prevent logging from working correctly
  • The retry logic change is sensible and addresses a real operational issue. However, there's a syntax error in the new debug log that needs to be fixed. Additionally, the hard-coded 300s value should ideally be configurable per the custom instructions, though this is a style concern rather than a blocker
  • Pay close attention to crates/astria-conductor/src/celestia/verify.rs - fix the debug log syntax error before merging

Important Files Changed

File Analysis

Filename Score Overview
crates/astria-conductor/src/celestia/verify.rs 3/5 Increased retry max delay from 10s to 5m and added debug log with incorrect field syntax

Sequence Diagram

sequenceDiagram
    participant Conductor
    participant CometBftRetryStrategy
    participant SequencerClient
    participant Sequencer

    Conductor->>SequencerClient: fetch_commit_with_retry(height)
    activate SequencerClient
    
    loop Retry with exponential backoff (max 5 min)
        SequencerClient->>Sequencer: commit(height)
        alt Retryable Error (HTTP/Timeout)
            Sequencer-->>SequencerClient: Error
            CometBftRetryStrategy->>CometBftRetryStrategy: Check error type
            CometBftRetryStrategy->>CometBftRetryStrategy: Double delay (exponential backoff)
            Note over CometBftRetryStrategy: Max delay capped at 300s (5min)<br/>Previously 10s
            CometBftRetryStrategy-->>SequencerClient: RetryPolicy::Delay
        else Non-retryable Error
            Sequencer-->>SequencerClient: Error
            CometBftRetryStrategy->>CometBftRetryStrategy: Check error type
            CometBftRetryStrategy->>CometBftRetryStrategy: Log debug message
            Note over CometBftRetryStrategy: New debug log added
            CometBftRetryStrategy-->>SequencerClient: RetryPolicy::Break
            SequencerClient-->>Conductor: Return Error
        else Success
            Sequencer-->>SequencerClient: Commit Response
            SequencerClient-->>Conductor: Return Commit
        end
    end
    
    deactivate SequencerClient
    
    Conductor->>SequencerClient: fetch_validators_with_retry(height)
    activate SequencerClient
    
    loop Retry with exponential backoff (max 5 min)
        SequencerClient->>Sequencer: validators(height)
        alt Retryable Error (HTTP/Timeout)
            Sequencer-->>SequencerClient: Error
            CometBftRetryStrategy->>CometBftRetryStrategy: Check error type
            CometBftRetryStrategy->>CometBftRetryStrategy: Double delay (exponential backoff)
            Note over CometBftRetryStrategy: Max delay capped at 300s (5min)<br/>Previously 10s
            CometBftRetryStrategy-->>SequencerClient: RetryPolicy::Delay
        else Non-retryable Error
            Sequencer-->>SequencerClient: Error
            CometBftRetryStrategy->>CometBftRetryStrategy: Check error type
            CometBftRetryStrategy->>CometBftRetryStrategy: Log debug message
            Note over CometBftRetryStrategy: New debug log added
            CometBftRetryStrategy-->>SequencerClient: RetryPolicy::Break
            SequencerClient-->>Conductor: Return Error
        else Success
            Sequencer-->>SequencerClient: Validators Response
            SequencerClient-->>Conductor: Return Validators
        end
    end
    
    deactivate SequencerClient
Loading

1 file reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

let retry_config = RetryFutureConfig::new(u32::MAX)
.custom_backoff(CometBftRetryStrategy::new(Duration::from_millis(100)))
.max_delay(Duration::from_secs(10))
.max_delay(Duration::from_secs(300))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: hard-coded 300s (5min) max delay should be configurable via environment variable, similar to other timeout values in the codebase that have XXX: This should probably be configurable comments

Context Used: Rule from dashboard - Hard-coded values that may vary between different networks (local, mocha, mainnet) should be configu... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/astria-conductor/src/celestia/verify.rs
Line: 298:298

Comment:
**style:** hard-coded 300s (5min) max delay should be configurable via environment variable, similar to other timeout values in the codebase that have `XXX: This should probably be configurable` comments

**Context Used:** Rule from `dashboard` - Hard-coded values that may vary between different networks (local, mocha, mainnet) should be configu... ([source](https://app.greptile.com/review/custom-context?memory=3a7f89a9-1179-4b2d-b733-e3acbebacdde))

How can I resolve this? If you propose a fix, please make it concise.

let retry_config = RetryFutureConfig::new(u32::MAX)
.custom_backoff(CometBftRetryStrategy::new(Duration::from_millis(100)))
.max_delay(Duration::from_secs(10))
.max_delay(Duration::from_secs(300))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: hard-coded 300s (5min) max delay should be configurable via environment variable, similar to other timeout values in the codebase that have XXX: This should probably be configurable comments

Context Used: Rule from dashboard - Hard-coded values that may vary between different networks (local, mocha, mainnet) should be configu... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/astria-conductor/src/celestia/verify.rs
Line: 333:333

Comment:
**style:** hard-coded 300s (5min) max delay should be configurable via environment variable, similar to other timeout values in the codebase that have `XXX: This should probably be configurable` comments

**Context Used:** Rule from `dashboard` - Hard-coded values that may vary between different networks (local, mocha, mainnet) should be configu... ([source](https://app.greptile.com/review/custom-context?memory=3a7f89a9-1179-4b2d-b733-e3acbebacdde))

How can I resolve this? If you propose a fix, please make it concise.

@joroshiba joroshiba enabled auto-merge November 3, 2025 20:00
@joroshiba joroshiba added this pull request to the merge queue Nov 3, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conductor pertaining to the astria-conductor crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants