Skip to content

Implement end-to-end delivery acknowledgment for checkpoint safety #33

@writeameer

Description

@writeameer

Problem

The ingester checkpoints (advances LSN) after writing to the OS pipe, not after the output provider confirms delivery to the destination (e.g., ASB). If the output provider crashes between receiving data and delivering it, the data is lost — the ingester thinks it was delivered.

This is a non-negotiable design requirement: a payload must not be marked as saved without confirmation from the output provider.

Current Flow (broken)

Ingester writes to stdout (pipe) → checkpoint LSN ← TOO EARLY
    ↓
CLI relays to output provider stdin
    ↓
Output provider sends to ASB ← confirmation never flows back

Proposed Flow

Ingester writes to stdout (pipe) → does NOT checkpoint yet
    ↓
CLI relays to output provider stdin
    ↓
Output provider sends to ASB → writes ack to stdout: {"ack":N}
    ↓
CLI relays ack back to ingester stdin
    ↓
Ingester receives ack → NOW checkpoint LSN

Design Considerations

  1. Ingester must read acks from stdin — currently it closes stdin after receiving config. Needs to keep it open and read ack lines.
  2. Output provider writes acks to stdout — after each batch is confirmed by ASB, emit {"ack": <count>} or {"ack_lsn": "<lsn>"}.
  3. CLI relays acks — output provider stdout → ingester stdin. The CLI already reads output provider stdout (for forwarding to os.Stdout). It needs to detect ack lines and route them back to the ingester.
  4. Batching — acks should cover batches, not individual messages. This aligns with ASB batched delivery (also needed — see Document checkpoint-delivery semantics across architectures #32).
  5. Timeout — if no ack arrives within a configurable window, the ingester should stop advancing and log an error.

Impact

  • Changes in: dstream CLI (relay acks), dstream-ingester-mssql (read acks, defer checkpoint), dstream-out-asb (emit acks), dstream-sdk-dotnet (ack support in SDK)
  • Protocol extension: adds ack message type to stdout
  • The ChangePublisher interface in the ingester needs to change — PublishChanges should not return done until the ack is received

Acceptance Criteria

  • Ingester does NOT checkpoint until output provider confirms delivery
  • Output provider sends ack only after destination confirms (e.g., ASB SendMessagesAsync succeeds)
  • If output provider crashes, ingester does not advance LSN — next restart re-reads from last safe checkpoint
  • Documented in protocol spec

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions