Skip to content

fix: handle duplicate log messages idempotently#953

Draft
alexluong wants to merge 3 commits into
mainfrom
fix/logmq-intra-batch-duplicates
Draft

fix: handle duplicate log messages idempotently#953
alexluong wants to merge 3 commits into
mainfrom
fix/logmq-intra-batch-duplicates

Conversation

@alexluong

Copy link
Copy Markdown
Collaborator

Handles at-least-once duplicates landing in the same logmq batch, which previously failed the whole batch in pglogstore (ON CONFLICT DO UPDATE rejects the same key twice in one statement, SQLSTATE 21000) and re-ran alert evaluation once per copy — the root cause of #946. Fixes #947.

Dedup keys on Attempt.ID at three layers: the batch processor drops duplicate copies while parsing, InsertMany tolerates intra-batch duplicates as part of the driver contract (new drivertest conformance case), and the alert monitor skips replayed attempts only once a full evaluation has succeeded — partial failures still re-run on redelivery.

🤖 Generated with Claude Code

alexluong and others added 3 commits June 12, 2026 23:15
Duplicate log messages (MQ redelivery, producer re-publish) can land in
the same batch. pglogstore's attempts INSERT uses ON CONFLICT DO UPDATE,
which Postgres rejects when the same (time, id) appears twice in one
statement (SQLSTATE 21000) — failing the whole batch. chlogstore accepted
duplicates but paid merge overhead and surfaced transient duplicate reads
until ReplacingMergeTree compaction.

Make InsertMany tolerate intra-batch duplicates as part of the driver
contract: dedupe entries by Attempt.ID (shared driver helper) in
pglogstore and chlogstore. memlogstore already conforms via keyed upsert.

Add a drivertest conformance case: a batch containing duplicate entries
inserts without error and persists a single attempt row.

Part of #947

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
At-least-once redelivery and producer re-publish (deliverymq retrying
after a post-publish error) can land multiple copies of the same log
entry in one batch — possibly under different MQ message IDs, so dedup
keys on Attempt.ID. Previously duplicates flowed into InsertMany (where
they crashed pglogstore, see #946) and re-ran alert evaluation once per
copy.

Drop duplicates while parsing: ack the duplicate copy immediately and
keep the first. Safe because copies are byte-identical and the kept copy
stays un-acked until persisted — the at-least-once guarantee rides on it.
Alert eval now runs once per unique attempt per batch.

Part of #947

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Alert evaluation already dedups counting by attempt ID (SADD), but the
"newly added" signal was discarded — every replayed copy of an attempt
(MQ redelivery, producer re-publish) re-ran the full evaluation and
re-emitted alerts.

Track an evaluated marker per attempt: IncrementConsecutiveFailureCount
now returns FailureCountResult{Count, NewlyCounted, AlreadyEvaluated},
and the monitor marks the attempt evaluated (cfeval set, 24h TTL) only
after the full evaluation succeeds. Replays of fully evaluated attempts
are skipped; attempts counted but never marked (eval failed mid-way, the
message was nacked) fall through and re-run as recovery — a naive
early-exit on already-counted would silently drop those alerts.
Marking is non-fatal: on failure the attempt re-evaluates on replay,
matching previous behavior. Reset clears both keys.

Part of #947

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

logmq: handle duplicate log messages idempotently

1 participant