fix: handle duplicate log messages idempotently#953
Draft
alexluong wants to merge 3 commits into
Draft
Conversation
Duplicate log messages (MQ redelivery, producer re-publish) can land in the same batch. pglogstore's attempts INSERT uses ON CONFLICT DO UPDATE, which Postgres rejects when the same (time, id) appears twice in one statement (SQLSTATE 21000) — failing the whole batch. chlogstore accepted duplicates but paid merge overhead and surfaced transient duplicate reads until ReplacingMergeTree compaction. Make InsertMany tolerate intra-batch duplicates as part of the driver contract: dedupe entries by Attempt.ID (shared driver helper) in pglogstore and chlogstore. memlogstore already conforms via keyed upsert. Add a drivertest conformance case: a batch containing duplicate entries inserts without error and persists a single attempt row. Part of #947 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
At-least-once redelivery and producer re-publish (deliverymq retrying after a post-publish error) can land multiple copies of the same log entry in one batch — possibly under different MQ message IDs, so dedup keys on Attempt.ID. Previously duplicates flowed into InsertMany (where they crashed pglogstore, see #946) and re-ran alert evaluation once per copy. Drop duplicates while parsing: ack the duplicate copy immediately and keep the first. Safe because copies are byte-identical and the kept copy stays un-acked until persisted — the at-least-once guarantee rides on it. Alert eval now runs once per unique attempt per batch. Part of #947 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Alert evaluation already dedups counting by attempt ID (SADD), but the
"newly added" signal was discarded — every replayed copy of an attempt
(MQ redelivery, producer re-publish) re-ran the full evaluation and
re-emitted alerts.
Track an evaluated marker per attempt: IncrementConsecutiveFailureCount
now returns FailureCountResult{Count, NewlyCounted, AlreadyEvaluated},
and the monitor marks the attempt evaluated (cfeval set, 24h TTL) only
after the full evaluation succeeds. Replays of fully evaluated attempts
are skipped; attempts counted but never marked (eval failed mid-way, the
message was nacked) fall through and re-run as recovery — a naive
early-exit on already-counted would silently drop those alerts.
Marking is non-fatal: on failure the attempt re-evaluates on replay,
matching previous behavior. Reset clears both keys.
Part of #947
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Handles at-least-once duplicates landing in the same logmq batch, which previously failed the whole batch in pglogstore (
ON CONFLICT DO UPDATErejects the same key twice in one statement, SQLSTATE 21000) and re-ran alert evaluation once per copy — the root cause of #946. Fixes #947.Dedup keys on
Attempt.IDat three layers: the batch processor drops duplicate copies while parsing,InsertManytolerates intra-batch duplicates as part of the driver contract (new drivertest conformance case), and the alert monitor skips replayed attempts only once a full evaluation has succeeded — partial failures still re-run on redelivery.🤖 Generated with Claude Code