logmq: handle duplicate log messages idempotently

Root-cause fix for the insert failure in #946.

**Problem**

The log pipeline handles at-least-once redelivery idempotently **across** batches (store upserts; alert counting dedups by attempt ID via `SADD`). Duplicates landing **within the same batch** are unhandled:

- **pglogstore**: attempts use `ON CONFLICT (time, id) DO UPDATE`; Postgres rejects the same key twice in one statement → `ON CONFLICT DO UPDATE command cannot affect row a second time (SQLSTATE 21000)` → the whole batch fails, nacks, redelivers, and recurs (#946 covers the escalation to a dead consumer).
- **chlogstore**: no error, but redundant rows (merge overhead, transient duplicate reads until ReplacingMergeTree merges).
- **alert evaluation**: runs in full once per copy instead of once per attempt.

Duplicates arise from MQ redelivery and from producer re-publish (deliverymq retrying after a post-publish error), so they can carry different MQ message IDs — dedup must key on `Attempt.ID`.

**Proposal**

a) **BatchProcessor** (`processBatch` in `internal/logmq/batchprocessor.go`): dedup by `Attempt.ID` while parsing; on duplicate, ack immediately and drop. Safe because copies are byte-identical and the kept copy stays un-acked until persisted — the at-least-once guarantee rides on it. Keeps the existing 1:1 entry↔message pairing; alert eval runs once per unique attempt per batch.

b) **LogStore contract**: `InsertMany` must tolerate intra-batch duplicates regardless of caller. pglogstore currently violates this (the 21000) — exact approach TBD: dedup entries before insert, restructure the `ON CONFLICT` usage, or accept the conflict if that's expressible safely. chlogstore: best-effort dedup to avoid redundant rows. memlogstore already conforms (keyed upsert).

c) **Alert replay skip**: `SAdd` already returns whether the attempt was newly counted (`internal/alert/store.go` discards it today); use it to skip re-evaluating replayed attempts. Design constraint: must preserve partial-failure retry — eval errors are nacked for redelivery, and re-running disable/emit on replay is intentional recovery (`internal/alert/monitor.go`). A naive early-exit on already-counted would silently drop alerts when eval fails after counting; likely needs a completed-marker or restructuring of when the attempt is recorded.

**Tests**

- drivertest contract case (all drivers): batch with duplicate entries inserts without error, one attempt row.
- batchprocessor: duplicate messages → one entry inserted, all copies acked, alert eval once.
- alert: replayed attempt skips eval; partial-failure replay still retries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logmq: handle duplicate log messages idempotently #947

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

logmq: handle duplicate log messages idempotently #947

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions