Skip to content

Reduce disk I/O of time-based compaction#6169

Open
lava wants to merge 1 commit into
mainfrom
topic/reduce-compaction-disk-io
Open

Reduce disk I/O of time-based compaction#6169
lava wants to merge 1 commit into
mainfrom
topic/reduce-compaction-disk-io

Conversation

@lava
Copy link
Copy Markdown
Member

@lava lava commented May 13, 2026

🔍 Problem

When a time-based compaction rule deletes part of a partition's data, the
output partition's synopsis should reflect the time range of the slices that
survived. Instead, the old code copied the input partition's full
min_import_time / max_import_time onto the output synopsis.

So an input partition spanning a wide time range that's been trimmed by
compaction kept advertising the original wide range. Time-based compaction
rules use that range to pick eligible partitions, so the same partition got
selected and rewritten again on every run, even though there was nothing
left to do.

🛠️ Solution

  • Aggregate min/max import time from the slices that actually end up in
    each output partition with std::min / std::max.
  • The default partition_synopsis sentinels (min_import_time = time::max(),
    max_import_time = time::min()) make the first slice seed the range
    correctly.

💬 Review

  • Side fix discovered while investigating TNZ-527; should not close that
    ticket.
  • No new test — the existing transformer/compaction tests don't assert on
    per-partition import-time ranges. Happy to add one if you want.
🎫 References TNZ-527

When the partition transformer (used by compaction) creates output
partitions, it was copying the source partition's full min/max import
time range onto every output partition synopsis. That makes the
compacted partition's time range overlap the original input range, so
time-based compaction rules consider it eligible for compaction again
on the next run.

Aggregate min/max import time from the slices that actually end up in
each output partition instead, so the resulting partition's range
reflects its actual contents.
@github-actions github-actions Bot added engine Core pipeline and storage engine bugfix Bug fix labels May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Bug fix engine Core pipeline and storage engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant