Skip to content

Antalya-26.3 Added support for TTL EXPORT#1810

Open
mkmkme wants to merge 19 commits into
antalya-26.3from
mkmkme/antalya-26.3/ttl-export-partition
Open

Antalya-26.3 Added support for TTL EXPORT#1810
mkmkme wants to merge 19 commits into
antalya-26.3from
mkmkme/antalya-26.3/ttl-export-partition

Conversation

@mkmkme
Copy link
Copy Markdown
Collaborator

@mkmkme mkmkme commented May 19, 2026

Fixes #1793

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

ReplicatedMergeTree gains support for TTL EXPORT TO TABLE

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@mkmkme mkmkme added antalya port-antalya PRs to be ported to all new Antalya releases antalya-26.3 labels May 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Workflow [PR], commit [dd3a2ac]

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented May 19, 2026

With the new commit 04206 (syntax check) is now passing

@mkmkme mkmkme force-pushed the mkmkme/antalya-26.3/ttl-export-partition branch from 0f90fd6 to 1d0c1a7 Compare May 19, 2026 11:14
mkmkme and others added 9 commits May 20, 2026 15:02
Tests describe the contract for the upcoming `TTL ... EXPORT TO db.table`
action. They are added before the C++ implementation so they double as the
acceptance criteria.

Stateless (tests/queries/0_stateless):
- 04206_ttl_export_partition_syntax: parser/metadata round-trip and rejection
  of (a) two EXPORT TTLs to the same destination and (b) EXPORT TTL on a
  table without a partition key.
- 04207_ttl_export_partition_basic: happy path, plus an in-line assertion
  that a future-dated partition is not exported.
- 04208_ttl_export_partition_skip_already_exported: re-triggering after a
  partition has been exported does not duplicate it.

Integration (tests/integration/test_ttl_export_partition):
- test_basic_to_iceberg, test_only_one_replica_submits,
  test_failure_and_backoff, test_serial_across_partitions,
  test_replica_restart_mid_export,
  test_modify_ttl_picks_up_with_materialize, test_disabled_replica,
  test_dedup_via_high_water_mark.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce parser, AST, and metadata plumbing for `TTL ... EXPORT TO db.table`.
No background scheduler or per-part TTL info yet — those land in follow-up
commits. The clause is recognised, round-trips through `SHOW CREATE TABLE`,
and the resulting `TTLDescription` is collected into `TTLTableDescription`'s
new `export_ttl` list (exposed via `StorageInMemoryMetadata::getExportTTLs`).

Validation in `TTLTableDescription::getTTLForTableFromAST`:
  * reject two `EXPORT` clauses to the same destination,
  * reject `EXPORT` TTL on a table with no partition key.

The destination-specific override of `TTLDescription::result_column`
(`"_export_" + db + "." + table`) is required so that future per-part TTL
info (keyed by `result_column`) keeps separate clocks per destination.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stores per-part TTL info under `MergeTreeDataPartTTLInfos::export_ttl`,
keyed by `TTLDescription::result_column`. The map is:

  * populated at write time in `MergeTreeDataWriter` for every TTL
    returned by `getExportTTLs`,
  * recomputed during `MATERIALIZE TTL` and merge-time TTL recompute via
    `TTLCalcTransform` / `TTLTransform` (a new `TTLUpdateField::EXPORT_TTL`
    finalizes into the right map),
  * serialized in JSON under the `"export"` key (mirroring the
    `recompression` entry),
  * propagated across merges through the existing `update` aggregation,
  * surfaced through `hasAnyNonFinishedTTLs` and `checkAllTTLCalculated`
    so old parts that predate the TTL are flagged for `MATERIALIZE TTL`.

Adds the partition-wide helper `getPartitionExportTTLMax`: returns the
max `export_ttl.max` across all parts of a partition, or `nullopt` if
any part is missing the entry (with optional `missing_parts_out` for
the scheduler to log). Deliberate: no on-the-fly evaluation — the user
runs `ALTER TABLE ... MATERIALIZE TTL` to backfill, same UX as moves
and recompression TTLs.

Also pulls `export_ttl` into `hasAnyTableTTL` / `hasOnlyRowsTTL` and the
`getColumnDependencies` TTL-column-set walk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Align the TTL syntax with `ALTER ... EXPORT PARTITION TO TABLE`: the
keyword is now `EXPORT TO TABLE <db.table>` instead of `EXPORT TO
<db.table>`. Parser, formatter, exception messages, and tests updated to
match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an `ExportOrigin` enum (`alter` | `ttl`) to the manifest body so
manifests submitted manually (`ALTER ... EXPORT PARTITION`) can be told
apart from manifests submitted by the upcoming TTL scheduler. Surfaced
as `system.replicated_partition_exports.export_origin
Enum8('alter' = 0, 'ttl' = 1)`. Existing manifests in ZooKeeper that
don't carry the field read back as `alter` for backwards compatibility.

`ttl`-origin manifests are skipped by manifest-TTL eviction: the
background cleanup in `ExportPartitionManifestUpdatingTask` and the
overwrite path in `StorageReplicatedMergeTree::exportPartitionToTable`
both refuse to consider them expired. The existing
`export_merge_tree_partition_force_export` setting still overrides via
the unchanged gate.

The write site keeps the default `ExportOrigin::alter`; the TTL
submitter writes `ExportOrigin::ttl` in a follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new query-level setting `export_merge_tree_partition_mark_as_ttl`
(default false). When set on `ALTER ... EXPORT PARTITION`, the resulting
manifest is written with `export_origin = ttl` (same as what the TTL
scheduler will write in a follow-up). The TTL scheduler always sets this
implicitly when it submits.

Enforces the "at most one ttl-origin manifest per (src, dest)"
invariant at submission time: when a ttl-origin manifest is being
created, scan siblings under `<zk_root>/exports/` for an existing
ttl-origin marker at a different `partition_id`. If found at `P_old`,
reject the submission as a back-fill (`new < P_old`) unless
`export_merge_tree_partition_force_export` is set; otherwise
best-effort `tryRemoveRecursive` of the old marker before creating the
new one. Same-key collisions continue to be handled by the existing
block.

A plain `alter` over a ttl marker at a different partition is allowed
without friction — alter manifests coexist with the ttl marker, and
the TTL scheduler will filter by `export_origin = ttl` when reading
its own state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts the readable-vs-insertable column diff and the partition-key
AST compare from `StorageReplicatedMergeTree::exportPartitionToTable`
into `ExportPartitionUtils::verifyExportDestinationCompatibility`, and
calls it from `TTLTableDescription::getTTLForTableFromAST` for every
`TTLMode::EXPORT` clause when not attaching. The destination is resolved
through `DatabaseCatalog::getTable`, matching the manual
`ALTER ... EXPORT PARTITION` flow (throws `UNKNOWN_TABLE` if missing).

The check is skipped under `is_attach=true` because the destination
table may not yet be loaded at server startup; submission-time
validation in `exportPartitionToTable` still covers that path.

Iceberg destinations skip the partition-key AST compare here; the
existing `verifyIcebergPartitionCompatibility` runs against the runtime
iceberg metadata at submission time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces `TTLExportScheduler`, a per-`StorageReplicatedMergeTree`
background driver that submits partition exports for tables with
`TTL ... EXPORT TO TABLE db.table`. The scheduler is stateless across
restarts: it reads the latest `export_origin = ttl` manifest from
ZooKeeper on every tick and acts on its status — no manifest →
submit the smallest eligible partition; PENDING → wait; COMPLETED →
walk forward to `partition_id > completed`; FAILED → resubmit with
`force_export=1` after per-partition exponential backoff; KILLED →
idle with a `LOG_WARNING` carrying the recovery recipe.

`submit` classifies outcomes as `Submitted | Transient | Failure` so
ZK CAS races and `UNKNOWN_TABLE` (destination dropped post-DDL) do
not bump backoff, while genuine submission errors do.

Adds three table-level settings used by the scheduler:
`export_merge_tree_partition_ttl_poll_interval_seconds` (default 5),
`export_merge_tree_partition_ttl_min_backoff_seconds` (default 1),
`export_merge_tree_partition_ttl_max_backoff_seconds` (default 60).

The scheduler is not yet wired into the background task pool; that
follows in a separate commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Declare the scheduler and its background task next to the other
`export_merge_tree_partition_*` task holders. Under the
`allow_experimental_export_merge_tree_partition` server gate:

- Construct the scheduler and create the `TTLExport` task, logging any
  exceptions from `run` via `tryLogCurrentException`.
- `ReplicatedMergeTreeRestartingThread::tryStartup` activates the task
  alongside the other export tasks; `partialShutdown` deactivates it.
- `alter` calls `ttl_export_task->schedule` when any `MODIFY TTL`
  command is in the alter so newly added EXPORT TTLs take effect
  immediately.
- `TTLExportScheduler::run` reschedules itself with
  `export_merge_tree_partition_ttl_poll_interval_seconds * jitter25`
  on the polling path. Early returns (shutdown, readonly, no EXPORT
  TTL, experimental gate off) intentionally skip the reschedule —
  deactivation, the `alter` hook, and server-level config drive those
  paths instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mkmkme mkmkme force-pushed the mkmkme/antalya-26.3/ttl-export-partition branch from 1d0c1a7 to f7d8a27 Compare May 20, 2026 13:10
mkmkme and others added 4 commits May 20, 2026 18:31
In `exportPartitionToTable`, the cross-partition swap used to delete the
existing ttl-origin marker before the `tryMulti` that creates the new
manifest. Any throw between the delete and the multi (parts.empty,
pending mutations, iceberg compatibility, or a ZK error mid-multi) left
the scheduler with no ttl marker at all, so the next tick of
`TTLExportScheduler` would treat the table as fresh and restart from the
oldest expired partition.

Now:

  - The sibling cache walk collects every stale ttl-origin entry for the
    destination, not just the first one encountered. The freshest by
    `create_time` is used for the back-fill check; the others are kept
    for cleanup. This makes the walk deterministic (no break-on-first-
    match in unspecified iteration order) and self-healing for any
    stragglers a previous cleanup may have missed.
  - The `tryRemoveRecursive` of those stale markers runs after the
    `tryMulti` succeeds. Failures during validation or in the multi
    itself now leave the existing marker intact, so the scheduler keeps
    its high-water mark.
  - The post-multi cleanup is best-effort; a ZK error there at worst
    leaves dead nodes, which the next ttl submission will reap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `export_merge_tree_partition_mark_as_ttl` to the Antalya 26.3 session
block, and the three TTL EXPORT scheduler MergeTree settings
(`export_merge_tree_partition_ttl_poll_interval_seconds`,
`export_merge_tree_partition_ttl_min_backoff_seconds`,
`export_merge_tree_partition_ttl_max_backoff_seconds`) to the 26.3 MergeTree
block, so `02995_new_settings_history` passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mkmkme mkmkme marked this pull request as ready for review May 20, 2026 17:42
@mkmkme mkmkme changed the title [WIP] Antalya-26.3 Added support for TTL EXPORT Antalya-26.3 Added support for TTL EXPORT May 20, 2026
instead of context's `current_database`

This should fix the stateless tests.
@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented May 21, 2026

test_database_iceberg/test.py::test_namespace_filter is failing on the current antalya-26.3 as well, so seems unrelated to me

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented May 21, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ebd61019e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/Storages/MergeTree/TTLExportScheduler.cpp Outdated
Comment thread src/Storages/TTLDescription.cpp
Comment thread src/Parsers/ASTTTLElement.cpp Outdated
mkmkme and others added 2 commits May 21, 2026 12:34
Partition IDs are arbitrary strings derived from the partition expression;
lex comparison can permanently strand newer partitions when widths differ
(`PARTITION BY year` with values 9, 10, 11 lex-orders as "10", "11", "9",
so after exporting "9" the scheduler would skip "10" and "11" forever).

`pickPartition` now forward-walks by expiration max, with the floor's
expiration max recomputed from the source's current parts. The floor's
exact `partition_id` is still excluded by string equality so we don't
re-pick the partition we just completed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous representation flattened `dst_database` and `dst_table` into a
single `destination_name` string joined by a dot. That made the boundary
ambiguous for quoted table names that legitimately contain a dot — e.g.
``TTL ... EXPORT TO TABLE `a.b` `` round-tripped through the formatter as
`` `a`.`b` `` (database `a`, table `b`), changing semantics on metadata
reattach and sending exports to the wrong destination via
`QualifiedTableName::parseFromString` in the scheduler and the DDL-time
schema-compat check.

Mirror the convention already used by `ASTAlterQuery` for the matching
`ALTER ... EXPORT PARTITION ... TO TABLE` syntax: store `destination_database`
and `destination_name` as two separate strings on `ASTTTLElement` and
`TTLDescription`. The parser fills both via `parseDatabaseAndTableName`; the
formatter emits `backQuoteIfNeed(db) + "." + backQuoteIfNeed(table)` when
qualified, else just the table. Both `parseFromString` call sites use the
fields directly.

Per-part `export_ttl` map is keyed by `TTLDescription::getExportKey`, which
uses `backQuoteIfNeed` for both parts so qualified `(a, b)` and unqualified
single-table `a.b` produce distinct keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented May 21, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48097f2db9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/Storages/StorageReplicatedMergeTree.cpp Outdated
Comment thread src/Storages/TTLDescription.cpp
mkmkme and others added 3 commits May 21, 2026 13:10
`exportPartitionToTable` rejected a `mark_as_ttl=1` submission when the new
`partition_id` was lex-less than the existing ttl marker's `partition_id`.
With variable-width partition IDs (e.g. `PARTITION BY year` yielding "9", "10"),
the scheduler's natural forward step picks "10" after completing "9", the
guard throws BAD_ARGUMENTS because `"10" < "9"` lex, the scheduler classifies
that as `Failure`, bumps backoff, and retries with the same partition —
stalling TTL exports for that destination.

Mirror the `pickPartition` fix: locate the EXPORT TTL targeting this
destination, compute the expiration max for both the new partition and the
existing marker's partition from the source's current parts, and throw only
when the new expiration max is strictly less than the old. Skip the guard
when no matching TTL is found (orphaned marker after the TTL was dropped) or
when either partition's parts are gone (DELETE TTL already cleaned up).
`export_merge_tree_partition_force_export` continues to bypass the whole
guard, including legitimate backward moves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`TTLExportScheduler` is only constructed and activated by
`StorageReplicatedMergeTree`; the base `MergeTreeData::exportPartitionToTable`
throws `NOT_IMPLEMENTED`. Without a DDL-time guard, `CREATE TABLE ... ENGINE
= MergeTree ... TTL ... EXPORT TO TABLE` and the matching `ALTER` succeeded
but no background exporter ever submitted manifests — silent no-op.

Throw `NOT_IMPLEMENTED` at parse time:
- CREATE: in `registerStorageMergeTree.cpp` right after `getTTLForTableFromAST`,
  using the local `replicated` bool.
- ALTER: in `MergeTreeData::checkAlterIsPossible`, using virtual
  `supportsReplication`. The check can't live in `checkTTLExpressions`
  because that method is also called from `MergeTreeData`'s constructor,
  where virtual dispatch returns the base-class value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the full `std::map<String, DataPartsVector>` build + two lookups
with a single pass that fills two `DataPartsVector`s for the new and the
existing-marker partitions directly. `getPartitionExportTTLMax` returns
`nullopt` for an empty vector, so the `find`/`end` guards collapse into the
function call. No semantic change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented May 21, 2026

@codex review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.3 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant