[WIP] Antalya-26.3 Added support for TTL EXPORT#1810
Draft
mkmkme wants to merge 11 commits into
Draft
Conversation
Collaborator
Author
|
With the new commit 04206 (syntax check) is now passing |
0f90fd6 to
1d0c1a7
Compare
Tests describe the contract for the upcoming `TTL ... EXPORT TO db.table` action. They are added before the C++ implementation so they double as the acceptance criteria. Stateless (tests/queries/0_stateless): - 04206_ttl_export_partition_syntax: parser/metadata round-trip and rejection of (a) two EXPORT TTLs to the same destination and (b) EXPORT TTL on a table without a partition key. - 04207_ttl_export_partition_basic: happy path, plus an in-line assertion that a future-dated partition is not exported. - 04208_ttl_export_partition_skip_already_exported: re-triggering after a partition has been exported does not duplicate it. Integration (tests/integration/test_ttl_export_partition): - test_basic_to_iceberg, test_only_one_replica_submits, test_failure_and_backoff, test_serial_across_partitions, test_replica_restart_mid_export, test_modify_ttl_picks_up_with_materialize, test_disabled_replica, test_dedup_via_high_water_mark. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce parser, AST, and metadata plumbing for `TTL ... EXPORT TO db.table`. No background scheduler or per-part TTL info yet — those land in follow-up commits. The clause is recognised, round-trips through `SHOW CREATE TABLE`, and the resulting `TTLDescription` is collected into `TTLTableDescription`'s new `export_ttl` list (exposed via `StorageInMemoryMetadata::getExportTTLs`). Validation in `TTLTableDescription::getTTLForTableFromAST`: * reject two `EXPORT` clauses to the same destination, * reject `EXPORT` TTL on a table with no partition key. The destination-specific override of `TTLDescription::result_column` (`"_export_" + db + "." + table`) is required so that future per-part TTL info (keyed by `result_column`) keeps separate clocks per destination. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stores per-part TTL info under `MergeTreeDataPartTTLInfos::export_ttl`,
keyed by `TTLDescription::result_column`. The map is:
* populated at write time in `MergeTreeDataWriter` for every TTL
returned by `getExportTTLs`,
* recomputed during `MATERIALIZE TTL` and merge-time TTL recompute via
`TTLCalcTransform` / `TTLTransform` (a new `TTLUpdateField::EXPORT_TTL`
finalizes into the right map),
* serialized in JSON under the `"export"` key (mirroring the
`recompression` entry),
* propagated across merges through the existing `update` aggregation,
* surfaced through `hasAnyNonFinishedTTLs` and `checkAllTTLCalculated`
so old parts that predate the TTL are flagged for `MATERIALIZE TTL`.
Adds the partition-wide helper `getPartitionExportTTLMax`: returns the
max `export_ttl.max` across all parts of a partition, or `nullopt` if
any part is missing the entry (with optional `missing_parts_out` for
the scheduler to log). Deliberate: no on-the-fly evaluation — the user
runs `ALTER TABLE ... MATERIALIZE TTL` to backfill, same UX as moves
and recompression TTLs.
Also pulls `export_ttl` into `hasAnyTableTTL` / `hasOnlyRowsTTL` and the
`getColumnDependencies` TTL-column-set walk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Align the TTL syntax with `ALTER ... EXPORT PARTITION TO TABLE`: the keyword is now `EXPORT TO TABLE <db.table>` instead of `EXPORT TO <db.table>`. Parser, formatter, exception messages, and tests updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an `ExportOrigin` enum (`alter` | `ttl`) to the manifest body so
manifests submitted manually (`ALTER ... EXPORT PARTITION`) can be told
apart from manifests submitted by the upcoming TTL scheduler. Surfaced
as `system.replicated_partition_exports.export_origin
Enum8('alter' = 0, 'ttl' = 1)`. Existing manifests in ZooKeeper that
don't carry the field read back as `alter` for backwards compatibility.
`ttl`-origin manifests are skipped by manifest-TTL eviction: the
background cleanup in `ExportPartitionManifestUpdatingTask` and the
overwrite path in `StorageReplicatedMergeTree::exportPartitionToTable`
both refuse to consider them expired. The existing
`export_merge_tree_partition_force_export` setting still overrides via
the unchanged gate.
The write site keeps the default `ExportOrigin::alter`; the TTL
submitter writes `ExportOrigin::ttl` in a follow-up commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new query-level setting `export_merge_tree_partition_mark_as_ttl` (default false). When set on `ALTER ... EXPORT PARTITION`, the resulting manifest is written with `export_origin = ttl` (same as what the TTL scheduler will write in a follow-up). The TTL scheduler always sets this implicitly when it submits. Enforces the "at most one ttl-origin manifest per (src, dest)" invariant at submission time: when a ttl-origin manifest is being created, scan siblings under `<zk_root>/exports/` for an existing ttl-origin marker at a different `partition_id`. If found at `P_old`, reject the submission as a back-fill (`new < P_old`) unless `export_merge_tree_partition_force_export` is set; otherwise best-effort `tryRemoveRecursive` of the old marker before creating the new one. Same-key collisions continue to be handled by the existing block. A plain `alter` over a ttl marker at a different partition is allowed without friction — alter manifests coexist with the ttl marker, and the TTL scheduler will filter by `export_origin = ttl` when reading its own state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts the readable-vs-insertable column diff and the partition-key AST compare from `StorageReplicatedMergeTree::exportPartitionToTable` into `ExportPartitionUtils::verifyExportDestinationCompatibility`, and calls it from `TTLTableDescription::getTTLForTableFromAST` for every `TTLMode::EXPORT` clause when not attaching. The destination is resolved through `DatabaseCatalog::getTable`, matching the manual `ALTER ... EXPORT PARTITION` flow (throws `UNKNOWN_TABLE` if missing). The check is skipped under `is_attach=true` because the destination table may not yet be loaded at server startup; submission-time validation in `exportPartitionToTable` still covers that path. Iceberg destinations skip the partition-key AST compare here; the existing `verifyIcebergPartitionCompatibility` runs against the runtime iceberg metadata at submission time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces `TTLExportScheduler`, a per-`StorageReplicatedMergeTree` background driver that submits partition exports for tables with `TTL ... EXPORT TO TABLE db.table`. The scheduler is stateless across restarts: it reads the latest `export_origin = ttl` manifest from ZooKeeper on every tick and acts on its status — no manifest → submit the smallest eligible partition; PENDING → wait; COMPLETED → walk forward to `partition_id > completed`; FAILED → resubmit with `force_export=1` after per-partition exponential backoff; KILLED → idle with a `LOG_WARNING` carrying the recovery recipe. `submit` classifies outcomes as `Submitted | Transient | Failure` so ZK CAS races and `UNKNOWN_TABLE` (destination dropped post-DDL) do not bump backoff, while genuine submission errors do. Adds three table-level settings used by the scheduler: `export_merge_tree_partition_ttl_poll_interval_seconds` (default 5), `export_merge_tree_partition_ttl_min_backoff_seconds` (default 1), `export_merge_tree_partition_ttl_max_backoff_seconds` (default 60). The scheduler is not yet wired into the background task pool; that follows in a separate commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Declare the scheduler and its background task next to the other `export_merge_tree_partition_*` task holders. Under the `allow_experimental_export_merge_tree_partition` server gate: - Construct the scheduler and create the `TTLExport` task, logging any exceptions from `run` via `tryLogCurrentException`. - `ReplicatedMergeTreeRestartingThread::tryStartup` activates the task alongside the other export tasks; `partialShutdown` deactivates it. - `alter` calls `ttl_export_task->schedule` when any `MODIFY TTL` command is in the alter so newly added EXPORT TTLs take effect immediately. - `TTLExportScheduler::run` reschedules itself with `export_merge_tree_partition_ttl_poll_interval_seconds * jitter25` on the polling path. Early returns (shutdown, readonly, no EXPORT TTL, experimental gate off) intentionally skip the reschedule — deactivation, the `alter` hook, and server-level config drive those paths instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1d0c1a7 to
f7d8a27
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1793
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
TBD
Documentation entry for user-facing changes
TBD
CI/CD Options
Exclude tests:
Regression jobs to run: