Skip to content

Releases: drt-hub/drt

v0.7.8 — Mixpanel destination + ClickHouse identifier fix + empty-batch contracts complete

04 Jun 23:47
v0.7.8
1771f75

Choose a tag to compare

Theme: Community-driven follow-up + ClickHouse identifier fix completion. Two contributor PRs accumulated on main since v0.7.7: a new Mixpanel destination (#608 by @Pawansingh3889, closes #417) and a ClickHouse _quote_ident identifier fix (#610 by @yodakanohoshi, closes #512 — closes the ClickHouse leg of the qualified-identifier fix family alongside Postgres #498 / MySQL #514). The ClickHouse fix is the urgency lever: v0.7.7 users running ClickHouse against a database-qualified table (db.tbl) hit a server-side Code: 62 syntax error from get_row_count's malformed identifier — this patch resolves that path plus the other raw-interpolation SQL paths. Also completes the empty-batch contract suite (#604 / #605 / #606 across HTTP API / special-transport / StagedDestination Protocol shapes — 25 of 25 registered destinations), which surfaced a real bug in staged_upload.finalize() also fixed here in #606. Ships user-facing sync.mode: mirror documentation (#607) — connector docs section + runnable cp -r example + skill option — plus the post-#608 Mixpanel wiring (#609) and i18n marker bump (#603). No breaking changes — drop-in upgrade from v0.7.7.

Breaking Changes

None. Drop-in upgrade from v0.7.7.

Added

  • Mixpanel destination (#417, PR #608): new HTTP destination for Mixpanel's two primary product-analytics APIs — /engage (endpoint: people_set, sets user-profile properties with the project token carried per-record) and /import (endpoint: import_events, ingests events with HTTP Basic auth via a Mixpanel service account + the numeric project_id). Both batch up to 2000 records per request (Mixpanel's hard limit, enforced via a batch_size field_validator that clamps at 2000), support EU data residency (region: eu routes to api-eu.mixpanel.com), and produce a deterministic $insert_id for the import path — derived from SHA256(canonical_json(record) + ":" + event_name)[:32] when insert_id_field is unset, so re-running the same sync does not double-count events server-side. Profile-set payload shape: {"$token": ..., "$distinct_id": ..., "$set": {<row props minus reserved>}}; event payload shape: {"event": ..., "properties": {"distinct_id": ..., "time": ..., "$insert_id": ..., <other row props>}}. Row-level errors surface in SyncResult.row_errors with the HTTP status code for httpx.HTTPStatusError paths and the response body (truncated to 500 chars) so debugging a bad payload doesn't require re-running with verbose logging. Two pydantic model_validators on MixpanelDestinationConfig cross-check endpoint against the required auth fields and (for import_events) require either a constant event_name or a per-row event_name_field. Optional properties_template (Jinja2 → JSON object) merges into the props dict after the row's plain columns. Composable with all existing infrastructure: RateLimiter for request pacing, with_retry/resolve_retry for transient-error handling, RowError/SyncResult shape consistent with the other 24 registered destinations. New docs/connectors/mixpanel.md covers both endpoints, the EU note, the deterministic insert_id behaviour, and links to the Mixpanel reference docs; new tests/unit/test_mixpanel_destination.py ships 13 tests across TestConfigValidation / TestPeopleSet / TestImportEvents / TestBatchingAndErrors. Empty-batch contract for Mixpanel landed in follow-up #609. Contributed by @Pawansingh3889.

Changed (Internal)

  • Mixpanel empty-batch contract + drt-create-sync skill destinations list refresh (follow-up to #608 Mixpanel destination): now that Mixpanel is registered, adds it to the API empty-batch contract framework as a one-line pytest.param(...) entry in tests/contracts/test_destination_api_empty_batch.py — Mixpanel's if not records: return SyncResult() short-circuit means it passes all three contracts (Protocol satisfaction / empty SyncResult / no httpx.Client.send calls) on first run, contract framework total now 36 API tests (12 destinations × 3). Also refreshes the destinations enumeration in the drt-create-sync skill (both .claude/commands/drt-create-sync.md and skills/drt/skills/drt-create-sync/SKILL.md) — the list had drifted significantly from the actual registry. Adds Mixpanel, Amplitude, Notion, Twilio, Intercom, Zendesk, Google Ads, Email SMTP, Snowflake, Salesforce Bulk all of which were already registered destinations but missing from the skill's prompt-to-user list. Out of scope for this PR: GitHub Topics is at the 20-max limit (would need to swap an existing topic to add mixpanel) — flagged as a separate decision rather than auto-swapped, since topics are externally visible shared state.

Documentation

  • sync.mode: mirror user-facing docs (follow-up to v0.7.7's #596#599 mirror landings): adds a ## Mirror mode (differential delete, #340 — v0.7.7+) section to docs/connectors/postgres.md covering the YAML shape (sync.mode: mirror + required upsert_key), the cost-shape comparison vs upsert / replace (new table), and all four safety guards (empty-source short-circuit / failed-rows exclusion / upsert_key required ValueError / composite key support). Calls out the memory-bound caveat and the MySQL / ClickHouse / Snowflake sibling implementations in the same section. New examples/postgres_to_postgres_mirror/ directory ships a complete runnable example (HR warehouse → operational Postgres mirror, full README + drt_project.yml + profiles.yml.example + syncs/mirror_active_employees.yml) that downstream users can cp -r as a starting point — uses on_error: fail and explains why (silently skipping a row in mirror mode would cause the destination counterpart to be DELETEd as "not observed"). The drt-create-sync skill (both .claude/commands/drt-create-sync.md and skills/drt/skills/drt-create-sync/SKILL.md) is updated to list mirror as a valid sync.mode option alongside the existing full / incremental / upsert / replace choices, with the upsert_key required and v0.7.7+ availability note inline. README.md / README.ja.md unchanged — mirror is already prominent in the v0.7.7 roadmap row, and a dedicated body section is content-design work outside this PR's scope.

Fixed

  • ClickHouse get_row_count rendered a malformed identifier for schema-qualified tables + raw table interpolation across SQL command paths (#512, ClickHouse counterpart to the Postgres #498 / MySQL #514 _quote_ident fixes): ClickHouseDestination.get_row_count built its quoted identifier with `".`".join(config.table.split(".")), so db.scores became `db.`scores` (3 backticks — a syntax error on the server, confirmed Code: 62 against ClickHouse 24.8) rather than the intended `db`.`scores`. There was no test for get_row_count with a qualified table, which is why it never surfaced. The _quote_ident helper (added for sync.mode: mirror in #598) already rendered this correctly, so the fix routes get_row_count through it and applies it to the remaining raw-interpolation command paths that were fragile for reserved words / mixed case / db.table names: TRUNCATE TABLE (replace mode), DROP/CREATE TABLE ... AS (swap shadow setup + on-error cleanup), EXCHANGE TABLES / DROP TABLE (finalize_sync swap), and client.insert(table, ...) on both the non-swap and swap paths — clickhouse-connect's client.insert() interpolates the table argument raw into INSERT INTO {table} ... FORMAT Native (see clickhouse_connect/driver/insert.py), so the destination pre-quotes it to match the established Postgres / MySQL precedent of quoting insert/upsert paths. Verified end-to-end against a real ClickHouse with a database-qualified table across the replace / swap / mirror / row-count paths.
  • staged_upload finalize() ran the full upload/trigger/poll lifecycle on empty input (PR for #340-adjacent Step 2e contract): a transient empty source — where every stage() call received [] — caused StagedUploadDestination.finalize() to serialize a 0-byte file, then POST it to the configured stage.url, then POST again to the configured trigger.url, then (if configured) poll. The result was a wasted upload + trigger + a zero-row job whose lifecycle still allocated quotas on the third-party API. SalesforceBulkDestination.finalize() already had the right guard (if not self._records: return SyncResult(rows_extracted=0) at the top); StagedUploadDestination.finalize() was missing it. Added the same short-circuit immediately after record_count = len(self._records) so no auth / upload / trigger / poll work runs when nothing was staged. Surfaced by the new contract test in this PR — the contract assertion against httpx.Client.send caught a single POST to https://upload.example.com/files during finalize() on empty input.

Changed (Internal)

  • Destination contract tests — Step 2e: StagedDestination Protocol (staged_upload + salesforce_bulk) (final follow-up to Step 2c PR [#60...
Read more

v0.7.7 — sync.mode: mirror across SQL destinations

31 May 23:46
v0.7.7
47e4c75

Choose a tag to compare

Theme: sync.mode: mirror across the SQL destination set + cli/main split completion. The first user-facing addition since v0.7.6 is the sync.mode: mirror differential-delete sync mode (#340), shipping in four landings across Postgres (Step 1), MySQL (Step 2), ClickHouse (Step 3), and Snowflake (Step 4) — all four SQL destinations now upsert source rows and then DELETE destination rows whose upsert_key was not observed in the source, without the TRUNCATE/re-insert overhead of replace mode. BigQuery follows when the contributor PR #584 lands. Also lands the cli/main.py split completion — Phase 2b PR (a) + PR (b) + final tighten finish the 1706 → 164 LOC split (-90%) begun in v0.7.5 — plus a FakeSource + destination contract test framework, a CI check-changelog-required guard, a GCS storage import mypy fix, and CI install line extension that unlocked ~102 silently-skipped SQL destination tests (raised total coverage 82.68 → 85.29). No breaking changes — drop-in upgrade from v0.7.6.

Breaking Changes

None. Drop-in upgrade from v0.7.6.

Added

  • sync.mode: mirror — Step 4: Snowflake support (#340 Step 4, follow-up to #596 + #597 + #598): Snowflake destination now supports sync.mode: mirror — the final SQL destination in the set. Same application-side diff semantics as the prior steps — accumulate upsert_key tuples seen across all batches during load(), then issue a single DELETE FROM <db>.<schema>.<table> WHERE key NOT IN (collected) from finalize_sync(). Snowflake's connector uses %s placeholders (same family as psycopg2 / pymysql) and does not auto-expand a tuple-of-tuples — so the placeholder shape is built explicitly, identical to MySQL Step 2: single-column form WHERE col NOT IN (%s, %s, ...) with a flat values list, composite form WHERE (c1, c2) NOT IN ((%s, %s), (%s, %s), ...) with values flattened row-major. Snowflake-specific wrinkle: the destination has its own pre-existing config.mode: "insert" | "merge" field (orthogonal to sync_options.mode) that controls whether the write path is plain INSERT or staging-table-plus-MERGE. Since mirror semantics intrinsically require upsert, sync.mode: mirror now forces the MERGE write path regardless of config.mode — users only need to set destination.upsert_key and sync.mode: mirror (no need to also flip destination.mode to merge). The same ValueError fail-fast applies before any INSERT touches Snowflake when upsert_key is absent. Also adds the first-ever finalize_sync method on Snowflake (the existing destination had no swap-replace finalize path), returning None for any non-mirror mode so the engine's existing dispatch is unchanged. New tests/unit/test_snowflake_mirror_mode.py ships 12 tests covering key accumulation, the merge-path forcing (verifies CREATE TEMP TABLE + MERGE INTO ran even with config.mode: insert), dedupe across overlapping batches, single + composite key DELETE structure (against ANALYTICS.PUBLIC.USER_SCORES to verify fully-qualified table emission), the empty-source safety path, state reset, the missing-upsert_key ValueError, row-error skip path, and the non-mirror finalize short-circuit. Tests use sys.modules injection (matching the existing test_snowflake_destination.py pattern) so no snowflake-connector-python install is required to run them. Closes #340 for the SQL destination set; the temp-table strategy for high-cardinality tables remains as a future follow-up.
  • sync.mode: mirror — Step 3: ClickHouse support (#340 Step 3, follow-up to #596 + #597): ClickHouse destination now supports sync.mode: mirror with the same application-side diff semantics as Postgres / MySQL — accumulate upsert_key tuples seen across all batches in load(), then issue a single mutation from finalize_sync() that removes destination rows whose key was not observed. ClickHouse uses ALTER TABLE ... DELETE WHERE key NOT IN (collected) with mutations_sync=1 so the call blocks until the mutation completes. Unlike Postgres / MySQL where the placeholder shape had to be assembled manually, clickhouse_connect's native {name:Type} parameter substitution accepts Array(String) (single column) and Array(Tuple(String, ...)) (composite) directly — so the call site is one parameter dict, not a flat placeholder list. Both column references and parameter values are coerced via toString() so the comparison works regardless of source column type (cost: the comparison can't use a column index on upsert_key — mirror mode is intended for small/medium reference tables, not high-volume fact tables; the temp-table strategy is a planned follow-up for high-cardinality cases). The mutation rewrites affected parts, which is expensive in ClickHouse — the new docstring notes this explicitly so misuse is hard. ClickHouseDestinationConfig.upsert_key is list[str] | None (it's informational only for the existing INSERT path, where dedup is handled by ReplacingMergeTree at merge time), so the runtime guard in load() raises ValueError early when mirror mode is requested without a populated key — fail-fast before any INSERT touches the table. Backtick-quoting for database-qualified table identifiers (db.table`db`.`table`) added via a new _quote_ident helper, matching the v0.7.4-hardened MySQL pattern. New tests/unit/test_clickhouse_mirror_mode.py ships 12 tests covering key accumulation, dedupe across overlapping batches, database-qualified DELETE shape, single + composite key DELETE structure, the empty-source safety path, state reset, the missing-upsert_key ValueError, row-error skip path, and coexistence with the existing EXCHANGE TABLES swap-finalize path. Snowflake follows in the next PR.
  • sync.mode: mirror — Step 2: MySQL support (#340 Step 2, follow-up to #596): MySQL destination now supports sync.mode: mirror with the same application-side diff semantics as Postgres — accumulate upsert_key tuples seen across all batches in load(), then issue a single DELETE WHERE key NOT IN (collected) from finalize_sync(). Same safety paths as Step 1: empty-source short-circuit (_mirror_keys never populated → no DELETE issued, so a transient empty source can't wipe the table), state reset after finalize_sync runs, ignores rows that failed during upsert (only successfully-loaded keys count as "source state"), and a ValueError at load time when upsert_key is empty. Single-column and composite upsert_key both supported. Because pymysql does not auto-expand a tuple-of-tuples into a NOT IN %s parameter the way psycopg2 does, the DELETE is built with explicit %s placeholders — single-column form WHERE \c` NOT IN (%s, %s, ...)with a flat values list, composite formWHERE (`c1`, `c2`) NOT IN ((%s, %s), (%s, %s), ...)with the values flattened in row-major order. Backtick-quoting + schema-qualified table handling reuses the existing_quote_identhelper that v0.7.4 hardened. Newtests/unit/test_mysql_mirror_mode.py ships 10 tests covering key accumulation, dedupe across overlapping batches, schema-qualified DELETE shape, single + composite key DELETE structure, the empty-source safety path, state reset, the missing-upsert_key` ValueError, and coexistence with the existing swap-finalize path. ClickHouse and Snowflake follow in subsequent PRs.
  • sync.mode: mirror — differential delete for Postgres + CI test coverage (#340, PR #596): new sync mode that upserts every source row (same as full) and then DELETEs destination rows whose upsert_key tuple was not observed in the source — without the TRUNCATE / re-insert overhead of replace mode. Strategy: application-side diff (collect upsert_key tuples in memory during load(), then issue a single DELETE WHERE key NOT IN (collected) from finalize_sync()). Memory-bound to the source key cardinality; a temp-table strategy is a planned follow-up for tables larger than a few million rows. Safety: when the source produces no batches with records, _mirror_keys stays empty and finalize_sync skips the DELETE — a transient empty source won't wipe the destination. Single-column and composite upsert_key both supported (composite uses (c1, c2) NOT IN ((v1a, v2a), …) shape). Postgres only in Step 1; MySQL / ClickHouse / Snowflake follow in subsequent PRs. 10 unit tests cover key accumulation, dedupe across overlapping batches, single + composite key DELETE shape, the empty-source safety path, state reset after finalize, and coexistence with the existing swap-finalize path. Also extends CI's install line to include [postgres,mysql,clickhouse] so the new mirror tests + the existing 43 postgres + 33 mysql + 26 clickhouse tests actually run in CI rather than being silently skipped by their pytest.importorskip guards — the pre-existing coverage gap that this PR's codecov delta surfaced.
  • Destination contract tests — Step 2b: SQL destinations + empty-batch invariant (PR #595, follow-up to #594): closes out the empty-batch contract suite with the four SQL destinations (postgres / mysql / clickhouse / snowflake). Two parametrised contracts × 4 destinations = 8 tests. Notable: no driver mocking required — CI's minimal install ([dev,mcp,duckdb]) deliberately excludes the SQL extras, and each d...
Read more

v0.7.6

28 May 10:53
v0.7.6
ea9cbe8

Choose a tag to compare

What's New

Theme: Small follow-up. Two additive features accumulated since v0.7.5 — a new Amplitude destination (#574, Identify API + HTTP V2 events API) and a new tojson_safe Jinja2 filter (#580) that unblocks datetime / Decimal / UUID columns flowing through REST API body_template rendering — plus a CLI --log-format typer-compatibility fix (#578), a follow-up retrofit of ErrorFormatter stage detection to an engine-emitted attribute (#571, supersedes the traceback-walk heuristic from #544), and Phase 2a of the cli/main.py split (#572, continues #565's Phase 1). No breaking changes — drop-in upgrade from v0.7.5.

Breaking Changes

None. Drop-in upgrade from v0.7.5.

Added

  • Amplitude destination (#574): Sync DWH rows to Amplitude Identify API (user properties) or HTTP V2 API (events). No extra dependencies. 18 unit tests.
  • tojson_safe Jinja2 filter (#580, PR #581): drop-in replacement for tojson in body_template rendering that tolerates datetime / date / time (encoded as ISO 8601), Decimal and UUID (encoded as string). Registered on both drt.templates.renderer and drt.destinations.staged_upload's local Jinja environments. The default tojson filter is unchanged — opt-in only, no behavioural change for existing templates. Unblocks BigQuery TIMESTAMP / Postgres numeric / uuid columns flowing into REST API destinations without CAST(... AS STRING) workarounds in model SQL. Docs: docs/connectors/rest-api.md.

Fixed

  • drt run --log-format typer 0.26.1 compatibility (#577, PR #578): the option was declared with str + click_type=click.Choice(...), a shape typer 0.26.1's revised stubs no longer accept (mypy fails on CI's resolved typer version while passing on the locally-pinned 0.24.1). Replaced with LogFormat(str, Enum) — the canonical typer pattern, version-agnostic, and removes the need for a # type: ignore. No user-visible behaviour change; CLI accepts the same text / json values as before.

Changed (Internal)

  • ErrorFormatter stage detection retrofitted to engine-emitted attr (PR #571, follow-up to #544): replaces the traceback-walk heuristic in drt.cli.errors.infer_stage with a _drt_stage string attribute set by engine/sync.py at the point of failure. Removes two failure modes of the heuristic (re-raise wrapping loses attribution; engine-vs-source ambiguity when source iterators fail mid-iteration). User-visible error_stage semantics unchanged — same source / destination / engine / state values, just sourced from the engine directly instead of inferred.
  • cli/main.py split Phase 2a (PR #572, continues #565's Phase 1): extracts drt sources / drt destinations / drt clean --orphans / drt serve into drt/cli/commands/{connectors,clean,serve}.py, with shared internals (resolve_profile_name, get_source, get_destination, get_watermark_storage) moved to drt/cli/_helpers.py. No CLI behaviour change; drt/cli/main.py shrinks accordingly. (Phase 2b extracts drt run itself — tracked in PR #579.)

Full Changelog: v0.7.5...v0.7.6
PyPI: https://pypi.org/project/drt-core/0.7.6/

v0.7.5 — Production Ready follow-up #3 + Tech Foundation Hardening

25 May 02:50
fddd7da

Choose a tag to compare

Theme: Production Ready follow-up #3 + Tech Foundation Hardening. Two streams shipping together: (1) the accumulated work since v0.7.4 — REST API polish, sync catalog (#499 P1+P2), MCP test tool, OTel Phase 1 config, hardcoded secret detection, lookup ambiguity warning, orphan shadow cleanup, drt init "Next steps:" block — and (2) the Tech Foundation Hardening epic (#538, 11 child issues, all closed) which locks in the foundations before v0.8 Cloud Destinations: CI reach (nightly + publish gate + CodeQL + pip-audit + SBOM), functional E2E coverage for the reverse-ETL paths (DuckDB harness + boundary cases), CLI/UX polish (ErrorFormatter, --detailed, --template), and load-bearing refactors (SyncObserver engine seam, destinations serializer + config base class consolidation, cli/main split Phase 1). No breaking changes — drop-in upgrade from v0.7.2 (v0.7.3 / v0.7.4 were patch-only cherry-picks).

Breaking Changes

None. Drop-in upgrade from v0.7.2 / v0.7.3 / v0.7.4.

Tech Foundation Hardening epic (#538)

  • CI: nightly schedule + publish-time lint/test gate (#539, PR #553): .github/workflows/ci.yml gains a weekly schedule: trigger (Mon 00:00 UTC) so env-drift / flaky regressions surface without waiting for the next PR. Both publish-drt-core.yml and publish-dagster-drt.yml gain a verify job (ruff + mypy + pytest on Python 3.12) that runs before build/upload — a tag pushed off a stale local can no longer ship without verification.
  • CI: supply-chain scans (CodeQL + pip-audit + SBOM) (#540, PR #557): new .github/workflows/codeql.yml (Python security-extended query pack, PR + weekly Mon 03:00 UTC schedule offset from the ci.yml nightly). pip-audit==2.10.0 step added to ci.yml (gated to Python 3.12 to avoid 4x duplicated work, OSV vulnerability service, --strict fails on warnings). CycloneDX SBOM (cyclonedx-bom==7.3.0) generated by both publish workflows and attached to the GitHub Release as drt-core-sbom.cdx.json / dagster-drt-sbom.cdx.json. SECURITY.md documents the scanning posture and allow-list policy.
  • Tests: DuckDB Source + reverse-ETL E2E harness (#541, PR #552): new tests/unit/test_duckdb.py (9 tests covering happy path, NULL, BIGINT/DOUBLE extremes, DATE/TIMESTAMP round-trip, connection success/failure) + new tests/integration/test_duckdb_e2e.py (4 tests driving real DuckDB → engine → pytest-httpserver REST destination end-to-end, no mocks). Establishes the harness pattern for future Source E2E modules (Postgres/MySQL/Snowflake): copy the duckdb_with_users fixture + assertion shape, swap the source. Documented in tests/integration/README.md. CI install line now includes the duckdb extra so the tests actually run.
  • Tests: boundary cases — idempotency, large batch, schema evolution, type conversion (#542, PR #555): 13 boundary tests across 4 files locking in the production-claim contracts. test_idempotency.py (3) — StateManager persistence + re-run semantics (drt does NOT dedupe at the engine layer; destinations own upsert). test_large_batch.py (3) — batch_size honored, no row drop/dupe, tracemalloc smoke ceiling guards against O(N) buffering. test_schema_evolution.py (3) — ALTER TABLE between runs surfaces in payload, locking the "no schema enforcement" stance. test_type_conversion.py (4) — NULL / BIGINT extremes / DOUBLE pass through cleanly; DATE / TIMESTAMP currently fail at httpx JSON serialization (locked for #317 to flip).
  • UX: drt sources --detailed / drt destinations --detailed + --format json (#543, PR #563): new drt/cli/_connector_detail.py introspects each connector's Pydantic / dataclass config to surface required env vars, optional env vars, required fields, and a copy-pasteable 3–8 line sample YAML — no hand-maintained tables, derived from the source of truth. --format json (machine-readable, plain print() so Rich line wrap doesn't corrupt the document) exposes the same data with a stable config_class field for advanced consumers. Registry-parity tests guard against new connectors landing without metadata. Establishes the extension point for v0.9 plugin authors (#297).
  • UX: ErrorFormatter for drt run failures (#544, PR #558): new drt/cli/errors.py wraps raw exceptions with stage (source / destination / engine / state, inferred today via traceback walk; will swap to engine-emitted tag once #527 OTel + #548 SyncObserver land), error type + message, and a suggested next step from a conservative keyword-rule table (connection / auth / rate-limit / state corruption). Rich panel rendering for terminal users; new error_type / error_stage / error_suggestion fields on --format json entries (preserving error string for back-compat). The log_json structured ERROR sync_complete line also gains the stage / type fields for log-aggregation grouping.
  • UX: drt init --template <connector> quickstart scaffolds + README rewrite (#545, PR #564): three curated static templates ship under drt/cli/templates/syncs/duckdb_to_rest (no accounts needed), postgres_to_slack (operational alerts), duckdb_to_hubspot (contacts upsert). drt init --template list enumerates them; drt init --template <name> writes the sync file plus a minimal drt_project.yml + .drt/.gitignore shell, never overwriting an existing project, then prints per-template next-steps text (env vars to set, sample tables to seed). README quickstart collapsed from 5 steps to 3 commands using --template; the interactive wizard demoted to a "Customizing your sync" subsection. docs/connectors/ discovery hint added above the Connectors table. README.ja.md mirrored.
  • Refactor: SyncObserver protocol — tighten engine I/O boundary (#548, PR #559): new drt/engine/observer.py introduces a fire-and-forget event surface (on_sync_started / on_watermark_resolved / on_warning / on_interrupted / on_sync_completed) with four concrete observers (NullObserver, LoggingObserver, StatePersistingObserver, CompositeObserver). engine/sync.py no longer imports logging or calls state_manager.save_sync(...) / watermark_storage.save(...) directly — all writes flow through the observer; the CLI composes the default LoggingObserver + StatePersistingObserver set. Restores the "no I/O beyond protocol calls" purity claim documented in CLAUDE.md, which is load-bearing for the v1.x Rust migration. Two boundary regression tests pin the contract by reading engine/sync.py as text and asserting the absence of direct calls. Engine API change: callers of run_sync that want state persistence must now pass observer=StatePersistingObserver(state_mgr, wm_storage). Internal-API only — CLI updated; affected tests updated in this PR.
  • Refactor: consolidate _serialize_value() across destinations/postgres + mysql (#547, PR #560): new drt/destinations/_serializer.py with one serialize_complex_value(value, column, json_columns, *, dict_encoder, list_encoder=None) function carrying the dialect-agnostic decision logic (validate dict/list against json_columns allowlist, error early on unlisted complex values). Postgres delegates with dict_encoder=_pg_dict_encoder (psycopg2 Json wrapper + json.dumps fallback) and list_encoder=None (pass-through to ARRAY adapter); MySQL delegates with both as json.dumps. ~50 LOC of duplication removed; 21 new matrix tests cover the shared module at 100% coverage. Future SQL destinations (Snowflake / Redshift / Databricks in v0.8) just supply their encoder — no third copy of the validation logic.
  • Refactor: extract BaseSqlDestinationConfig from config/models.py (#549, PR #562): nine connection fields (connection_string_env, host, host_env, port, user, user_env, password, password_env, lookups) lifted from Postgres / MySQL / ClickHouse destination configs into a shared base. Subclasses override port (Postgres 5432 / MySQL 3306 / ClickHouse 8123) and add their dialect-specific fields (dbname vs database, ssl vs secure, upsert_key, json_columns). _check_connection validator stays per-subclass because the database field name differs. Zero test changes — Pydantic v2 inheritance is transparent. v0.8 Cloud destinations (Redshift in particular) inherit for free.
  • Refactor: drt/cli/commands/ package — Phase 1 of cli/main.py split (#546, PR #565): new drt/cli/_app.py holds the shared typer.Typer instance; new drt/cli/commands/ package extracts six command modules — doctor, list_syncs, config (3 sub-commands), cloud (2 stub commands), docs (generate + serve), mcp (run). cli/main.py 1706 → 1470 LOC (-14%) while preserving every from drt.cli.main import ... import path that tests rely on (app, _run_one, _RunContext, _get_destination, etc.). Heavier commands (run / validate / status / test / init / serve / sources / destinations / clean) stay in main.py for a Phase 2 follow-up — the pattern this PR establishes makes that move mechanical.
  • Fix: escape Rich markup in print_error so [extra] hints aren't dropped (PR #566): discovered while writing coverage tests for the cli/main.py split. print_error("MCP server requires: pip install drt-core[mcp]") was rendering as "pip install drt-core" because Rich consumed [mcp] as a style tag. Fix in drt/cli/output.py:print_error calls rich.markup.escape on the message argument before interpolating into the styled template — guards every callsite at once (any future print_error(...) with bracketed content is now safe). Four parametrized tests cover drt-core[mcp], drt-core[postgres,duckdb], [INFO]-style markers, and plain-message regression.

Other shipped items (since v0.7.4)

  • REST API: lift extract_next_link into shared drt._http_utils (#530, item 1): removes the cross-domain import where drt/sources/rest_api.py reached into `RestApiDes...
Read more

v0.7.4 — MySQL Patch

23 May 14:27
v0.7.4
88d6c30

Choose a tag to compare

Theme: Patch release for MySQL schema-qualified identifier handling.

Cherry-pick of PR #514 onto the v0.7.3 release line — the MySQL counterpart to the Postgres Identifier() composition fix that shipped in v0.7.3. No new features, no breaking changes.

⚠️ Erratum on v0.7.3: Earlier drafts of the v0.7.3 CHANGELOG referenced this MySQL fix as if it had shipped in v0.7.3, but PR #514 actually landed on main two days after the v0.7.3 tag was cut. The wheel published to PyPI as drt-core==0.7.3 does not contain the MySQL fix. v0.7.4 is the release that actually delivers it. Users hitting #511 should upgrade to drt-core>=0.7.4.

Fixed

  • MySQL destination correctly quotes schema-qualified table names (#511, PR #514): mydb.scores now produces `mydb`.`scores` across the row-count, replace, insert, and upsert paths (previously the dotted name was treated as a single backtick-quoted identifier, so any MySQL destination configured with a schema-qualified table name failed at SQL execution). The _quote_ident helper is now applied consistently across all SQL-composition paths on the MySQL destination, matching the Postgres Identifier() composition fix that shipped in v0.7.3 (#442 / PR #498). Contributed by @Godzilaa.

Upgrade

pip install -U drt-core
# or
uv add drt-core

PyPI: https://pypi.org/project/drt-core/0.7.4/

Compared to v0.7.3

Drop-in upgrade. No config changes, no API changes, no schema changes. v0.8 work continues in parallel on main.

Full Changelog: v0.7.3...v0.7.4

v0.7.3

17 May 12:36
v0.7.3
90e6da6

Choose a tag to compare

drt-core v0.7.3

Patch release for Postgres schema-qualified identifier handling.

Cherry-pick of PR #485 + PR #498 onto the v0.7.2 release line — so users on drt-core~=0.7.2 get the fix without the v0.8.0 feature set. No new features, no breaking changes.

Install

pip install --upgrade drt-core

Fixed

  • Postgres: qualified schema.table identifiers now safely composed (#442, PR #498): the row-count, replace, swap, finalize, insert, and upsert SQL paths previously passed f"{schema}.{table}" style strings through psycopg2.sql.Identifier(), which double-quoted the entire dotted name into a single identifier ("marketing.email_events") — so any Postgres destination configured with a schema-qualified table name failed at SQL execution. The fix splits qualified names into separate schema and relation Identifier() components, while keeping swap shadow/old suffixes attached to the relation name only (so marketing.email_events becomes marketing."email_events__drt_swap", not a single quoted identifier). Contributed by @Photon101.
  • Postgres: swap path fully migrated to psycopg2.sql composition (PR #485, fixes #483): prerequisite for the qualified-identifier fix above. The swap-path SQL (DROP TABLE IF EXISTS, CREATE TABLE LIKE, INSERT, ALTER TABLE RENAME, cleanup DROP) was still using f-string formatting; this commit migrates it to psycopg2.sql.SQL + Identifier for safe composition. Originally queued for v0.7.2 but landed post-release; bundled into this patch because PR #498 depends on it. Contributed by @Photon101.

Who should upgrade

If you use a Postgres destination with schema-qualified table names (public.users, marketing.events, etc.) — upgrade now. v0.7.2 silently failed at SQL execution for these configs.

Versioning

Strict patch per VERSIONING.md. The v0.8.0 features queued on main ([Unreleased] in CHANGELOG) are not included here — they ship in v0.8.0 along with the Cloud Destinations theme.

Full Changelog: v0.7.2...v0.7.3

v0.7.2 — Production Ready follow-up #2

11 May 23:48
v0.7.2
c55dc5c

Choose a tag to compare

Theme: Production Ready follow-up #2. Closing out the v0.7 cycle items that didn't make v0.7.1 — opt-in anonymous telemetry, deprecation warnings in drt validate, and Postgres psycopg2.sql SQL composition hardening.

✨ Added

  • Opt-in anonymous usage telemetry (#263, PR #446) — A new drt/telemetry.py module sends one sync_completed event per drt run when the user explicitly opts in. Off by default. Honors DO_NOT_TRACK=1. Allow-list properties (drt_version, python_version, os, source_type, destination_type, sync_mode, rows_synced, duration_seconds, status) — sync names, model SQL, destination URLs, credentials, and project paths are never transmitted. The wire envelope additionally carries event, distinct_id, timestamp, and api_key. Configure with drt config set telemetry.enabled true or DRT_TELEMETRY=1. Endpoint defaults to PostHog Cloud EU; override with DRT_TELEMETRY_ENDPOINT / DRT_TELEMETRY_API_KEY for self-hosted PostHog. Privacy posture and GDPR disclosure live in docs/telemetry.md. Contributed by @kiwamizamurai.

  • Deprecation warnings in drt validate (#467, PR #478) — validate now reads drt/deprecations.py and surfaces ⚠️ warnings for any deprecated config keys it finds in your sync YAMLs, in text output and as a per-sync deprecations array under --output json. Exit code stays 0 (warnings are non-blocking). The registry currently has no active entries — it's the infrastructure for the next real deprecation per VERSIONING.md Step 1. Closes the Step 2 ("Add Tooling Support") TODO from #457. Contributed by @Muawiya-contact.

  • Release-time telemetry key injection workflow (PR #481) — publish-drt-core.yml sed-injects POSTHOG_WRITE_KEY from secrets into drt/telemetry.py:_DEFAULT_API_KEY before uv build, with a smoke check that asserts injection happened. Fail-safe when the secret is unset (community forks ship with telemetry physically disabled — is_enabled() short-circuits to False without a key).

🔧 Changed

  • Postgres destination: safe SQL composition via psycopg2.sql (#442, PR #452) — Replaced f-string interpolation of table and column identifiers in _load_replace, _load_upsert, _build_insert_sql, and _build_upsert_sql with psycopg2.sql.SQL / psycopg2.sql.Identifier. Eliminates a class of identifier-injection bugs in environments where table/column names are derived from config rather than hard-coded. Swap-path methods (_load_replace_swap / finalize_sync from #435 / #448) tracked for follow-up in #483. Contributed by @Khush-domadia.

📄 Documentation

  • GDPR disclosure section in docs/telemetry.md — Data controller (K. Masuda as natural person, transferable to a future legal entity), retention (1 year on Free tier; #482 tracks reducing to 90 days via API-based cleanup), erasure contact (drt.hub.dev@gmail.com), Art. 6(1)(a) opt-in consent + Art. 46(2)(c) Standard Contractual Clauses basis for the US-incorporated processor / EU-hosted storage split (DPA signed via PostHog's self-serve flow).
  • CHANGELOG correctness fix — The previous v0.7.1 release commit accidentally renamed the v0.7.0 heading to [0.7.1] without preserving v0.7.1 content. This release restores [0.7.0] under its correct date and inserts the actual v0.7.1 entries (drt diff #413, cursor fix #475, on_error fail alignment #463, VERSIONING.md #457) that were missing.

🚧 Followups

  • #482chore(telemetry): cap event retention at 90 days via PostHog API cleanup (Free plan workaround)
  • #483chore(postgres): extend psycopg2.sql migration to swap path (good first issue)

Upgrade

pip install -U drt-core==0.7.2

Drop-in upgrade from v0.7.1 — no breaking changes. Telemetry is off by default; nothing leaves your machine until you run drt config set telemetry.enabled true.

Triage Collaborator update

🎉 @PFCAaron12 joined the triage team during this cycle (accepted at discussion #432), bringing the Triage Collaborator count to three alongside @Pawansingh3889 and @Muawiya-contact.

Contributors

Thanks to @kiwamizamurai (#446), @Muawiya-contact (#478), @Khush-domadia (#452), and the wider community for keeping the v0.7 cycle moving forward.

Full Changelog: v0.7.1...v0.7.2

v0.7.1 — Production Ready follow-up

07 May 13:43
v0.7.1
e11abdc

Choose a tag to compare

Theme: Production Ready follow-up. Tail of the v0.7 cycle — record-level dry-run preview, watermark cursor correctness fix, on_error=fail alignment for the remaining HTTP destinations, and the new VERSIONING.md policy doc.

✨ Added

  • drt run --dry-run --diff (#413) — Record-level preview before deploying a sync. For queryable destinations (Postgres / MySQL / ClickHouse), shows added / updated (with field-level old → new diffs) / deleted records. For non-queryable destinations (REST API, Slack, HubSpot, Notion, file destinations) falls back to "sample mode" — first N records to be sent. Works in both text (rich tables) and --output json. New --diff-limit N flag (default 20). Doc: docs/guides/dry-run-and-diff.md. Follow-ups: #468 Snowflake support, #469 Protocol method, #470 perf, #471 --diff-fields, #472 API-based SaaS diff.
  • VERSIONING.md (#457, polished in #464) — Project's semver and deprecation policy pre-1.0. Documents what counts as a breaking change at each layer (CLI / config schema / Python API) and the deprecation cadence. Cross-linked from CONTRIBUTING.md PR checklist. Initial draft and follow-up polish contributed by @Muawiya-contact.

🐛 Fixed

  • Watermark advance for tz-aware cursor values (#475) — drt/engine/sync.py was calling str() directly on cursor field values, which for tz-aware datetimes (e.g. BigQuery TIMESTAMP columns from the Python BQ client) produced strings with a +00:00 suffix. When user SQL or default_value was tz-naive, the boundary row matched again and re-fired on every subsequent run. The engine now normalizes tz-aware datetimes to naive UTC before stringifying. Reported by @K-Masuda-SL after a prod incident.
  • on_error=fail not respected by Notion / REST API / Email SMTP destinations (#463, contributes to #365) — Three HTTP destinations continued processing the rest of the batch after the first failure even when on_error: fail was configured. Now all three short-circuit and return on the first failure, matching the documented contract and the behavior of every other destination. New tests lock the semantic in across the webhook surface.

🚧 Deferred to v0.7.2

  • Opt-in anonymous usage telemetry (#263, PR #446)
  • Postgres psycopg2.sql SQL composition (#442, PR #452)

Upgrade

pip install -U drt-core==0.7.1

Drop-in upgrade from v0.7.0 — no breaking changes.

Contributors

Thanks to @Muawiya-contact (#457, #464), @K-Masuda-SL (#475), and the wider community for keeping the v0.7 cycle moving forward.

Full Changelog: v0.7.0...v0.7.1

drt-core v0.7.0 — Production Ready

06 May 08:00
v0.7.0
3abcb87

Choose a tag to compare

[0.7.0] - 2026-05-06

Theme: Production Ready. Reliability, observability, and correctness for syncs that run in production environments — graceful shutdown on SIGTERM/SIGINT, retry knobs per destination, atomic zero-downtime table replace, sync execution history, FK existence filtering, opinionated JSON column handling. Plus the first DWH destination (Snowflake), the GitHub Codespaces playground for zero-setup onboarding, and OPEN_CORE.md documenting the open core boundary.

This release closes 9 v0.7 milestone issues plus several spillover items shipped early.

Breaking Changes

None. Drop-in upgrade from v0.6.x.

Added

  • Sync failure alerts (#414): Configure alerts.on_failure in sync YAML to send Slack or generic HTTP webhook notifications when a sync ends with failed > 0 or raises an exception. Two target types in v0.7: slack (Slack incoming webhook) and webhook (generic HTTP POST/PUT with optional body_template). Template variables: sync_name, error, rows_processed, duration_s, started_at. Dispatch is best-effort — alert failures are logged but never affect sync correctness or override the original exception.
  • Graceful shutdown on SIGTERM/SIGINT (#279): drt run now handles SIGTERM (container stop, K8s pod eviction, Airflow cancellation) and SIGINT (Ctrl+C) cooperatively. Signal handler sets a stop_event checked between batches so the in-flight batch always completes; state and watermark are persisted before exit. POSIX-conventional exit codes: 130 for SIGINT, 143 for SIGTERM. A 30-second watchdog force-exits if the current batch hangs — pair with K8s terminationGracePeriodSeconds: 60 for a bounded shutdown window. New SyncResult.interrupted flag lets integrations (Dagster / Airflow / Prefect) distinguish a clean cancellation from an error. See docs/guides/graceful-shutdown.md.
  • FK existence check via lookups.check_only (#354): Filter source rows by whether a foreign key exists in the destination, without resolving a value. Set check_only: true on a lookup (and omit select); rows whose match key is missing in the destination table are filtered per on_miss (skip or fail). Common use case: BigQuery has prd-like data but the destination DB (e.g. staging) holds only a subset — silently drop rows pointing at non-existent FKs instead of failing the sync. Source columns are always preserved (no implicit drop) since the target name is just a label, not a destination column. Works alongside regular value-resolving lookups in the same sync.
  • Snowflake destination (#353): Write rows back to Snowflake tables. Supports mode: insert (append) and mode: merge (upsert via temp staging table + MERGE statement using upsert_key). Auth via account_env / user_env / password_env. Install: pip install drt-core[snowflake]. Contributed by @PFCAaron12.
  • Zero-downtime replace via staging table swap (#338): sync.replace_strategy: swap enables truly atomic table replacement — drt writes to a shadow table ({table}__drt_swap) per batch and atomically renames it to the original at the end of the sync. Supported on PostgreSQL (transactional ALTER TABLE RENAME), MySQL (atomic RENAME TABLE), and ClickHouse (atomic EXCHANGE TABLES, requires 21.8+). Default remains truncate (existing TRUNCATE → INSERT behavior). Follow-ups tracked in #433 (orphan auto-cleanup) and #434 (Snowflake support).
  • Per-destination retry override (#277): Each HTTP destination can now declare its own retry: RetryConfig block in YAML to override the sync-level sync.retry. Useful when one destination has stricter rate limits or unusual failure modes (e.g. Notion 7 attempts while other destinations stay at the default 3). Priority: destination.retry > sync.retry > built-in defaults. Brings drt in line with the per-adapter retry knobs in dbt and dlt. Documentation: docs/guides/retry.md.
  • Sync execution history (#276): Every drt run now appends a record to .drt/history/<sync_name>.jsonl with timestamp, status, rows synced/failed, duration, and errors. Inspect via drt status --history [--sync NAME] [--limit N] [--output json] or via the new drt_get_history MCP tool. Configurable retention (history.retention_days, default 30) prunes old entries lazily on each append. Best-effort: history persistence never affects sync correctness. The CLI/MCP counterpart to the run-history UI in Census/Hightouch — brought to a Git-native, scriptable workflow. Documentation: docs/guides/sync-history.md.
  • GitHub Codespaces playground (#407, closes #283): Zero-setup "click and try" onboarding via the Open in GitHub Codespaces badge. The devcontainer installs drt-core[duckdb], seeds a sample DuckDB warehouse, and ships two runnable examples — examples/duckdb_to_file (DuckDB → CSV) and examples/duckdb_to_rest (DuckDB → REST API). Contributed by @safridwirizky.
  • json_columns config for explicit JSON serialization (#316): Declare which Postgres/MySQL destination columns hold JSON/JSONB data via json_columns: [col1, col2]. Listed columns are wrapped with the driver-native JSON adapter (psycopg2.extras.Json for Postgres, json.dumps for MySQL); unlisted columns receiving dict/list values raise an early ValueError pointing at the missing column instead of failing deep inside the driver. Backward-compatible — when omitted, all dict/list values are auto-wrapped as before. Contributed by @armorbreak001.
  • drt doctor command (#264): Diagnostics for the user's environment — checks Python version, drt version, project file existence, profile resolution, sync file count, optional extras installation status, and common environment variables. Pinpoints setup issues for new users without forcing them to read traceback output. Contributed by @pureqin.
  • --quiet / -q flag for drt run (#265): Suppresses banner / sync-result / summary / watermark output for CI and cron use cases where logs are noise. --quiet wins over --verbose when both are passed; --output json is unaffected so structured output still flows. Contributed by @Pawansingh3889.
  • drt test --output json and drt test --dry-run (#366, #371): Brings drt test to feature parity with drt run. JSON output gives CI integrations structured pass/fail data; dry-run prints the test plan (test name, target, type) without hitting any database. Contributed by @wahajahmed010.
  • drt cloud push stub command (#302): Placeholder Typer subcommand that prints an "enterprise cloud push" message and exits cleanly. Reserves the CLI surface so future enterprise integrations don't break user shell aliases / scripts. Re-landed under maintainer authorship after the original contributor (#308) didn't return to sign the CLA. See OPEN_CORE.md for what's free vs. enterprise.

Changed

  • Connector registry (#381): Replaced hardcoded isinstance chains in _get_destination() / _get_source() with a centralized registry (drt/connectors/registry.py). Adding a new connector no longer requires editing main.py. Error messages now list available connectors on typo. Contributed by @Muawiya-contact.
  • REST API destination pagination (#260): Fetch data from paginated APIs. 3 strategies: offset/limit, cursor-based, HTTP Link headers. Contributed by @Muawiya-contact.
  • Retry resolution helper (#277): Each HTTP destination now uses a single resolve_retry(config.retry, sync_options) helper instead of duplicating its own _DEFAULT_RETRY fallback. Removes ~110 lines of dead code (the per-destination _DEFAULT_RETRY constants were unreachable because SyncOptions.retry is always populated by default_factory).

Fixed

  • PostgreSQL destination: crash on dict values bound for JSONB columns — wrapped with psycopg2.extras.Json (#315). Contributed by @armorbreak001.
  • Notion destination: sync_options.retry override was silently ignored — Notion always used the hardcoded _DEFAULT_RETRY (3 attempts) regardless of user configuration. Now respects user-configured retry like every other HTTP destination (#438). Contributes to #365.
  • BasicAuth credentials from secrets.toml: BasicAuth was the only auth type that couldn't resolve credentials from .drt/secrets.toml — it called os.environ.get() directly while every other auth type went through resolve_env(). Users storing BasicAuth credentials in secrets.toml got a misleading "env var not set" error. Now matches the other auth types (#386). Contributed by @armorbreak001.
  • replace_strategy: swap ignored json_columns config (#448): When both replace_strategy: swap (#338) and json_columns (#316) were configured on a Postgres or MySQL destination, swap mode silently bypassed the explicit JSON column declarations because _load_replace_swap did not thread config.json_columns through to _serialize_value. Now both strategies honour the config consistently — swap-mode dict values in unlisted columns raise the same fail-fast ValueError as truncate mode. ClickHouse is unaffected (its connector handles JSON encoding driver-side and has no json_columns config). Discovered post-rebase of #435 onto #382 — neither feature was wrong in isolation; the gap was an interaction artifact of parallel development.

v0.6.2

20 Apr 11:50
b6b6ef1

Choose a tag to compare

What's New

Added

  • watermark.default_value (#390): Configure a fallback cursor value for first-run incremental syncs. Prevents broken SQL (e.g. WHERE TIMESTAMP(...) >= TIMESTAMP('')) when no watermark file exists yet. Without a default, drt now raises a clear error with actionable guidance instead of silently rendering an empty string.

    sync:
      mode: incremental
      cursor_field: completed_at
      watermark:
        storage: gcs
        bucket: my-bucket
        key: watermarks/my_sync.json
        default_value: "2026-04-20 00:00:00"  # used on first run
  • --cursor-value CLI option (#390): Override the cursor/watermark value at runtime for backfill and recovery scenarios. The override takes highest priority in the fallback chain and the resulting watermark is persisted on success.

    drt run --select my_sync --cursor-value '2026-01-01 00:00:00'
  • Watermark source observability (#391): Operators can now see where the cursor value came from — storage, default_value, or cli_override — via structured INFO logs, --output json fields (watermark_source, cursor_value_used), and an end-of-run summary in text mode.

Fallback chain (priority order)

1. --cursor-value CLI flag
2. Watermark storage (GCS / BigQuery / local)
3. State manager (.drt/state.json)
4. watermark.default_value (YAML config)
5. Cursor template present → ValueError / No template → full extract

Install / Upgrade

pip install drt-core==0.6.2
# or
uv add drt-core==0.6.2

Full Changelog: v0.6.1...v0.6.2