Releases: drt-hub/drt
v0.7.8 — Mixpanel destination + ClickHouse identifier fix + empty-batch contracts complete
Theme: Community-driven follow-up + ClickHouse identifier fix completion. Two contributor PRs accumulated on main since v0.7.7: a new Mixpanel destination (#608 by @Pawansingh3889, closes #417) and a ClickHouse _quote_ident identifier fix (#610 by @yodakanohoshi, closes #512 — closes the ClickHouse leg of the qualified-identifier fix family alongside Postgres #498 / MySQL #514). The ClickHouse fix is the urgency lever: v0.7.7 users running ClickHouse against a database-qualified table (db.tbl) hit a server-side Code: 62 syntax error from get_row_count's malformed identifier — this patch resolves that path plus the other raw-interpolation SQL paths. Also completes the empty-batch contract suite (#604 / #605 / #606 across HTTP API / special-transport / StagedDestination Protocol shapes — 25 of 25 registered destinations), which surfaced a real bug in staged_upload.finalize() also fixed here in #606. Ships user-facing sync.mode: mirror documentation (#607) — connector docs section + runnable cp -r example + skill option — plus the post-#608 Mixpanel wiring (#609) and i18n marker bump (#603). No breaking changes — drop-in upgrade from v0.7.7.
Breaking Changes
None. Drop-in upgrade from v0.7.7.
Added
- Mixpanel destination (#417, PR #608): new HTTP destination for Mixpanel's two primary product-analytics APIs —
/engage(endpoint: people_set, sets user-profile properties with the project token carried per-record) and/import(endpoint: import_events, ingests events with HTTP Basic auth via a Mixpanel service account + the numericproject_id). Both batch up to 2000 records per request (Mixpanel's hard limit, enforced via abatch_sizefield_validatorthat clamps at 2000), support EU data residency (region: euroutes toapi-eu.mixpanel.com), and produce a deterministic$insert_idfor the import path — derived fromSHA256(canonical_json(record) + ":" + event_name)[:32]wheninsert_id_fieldis unset, so re-running the same sync does not double-count events server-side. Profile-set payload shape:{"$token": ..., "$distinct_id": ..., "$set": {<row props minus reserved>}}; event payload shape:{"event": ..., "properties": {"distinct_id": ..., "time": ..., "$insert_id": ..., <other row props>}}. Row-level errors surface inSyncResult.row_errorswith the HTTP status code forhttpx.HTTPStatusErrorpaths and the response body (truncated to 500 chars) so debugging a bad payload doesn't require re-running with verbose logging. Two pydanticmodel_validators onMixpanelDestinationConfigcross-checkendpointagainst the required auth fields and (forimport_events) require either a constantevent_nameor a per-rowevent_name_field. Optionalproperties_template(Jinja2 → JSON object) merges into the props dict after the row's plain columns. Composable with all existing infrastructure:RateLimiterfor request pacing,with_retry/resolve_retryfor transient-error handling,RowError/SyncResultshape consistent with the other 24 registered destinations. Newdocs/connectors/mixpanel.mdcovers both endpoints, the EU note, the deterministic insert_id behaviour, and links to the Mixpanel reference docs; newtests/unit/test_mixpanel_destination.pyships 13 tests acrossTestConfigValidation/TestPeopleSet/TestImportEvents/TestBatchingAndErrors. Empty-batch contract for Mixpanel landed in follow-up #609. Contributed by @Pawansingh3889.
Changed (Internal)
- Mixpanel empty-batch contract + drt-create-sync skill destinations list refresh (follow-up to #608 Mixpanel destination): now that Mixpanel is registered, adds it to the API empty-batch contract framework as a one-line
pytest.param(...)entry intests/contracts/test_destination_api_empty_batch.py— Mixpanel'sif not records: return SyncResult()short-circuit means it passes all three contracts (Protocol satisfaction / emptySyncResult/ nohttpx.Client.sendcalls) on first run, contract framework total now 36 API tests (12 destinations × 3). Also refreshes the destinations enumeration in thedrt-create-syncskill (both.claude/commands/drt-create-sync.mdandskills/drt/skills/drt-create-sync/SKILL.md) — the list had drifted significantly from the actual registry. Adds Mixpanel, Amplitude, Notion, Twilio, Intercom, Zendesk, Google Ads, Email SMTP, Snowflake, Salesforce Bulk all of which were already registered destinations but missing from the skill's prompt-to-user list. Out of scope for this PR: GitHub Topics is at the 20-max limit (would need to swap an existing topic to addmixpanel) — flagged as a separate decision rather than auto-swapped, since topics are externally visible shared state.
Documentation
sync.mode: mirroruser-facing docs (follow-up to v0.7.7's #596–#599 mirror landings): adds a## Mirror mode (differential delete, #340 — v0.7.7+)section todocs/connectors/postgres.mdcovering the YAML shape (sync.mode: mirror+ requiredupsert_key), the cost-shape comparison vsupsert/replace(new table), and all four safety guards (empty-source short-circuit / failed-rows exclusion /upsert_keyrequired ValueError / composite key support). Calls out the memory-bound caveat and the MySQL / ClickHouse / Snowflake sibling implementations in the same section. Newexamples/postgres_to_postgres_mirror/directory ships a complete runnable example (HR warehouse → operational Postgres mirror, full README +drt_project.yml+profiles.yml.example+syncs/mirror_active_employees.yml) that downstream users cancp -ras a starting point — useson_error: failand explains why (silently skipping a row in mirror mode would cause the destination counterpart to be DELETEd as "not observed"). Thedrt-create-syncskill (both.claude/commands/drt-create-sync.mdandskills/drt/skills/drt-create-sync/SKILL.md) is updated to listmirroras a validsync.modeoption alongside the existingfull/incremental/upsert/replacechoices, with theupsert_key requiredand v0.7.7+ availability note inline. README.md / README.ja.md unchanged — mirror is already prominent in the v0.7.7 roadmap row, and a dedicated body section is content-design work outside this PR's scope.
Fixed
- ClickHouse
get_row_countrendered a malformed identifier for schema-qualified tables + raw table interpolation across SQL command paths (#512, ClickHouse counterpart to the Postgres #498 / MySQL #514_quote_identfixes):ClickHouseDestination.get_row_countbuilt its quoted identifier with`".`".join(config.table.split(".")), sodb.scoresbecame`db.`scores`(3 backticks — a syntax error on the server, confirmedCode: 62against ClickHouse 24.8) rather than the intended`db`.`scores`. There was no test forget_row_countwith a qualified table, which is why it never surfaced. The_quote_identhelper (added forsync.mode: mirrorin #598) already rendered this correctly, so the fix routesget_row_countthrough it and applies it to the remaining raw-interpolation command paths that were fragile for reserved words / mixed case /db.tablenames:TRUNCATE TABLE(replace mode),DROP/CREATE TABLE ... AS(swap shadow setup + on-error cleanup),EXCHANGE TABLES/DROP TABLE(finalize_syncswap), andclient.insert(table, ...)on both the non-swap and swap paths —clickhouse-connect'sclient.insert()interpolates the table argument raw intoINSERT INTO {table} ... FORMAT Native(seeclickhouse_connect/driver/insert.py), so the destination pre-quotes it to match the established Postgres / MySQL precedent of quoting insert/upsert paths. Verified end-to-end against a real ClickHouse with a database-qualified table across the replace / swap / mirror / row-count paths. staged_uploadfinalize() ran the full upload/trigger/poll lifecycle on empty input (PR for #340-adjacent Step 2e contract): a transient empty source — where everystage()call received[]— causedStagedUploadDestination.finalize()to serialize a 0-byte file, then POST it to the configuredstage.url, then POST again to the configuredtrigger.url, then (if configured) poll. The result was a wasted upload + trigger + a zero-row job whose lifecycle still allocated quotas on the third-party API.SalesforceBulkDestination.finalize()already had the right guard (if not self._records: return SyncResult(rows_extracted=0)at the top);StagedUploadDestination.finalize()was missing it. Added the same short-circuit immediately afterrecord_count = len(self._records)so no auth / upload / trigger / poll work runs when nothing was staged. Surfaced by the new contract test in this PR — the contract assertion againsthttpx.Client.sendcaught a single POST tohttps://upload.example.com/filesduring finalize() on empty input.
Changed (Internal)
- Destination contract tests — Step 2e:
StagedDestinationProtocol (staged_upload + salesforce_bulk) (final follow-up to Step 2c PR [#60...
v0.7.7 — sync.mode: mirror across SQL destinations
Theme: sync.mode: mirror across the SQL destination set + cli/main split completion. The first user-facing addition since v0.7.6 is the sync.mode: mirror differential-delete sync mode (#340), shipping in four landings across Postgres (Step 1), MySQL (Step 2), ClickHouse (Step 3), and Snowflake (Step 4) — all four SQL destinations now upsert source rows and then DELETE destination rows whose upsert_key was not observed in the source, without the TRUNCATE/re-insert overhead of replace mode. BigQuery follows when the contributor PR #584 lands. Also lands the cli/main.py split completion — Phase 2b PR (a) + PR (b) + final tighten finish the 1706 → 164 LOC split (-90%) begun in v0.7.5 — plus a FakeSource + destination contract test framework, a CI check-changelog-required guard, a GCS storage import mypy fix, and CI install line extension that unlocked ~102 silently-skipped SQL destination tests (raised total coverage 82.68 → 85.29). No breaking changes — drop-in upgrade from v0.7.6.
Breaking Changes
None. Drop-in upgrade from v0.7.6.
Added
sync.mode: mirror— Step 4: Snowflake support (#340 Step 4, follow-up to #596 + #597 + #598): Snowflake destination now supportssync.mode: mirror— the final SQL destination in the set. Same application-side diff semantics as the prior steps — accumulateupsert_keytuples seen across all batches duringload(), then issue a singleDELETE FROM <db>.<schema>.<table> WHERE key NOT IN (collected)fromfinalize_sync(). Snowflake's connector uses%splaceholders (same family as psycopg2 / pymysql) and does not auto-expand a tuple-of-tuples — so the placeholder shape is built explicitly, identical to MySQL Step 2: single-column formWHERE col NOT IN (%s, %s, ...)with a flat values list, composite formWHERE (c1, c2) NOT IN ((%s, %s), (%s, %s), ...)with values flattened row-major. Snowflake-specific wrinkle: the destination has its own pre-existingconfig.mode: "insert" | "merge"field (orthogonal tosync_options.mode) that controls whether the write path is plain INSERT or staging-table-plus-MERGE. Since mirror semantics intrinsically require upsert,sync.mode: mirrornow forces the MERGE write path regardless ofconfig.mode— users only need to setdestination.upsert_keyandsync.mode: mirror(no need to also flipdestination.modetomerge). The sameValueErrorfail-fast applies before any INSERT touches Snowflake whenupsert_keyis absent. Also adds the first-everfinalize_syncmethod on Snowflake (the existing destination had no swap-replace finalize path), returningNonefor any non-mirror mode so the engine's existing dispatch is unchanged. Newtests/unit/test_snowflake_mirror_mode.pyships 12 tests covering key accumulation, the merge-path forcing (verifiesCREATE TEMP TABLE+MERGE INTOran even withconfig.mode: insert), dedupe across overlapping batches, single + composite key DELETE structure (againstANALYTICS.PUBLIC.USER_SCORESto verify fully-qualified table emission), the empty-source safety path, state reset, the missing-upsert_keyValueError, row-error skip path, and the non-mirror finalize short-circuit. Tests usesys.modulesinjection (matching the existingtest_snowflake_destination.pypattern) so nosnowflake-connector-pythoninstall is required to run them. Closes #340 for the SQL destination set; the temp-table strategy for high-cardinality tables remains as a future follow-up.sync.mode: mirror— Step 3: ClickHouse support (#340 Step 3, follow-up to #596 + #597): ClickHouse destination now supportssync.mode: mirrorwith the same application-side diff semantics as Postgres / MySQL — accumulateupsert_keytuples seen across all batches inload(), then issue a single mutation fromfinalize_sync()that removes destination rows whose key was not observed. ClickHouse usesALTER TABLE ... DELETE WHERE key NOT IN (collected)withmutations_sync=1so the call blocks until the mutation completes. Unlike Postgres / MySQL where the placeholder shape had to be assembled manually, clickhouse_connect's native{name:Type}parameter substitution acceptsArray(String)(single column) andArray(Tuple(String, ...))(composite) directly — so the call site is one parameter dict, not a flat placeholder list. Both column references and parameter values are coerced viatoString()so the comparison works regardless of source column type (cost: the comparison can't use a column index onupsert_key— mirror mode is intended for small/medium reference tables, not high-volume fact tables; the temp-table strategy is a planned follow-up for high-cardinality cases). The mutation rewrites affected parts, which is expensive in ClickHouse — the new docstring notes this explicitly so misuse is hard.ClickHouseDestinationConfig.upsert_keyislist[str] | None(it's informational only for the existing INSERT path, where dedup is handled byReplacingMergeTreeat merge time), so the runtime guard inload()raisesValueErrorearly when mirror mode is requested without a populated key — fail-fast before any INSERT touches the table. Backtick-quoting for database-qualified table identifiers (db.table→`db`.`table`) added via a new_quote_identhelper, matching the v0.7.4-hardened MySQL pattern. Newtests/unit/test_clickhouse_mirror_mode.pyships 12 tests covering key accumulation, dedupe across overlapping batches, database-qualified DELETE shape, single + composite key DELETE structure, the empty-source safety path, state reset, the missing-upsert_keyValueError, row-error skip path, and coexistence with the existingEXCHANGE TABLESswap-finalize path. Snowflake follows in the next PR.sync.mode: mirror— Step 2: MySQL support (#340 Step 2, follow-up to #596): MySQL destination now supportssync.mode: mirrorwith the same application-side diff semantics as Postgres — accumulateupsert_keytuples seen across all batches inload(), then issue a singleDELETE WHERE key NOT IN (collected)fromfinalize_sync(). Same safety paths as Step 1: empty-source short-circuit (_mirror_keysnever populated → no DELETE issued, so a transient empty source can't wipe the table), state reset afterfinalize_syncruns, ignores rows that failed during upsert (only successfully-loaded keys count as "source state"), and aValueErrorat load time whenupsert_keyis empty. Single-column and compositeupsert_keyboth supported. Because pymysql does not auto-expand a tuple-of-tuples into aNOT IN %sparameter the way psycopg2 does, the DELETE is built with explicit%splaceholders — single-column formWHERE \c` NOT IN (%s, %s, ...)with a flat values list, composite formWHERE (`c1`, `c2`) NOT IN ((%s, %s), (%s, %s), ...)with the values flattened in row-major order. Backtick-quoting + schema-qualified table handling reuses the existing_quote_identhelper that v0.7.4 hardened. Newtests/unit/test_mysql_mirror_mode.pyships 10 tests covering key accumulation, dedupe across overlapping batches, schema-qualified DELETE shape, single + composite key DELETE structure, the empty-source safety path, state reset, the missing-upsert_key` ValueError, and coexistence with the existing swap-finalize path. ClickHouse and Snowflake follow in subsequent PRs.sync.mode: mirror— differential delete for Postgres + CI test coverage (#340, PR #596): new sync mode that upserts every source row (same asfull) and then DELETEs destination rows whoseupsert_keytuple was not observed in the source — without the TRUNCATE / re-insert overhead ofreplacemode. Strategy: application-side diff (collectupsert_keytuples in memory duringload(), then issue a singleDELETE WHERE key NOT IN (collected)fromfinalize_sync()). Memory-bound to the source key cardinality; a temp-table strategy is a planned follow-up for tables larger than a few million rows. Safety: when the source produces no batches with records,_mirror_keysstays empty andfinalize_syncskips the DELETE — a transient empty source won't wipe the destination. Single-column and compositeupsert_keyboth supported (composite uses(c1, c2) NOT IN ((v1a, v2a), …)shape). Postgres only in Step 1; MySQL / ClickHouse / Snowflake follow in subsequent PRs. 10 unit tests cover key accumulation, dedupe across overlapping batches, single + composite key DELETE shape, the empty-source safety path, state reset after finalize, and coexistence with the existing swap-finalize path. Also extends CI's install line to include[postgres,mysql,clickhouse]so the new mirror tests + the existing 43 postgres + 33 mysql + 26 clickhouse tests actually run in CI rather than being silently skipped by theirpytest.importorskipguards — the pre-existing coverage gap that this PR's codecov delta surfaced.- Destination contract tests — Step 2b: SQL destinations + empty-batch invariant (PR #595, follow-up to #594): closes out the empty-batch contract suite with the four SQL destinations (
postgres/mysql/clickhouse/snowflake). Two parametrised contracts × 4 destinations = 8 tests. Notable: no driver mocking required — CI's minimal install ([dev,mcp,duckdb]) deliberately excludes the SQL extras, and each d...
v0.7.6
What's New
Theme: Small follow-up. Two additive features accumulated since v0.7.5 — a new Amplitude destination (#574, Identify API + HTTP V2 events API) and a new tojson_safe Jinja2 filter (#580) that unblocks datetime / Decimal / UUID columns flowing through REST API body_template rendering — plus a CLI --log-format typer-compatibility fix (#578), a follow-up retrofit of ErrorFormatter stage detection to an engine-emitted attribute (#571, supersedes the traceback-walk heuristic from #544), and Phase 2a of the cli/main.py split (#572, continues #565's Phase 1). No breaking changes — drop-in upgrade from v0.7.5.
Breaking Changes
None. Drop-in upgrade from v0.7.5.
Added
- Amplitude destination (#574): Sync DWH rows to Amplitude Identify API (user properties) or HTTP V2 API (events). No extra dependencies. 18 unit tests.
tojson_safeJinja2 filter (#580, PR #581): drop-in replacement fortojsoninbody_templaterendering that toleratesdatetime/date/time(encoded as ISO 8601),DecimalandUUID(encoded as string). Registered on bothdrt.templates.rendereranddrt.destinations.staged_upload's local Jinja environments. The defaulttojsonfilter is unchanged — opt-in only, no behavioural change for existing templates. Unblocks BigQueryTIMESTAMP/ Postgresnumeric/uuidcolumns flowing into REST API destinations withoutCAST(... AS STRING)workarounds in model SQL. Docs: docs/connectors/rest-api.md.
Fixed
drt run --log-formattyper 0.26.1 compatibility (#577, PR #578): the option was declared withstr + click_type=click.Choice(...), a shape typer 0.26.1's revised stubs no longer accept (mypy fails on CI's resolved typer version while passing on the locally-pinned 0.24.1). Replaced withLogFormat(str, Enum)— the canonical typer pattern, version-agnostic, and removes the need for a# type: ignore. No user-visible behaviour change; CLI accepts the sametext/jsonvalues as before.
Changed (Internal)
ErrorFormatterstage detection retrofitted to engine-emitted attr (PR #571, follow-up to #544): replaces the traceback-walk heuristic indrt.cli.errors.infer_stagewith a_drt_stagestring attribute set byengine/sync.pyat the point of failure. Removes two failure modes of the heuristic (re-raise wrapping loses attribution; engine-vs-source ambiguity when source iterators fail mid-iteration). User-visibleerror_stagesemantics unchanged — samesource/destination/engine/statevalues, just sourced from the engine directly instead of inferred.cli/main.pysplit Phase 2a (PR #572, continues #565's Phase 1): extractsdrt sources/drt destinations/drt clean --orphans/drt serveintodrt/cli/commands/{connectors,clean,serve}.py, with shared internals (resolve_profile_name,get_source,get_destination,get_watermark_storage) moved todrt/cli/_helpers.py. No CLI behaviour change;drt/cli/main.pyshrinks accordingly. (Phase 2b extractsdrt runitself — tracked in PR #579.)
Full Changelog: v0.7.5...v0.7.6
PyPI: https://pypi.org/project/drt-core/0.7.6/
v0.7.5 — Production Ready follow-up #3 + Tech Foundation Hardening
Theme: Production Ready follow-up #3 + Tech Foundation Hardening. Two streams shipping together: (1) the accumulated work since v0.7.4 — REST API polish, sync catalog (#499 P1+P2), MCP test tool, OTel Phase 1 config, hardcoded secret detection, lookup ambiguity warning, orphan shadow cleanup, drt init "Next steps:" block — and (2) the Tech Foundation Hardening epic (#538, 11 child issues, all closed) which locks in the foundations before v0.8 Cloud Destinations: CI reach (nightly + publish gate + CodeQL + pip-audit + SBOM), functional E2E coverage for the reverse-ETL paths (DuckDB harness + boundary cases), CLI/UX polish (ErrorFormatter, --detailed, --template), and load-bearing refactors (SyncObserver engine seam, destinations serializer + config base class consolidation, cli/main split Phase 1). No breaking changes — drop-in upgrade from v0.7.2 (v0.7.3 / v0.7.4 were patch-only cherry-picks).
Breaking Changes
None. Drop-in upgrade from v0.7.2 / v0.7.3 / v0.7.4.
Tech Foundation Hardening epic (#538)
- CI: nightly schedule + publish-time lint/test gate (#539, PR #553):
.github/workflows/ci.ymlgains a weeklyschedule:trigger (Mon 00:00 UTC) so env-drift / flaky regressions surface without waiting for the next PR. Bothpublish-drt-core.ymlandpublish-dagster-drt.ymlgain averifyjob (ruff + mypy + pytest on Python 3.12) that runs before build/upload — a tag pushed off a stale local can no longer ship without verification. - CI: supply-chain scans (CodeQL + pip-audit + SBOM) (#540, PR #557): new
.github/workflows/codeql.yml(Pythonsecurity-extendedquery pack, PR + weekly Mon 03:00 UTC schedule offset from the ci.yml nightly).pip-audit==2.10.0step added toci.yml(gated to Python 3.12 to avoid 4x duplicated work, OSV vulnerability service,--strictfails on warnings). CycloneDX SBOM (cyclonedx-bom==7.3.0) generated by both publish workflows and attached to the GitHub Release asdrt-core-sbom.cdx.json/dagster-drt-sbom.cdx.json.SECURITY.mddocuments the scanning posture and allow-list policy. - Tests: DuckDB Source + reverse-ETL E2E harness (#541, PR #552): new
tests/unit/test_duckdb.py(9 tests covering happy path, NULL, BIGINT/DOUBLE extremes, DATE/TIMESTAMP round-trip, connection success/failure) + newtests/integration/test_duckdb_e2e.py(4 tests driving real DuckDB → engine → pytest-httpserver REST destination end-to-end, no mocks). Establishes the harness pattern for future Source E2E modules (Postgres/MySQL/Snowflake): copy theduckdb_with_usersfixture + assertion shape, swap the source. Documented intests/integration/README.md. CI install line now includes theduckdbextra so the tests actually run. - Tests: boundary cases — idempotency, large batch, schema evolution, type conversion (#542, PR #555): 13 boundary tests across 4 files locking in the production-claim contracts.
test_idempotency.py(3) — StateManager persistence + re-run semantics (drt does NOT dedupe at the engine layer; destinations own upsert).test_large_batch.py(3) —batch_sizehonored, no row drop/dupe, tracemalloc smoke ceiling guards against O(N) buffering.test_schema_evolution.py(3) — ALTER TABLE between runs surfaces in payload, locking the "no schema enforcement" stance.test_type_conversion.py(4) — NULL / BIGINT extremes / DOUBLE pass through cleanly; DATE / TIMESTAMP currently fail at httpx JSON serialization (locked for #317 to flip). - UX:
drt sources --detailed/drt destinations --detailed+--format json(#543, PR #563): newdrt/cli/_connector_detail.pyintrospects each connector's Pydantic / dataclass config to surface required env vars, optional env vars, required fields, and a copy-pasteable 3–8 line sample YAML — no hand-maintained tables, derived from the source of truth.--format json(machine-readable, plainprint()so Rich line wrap doesn't corrupt the document) exposes the same data with a stableconfig_classfield for advanced consumers. Registry-parity tests guard against new connectors landing without metadata. Establishes the extension point for v0.9 plugin authors (#297). - UX: ErrorFormatter for
drt runfailures (#544, PR #558): newdrt/cli/errors.pywraps raw exceptions with stage (source / destination / engine / state, inferred today via traceback walk; will swap to engine-emitted tag once #527 OTel + #548 SyncObserver land), error type + message, and a suggested next step from a conservative keyword-rule table (connection / auth / rate-limit / state corruption). Rich panel rendering for terminal users; newerror_type/error_stage/error_suggestionfields on--format jsonentries (preservingerrorstring for back-compat). Thelog_jsonstructured ERRORsync_completeline also gains the stage / type fields for log-aggregation grouping. - UX:
drt init --template <connector>quickstart scaffolds + README rewrite (#545, PR #564): three curated static templates ship underdrt/cli/templates/syncs/—duckdb_to_rest(no accounts needed),postgres_to_slack(operational alerts),duckdb_to_hubspot(contacts upsert).drt init --template listenumerates them;drt init --template <name>writes the sync file plus a minimaldrt_project.yml+.drt/.gitignoreshell, never overwriting an existing project, then prints per-template next-steps text (env vars to set, sample tables to seed). README quickstart collapsed from 5 steps to 3 commands using--template; the interactive wizard demoted to a "Customizing your sync" subsection.docs/connectors/discovery hint added above the Connectors table. README.ja.md mirrored. - Refactor:
SyncObserverprotocol — tighten engine I/O boundary (#548, PR #559): newdrt/engine/observer.pyintroduces a fire-and-forget event surface (on_sync_started/on_watermark_resolved/on_warning/on_interrupted/on_sync_completed) with four concrete observers (NullObserver,LoggingObserver,StatePersistingObserver,CompositeObserver).engine/sync.pyno longer importsloggingor callsstate_manager.save_sync(...)/watermark_storage.save(...)directly — all writes flow through the observer; the CLI composes the defaultLoggingObserver + StatePersistingObserverset. Restores the "no I/O beyond protocol calls" purity claim documented in CLAUDE.md, which is load-bearing for the v1.x Rust migration. Two boundary regression tests pin the contract by readingengine/sync.pyas text and asserting the absence of direct calls. Engine API change: callers ofrun_syncthat want state persistence must now passobserver=StatePersistingObserver(state_mgr, wm_storage). Internal-API only — CLI updated; affected tests updated in this PR. - Refactor: consolidate
_serialize_value()across destinations/postgres + mysql (#547, PR #560): newdrt/destinations/_serializer.pywith oneserialize_complex_value(value, column, json_columns, *, dict_encoder, list_encoder=None)function carrying the dialect-agnostic decision logic (validate dict/list againstjson_columnsallowlist, error early on unlisted complex values). Postgres delegates withdict_encoder=_pg_dict_encoder(psycopg2 Json wrapper + json.dumps fallback) andlist_encoder=None(pass-through to ARRAY adapter); MySQL delegates with both asjson.dumps. ~50 LOC of duplication removed; 21 new matrix tests cover the shared module at 100% coverage. Future SQL destinations (Snowflake / Redshift / Databricks in v0.8) just supply their encoder — no third copy of the validation logic. - Refactor: extract
BaseSqlDestinationConfigfromconfig/models.py(#549, PR #562): nine connection fields (connection_string_env,host,host_env,port,user,user_env,password,password_env,lookups) lifted from Postgres / MySQL / ClickHouse destination configs into a shared base. Subclasses overrideport(Postgres 5432 / MySQL 3306 / ClickHouse 8123) and add their dialect-specific fields (dbnamevsdatabase,sslvssecure,upsert_key,json_columns)._check_connectionvalidator stays per-subclass because the database field name differs. Zero test changes — Pydantic v2 inheritance is transparent. v0.8 Cloud destinations (Redshift in particular) inherit for free. - Refactor:
drt/cli/commands/package — Phase 1 of cli/main.py split (#546, PR #565): newdrt/cli/_app.pyholds the sharedtyper.Typerinstance; newdrt/cli/commands/package extracts six command modules —doctor,list_syncs,config(3 sub-commands),cloud(2 stub commands),docs(generate + serve),mcp(run).cli/main.py1706 → 1470 LOC (-14%) while preserving everyfrom drt.cli.main import ...import path that tests rely on (app,_run_one,_RunContext,_get_destination, etc.). Heavier commands (run/validate/status/test/init/serve/sources/destinations/clean) stay in main.py for a Phase 2 follow-up — the pattern this PR establishes makes that move mechanical. - Fix: escape Rich markup in
print_errorso[extra]hints aren't dropped (PR #566): discovered while writing coverage tests for the cli/main.py split.print_error("MCP server requires: pip install drt-core[mcp]")was rendering as"pip install drt-core"because Rich consumed[mcp]as a style tag. Fix indrt/cli/output.py:print_errorcallsrich.markup.escapeon the message argument before interpolating into the styled template — guards every callsite at once (any futureprint_error(...)with bracketed content is now safe). Four parametrized tests coverdrt-core[mcp],drt-core[postgres,duckdb],[INFO]-style markers, and plain-message regression.
Other shipped items (since v0.7.4)
- REST API: lift
extract_next_linkinto shareddrt._http_utils(#530, item 1): removes the cross-domain import wheredrt/sources/rest_api.pyreached into `RestApiDes...
v0.7.4 — MySQL Patch
Theme: Patch release for MySQL schema-qualified identifier handling.
Cherry-pick of PR #514 onto the v0.7.3 release line — the MySQL counterpart to the Postgres Identifier() composition fix that shipped in v0.7.3. No new features, no breaking changes.
⚠️ Erratum on v0.7.3: Earlier drafts of the v0.7.3 CHANGELOG referenced this MySQL fix as if it had shipped in v0.7.3, but PR #514 actually landed onmaintwo days after the v0.7.3 tag was cut. The wheel published to PyPI asdrt-core==0.7.3does not contain the MySQL fix. v0.7.4 is the release that actually delivers it. Users hitting #511 should upgrade todrt-core>=0.7.4.
Fixed
- MySQL destination correctly quotes schema-qualified table names (#511, PR #514):
mydb.scoresnow produces`mydb`.`scores`across the row-count, replace, insert, and upsert paths (previously the dotted name was treated as a single backtick-quoted identifier, so any MySQL destination configured with a schema-qualified table name failed at SQL execution). The_quote_identhelper is now applied consistently across all SQL-composition paths on the MySQL destination, matching the PostgresIdentifier()composition fix that shipped in v0.7.3 (#442 / PR #498). Contributed by @Godzilaa.
Upgrade
pip install -U drt-core
# or
uv add drt-corePyPI: https://pypi.org/project/drt-core/0.7.4/
Compared to v0.7.3
Drop-in upgrade. No config changes, no API changes, no schema changes. v0.8 work continues in parallel on main.
Full Changelog: v0.7.3...v0.7.4
v0.7.3
drt-core v0.7.3
Patch release for Postgres schema-qualified identifier handling.
Cherry-pick of PR #485 + PR #498 onto the v0.7.2 release line — so users on drt-core~=0.7.2 get the fix without the v0.8.0 feature set. No new features, no breaking changes.
Install
pip install --upgrade drt-coreFixed
- Postgres: qualified
schema.tableidentifiers now safely composed (#442, PR #498): the row-count, replace, swap, finalize, insert, and upsert SQL paths previously passedf"{schema}.{table}"style strings throughpsycopg2.sql.Identifier(), which double-quoted the entire dotted name into a single identifier ("marketing.email_events") — so any Postgres destination configured with a schema-qualified table name failed at SQL execution. The fix splits qualified names into separate schema and relationIdentifier()components, while keeping swap shadow/old suffixes attached to the relation name only (somarketing.email_eventsbecomesmarketing."email_events__drt_swap", not a single quoted identifier). Contributed by @Photon101. - Postgres: swap path fully migrated to
psycopg2.sqlcomposition (PR #485, fixes #483): prerequisite for the qualified-identifier fix above. The swap-path SQL (DROP TABLE IF EXISTS, CREATE TABLE LIKE, INSERT, ALTER TABLE RENAME, cleanup DROP) was still using f-string formatting; this commit migrates it topsycopg2.sql.SQL+Identifierfor safe composition. Originally queued for v0.7.2 but landed post-release; bundled into this patch because PR #498 depends on it. Contributed by @Photon101.
Who should upgrade
If you use a Postgres destination with schema-qualified table names (public.users, marketing.events, etc.) — upgrade now. v0.7.2 silently failed at SQL execution for these configs.
Versioning
Strict patch per VERSIONING.md. The v0.8.0 features queued on main ([Unreleased] in CHANGELOG) are not included here — they ship in v0.8.0 along with the Cloud Destinations theme.
Full Changelog: v0.7.2...v0.7.3
v0.7.2 — Production Ready follow-up #2
Theme: Production Ready follow-up #2. Closing out the v0.7 cycle items that didn't make v0.7.1 — opt-in anonymous telemetry, deprecation warnings in drt validate, and Postgres psycopg2.sql SQL composition hardening.
✨ Added
-
Opt-in anonymous usage telemetry (#263, PR #446) — A new
drt/telemetry.pymodule sends onesync_completedevent perdrt runwhen the user explicitly opts in. Off by default. HonorsDO_NOT_TRACK=1. Allow-listproperties(drt_version,python_version,os,source_type,destination_type,sync_mode,rows_synced,duration_seconds,status) — sync names, model SQL, destination URLs, credentials, and project paths are never transmitted. The wire envelope additionally carriesevent,distinct_id,timestamp, andapi_key. Configure withdrt config set telemetry.enabled trueorDRT_TELEMETRY=1. Endpoint defaults to PostHog Cloud EU; override withDRT_TELEMETRY_ENDPOINT/DRT_TELEMETRY_API_KEYfor self-hosted PostHog. Privacy posture and GDPR disclosure live in docs/telemetry.md. Contributed by @kiwamizamurai. -
Deprecation warnings in
drt validate(#467, PR #478) —validatenow readsdrt/deprecations.pyand surfaces⚠️ warnings for any deprecated config keys it finds in your sync YAMLs, in text output and as a per-syncdeprecationsarray under--output json. Exit code stays0(warnings are non-blocking). The registry currently has no active entries — it's the infrastructure for the next real deprecation per VERSIONING.md Step 1. Closes the Step 2 ("Add Tooling Support") TODO from #457. Contributed by @Muawiya-contact. -
Release-time telemetry key injection workflow (PR #481) —
publish-drt-core.ymlsed-injectsPOSTHOG_WRITE_KEYfrom secrets intodrt/telemetry.py:_DEFAULT_API_KEYbeforeuv build, with a smoke check that asserts injection happened. Fail-safe when the secret is unset (community forks ship with telemetry physically disabled —is_enabled()short-circuits toFalsewithout a key).
🔧 Changed
- Postgres destination: safe SQL composition via
psycopg2.sql(#442, PR #452) — Replaced f-string interpolation of table and column identifiers in_load_replace,_load_upsert,_build_insert_sql, and_build_upsert_sqlwithpsycopg2.sql.SQL/psycopg2.sql.Identifier. Eliminates a class of identifier-injection bugs in environments where table/column names are derived from config rather than hard-coded. Swap-path methods (_load_replace_swap/finalize_syncfrom #435 / #448) tracked for follow-up in #483. Contributed by @Khush-domadia.
📄 Documentation
- GDPR disclosure section in
docs/telemetry.md— Data controller (K. Masuda as natural person, transferable to a future legal entity), retention (1 year on Free tier; #482 tracks reducing to 90 days via API-based cleanup), erasure contact (drt.hub.dev@gmail.com), Art. 6(1)(a) opt-in consent + Art. 46(2)(c) Standard Contractual Clauses basis for the US-incorporated processor / EU-hosted storage split (DPA signed via PostHog's self-serve flow). - CHANGELOG correctness fix — The previous v0.7.1 release commit accidentally renamed the v0.7.0 heading to
[0.7.1]without preserving v0.7.1 content. This release restores[0.7.0]under its correct date and inserts the actual v0.7.1 entries (drt diff #413, cursor fix #475, on_error fail alignment #463, VERSIONING.md #457) that were missing.
🚧 Followups
- #482 —
chore(telemetry): cap event retention at 90 days via PostHog API cleanup (Free plan workaround) - #483 —
chore(postgres): extend psycopg2.sql migration to swap path(good first issue)
Upgrade
pip install -U drt-core==0.7.2Drop-in upgrade from v0.7.1 — no breaking changes. Telemetry is off by default; nothing leaves your machine until you run drt config set telemetry.enabled true.
Triage Collaborator update
🎉 @PFCAaron12 joined the triage team during this cycle (accepted at discussion #432), bringing the Triage Collaborator count to three alongside @Pawansingh3889 and @Muawiya-contact.
Contributors
Thanks to @kiwamizamurai (#446), @Muawiya-contact (#478), @Khush-domadia (#452), and the wider community for keeping the v0.7 cycle moving forward.
Full Changelog: v0.7.1...v0.7.2
v0.7.1 — Production Ready follow-up
Theme: Production Ready follow-up. Tail of the v0.7 cycle — record-level dry-run preview, watermark cursor correctness fix, on_error=fail alignment for the remaining HTTP destinations, and the new VERSIONING.md policy doc.
✨ Added
drt run --dry-run --diff(#413) — Record-level preview before deploying a sync. For queryable destinations (Postgres / MySQL / ClickHouse), shows added / updated (with field-levelold → newdiffs) / deleted records. For non-queryable destinations (REST API, Slack, HubSpot, Notion, file destinations) falls back to "sample mode" — first N records to be sent. Works in both text (rich tables) and--output json. New--diff-limit Nflag (default 20). Doc: docs/guides/dry-run-and-diff.md. Follow-ups: #468 Snowflake support, #469 Protocol method, #470 perf, #471--diff-fields, #472 API-based SaaS diff.- VERSIONING.md (#457, polished in #464) — Project's semver and deprecation policy pre-1.0. Documents what counts as a breaking change at each layer (CLI / config schema / Python API) and the deprecation cadence. Cross-linked from
CONTRIBUTING.mdPR checklist. Initial draft and follow-up polish contributed by @Muawiya-contact.
🐛 Fixed
- Watermark advance for tz-aware cursor values (#475) —
drt/engine/sync.pywas callingstr()directly on cursor field values, which for tz-aware datetimes (e.g. BigQueryTIMESTAMPcolumns from the Python BQ client) produced strings with a+00:00suffix. When user SQL ordefault_valuewas tz-naive, the boundary row matched again and re-fired on every subsequent run. The engine now normalizes tz-aware datetimes to naive UTC before stringifying. Reported by @K-Masuda-SL after a prod incident. on_error=failnot respected by Notion / REST API / Email SMTP destinations (#463, contributes to #365) — Three HTTP destinations continued processing the rest of the batch after the first failure even whenon_error: failwas configured. Now all three short-circuit and return on the first failure, matching the documented contract and the behavior of every other destination. New tests lock the semantic in across the webhook surface.
🚧 Deferred to v0.7.2
- Opt-in anonymous usage telemetry (#263, PR #446)
- Postgres
psycopg2.sqlSQL composition (#442, PR #452)
Upgrade
pip install -U drt-core==0.7.1Drop-in upgrade from v0.7.0 — no breaking changes.
Contributors
Thanks to @Muawiya-contact (#457, #464), @K-Masuda-SL (#475), and the wider community for keeping the v0.7 cycle moving forward.
Full Changelog: v0.7.0...v0.7.1
drt-core v0.7.0 — Production Ready
[0.7.0] - 2026-05-06
Theme: Production Ready. Reliability, observability, and correctness for syncs that run in production environments — graceful shutdown on SIGTERM/SIGINT, retry knobs per destination, atomic zero-downtime table replace, sync execution history, FK existence filtering, opinionated JSON column handling. Plus the first DWH destination (Snowflake), the GitHub Codespaces playground for zero-setup onboarding, and OPEN_CORE.md documenting the open core boundary.
This release closes 9 v0.7 milestone issues plus several spillover items shipped early.
Breaking Changes
None. Drop-in upgrade from v0.6.x.
Added
- Sync failure alerts (#414): Configure
alerts.on_failurein sync YAML to send Slack or generic HTTP webhook notifications when a sync ends withfailed > 0or raises an exception. Two target types in v0.7:slack(Slack incoming webhook) andwebhook(generic HTTP POST/PUT with optionalbody_template). Template variables:sync_name,error,rows_processed,duration_s,started_at. Dispatch is best-effort — alert failures are logged but never affect sync correctness or override the original exception. - Graceful shutdown on SIGTERM/SIGINT (#279):
drt runnow handlesSIGTERM(container stop, K8s pod eviction, Airflow cancellation) andSIGINT(Ctrl+C) cooperatively. Signal handler sets astop_eventchecked between batches so the in-flight batch always completes; state and watermark are persisted before exit. POSIX-conventional exit codes:130for SIGINT,143for SIGTERM. A 30-second watchdog force-exits if the current batch hangs — pair with K8sterminationGracePeriodSeconds: 60for a bounded shutdown window. NewSyncResult.interruptedflag lets integrations (Dagster / Airflow / Prefect) distinguish a clean cancellation from an error. See docs/guides/graceful-shutdown.md. - FK existence check via
lookups.check_only(#354): Filter source rows by whether a foreign key exists in the destination, without resolving a value. Setcheck_only: trueon a lookup (and omitselect); rows whose match key is missing in the destination table are filtered peron_miss(skiporfail). Common use case: BigQuery has prd-like data but the destination DB (e.g. staging) holds only a subset — silently drop rows pointing at non-existent FKs instead of failing the sync. Source columns are always preserved (no implicit drop) since the target name is just a label, not a destination column. Works alongside regular value-resolving lookups in the same sync. - Snowflake destination (#353): Write rows back to Snowflake tables. Supports
mode: insert(append) andmode: merge(upsert via temp staging table +MERGEstatement usingupsert_key). Auth viaaccount_env/user_env/password_env. Install:pip install drt-core[snowflake]. Contributed by @PFCAaron12. - Zero-downtime replace via staging table swap (#338):
sync.replace_strategy: swapenables truly atomic table replacement — drt writes to a shadow table ({table}__drt_swap) per batch and atomically renames it to the original at the end of the sync. Supported on PostgreSQL (transactionalALTER TABLE RENAME), MySQL (atomicRENAME TABLE), and ClickHouse (atomicEXCHANGE TABLES, requires 21.8+). Default remainstruncate(existing TRUNCATE → INSERT behavior). Follow-ups tracked in #433 (orphan auto-cleanup) and #434 (Snowflake support). - Per-destination retry override (#277): Each HTTP destination can now declare its own
retry: RetryConfigblock in YAML to override the sync-levelsync.retry. Useful when one destination has stricter rate limits or unusual failure modes (e.g. Notion 7 attempts while other destinations stay at the default 3). Priority:destination.retry>sync.retry> built-in defaults. Brings drt in line with the per-adapter retry knobs in dbt and dlt. Documentation:docs/guides/retry.md. - Sync execution history (#276): Every
drt runnow appends a record to.drt/history/<sync_name>.jsonlwith timestamp, status, rows synced/failed, duration, and errors. Inspect viadrt status --history [--sync NAME] [--limit N] [--output json]or via the newdrt_get_historyMCP tool. Configurable retention (history.retention_days, default 30) prunes old entries lazily on each append. Best-effort: history persistence never affects sync correctness. The CLI/MCP counterpart to the run-history UI in Census/Hightouch — brought to a Git-native, scriptable workflow. Documentation:docs/guides/sync-history.md. - GitHub Codespaces playground (#407, closes #283): Zero-setup "click and try" onboarding via the Open in GitHub Codespaces badge. The devcontainer installs
drt-core[duckdb], seeds a sample DuckDB warehouse, and ships two runnable examples —examples/duckdb_to_file(DuckDB → CSV) andexamples/duckdb_to_rest(DuckDB → REST API). Contributed by @safridwirizky. json_columnsconfig for explicit JSON serialization (#316): Declare which Postgres/MySQL destination columns hold JSON/JSONB data viajson_columns: [col1, col2]. Listed columns are wrapped with the driver-native JSON adapter (psycopg2.extras.Jsonfor Postgres,json.dumpsfor MySQL); unlisted columns receiving dict/list values raise an earlyValueErrorpointing at the missing column instead of failing deep inside the driver. Backward-compatible — when omitted, all dict/list values are auto-wrapped as before. Contributed by @armorbreak001.drt doctorcommand (#264): Diagnostics for the user's environment — checks Python version, drt version, project file existence, profile resolution, sync file count, optional extras installation status, and common environment variables. Pinpoints setup issues for new users without forcing them to read traceback output. Contributed by @pureqin.--quiet/-qflag fordrt run(#265): Suppresses banner / sync-result / summary / watermark output for CI and cron use cases where logs are noise.--quietwins over--verbosewhen both are passed;--output jsonis unaffected so structured output still flows. Contributed by @Pawansingh3889.drt test --output jsonanddrt test --dry-run(#366, #371): Bringsdrt testto feature parity withdrt run. JSON output gives CI integrations structured pass/fail data; dry-run prints the test plan (test name, target, type) without hitting any database. Contributed by @wahajahmed010.drt cloud pushstub command (#302): Placeholder Typer subcommand that prints an "enterprise cloud push" message and exits cleanly. Reserves the CLI surface so future enterprise integrations don't break user shell aliases / scripts. Re-landed under maintainer authorship after the original contributor (#308) didn't return to sign the CLA. See OPEN_CORE.md for what's free vs. enterprise.
Changed
- Connector registry (#381): Replaced hardcoded
isinstancechains in_get_destination()/_get_source()with a centralized registry (drt/connectors/registry.py). Adding a new connector no longer requires editingmain.py. Error messages now list available connectors on typo. Contributed by @Muawiya-contact. - REST API destination pagination (#260): Fetch data from paginated APIs. 3 strategies: offset/limit, cursor-based, HTTP Link headers. Contributed by @Muawiya-contact.
- Retry resolution helper (#277): Each HTTP destination now uses a single
resolve_retry(config.retry, sync_options)helper instead of duplicating its own_DEFAULT_RETRYfallback. Removes ~110 lines of dead code (the per-destination_DEFAULT_RETRYconstants were unreachable becauseSyncOptions.retryis always populated bydefault_factory).
Fixed
- PostgreSQL destination: crash on
dictvalues bound for JSONB columns — wrapped withpsycopg2.extras.Json(#315). Contributed by @armorbreak001. - Notion destination:
sync_options.retryoverride was silently ignored — Notion always used the hardcoded_DEFAULT_RETRY(3 attempts) regardless of user configuration. Now respects user-configured retry like every other HTTP destination (#438). Contributes to #365. BasicAuthcredentials fromsecrets.toml:BasicAuthwas the only auth type that couldn't resolve credentials from.drt/secrets.toml— it calledos.environ.get()directly while every other auth type went throughresolve_env(). Users storing BasicAuth credentials insecrets.tomlgot a misleading "env var not set" error. Now matches the other auth types (#386). Contributed by @armorbreak001.replace_strategy: swapignoredjson_columnsconfig (#448): When bothreplace_strategy: swap(#338) andjson_columns(#316) were configured on a Postgres or MySQL destination, swap mode silently bypassed the explicit JSON column declarations because_load_replace_swapdid not threadconfig.json_columnsthrough to_serialize_value. Now both strategies honour the config consistently — swap-mode dict values in unlisted columns raise the same fail-fastValueErroras truncate mode. ClickHouse is unaffected (its connector handles JSON encoding driver-side and has nojson_columnsconfig). Discovered post-rebase of #435 onto #382 — neither feature was wrong in isolation; the gap was an interaction artifact of parallel development.
v0.6.2
What's New
Added
-
watermark.default_value(#390): Configure a fallback cursor value for first-run incremental syncs. Prevents broken SQL (e.g.WHERE TIMESTAMP(...) >= TIMESTAMP('')) when no watermark file exists yet. Without a default, drt now raises a clear error with actionable guidance instead of silently rendering an empty string.sync: mode: incremental cursor_field: completed_at watermark: storage: gcs bucket: my-bucket key: watermarks/my_sync.json default_value: "2026-04-20 00:00:00" # used on first run
-
--cursor-valueCLI option (#390): Override the cursor/watermark value at runtime for backfill and recovery scenarios. The override takes highest priority in the fallback chain and the resulting watermark is persisted on success.drt run --select my_sync --cursor-value '2026-01-01 00:00:00' -
Watermark source observability (#391): Operators can now see where the cursor value came from —
storage,default_value, orcli_override— via structured INFO logs,--output jsonfields (watermark_source,cursor_value_used), and an end-of-run summary in text mode.
Fallback chain (priority order)
1. --cursor-value CLI flag
2. Watermark storage (GCS / BigQuery / local)
3. State manager (.drt/state.json)
4. watermark.default_value (YAML config)
5. Cursor template present → ValueError / No template → full extract
Install / Upgrade
pip install drt-core==0.6.2
# or
uv add drt-core==0.6.2Full Changelog: v0.6.1...v0.6.2