fix: DuckLake DuckDB-backed catalog attach incorrectly applies META_TYPE 'sqlite'#3871
fix: DuckLake DuckDB-backed catalog attach incorrectly applies META_TYPE 'sqlite'#3871Analect wants to merge 4 commits into
Conversation
drivername='duckdb' was handled identically to 'sqlite' in
build_attach_statement, adding META_TYPE 'sqlite' / META_JOURNAL_MODE 'WAL'
parameters that cause DuckDB-format catalog files to fail with:
Failed to execute query "PRAGMA journal_mode=WAL": file is not a database
Split the combined elif branch so drivername='duckdb' generates a clean attach:
ATTACH IF NOT EXISTS 'ducklake:{path}' AS {name} (DATA_PATH '...')
The DuckLakeCredentials docstring explicitly lists duckdb:///catalog.duckdb
as a valid catalog URI, confirming this is a supported use case.
Fixes dlt-hub#3870
|
@burnash ... is there anything I need tweaking to have this considered? Thanks. |
|
Hi @burnash and @rudolfix — I wanted to flag that CI is failing across most destinations and I'd appreciate some input before I make changes. The original fix is limited to Before I do anything, I wanted to check: were those extra commits added intentionally as part of the review process (e.g. to support testing), or are they something I should revert to bring the branch back to the minimal fix? Happy to clean the branch up either way — just want to make sure I'm not undoing something deliberate. |
drivername='duckdb' was handled identically to 'sqlite' in build_attach_statement, adding META_TYPE 'sqlite' / META_JOURNAL_MODE 'WAL' parameters that cause DuckDB-format catalog files to fail with:
Failed to execute query "PRAGMA journal_mode=WAL": file is not a databaseSplit the combined elif branch so
drivername='duckdb'generates a clean attach:The
DuckLakeCredentialsdocstring explicitly listsduckdb:///catalog.duckdbas a valid catalog URI, confirming this is a supported use case.Description
Problem
When using a DuckDB-backed DuckLake catalog (
catalog="duckdb:///catalog.duckdb"),pipeline.run()fails with:Root cause
build_attach_statementinsql_client.pyhandled bothdrivername='sqlite'and
drivername='duckdb'in a single branch, always appendingMETA_TYPE 'sqlite', META_JOURNAL_MODE 'WAL', META_BUSY_TIMEOUT 1000.META_TYPE 'sqlite'causes DuckLake to use SQLite operations on the catalogfile. For a DuckDB-format file this hits the SQLite
PRAGMA journal_mode=WALand raises "file is not a database".
Fix
Split the combined
elifinto two separate branches.drivername='duckdb'now generates a clean attach without
META_TYPE:Tests
Added
tests/load/pipeline/test_ducklake_attach.pywith unit tests forbuild_attach_statementcovering both the sqlite and duckdb cases. Testscan be run without external credentials.
Related Issues
Additional Context
Tests pass without external credentials or a live DuckLake instance:
```
pytest tests/load/pipeline/test_ducklake_attach.py -v
tests/load/pipeline/test_ducklake_attach.py::test_sqlite_catalog_includes_meta_type_sqlite PASSED
tests/load/pipeline/test_ducklake_attach.py::test_duckdb_catalog_excludes_meta_type_sqlite PASSED
tests/load/pipeline/test_ducklake_attach.py::test_duckdb_catalog_attach_format PASSED
3 passed in 1.06s
```