-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Re-use metadata of unaltered row groups when checkpointing a table #18395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…umns just to not do anything with them
Mytherin
added a commit
that referenced
this pull request
Jul 25, 2025
…Metadata blocks (#18398) Follow-up from #18395 This PR adds storage for `extra_metadata_blocks` in the row-group metadata when the latest storage version (v1.4.0) is used. This is a list of metadata blocks that are referenced by the row group metadata, but **not** present in the list of data pointers (`data_pointers`). These blocks can be present when either (1) columns are very wide due to e.g. being deeply nested, or (2) the final column pointer "crosses" the metadata block threshold, as `data_pointers` points only to the beginning of the column metadata. Usually this list is empty - and therefore storing this does not take up a lot of extra storage space. The presence of this list allows us to more efficiently re-use metadata as we know exactly which metadata blocks a row group points to, without having to do any additional deserialization. ### Only Flush Dirty Metadata blocks Previously our metadata manager would flush all metadata blocks, incurring a lot of unnecessary I/O now that we are doing a lot of metadata re-use. This PR reworks this so that we keep track of which blocks are dirty and only flush the dirty blocks. ### Performance Running the benchmark in #18395 again, we now get the following timings: | Operation | v1.3.2 | Re-Use | New | |------------|--------|-------|-------| | Checkpoint | 0.5s | 0.11s | 0.04s | | Full | 0.63s | 0.13s | 0.07s |
krlmlr
added a commit
to krlmlr/duckdb-r
that referenced
this pull request
Jul 26, 2025
Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395)
krlmlr
added a commit
to krlmlr/duckdb-r
that referenced
this pull request
Jul 26, 2025
Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395)
krlmlr
added a commit
to krlmlr/duckdb-r
that referenced
this pull request
Jul 26, 2025
Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395)
Tmonster
added a commit
to Tmonster/duckdb-r
that referenced
this pull request
Sep 8, 2025
bump iceberg to latest main [chore] Fix amalgamation build in progress_bar (duckdb/duckdb#18910) Bump inet & aws (duckdb/duckdb#18899) fix: refine query ETA display and Kalman filter stability (duckdb/duckdb#18880) Bump httpfs to v1.4-andium branch (duckdb/duckdb#18898) Encryption now encoded as a bit, centralizing in set/getter (duckdb/duckdb#18897) Add callback for when an extension fails to load, and also log this (duckdb/duckdb#18894) Keep base data scan state alive in ColumnData::Update call (duckdb/duckdb#18893) Expected errors 2053 (duckdb/duckdb#18892) fixing auto-specifying ciphers and remove double storage (duckdb/duckdb#18891) Add rowsort to upsert_default.test (duckdb/duckdb#18890) bump aws and iceberg (duckdb/duckdb#18889) [chore] Bump config test/configs/compressed_in_memory.json to new format (duckdb/duckdb#18888) [Dev] Fix footgun in `string_t::SetSizeAndFinalize` (duckdb/duckdb#18885) Json: no reinterpret<size_t*> (duckdb/duckdb#18886) [C API] Result schema of prepared statements (duckdb/duckdb#18779) Add `COPY (FORMAT BLOB)` to Andium too :^) (duckdb/duckdb#18884) Avoid automatically checkpointing if the database instance has been invalidated (duckdb/duckdb#18881) Update spatial+vss+sqlsmith in preparation for v1.4 (duckdb/duckdb#18882) Internal duckdb/duckdb#5796: Window Progress (duckdb/duckdb#18860) [Test] Small fixes to concurrent attach/detach test (duckdb/duckdb#18862) Update ducdkb iceberg hash (duckdb/duckdb#18873) Storage fuzzing + several fixes (duckdb/duckdb#18876) Bump mbedtls to v3.6.4 (duckdb/duckdb#18871) [minor] Incompatible DB error message: add newline (duckdb/duckdb#18861) Bump & remove patches for delta, avro, excel, encodings, fts (duckdb/duckdb#18869) Add a FORCE_DEBUG flag to force `-DDEBUG`, similar to FORCE_ASSERT (duckdb/duckdb#18872) Expected errors 2053 (duckdb/duckdb#18864) update duckdb azure extension ref for 1.4.0 (duckdb/duckdb#18868) Hold segment lock during GetColumnSegmentInfo (duckdb/duckdb#18859) Centralize attached database paths in a DatabaseFilePathManager which is shared across databases created through the same DBInstanceCache (duckdb/duckdb#18857) Add more encryption modes CTR and CBC (duckdb/duckdb#18619) Bump Ducklake (duckdb/duckdb#18825) No more `wal_encryption` flag (duckdb/duckdb#18851) Fix `NULL` path for `json_each`/`json_tree` (duckdb/duckdb#18852) Make ATTACH OR REPLACE atomic, keep list of used databases in MetaTransaction (duckdb/duckdb#18850) WAL <> DB File Match Fixes (duckdb/duckdb#18849) Add test_env to unit tester (duckdb/duckdb#18847) Merge ossivalis into main (duckdb/duckdb#18844) Bump MySQL/Postgres/SQLite (duckdb/duckdb#18848) Add OnBeginExtensionLoad callback (duckdb/duckdb#18842) Ignore null verification for statistics on structs (duckdb/duckdb#18813) Document storage version flag in CLI + minor rendering fix (duckdb/duckdb#18841) Add the `VARIANT` LogicalType (duckdb/duckdb#18609) Avoid printing '99 hours', given in most cases that means estimate is… (duckdb/duckdb#18839) Don't notify Py pkg when override git describe is set (duckdb/duckdb#18843) Add support for reading/writing native parquet geometry types (duckdb/duckdb#18832) Fix/run function in transaction (duckdb/duckdb#18741) Avoid expensive checkpoints and write amplification by appending row groups, and limiting vacuum operations for the last number of row groups (duckdb/duckdb#18829) fix: silence warnings about signed/unsigned conversions. (duckdb/duckdb#18835) fix: sanitize input for enable_logging (duckdb/duckdb#18830) Ensure a WAL file matches the DB file and checkpoint iteration (duckdb/duckdb#18823) Fix: Preserve database configuration flags for tab completion in DuckDB shell (duckdb/duckdb#18482) Extensions.yml: Pass down save_cache to inner workflows (duckdb/duckdb#18828) Fix format-fix runs on Linux (duckdb/duckdb#18827) Re-add accidentally removed check if copy_from is supported (duckdb/duckdb#18824) [Fix] Bug in fixed-size buffer when throwing out-of-memory (duckdb/duckdb#18769) For BC reasons - keep VARINT as alias for BIGNUM (duckdb/duckdb#18821) Add callback to get a list of copy options, use this to provide suggestions and to erase options from import that are only used during exporting (duckdb/duckdb#18812) fix: Add COLLATE NOCASE support to strpos function (duckdb/duckdb#18819) [CI] install libcurl4-openssl-dev with apt-get (duckdb/duckdb#18811) Provide failing file name in Parquet reader error messages (duckdb/duckdb#18814) Test runner: Expand '{UUID}' into a random UUID (duckdb/duckdb#18809) Expected errors 2053 (duckdb/duckdb#18810) Add support for non-aggregate window functions (duckdb/duckdb#18788) Fix some unindented interactions between `EMPTY_RESULT_PULLUP` and `MATERIALIZED` CTEs (duckdb/duckdb#18805) Typed macro parameters (duckdb/duckdb#18786) Internal duckdb/duckdb#3273: Hashed Sort Callbacks (duckdb/duckdb#18796) [chore] Fixup tidy-check on src/logging/log_manager.cpp by passing const & (duckdb/duckdb#18801) Fixup progress_bar: avoid converting doubles into int32_t unchecked (duckdb/duckdb#18800) Issue duckdb/duckdb#18767: Ignore Timestamp Offsets (duckdb/duckdb#18794) bump httpfs so it includes curl option (duckdb/duckdb#18691) Improve autocomplete suggestions (duckdb/duckdb#18773) Remove everything python-package related (duckdb/duckdb#18789) Support expressions as COPY file target (duckdb/duckdb#18795) fix: coalesce query progress updates to reduce terminal writes (duckdb/duckdb#18672) Task Scheduler: track exact task count, and re-signal on dequeue failure if there are tasks left (duckdb/duckdb#18792) Internal duckdb/duckdb#3273: Parallel Window Masks (duckdb/duckdb#18731) fix: improve speed of GetValue() for STRUCT type (duckdb/duckdb#18785) Add `memory_limit` parameter to `benchmark_runner`/`test_runner.py` (duckdb/duckdb#18790) Treat ENABLE_EXTENSION_AUTOINSTALL as the BOOL that it is (duckdb/duckdb#18778) Move row id logic to separate RowIdColumnData class instead of inlining it into the RowGroup (duckdb/duckdb#18780) Improve error messages for merge / vector reference (duckdb/duckdb#18777) Use microsecond resolution for printing the current timestamp (duckdb/duckdb#18776) Add `file_size_bytes` (de-)serialization (duckdb/duckdb#18775) Propagate `DUCKDB_*_VERSION` in extensions and tests (duckdb/duckdb#18774) [Test Fix] Forward output to file (duckdb/duckdb#18772) [CI] Adjust test configs post logger PR (duckdb/duckdb#18771) Revert "Use 1-based indexing for SQL-based JSON array extraction" (duckdb/duckdb#18758) Make `duckdb_log` return a TIMESTAMP_TZ (duckdb/duckdb#18768) Fix Path Typo in Extension's CMake Warning Message (duckdb/duckdb#18766) Fix index resolution when querying table with index via view (duckdb/duckdb#18319) Fix radix partitioning with more than 10 bits (duckdb/duckdb#18761) Add support for auto-globbing within a directory: if no matches are found for a specific path, we retry with `/**/*.[ext]` appended (duckdb/duckdb#18760) Refactor read_blob and read_text to use MultiFileFunction. (duckdb/duckdb#18706) Add missing expected errors to the test cases (next chunk) (duckdb/duckdb#18753) Minor logging fixes and more benchmarking (duckdb/duckdb#18755) Extensions.yml should also check converted_to_draft (duckdb/duckdb#18754) [Profiling] Add Profiling to Write Function (duckdb/duckdb#18724) Fixing lazy polars execution on query result (duckdb/duckdb#18749) Remove separate WAL encryption flag (duckdb/duckdb#18750) Add leak suppressions to nightly runs (duckdb/duckdb#18748) Append using a SQL query, instead of directly appending to a base table, and support user-provided queries through the QueryAppender (duckdb/duckdb#18738) removed placeholder client directories for node and jdbc, its been > 1 yr (duckdb/duckdb#18757) Add missing expected errors to the test cases (duckdb/duckdb#18746) Add OS X notarization for DuckDB CLI and libduckdb.dylib (duckdb/duckdb#18747) Use correct type for pushing collations in subqueries (duckdb/duckdb#18744) Merge ossivalis into main (duckdb/duckdb#18719) Secrets: if serialization_type is not specified, assume it's a key value secret (duckdb/duckdb#18743) [C API] Function to set a copy callback for bind data (duckdb/duckdb#18739) fix timetravel for default tables (duckdb/duckdb#18240) [unittest] SkipLoggingSameError() to make unittester report one failure per case (duckdb/duckdb#18270) Use 1-based indexing for SQL-based JSON array extraction (duckdb/duckdb#18735) Add (CSV) file logger (duckdb/duckdb#17692) feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641) Internal duckdb/duckdb#5669: Loop Join Thresholds (duckdb/duckdb#18733) Fix PIVOT in multiple statements (duckdb/duckdb#18729) Minor fixes for other catalogs - mostly checking `IsDuckTable()` for unsupported operations (duckdb/duckdb#18720) Added support for blob<->uuid conversions (duckdb/duckdb#18027) #Fix 18558: add row_group scan fast path (duckdb/duckdb#18686) Improved grammar generation script (duckdb/duckdb#18716) Correctly throw an error when too few columns are supplied in MERGE INTO INSERT (duckdb/duckdb#18715) [Profiling] Add Profiling to Read Function (duckdb/duckdb#18661) Fix issue with materialized CTE optimization in flatten_dependent_join (duckdb/duckdb#18714) Add Option to Allocate Using an Arena in `string_t` (duckdb/duckdb#17992) Internal duckdb/duckdb#3273: Hashed Sort States (duckdb/duckdb#18690) Python-style positional/named arguments for macro's (duckdb/duckdb#18684) [Fix] Correctly handle table and index chunks in WAL replay buffering (duckdb/duckdb#18700) Make ART construction iterative via ARTBuilder (duckdb/duckdb#18702) Correctly handle collations for IN (subquery) (duckdb/duckdb#18698) Hold row group lock for entire call of MoveToCollection (duckdb/duckdb#18694) Expected errors 2053 (duckdb/duckdb#18695) Issue duckdb/duckdb#18457: DateTrunc Simplification Warnings (duckdb/duckdb#18687) [Python SQLLogicTest] Add `test/sql/pragma/profiling/test_profiling_all.test` to the SKIPPED_TESTS set (duckdb/duckdb#18689) Make sure parse errors are wrapped in ErrorData (duckdb/duckdb#18682) Internal duckdb/duckdb#5366: Window State Arguments (duckdb/duckdb#18676) Expected errors 2053 (duckdb/duckdb#14213) Add `date_trunc()` simplification rules (duckdb/duckdb#18457) Fix the issue where delta_for isn't used in bitpacking when for is unavailable (duckdb/duckdb#18616) fix error message related to wrong memory unit (duckdb/duckdb#18671) Grab lock and double-check that column is not loaded in MoveToCollection (duckdb/duckdb#18677) Correctly allocate uncompressed string data in ZSTD for many giant strings (duckdb/duckdb#18678) Internal duckdb/duckdb#5662: IEJoin Test Plans (duckdb/duckdb#18680) [ Python SQLLogic Tester ] Add `MERGE_INTO` to `statement.type` enum in `result.py` (duckdb/duckdb#18675) Internal duckdb/duckdb#5366: Window Interrupt Arguments (duckdb/duckdb#18651) Correctly set weights in reservoir sample when switch to slow sampling (duckdb/duckdb#18563) [Dev] Add script to create patch from changes in an extension repository (duckdb/duckdb#18620) Python test runner: Fix hash comparison error output (duckdb/duckdb#18626) [CI] skip building encodings extension in InvokeCI (duckdb/duckdb#18655) CLI: Make ETA more of an estimate, and support large_row_rendering for footers (duckdb/duckdb#18656) Merge ossivalis into main (duckdb/duckdb#18644) Python test runner: Fix result check for `COPY ... RETURN_STATS` queries (duckdb/duckdb#18625) Add 1.4 release codename (duckdb/duckdb#18652) Change arrow() to export record batch reader (duckdb/duckdb#18642) bump spatial (on main) (duckdb/duckdb#18197) bump avro to v1.4 (duckdb/duckdb#18434) Make more configs into generic settings (duckdb/duckdb#18592) Add "Hash Zero" verification CI run (duckdb/duckdb#18623) feat: add ETA to progress bar in DuckDB CLI (duckdb/duckdb#18575) wrap httplib ::max() call in `WIN_32` check (duckdb/duckdb#18590) [ART] ART::Erase refactoring (duckdb/duckdb#18595) [CSV Sniffer] Fix type detection issue with union and empty columns (duckdb/duckdb#18606) Add Field IDS to multi file reader for positional deletes (duckdb/duckdb#18617) Re-add `hugeint` to `__internal_compress_string` (duckdb/duckdb#18622) Adjust filter pushdown to latest polars release (duckdb/duckdb#18624) parquet/parquet_multi_file_info.cpp: fix move from stack (duckdb/duckdb#18634) Issue duckdb/duckdb#18631: Streaming Windowed Quantile (duckdb/duckdb#18636) Fix serialization backwards compatability for varargs functions (duckdb/duckdb#18596) [Profiling] Add client context into more read functions (duckdb/duckdb#18514) [CI] Don't zip and upload Code Coverage tests results when Code Coverage got cancelled (duckdb/duckdb#18607) [Test] Fix test case and a benchmark (duckdb/duckdb#18610) Update README.md (duckdb/duckdb#18614) correctly setting log transaction id in ThreadContext (duckdb/duckdb#18536) [Fix] Hidden test failure in test_struct_update.test (duckdb/duckdb#18598) Increment storage version to enable `DICT_FSST` in benchmark file (duckdb/duckdb#18588) fix hidden merge conflict (duckdb/duckdb#18589) Adds a function for updating and adding values in a struct (duckdb/duckdb#15533) Pushdown filters on coalesced outer join keys compared for equality under the join condition (duckdb/duckdb#18169) fix: libduckdb.so missing soversion (duckdb/duckdb#18305) String dictionary hash cache (duckdb/duckdb#18580) Force `LIST`/`ARRAY` child vectors on a Parquet single page (duckdb/duckdb#18578) fix: use thousands separator and decimal for row counts in`duckbox` output format (duckdb/duckdb#18564) Flip left/right delim join based on cardinalities (duckdb/duckdb#18552) [Fix] Adjust shrink threshold back to original count > SHRINK_THRESHOLD (duckdb/duckdb#18582) [CSV Sniffer] Fixing bug of not properly setting skipped rows from sniffer (duckdb/duckdb#18555) fix: add formatting to explain row counts (duckdb/duckdb#18566) Delete FUNDING.json Update FUNDING.json Create FUNDING.json [Indexes] Buffer-managed indexes part 3: segment handle for Node48 and Node256 (duckdb/duckdb#18567) Rename the Varint type to Bignum (duckdb/duckdb#18547) Add compile option standalone-debug for clang (duckdb/duckdb#17433) Fixing compilation with -std=cpp23 (duckdb/duckdb#18557) [easy] [no-op] Minor optimization on iterator lookup (duckdb/duckdb#15349) optimize/parquet: generate movable types for parquet (duckdb/duckdb#18510) Check if `heap_block_ids` is empty before getting start/end when destroying chunks in `TupleDataCollection` (duckdb/duckdb#18556) Implement special-case `VARCHAR` to `JSON[]` casts and vice versa (duckdb/duckdb#18541) [ART] Node::Free refactoring (duckdb/duckdb#18544) [Fix] Follow-up PR to only delete unique row IDs (duckdb/duckdb#18545) Restore missing `test/configs/small_block_size.json` file (duckdb/duckdb#18507) Unittester: Add the `--sort-style` parameter that allows a fallback comparison where results are sorted according to a given sort-style (duckdb/duckdb#18542) Allow overriding openssl version for FIPS compliance (duckdb/duckdb#18499) fix: improve handling variant nulls and nested types (duckdb/duckdb#18538) Add support for explicit clean-up routine in test config, and exit multi-statement execution when an error is encountered (duckdb/duckdb#18539) Use global index, not local id when creating filters in `MultiFileColumnMapper` (duckdb/duckdb#18537) Add `StatementVerifier` for `EXPLAIN` (duckdb/duckdb#18529) Add CAPI to retrieve client context for table functions (duckdb/duckdb#18520) fix: support both field orders for variant struct (duckdb/duckdb#18532) [Varint] Negation, Subtraction and Over/under-flow checking (duckdb/duckdb#18477) ALP test: skip TPC-DS 67 - it is not consistent with floating point numbers (duckdb/duckdb#18528) Consistently detect JSON schema indepent of number of threads (duckdb/duckdb#18522) Internal duckdb/duckdb#16560: Numeric TRUNC Precision (duckdb/duckdb#18511) Dynamically determine dictionary size limit in Parquet writer (if unset) (duckdb/duckdb#18356) Fix incorrect character encoding in GetLastErrorAsString on Windows (duckdb/duckdb#18431) Fix: Write the salt together with the HT offset when determining the value for key comparison (duckdb/duckdb#18374) When tracking evicted_data_per_tag, track actual size on disk after temp file compression (duckdb/duckdb#18521) Adding WITH ORDINALITY to DuckDB (duckdb/duckdb#16581) ParserException for Pragma with named parameters (duckdb/duckdb#18506) Temporarily excluding `Build Pyodide wheel` for Python 3.11 because it fails to build `WASM` wheels (duckdb/duckdb#18508) Remove `immediate_transaction_mode` from DB config options (duckdb/duckdb#18516) Allow expressions to be used in ATTACH / COPY options (duckdb/duckdb#18515) Fix several bugs/fuzzer issues (duckdb/duckdb#18503) Fix: Remove overly strict assertion on empty string value (duckdb/duckdb#18504) Change UNICODE to UTF8 (duckdb/duckdb#17586) Merge ossivalis (duckdb/duckdb#18502) fix: add missing space in AttachInfo::ToString() (duckdb/duckdb#18500) julia: config improvements (duckdb/duckdb#17585) [Profiling] Add client context into read functions (duckdb/duckdb#18438) Fix accidental internal exception in type transformation (duckdb/duckdb#18492) add delta linux back to ci (duckdb/duckdb#18491) Change ctrl-a/ctrl-e to move to start/end of line, not buffer (duckdb/duckdb#18490) Unify `ON CONFLICT` and `MERGE INTO` (duckdb/duckdb#18480) Internal duckdb/duckdb#5384: Window Sorting Polish (duckdb/duckdb#18484) re-nable extensions in invokeci (duckdb/duckdb#18476) SUM and + Operator for Varints (duckdb/duckdb#18424) Internal duckdb/duckdb#5366: WindowDeltaScanner (duckdb/duckdb#18468) Merge ossivalis (duckdb/duckdb#18456) Bump postgres to latest main (duckdb/duckdb#18464) Internal duckdb/duckdb#5385: WindowMergeSortTree Sort Update (duckdb/duckdb#18461) Add support for generic settings, and move many settings over to generic settings (duckdb/duckdb#18447) Buffer index appends during WAL replay (duckdb/duckdb#18313) Internal duckdb/duckdb#5384: WindowDistinctAggregator Sort Update (duckdb/duckdb#18442) Add support for "template" types (duckdb/duckdb#18410) Update pyodide build to 0.28.0 (duckdb/duckdb#18446) Parquet: add row-group ordinal during writing encryption (duckdb/duckdb#18433) Include pyodide build configuration (duckdb/duckdb#18183) [Fix] Block size nightly (duckdb/duckdb#18425) Internal duckdb/duckdb#5368: WindowNaiveAggregator Sort Update (duckdb/duckdb#18409) Internal duckdb/duckdb#5367: SortedAggregateFunction Sort Update (duckdb/duckdb#18408) Refactor extension CI to use extension-ci-tools (duckdb/duckdb#18361) Correctly fetch only base column data in ColumnData::FetchUpdateData (duckdb/duckdb#18423) feat: remove anything following `?` in database name (duckdb/duckdb#18417) Merge `v1.3-ossivalis` in `main` (duckdb/duckdb#18401) Add support for table_constraints of AdbcConnectionGetObjects() (duckdb/duckdb#18181) Add DuckLake back in (duckdb/duckdb#18405) Internal duckdb/duckdb#5294: TIME_NS C API (duckdb/duckdb#18215) Remove incorrect assertion (duckdb/duckdb#18404) [ Python SQLLogic Tester ] Add `MERGE_INTO` statement to duckdb python (duckdb/duckdb#18402) CI: Add separate job for discussion mirroring (duckdb/duckdb#18407) Wrap runner.ExecuteFile, otherwise cleanup is not properly performed (duckdb/duckdb#18400) Internal duckdb/duckdb#3273: Window Hashed Sort (duckdb/duckdb#18337) Store extra metadata blocks in RowGroupPointer, and only flush dirty Metadata blocks (duckdb/duckdb#18398) CI: Fix Discussion mirroring (duckdb/duckdb#18397) Record whether or not cross products are implicit or not, and use this for converting queries back to SQL (duckdb/duckdb#18394) Correct and consistent integer arithmetic error messages (duckdb/duckdb#18393) Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395) Approx database count system function (duckdb/duckdb#18392) Re-use table metadata when table is not altered during checkpoint (duckdb/duckdb#18390) Bump httpfs (duckdb/duckdb#18388) Uncomment skipped decimal REE tests (duckdb/duckdb#18372) Re-enable but deprecate CORE_EXTENSIONS in CMakeLists.txt (duckdb/duckdb#18377) Add missing ninja to workflow file (duckdb/duckdb#18373) Merge `v1.3-ossivalis` into `main` (duckdb/duckdb#18364) Pass `AttachOptions` to `attach` method, and turn `StorageExtensionInfo` into an `optional_ptr` (duckdb/duckdb#18368) Split up out-of-tree extensions into separate files, and allow out-of-tree extensions to be built using BUILD_EXTENSIONS={ext_name} (duckdb/duckdb#18357) Python external dispatch param fixes (duckdb/duckdb#18359) Revert "[unittest] - fix doubled error headers on `Unexpected failure`" (duckdb/duckdb#18355) Add support for checkpointing in-memory tables (duckdb/duckdb#18348) [C API] Expose expressions and use them in scalar function binding (duckdb/duckdb#18142) Extend PEG parser grammar (duckdb/duckdb#18221) [unittest] - fix doubled error headers on `Unexpected failure` (duckdb/duckdb#18314) Fix condition indexes in join filter pushdown (duckdb/duckdb#18341) download Real Nest data in quiet mode (duckdb/duckdb#18346) Fix debug error in join order optimizer (duckdb/duckdb#18344) Aarch64 backport (duckdb/duckdb#18345) add the from-table-function as parameter to copy-from-bind (duckdb/duckdb#18004) feat: making Parquet write RowGroup.total_compressed_size (duckdb/duckdb#18307) Make storage-version a test parameter (duckdb/duckdb#18324) New Arrow C-API (duckdb/duckdb#18246) feat: Parquet extension add row_group_compressed_size (duckdb/duckdb#18294) Merge ossivalis into main (duckdb/duckdb#18272) SHOW TABLES FROM <qualified_name> (duckdb/duckdb#18179) Add target for installing Python deps. (duckdb/duckdb#18285) Use `FromEpochSeconds` instead of `FromTimeT` in `FileSystem::GetLastModifiedTime` (duckdb/duckdb#18281) [Fix] Adjust test to run with different block sizes (duckdb/duckdb#18277) Use DuckDB cast infrastructure in fmt for new uhugeint/hugeint code (duckdb/duckdb#18275) Use set for row ID scanning during index scans (duckdb/duckdb#18274) Add support for RETURNING to MERGE INTO (duckdb/duckdb#18271) Support HUGEINT in printf and format (duckdb/duckdb#13277) Expanded autocomplete suggestions (duckdb/duckdb#18243) [Parquet] Add read support for the `VARIANT` LogicalType (with shredded encoding) (duckdb/duckdb#18224) Reduce copy in Vector::Reinterpret (duckdb/duckdb#18264) Fixes for gcc 15 (duckdb/duckdb#18261) Fix dictionary-related assertions (duckdb/duckdb#18260) Allow for static libs from extension dependencies to be bundled (duckdb/duckdb#18226) disable WebAssembly duckdb-wasm builds job in NightlyTests triggered by 'workflow_dispatch' event (duckdb/duckdb#18129) Bunch of loosely connected test/CI fixes (duckdb/duckdb#18254) update run_extension_medata_tests.sh (duckdb/duckdb#17976) fixes for some minor llvm 20 complaints (duckdb/duckdb#18257) Fix integer overflow in sequence vector (duckdb/duckdb#18245) Add type safety to `FlatVector::GetData<T>`, `ConstantVector::GetData<T>` and `UnifiedVectorFormat::GetData<T>` (duckdb/duckdb#18256) Slightly higher memory limit for test (duckdb/duckdb#18235) Improve descriptions of thresholds vars affecting join algorithm selection (duckdb/duckdb#17377) Add support for geoarrow encoded geometries in geoparquet files. (duckdb/duckdb#17942) Dictionary functions (duckdb/duckdb#18127) Better `NULL` handling in `TupleDataLayout` (duckdb/duckdb#18069) Track `DataChunk` memory usage in various places (duckdb/duckdb#18191) [Parquet] Add read support for the `VARIANT` LogicalType (duckdb/duckdb#18187) Bugduckdb/duckdb#18163 Fix STDDEV_SAMP undeterminism (duckdb/duckdb#18210) Internal duckdb/duckdb#5264: NLJ Not Distinct (duckdb/duckdb#18216) Improve Parquet reader `NULL` statistics and compress all-`NULL` columns using `CompressedMaterialization` (duckdb/duckdb#18217) Get type of encoded `SortKey` from `TupleDataLayout` (duckdb/duckdb#18218) ci(pyodide): enable WASM exceptions on the latest pyodide build (duckdb/duckdb#18173) Temporary file encryption (duckdb/duckdb#18208) More internal-linkage (duckdb/duckdb#18177) Two-rowID-leaf support in the conflict manager and general refactoring (duckdb/duckdb#18194) [Parquet][Dev] Update the vendored `parquet.thrift` to `3ce0760` (duckdb/duckdb#18195) Parquet reader logging (duckdb/duckdb#18172) Merge `v1.3-ossivalis` into `main` (duckdb/duckdb#18188) [Profiling] Move the client context into more write functions (duckdb/duckdb#17875) Check if `GetLastSegment` is not `nullptr` in `ColumnData::RevertAppend` (duckdb/duckdb#18171) Reduce lock contention for the instance cache (duckdb/duckdb#18079) fix bug with allowed_paths (duckdb/duckdb#18176) Avoid `realloc` in CSV writer (duckdb/duckdb#18174) fix typo (duckdb/duckdb#18165) Resolve some small build issues (duckdb/duckdb#18162) Implement `replace_type` function (duckdb/duckdb#18077) Issue duckdb/duckdb#17683: TIME_NS Compilation (duckdb/duckdb#18053) Add support for AdbcConnectionGetObjects(table_type) (duckdb/duckdb#18066) Detect when updates have no effect, and skip performing the actual updates if we encounter these nop updates (duckdb/duckdb#18144) Add support for `MERGE INTO` (duckdb/duckdb#18135) Improve sort key comparison performance (duckdb/duckdb#18131) set ::error:: annotations for test runners (duckdb/duckdb#18072) Internal duckdb/duckdb#3273: Window Task Generation (duckdb/duckdb#18113) Update description of 'arrow_lossless_conversion' (duckdb/duckdb#18046) [chore] Merge v1.3-ossivalis on main (duckdb/duckdb#18109) ci: build duckdb against the latest emscripten (duckdb/duckdb#18110) Don't throw `InternalException` in `Sort::Sink` (duckdb/duckdb#18105) TPC-DS: Use BIGINT fields (duckdb/duckdb#18098) [CI] don't run jobs on draft PRs (duckdb/duckdb#18016) Fix correlated subquery unnest fail (duckdb/duckdb#18092) [CSV Reader] Prohibit options delim and sep in same read_csv call (duckdb/duckdb#18096) Add start/end offset percentage options to Python test runner (duckdb/duckdb#18091) Avoid running DraftPR.yml until timeout if token is missing (duckdb/duckdb#18090) Unittest: Configure skip error messages (duckdb/duckdb#18087) Switch to Optional for type hints in polars lazy dataframe function (duckdb/duckdb#18078) Issue duckdb/duckdb#18071: Temporal inf -inf (duckdb/duckdb#18083) Fix some scaling issues (duckdb/duckdb#17985) Unittester: add `on_new_connection` + `on_load` + `skip_tests` options (duckdb/duckdb#18042) Use `timestamp_t` instead of `time_t` for file last modified time (duckdb/duckdb#18037) Add support for class-based expression iteration (duckdb/duckdb#18070) fix star expr exclude error (duckdb/duckdb#18063) Adding WAL encryption (duckdb/duckdb#17955) Avoid adding commands read from a file to the shell history (duckdb/duckdb#18057) Remove match-case statements from polars_io.py (duckdb/duckdb#18052) Merge ossivalis into main (duckdb/duckdb#18036) Add ppc64le spin-wait instruction (duckdb/duckdb#17837) Unittest: Add skip_compiled option that can be used to skip built-in C++ tests (duckdb/duckdb#18034) [Explain] Add the YAML format for EXPLAIN statements (duckdb/duckdb#17572) Remove Linux (32 Bit) job (duckdb/duckdb#18012) [Chore] Minor conflict manager refactoring (duckdb/duckdb#18015) Fix duckdb/duckdb#18007: correctly execute expressions with pivot operator (duckdb/duckdb#18020) c-api to copy vector with selection (duckdb/duckdb#17870) Add support to produce Polars Lazy Dataframes (duckdb/duckdb#17947) Implement consumption and production of Arrow Binary View (duckdb/duckdb#17975) Rework extension loading to go through thread-safe ExtensionManager (duckdb/duckdb#17994) Issue duckdb/duckdb#5123: make_timestamp_ms (duckdb/duckdb#17908) Flag to disable database invalidation (duckdb/duckdb#17938) [Fix] Reset profiling info before preparing a query (duckdb/duckdb#17940) Issue duckdb/duckdb#5144: AsOf Join Threshold (duckdb/duckdb#17979) [CI] Skip some workflows when updating out of tree extensions SHA (duckdb/duckdb#17949) Merge v1.3-ossivalis into main (duckdb/duckdb#17973) [nested] Allow fixed-size arrays to be unnested (duckdb/duckdb#17968) Unit Tester Configuration (duckdb/duckdb#17972) [Nested] Optimize structs in `LIST_VALUE` (duckdb/duckdb#17169) Enable building spatial and encodings extensions (duckdb/duckdb#17960) [Nested] Add `struct_position` and `struct_contains` functions (duckdb/duckdb#17819) Visual Studio 17 (2022) fixes (duckdb/duckdb#17948) [CI Nightly Fix] Skip logging test if not standard block size (duckdb/duckdb#17957) Add v1.3-ossivalis to Cross version workflow (duckdb/duckdb#17906) Unittester failures summary (duckdb/duckdb#16833) Block based encryption (duckdb/duckdb#17275) Do not dispatch JDBC/ODBC jobs in release CI runs (duckdb/duckdb#17937) fix use after free in adbc on invalid stmt (duckdb/duckdb#17927) Fix empty BP block when writing parquet (duckdb/duckdb#17929) Leverage `VectorType` in `ColumnDataCollection` (duckdb/duckdb#17881) Merge v1.3 into main (duckdb/duckdb#17897) Make CTE Materialization the Default Instead of Inlining (duckdb/duckdb#17459) Use an arena linked list for the physical operator children (duckdb/duckdb#17748) Reword GenAI policy (duckdb/duckdb#17895) Issue duckdb/duckdb#17861: FILL Argument Types (duckdb/duckdb#17888) Update function descriptions and examples for list, array, lambda functions (duckdb/duckdb#17886) Add GenAI policy (duckdb/duckdb#17882) Issue duckdb/duckdb#17849: Test FILL Duplicates (duckdb/duckdb#17869) Add STRUCT to MAP cast function (duckdb/duckdb#17799) Issue duckdb/duckdb#17040: FILL Secondary Sorts (duckdb/duckdb#17821) Issue duckdb/duckdb#17153: Window Order Columns (duckdb/duckdb#17835) julia: add missing methods from C-API (duckdb/duckdb#17733) Function Serialization: adapt to removal of overloads by explicitly casting if argument types have changed (duckdb/duckdb#17864) [Indexes] Buffer-managed indexes part 2: segment handle for base nodes (duckdb/duckdb#17828) duckdb/duckdb#17853 Enable flexible page sizes and update Android NDK to r27 in workflow. (duckdb/duckdb#17854) Internal duckdb/duckdb#4991: Remove Epoch_MS(MS) (duckdb/duckdb#17816) Add `duckdb_type` column to parquet_schema (duckdb/duckdb#17852) Merge v1.3 into main (duckdb/duckdb#17851) Fix ICE with Windows ARM64 (duckdb/duckdb#17844) fix: escape using_columns on JoinRef::ToString (duckdb/duckdb#17839) Merge130 (duckdb/duckdb#17833) Replace string for const data ptr in encryption api (duckdb/duckdb#17825) Pushdown pivot filter (duckdb/duckdb#17801) Merge v1.3 into main (duckdb/duckdb#17806) Add qualified parameter to Python GetTableNames API (duckdb/duckdb#17797) Fix propagatesNullValues for case expr (duckdb/duckdb#17796) [Profiling] Propagate the ClientContext into file handle writes (duckdb/duckdb#17754) Ensure we use the same layout in `RadixPartitionedHashTable` and `GroupedAggregateHashTable` (duckdb/duckdb#17790) [Julia] api docs improvements (duckdb/duckdb#15645) [Indexes] Buffer-managed indexes part 1: segment handles (duckdb/duckdb#17758) Mark Upper/LowerComparisonType as const (duckdb/duckdb#17773) Support glibc 2.28 environments (duckdb/duckdb#17776) Pass `ExtensionLoader` when loading extensions, change extension entry function (duckdb/duckdb#17772) Expose file_size_bytes and footer_size in parquet_file_metadata (duckdb/duckdb#17750) [CAPI] Expose ErrorData (duckdb/duckdb#17722) Rename decorator from test_nulls to null_test_parameters (duckdb/duckdb#17760) re-add httpfs apply_patches (duckdb/duckdb#17755) Deprecate windows-2019 runners (duckdb/duckdb#17745) csv_scanner: correct code comment (duckdb/duckdb#17735) Adding additional authenticated data for encryption (duckdb/duckdb#17508) [SQLLogicTester] Introduce `reset label <query label>` in the tester (duckdb/duckdb#17729) Fix windows-2025 build errors (duckdb/duckdb#17726) Aggregation performance (duckdb/duckdb#17718) fix linux extension ci (duckdb/duckdb#17720) Correctly setting the delim offset (duckdb/duckdb#17716) Sorting followup (duckdb/duckdb#17717) Revert "set default for MAIN_BRANCH_VERSIONING to false" (duckdb/duckdb#17708) ClientBufferManager wrapper to access the client context in the buffer manager (duckdb/duckdb#17699) Issue duckdb/duckdb#17040: FILL Window Function (duckdb/duckdb#17686) Merge v1.3-ossivalis into main (duckdb/duckdb#17690) New Sorting Implementation (duckdb/duckdb#17584) Output hashes in unittest and fix order (duckdb/duckdb#17664) Enable profiling output for all operator types (duckdb/duckdb#17665) [C API] Expose duckdb_scalar_function_bind_get_extra_info (duckdb/duckdb#17666) Add rowsort in generate_series test duckdb/duckdb#43 (duckdb/duckdb#17675) bump DuckDB_jll to v1.3.0 (duckdb/duckdb#17677) C API tidying (duckdb/duckdb#17623) fix extension troubleshooting link (duckdb/duckdb#17616) Move query profiler's EndQuery after commit/rollback (duckdb/duckdb#17595) Update function descriptions and examples (duckdb/duckdb#17132) Add support for ToSqlString for union types (duckdb/duckdb#17513) Remove redundant code path in the ConflictManager (duckdb/duckdb#17562) change exception type to not be an internal exception (duckdb/duckdb#17551) Python package devexp improvements (duckdb/duckdb#17483)
Mytherin
added a commit
that referenced
this pull request
Sep 19, 2025
…erialize, and add logging to checkpoints (#19055) This PR fixes an issue where column segments would not be re-aligned correctly upon `Deserialize` when the row group itself was re-aligned due to a vacuum operation. This could occur following an optimization done in #18395 that would postpone de-serializing column data in `RowGroup::MoveToCollection`. This could lead to an internal exception happening on checkpoint, which would occur when we would vacuum row groups, followed by loading previously unloaded segments. In addition, this PR also adds logging to vacuum/checkpoints.
Mytherin
added a commit
that referenced
this pull request
Nov 3, 2025
…r, as that pointer might not always be valid (#19588) When enabling the new [experimental metadata re-use](#18395), it is possible for metadata of *some* row groups to be re-used. This can cause linked lists of metadata blocks to contain invalid references. For example, when writing a bunch of row groups, we might get this layout: ``` METADATA BLOCK 1 ROW GROUP 1 ROW GROUP 2 (pt 1) NEXT BLOCK: 2 -> METADATA BLOCK 2 ROW GROUP 2 (pt 2) ROW GROUP 3 ``` Metadata is stored in a linked list (block 1 -> block 2) - but we don't need to traverse this linked list fully. We store pointers to individual row groups, and can start reading from their position. Now suppose we re-use metadata of `ROW GROUP 1`, but not of the other row groups (because e.g. they have been updated / changed). Since this is fully contained in `METADATA BLOCK 1`, we can garbage collect `METADATA BLOCK 2`, leaving the following metadata block: ``` METADATA BLOCK 1 ROW GROUP 1 ROW GROUP 2 (pt 1) NEXT BLOCK: 2 ``` Now we can safely read this block and read the metadata for `ROW GROUP 1`, **however**, this block contains a reference to a metadata block that is no longer valid and might have been garbage collected. This revealed a problem in the `MetadataReader`. In the current implementation of the `MetadataReader` - when pointing it towards a block, it would eagerly try to figure out the metadata location of *the next block*. This is normally not a problem, however, with these invalid chains, we might try to resolve a block that has been freed up already - causing an internal exception to trigger: ``` Failed to load metadata pointer (id %llu, idx %llu, ptr %llu) ``` This PR resolves the issue by making the MetadataReader lazy. Instead of eagerly resolving the next pointer, we only do this when it is actually required.
mach-kernel
added a commit
to spiceai/duckdb
that referenced
this pull request
Nov 14, 2025
Squashed commit of the following:
commit 68d7555f68bd25c1a251ccca2e6338949c33986a
Merge: 3d4d568674 9c6efc7d89
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Nov 11 11:59:30 2025 +0100
Fix minor crypto issues (#19716)
commit 3d4d568674d1e05d221e8326c0d180336c350f18
Merge: 7386b4485d 0dea05daf8
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Nov 11 10:58:18 2025 +0100
Logs to be case-insensitive also at enable_logging callsite (#19734)
Currently `CALL enable_logging('http');` would succeed, but then select
an empty subset of the available logs (`http` != `HTTP`), due to a quirk
in the code. This PR fixes that up.
commit 7386b4485d23bc99c9f6efab6ce0e33ecc23222b
Merge: 1ef3444f09 d4a77c801b
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Nov 11 09:28:13 2025 +0100
Add explicit Initialize(HTTPParam&) method to HTTPClient (#19723)
This allow explicit re-initialization of specific parts of HTTPClient(s)
This diff would allow patterns such reusing partially constructed (but
properly re-initialized) HTTPClient objects
```c++
struct CrossQueryState {
// in some state kept around
unique_ptr<HTTPClient>& client;
};
void SomeFunction() {
// ...
http_util.Request(get_request, client);
// some more logic, same query
http_util.Request(get_request, client);
}
void SomeOtherFunction() {
// Re-initialize part of the client, given some settings might have changed
auto http_params = HTTPParams(http_util)
client->Initialize(http_params);
// ...
http_util.Request(get_request, client);
// some more logic, same query
http_util.Request(get_request, client);
}
```
Note that PR is fully opt-in from users, while if you implement a
file-system abstraction inheriting from HTTPClient you should get a
compiler error pointing to implementing the relevant function.
commit 9c6efc7d89ee5ca60598c7e43778c0e9b34b266b
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Nov 11 08:09:02 2025 +0100
Fix typo
commit e52f71387731da1202fc33755922999a472218a1
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Nov 11 08:08:32 2025 +0100
Add require to test
commit 1ef3444f09b1df6e4a7cc3ad1d67868ecaa1a6a4
Merge: 8090b8d52e dff5b7f608
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Nov 11 08:07:17 2025 +0100
Bump the Postgres scanner extension (#19730)
commit 0dea05daf823237a2de28ec7c0fec53dbb006475
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Tue Nov 11 06:42:36 2025 +0100
Logs to be case-insensitive also at enable_logging callsite
commit 8090b8d52ed6bfd31b72013f6800cea89539cc2f
Merge: 6667c7a3ec 5e9f88863f
Author: Mark <mark.raasveldt@gmail.com>
Date: Mon Nov 10 21:34:42 2025 +0100
[Dev] Fix assertion failure for empty ColumnData serialization (#19713)
The `PersistentColumnData` constructor asserts that the pointers aren't
empty.
This assertion will fail if we try to serialize the child of a list, if
all lists are empty (as the child will be entirely empty then)
Backported fix for problem found by: #19674
commit 6667c7a3ecdc56cc144a9bcf8601001af66e6839
Merge: 3f0ad6958f 4a0f4b0b38
Author: Mark <mark.raasveldt@gmail.com>
Date: Mon Nov 10 21:32:58 2025 +0100
Bump httpfs and resume testing on Windows (#19714)
commit dff5b7f608b732a0e7c5d9a68e7e8d7db3c48478
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Mon Nov 10 21:31:46 2025 +0100
Bump the Postgres scanner extension
commit 0e3d0b5af535fcde90d272d95b1d08cb5fb12d15
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Mon Nov 10 21:26:43 2025 +0100
remove deleted file from patch
commit ffb7be7cc5f27d9945d6868f76ef769a3f8a43d4
Merge: 2142f0b10d 3f0ad6958f
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Mon Nov 10 21:13:17 2025 +0100
Merge branch 'v1.4-andium' into fix-crypto-issue
commit 2142f0b10db72b89c9101fa65ead619182f8e5d1
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Mon Nov 10 20:55:18 2025 +0100
fix duplicate job id
commit 0a225cb99a130c2b1635d6ced03bc37f01ff9436
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Mon Nov 10 20:52:40 2025 +0100
fix ci for encryption
commit 3f0ad6958f1952a083bc499fc147f69504a3c6d2
Merge: f3fb834ef7 a1eeb0df6f
Author: Mark <mark.raasveldt@gmail.com>
Date: Mon Nov 10 20:09:11 2025 +0100
Fix #19700: correctly sort output selection vector in nested selection operations (#19718)
Fixes #19700
This probably should be maintained during the actual select - but for
now just sorting it afterwards solves the issue.
commit f3fb834ef7153b90ef3908eb51a5b85efa580ca5
Merge: 7333a0ae84 c8ddca6f3c
Author: Mark <mark.raasveldt@gmail.com>
Date: Mon Nov 10 20:09:03 2025 +0100
Fix #19355: correctly resolve subquery in MERGE INTO action condition (#19720)
Fixes #19355
commit 7333a0ae84d51729fffe91e67f12c3cee526af2a
Merge: 95fcb8f188 6595848a27
Author: Mark <mark.raasveldt@gmail.com>
Date: Mon Nov 10 16:46:31 2025 +0100
Bump: delta, ducklake, httpfs (#19715)
This PR bumps the following extensions:
- `delta` from `0747c23791` to `6515bb2560`
- `ducklake` from `022cfb1373` to `77f2512a67`
- `httpfs` from `b80c680f86` to `041a782b0b`
commit 35f98411037cb0499e236d0cbe20d6b3a0dcc43f
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Mon Nov 10 14:51:29 2025 +0100
install curl
commit d4a77c801bb1a88e634c12bc64e185ef2f147d2d
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Mon Nov 10 14:37:42 2025 +0100
Add explicit Initialize(HTTPParams&) method to HTTPClient
This allow explicit re-initialization of specific parts of HTTPClient(s)
commit 6595848a27bd7fb271c63a99551d8326417320dd
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Mon Nov 10 11:30:34 2025 +0100
bump extensions
commit 7a7726214c86267d476a2edbc68656ebd6253fe8
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Mon Nov 10 11:28:32 2025 +0100
fix: ci issues
commit 4a0f4b0b38b9d5660c8a5c848d8a1c71bc3220de
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Mon Nov 10 11:07:58 2025 +0100
Bump httpfs and resume testing on Windows
commit 5e9f88863f5f519620ae01f4ff873f6a2869343f
Author: Tishj <t_b@live.nl>
Date: Mon Nov 10 10:58:03 2025 +0100
conditionally create the PersistentColumnData, if there are no segments (as could be the case for a list's child), there won't be any data pointers
commit 95fcb8f18819b1a77df079a7fcb753a8c2f52844
Merge: 396c86228b 4f3df42f20
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Mon Nov 10 10:50:38 2025 +0100
Bump: aws, ducklake, httpfs, iceberg (#19654)
This PR bumps the following extensions:
- `aws` from `18803d5e55` to `55bf3621fb`
- `ducklake` from `2554312f71` to `022cfb1373`
- `httpfs` from `8356a90174` to `b80c680f86`
- `iceberg` from `5e22d03133` to `db7c01e92`
commit c8ddca6f3c32aa0d3a9536371f9e3ca8cb00753e
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Mon Nov 10 09:19:31 2025 +0100
Fix #19355: correctly resolve subquery in MERGE INTO action condition
commit a1eeb0df6ffc2f129638a2dfaab9a70720c8db1b
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Mon Nov 10 09:00:35 2025 +0100
Fix #19700: correctly sort output selection vector in nested selection operations
commit 396c86228bda46929560affde7effdbab7d4e905
Merge: e3d242509e e501fcbd1a
Author: Mark <mark.raasveldt@gmail.com>
Date: Sat Nov 8 17:34:13 2025 +0100
Add missing query location to blob cast (#19689)
commit e3d242509e5710314921a0d7debd0bedb4d10a3e
Merge: 7ce99bc041 1ba198d711
Author: Mark <mark.raasveldt@gmail.com>
Date: Sat Nov 8 17:34:04 2025 +0100
Add request timing to HTTP log (#19691)
Demo:
```SQL
D call enable_logging('HTTP');
D from read_csv_auto('s3://duckdblabs-testing/test.csv');
D select request.type, request.url, request.start_time, request.duration_ms from duckdb_logs_parsed('HTTP');
┌─────────┬────────────────────────────────────────────────────────────────┬───────────────────────────────┬─────────────┐
│ type │ url │ start_time │ duration_ms │
│ varchar │ varchar │ timestamp with time zone │ int64 │
├─────────┼────────────────────────────────────────────────────────────────┼───────────────────────────────┼─────────────┤
│ HEAD │ https://duckdblabs-testing.s3.us-east-1.amazonaws.com/test.csv │ 2025-11-07 10:17:56.052202+00 │ 417 │
│ GET │ https://duckdblabs-testing.s3.us-east-1.amazonaws.com/test.csv │ 2025-11-07 10:17:56.478847+00 │ 104 │
└─────────┴────────────────────────────────────────────────────────────────┴───────────────────────────────┴─────────────┘
```
commit ae518d0a4e439f80c768388fab8f51d667f7e4b7
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Fri Nov 7 13:58:22 2025 +0100
minor ci fixes
commit e501fcbd1af58cf147b80051b38ddf815d5e1b8c
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Fri Nov 7 12:37:21 2025 +0100
move
commit bc1a683d10150dfe15f2f4d69e505f6337c4fc27
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Fri Nov 7 11:22:35 2025 +0100
only load httpfs if necessary
commit 1ba198d71106a851fb8234ccfb208ec66b0e1d17
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Fri Nov 7 11:16:38 2025 +0100
fix: check if logger exists
commit f22e9a06ef6e1b6c999e8c7389b05e40ae9032fc
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Fri Nov 7 11:13:11 2025 +0100
add test for http log timing
commit f474ba123485377f94e5b57600fb720733050c98
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Fri Nov 7 11:00:19 2025 +0100
add http timings to logger
commit 02bb5d19b9fc7a702184ffcf7d9688b88f54071a
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Fri Nov 7 09:30:04 2025 +0100
Add query location to blob cast
commit 7ce99bc04130615dfc3a39dfb79177a8942fefba
Merge: 1555b0488e aea843492d
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Fri Nov 7 09:22:48 2025 +0100
Fix InsertRelation on attached database (#19583)
Fixes https://github.com/duckdb/duckdb/issues/18396
Related PR in duckdb-python:
https://github.com/duckdb/duckdb-python/pull/155
commit 1555b0488e322998e6fd06cc47e1909c7bb4eba4
Merge: 783f08ffd8 98e2c4a75f
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Fri Nov 7 08:31:05 2025 +0100
Log total probe matches in hash join (#19683)
This is usually evident from the number of tuples coming out of a join,
but it can be hard to understand what's going on when doing a
`LEFT`/`RIGHT`/`OUTER` join. This PR adds one log call at the end of the
hash join to report how many probe matches there were.
```sql
D CALL enable_logging('PhysicalOperator');
┌─────────┐
│ Success │
│ boolean │
├─────────┤
│ 0 rows │
└─────────┘
D SELECT count(*)
FROM range(3_000_000) t1(i)
LEFT JOIN range(1_000_000, 2_000_000) t2(i)
USING (i);
┌────────────────┐
│ count_star() │
│ int64 │
├────────────────┤
│ 3000000 │
│ (3.00 million) │
└────────────────┘
D CALL disable_logging();
┌─────────┐
│ Success │
│ boolean │
├─────────┤
│ 0 rows │
└─────────┘
D SELECT info.total_probe_matches::BIGINT total_probe_matches
FROM duckdb_logs_parsed('PhysicalOperator')
WHERE class = 'PhysicalHashJoin' AND event = 'GetData';
┌─────────────────────┐
│ total_probe_matches │
│ int64 │
├─────────────────────┤
│ 1000000 │
│ (1.00 million) │
└─────────────────────┘
```
Here we are able to see that the hash join produced 1M matches, but
emitted 3M tuples.
commit 783f08ffd89b1d1290b2d3dec0b3ba12d8c233bf
Merge: 6c6af22ea4 1d5c9f5f3d
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Thu Nov 6 15:57:35 2025 +0100
Fixup linking for LLVM (#19668)
See conversation at https://github.com/llvm/llvm-project/issues/77653
This allows again:
```
brew install llvm
CMAKE_LLVM_PATH=/opt/homebrew/Cellar/llvm/21.1.5 GEN=ninja make
```
to just work.
Arguably very limited, but can as well be fixed.
commit 6c6af22ea45effc67dc9e76feec3fb73208750bb
Merge: 2892abafa7 f483e95d1c
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Thu Nov 6 15:56:49 2025 +0100
Categorize ParseLogMessage as CAN_THROW_RUNTIME_ERROR (#19672)
Currently we rely on filtering on query type AND executing scalar
function `parse_duckdb_log_message` to not be reordered.
This is somehow brittle, and have found locally cases where this cause
problems that will result in wrong casts, such as:
```
Conversion Error:
Type VARCHAR with value 'ColumnDataCheckpointer FinalAnalyze(COMPRESSION_UNCOMPRESSED) result for main.big.0(VALIDITY): 15360' can't be cast to the destination type STRUCT(metric VARCHAR, "value" VARCHAR)
```
Looking at the executed plan, it would look like:
```
┌─────────────┴─────────────┐
│ FILTER │
│ ──────────────────── │
│ ((type = 'Metrics') AND │
│ (struct_extract │
│ (parse_duckdb_log_message(│
│ 'Metrics', message), │
│ 'metric') = 'CPU_TIME')) │
│ │
│ ~0 rows │
└─────────────┬─────────────┘
```
Tagging `parse_duckdb_log_message` as potentially throwing on some input
avoids reordering, and avoid the problem while improving the usability
of logs.
An alternative solution would be use explicit DefaultTryCast (instead of
TryCast), at
https://github.com/duckdb/duckdb/blob/v1.4-andium/src/function/scalar/system/parse_log_message.cpp#L70,
either allow to solve the problem.
commit 98e2c4a75f816eae6ef2893bbb581c9913293f2a
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Thu Nov 6 15:35:25 2025 +0100
log total probe matches in hash join
commit 2892abafa772fffc4402e5125cf16a26c094cb44
Merge: ecc73b2b4b 488069ec8d
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Thu Nov 6 14:21:05 2025 +0100
duckdb_logs_parsed to do case-insensitive matching (#19669)
This is something me and @Tmonster bumped into while helping a customer
debugging an issue.
I think it's more intuitive and friendly that user facing functions are
case insensitive, given that is the general user expectation around SQL.
I am not sure `ILIKE` is the best way to do so (an alternative would be
filtering on `lower(1) = lower(2)`).
Note that passing `%` signs is currently checked elsewhere, for example:
```sql
SELECT message FROM duckdb_logs_parsed('query%') WHERE starts_with(message, 'SELECT 1');
```
would throw
```
Invalid Input Error: structured_log_schema: 'query%' not found
```
(while `querylog` already work, see test case, given there
case-insensitivity comparison was already used)
commit aea843492da3f40c30e6e88c12eb6da690348f2e
Author: Evert Lammerts <evert.lammerts@gmail.com>
Date: Thu Nov 6 11:40:11 2025 +0100
review feedback
commit 094a54b890a2466aad743b1c372809849cdef283
Author: Evert Lammerts <evert.lammerts@gmail.com>
Date: Sat Nov 1 11:22:34 2025 +0100
Fix InsertRelation on attached database
commit 4f3df42f208d5e6dc602d2e688911ef13758d3aa
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Thu Nov 6 11:31:58 2025 +0100
bump iceberg further
commit f483e95d1c3983c2ba5758ebba1272f7ff12cd0d
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Fri Oct 31 12:25:01 2025 +0100
Improve tests using now working FROM duckdb_logs_parsed()
commit 6554c84a73b6c7857d2ec5ebf6f2019ceb56e6dc
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Tue Nov 4 12:56:31 2025 +0100
parse_logs_message might throw
commit 488069ec8d726d3b19093e8d57101c6c6af8910b
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Thu Nov 6 09:29:49 2025 +0100
duckdb_logs_parsed to do case-insensitive matching
commit 1d5c9f5f3d18c73e27b0bc4353d549680c5c82d5
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Thu Nov 6 09:13:41 2025 +0100
Fixup linking for LLVM
See conversation at https://github.com/llvm/llvm-project/issues/77653
commit ecc73b2b4b10beb175968e55e24e69241d00df1b
Merge: 2d69f075ee 4cb677238f
Author: Mark <mark.raasveldt@gmail.com>
Date: Thu Nov 6 08:58:09 2025 +0100
Always remember extra_metadata_blocks when checkpointing (#19639)
This is a follow-up to https://github.com/duckdb/duckdb/pull/19588,
adding the following:
- Reenables block verification in a new test configuration. It further
adds new checks to ensure that the metadata blocks that the RowGroup
references after checkpointing corresponds to those that it would see if
it were to reload them from disk. This verification would have caught
the issue addressed by https://github.com/duckdb/duckdb/pull/19588
- Adds a small tweak in `MetadataWriter::SetWrittenPointers`. This
ensures that the table writer does not track an `extra_metadata_block`
that did not ever receive any writes as part of that rowgroup (as it
immediately skipped to next block when calling
`writer.GetMetaBlockPointer()` after `writer.StartWritingColumns`). With
the added verification, not having this tweak fails e.g. the following
test:
```
test/sql/storage/compression/bitpacking/bitpacking_compression_ratio.test_slow
CREATE TABLE test_bitpacked AS SELECT i//2::INT64 AS i FROM range(0, 120000000) tbl(i);
================================================================================
TransactionContext Error: Failed to commit: Failed to create checkpoint because of error: Reloading blocks just written does not yield same blocks: Written: {block_id: 2 index: 32 offset: 0}, {block_id: 2 index: 33 offset: 8},
Read: {block_id: 2 index: 33 offset: 8},
Read Detailed: {block_id: 2 index: 33 offset: 8},
Start pointers: {block_id: 2 index: 33 offset: 8},
Metadata blocks: {block_id: 2 index: 32 offset: 0},
```
- Ensures that we always update `extra_metadata_blocks` after
checkpointing a rowgroup. This speeds up subsequent checkpoints
significantly. Right now, if you have a large legacy database, and don't
update these old rowgroups, this field is kept as is, and every
checkpoint needs to recompute it (even if the database isn't reloaded).
Making sure we always have `RowGroup::has_metadata_blocks == true` after
each checkpoint, even in case of metadata reuse, will both benefit
checkpointing for databases in old storage formats, as well as when
starting to use newer storage format on large legacy databases.
- Only tangentially related to the issue / PR, but while debugging I
noticed that the `deletes_is_loaded` variable is not correctly
initialized in all RowGroup constructors (can also be triggered with the
assertion I added in `RowGroup::HasChanges()`)
commit 46028940c8e429739e73f4d345ec3cab5eb5b01c
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Wed Nov 5 19:33:58 2025 +0100
bump extension entries
commit 2d69f075ee91c42ad4fe4208a4d1f06d0034faff
Merge: 7043621a83 e3fb2eb884
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Wed Nov 5 15:27:27 2025 +0100
Enable running all extensions tests as part of the build step (#19631)
This is enabled via
https://github.com/duckdb/extension-ci-tools/pull/278, that introduced a
way to hook into running tests for all extension of a given
configuration (as opposed to a single one).
Also few minor fixes I bumped into:
* disable unused platforms from the external extension builds
* remove `[persistence]` tests to be always run
* enable `vortex` tests
* avoid `httpfs` tests on Windows, to be reverted in a follow up
commit 4cb677238f7f4ad4d747f1a1045396fd74765724
Merge: b48cd982e0 7043621a83
Author: Yannick Welsch <yannick@welsch.lu>
Date: Wed Nov 5 14:59:47 2025 +0100
Merge remote-tracking branch 'origin/v1.4-andium' into yw/metadata-reuse-tweaks
commit b48cd982e0c59a03cf78a37175ba7272438c2525
Author: Yannick Welsch <yannick@welsch.lu>
Date: Wed Nov 5 14:59:34 2025 +0100
newline
commit 490411ab5ae614064e3e4fa94f631dcbbeea68d8
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Wed Nov 5 13:55:19 2025 +0100
fix: add more places to securely clear key from memory
commit e3fb2eb8843f9ff90ad29fd69938ee6961b644dc
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Wed Nov 5 11:07:40 2025 +0100
Avoid testing httpfs on Windows (fix incoming)
commit e719c837851f016ea614b28380685de8794ccf39
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Wed Nov 5 11:04:57 2025 +0100
Revert "Add ducklake tests"
This reverts commit b77a9615117de845fa48463f09be20a89dea7434.
commit 4242618a8d43c2004f55b27b63535ad979302e92
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Wed Nov 5 11:03:48 2025 +0100
only autoload if crypto util is not set
commit 19232fc414dc7f861dcbad788ba5466d10c27a67
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Wed Nov 5 10:14:12 2025 +0100
bump extensions
commit 7043621a83d1be17ba6b278f0f7a3ec65df98d93
Merge: db845b80c7 3584a93938
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Wed Nov 5 09:18:39 2025 +0100
Bump MySQL scanner (#19643)
Updating the MySQL scanner to include the time zone handling fix to
duckdb/duckdb-mysql#166.
commit db845b80c76452054e26cf7a2d715769592de925
Merge: f50618b48c 7eccc643ae
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Wed Nov 5 09:15:52 2025 +0100
Remove `FlushAll` from `DETACH` (#19644)
This was initially added to reduce RSS after `DETACH`ing, but it is now
creating a large bottleneck for workloads that aggressively
`ATTACH`/`DETACH`. RSS will be freed by further allocation activity, or
when `SET allocator_background_threads=true;` is enabled.
commit 4978ccd8ec15e7631fd9ed741d338da663b0ff48
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Tue Nov 4 16:34:16 2025 +0100
fix: add patch file
commit 6ec168d508d9395306b29c62cb0b163b6a77bafb
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Tue Nov 4 16:13:18 2025 +0100
format
commit 67ec072c0ea6a237213f680709773e1342b11065
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Tue Nov 4 15:59:04 2025 +0100
fix: tests
commit 7eccc643ae57a76a49e61b905f9a9a1857a00084
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Tue Nov 4 15:47:29 2025 +0100
remove flush all from detach
commit 3584a93938a4852b0510b0c3d6b3bb13861c4147
Author: Alex Kasko <alex@staticlibs.net>
Date: Tue Nov 4 14:33:21 2025 +0000
Bump MySQL scanner
Updating the MySQL scanner to include the time zone handling fix to
duckdb/duckdb-mysql#166.
commit 250b917ed6f423b56efbd855b2359a498fe2ef8d
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Tue Nov 4 14:41:32 2025 +0100
fix: various issues with encryption
commit f50618b48c3dd04f77ae557e3bb4863f96f74a76
Merge: 66100df7ae 8257973295
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Nov 4 14:26:16 2025 +0100
Fix #19455: correctly extract root table in merge into when running ajoin that contains single-sided predicates that are transformed into filters (#19637)
Fixes #19455
commit 82579732952d68dec2b2a44cc1ca04243ac57151
Merge: 6efd4a4fde 66100df7ae
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Tue Nov 4 14:25:42 2025 +0100
Merge branch 'v1.4-andium' into mergeintointernalerror
commit 66100df7aeb321d37f2434416df59dc274948987
Merge: d54d36faae c53eb7a562
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Nov 4 14:24:10 2025 +0100
Detect invalid merge into action and throw exception (#19636)
`WHEN NOT MATCHED (BY TARGET)` cannot be combined with `DELETE` or
`UPDATE`, since there is no rows in the target table to delete or
update. This PR ensures we throw an error when this is attempted.
commit ca88f5b2cf9480ac8e57f436fbc89d327d19422a
Author: Yannick Welsch <yannick@welsch.lu>
Date: Tue Nov 4 10:57:57 2025 +0100
Use reserve instead
commit 133a15ee61a64a831de46e4407f38d8bdd7b71f5
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Tue Nov 4 10:45:20 2025 +0100
Move also [persistence] tests back under ENABLE_UNITTEST_CPP_TESTS
commit eb322ce251b5c4347650afc455171d862c51bf34
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Tue Nov 4 10:41:40 2025 +0100
Switch from running on PRs wasm_mvp to wasm_eh
commit 9c5f82fa358fcf236cff21499351c1e739ca032a
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Tue Nov 4 10:40:15 2025 +0100
Currently no external extension works on wasm or windows or musl
To be expanded once that changes
commit d54d36faae00120f548b39d1e21d93ca25f17087
Merge: 97fdeddb2b c01c994085
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Tue Nov 4 09:03:51 2025 +0100
Bump: spatial (#19620)
This PR bumps the following extensions:
- `spatial` from `61ede09bec` to `d83faf88cd`
commit 6efd4a4fde180bf7d9c433977921818e5465c92a
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Tue Nov 4 08:13:56 2025 +0100
Fix #19455: correctly extract root table in merge into when running a join that contains single-sided predicates that are transformed into filters
commit c53eb7a56266157f0e9d97bd91be0d36285ec38b
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Tue Nov 4 08:01:24 2025 +0100
Detect invalid merge into action and throw exception
commit 97fdeddb2bd5c34862afd30177c9184f51f6dccd
Merge: a0a46d6ed0 87193fd5ab
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Nov 4 07:48:43 2025 +0100
Try to prevent overshooting of `FILE_SIZE_BYTES` by pre-emptively increasing bytes written in Parquet writer (#19622)
Helps with #19552, but doesn't fully fix the problem. We should look
into a more robust fix for v1.5.0, but not for a bugfix release
commit a0a46d6ed06dd962a4d6eeb01f3e14f8b275cec4
Merge: 73c0d0db15 3838c4a1ed
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Nov 4 07:48:27 2025 +0100
Increase cast-cost of old-style implicit cast to string (#19621)
This PR fixes https://github.com/duckdb/duckdb-python/issues/148
The issue is that `list_extract` now has two overloads, one for a
templated list `LIST<T>` and for concrete `VARCHAR` inputs. When binding
a function we add a really high cost to selecting a templated overload
to ensure we always pick something more specific if available. With our
current casting rules, we are unable to cast `VARHCAR[]` to `VARCHAR`,
and therefore fall back to the list-template as expected. But with
old-style casting rules we allow `VARCHAR[]` to `VARCHAR` by also adding
a high cost penalty, but its still lower than the cost of casting to the
template - even though that would be the better alternative.
With old-style casting we basically always have a lower-cost "fallback"
option than selecting a template overload. While we should overhaul our
casting system to evaluate the cast cost along more axes than just
"score", this PR fixes this specific case by just cranking up the cost
of old-style implicit to-string casts.
commit c01c99408526b3c0d698028083481301af069824
Author: Max Gabrielsson <max@gabrielsson.com>
Date: Mon Nov 3 22:42:37 2025 +0100
extension entries
commit b77a9615117de845fa48463f09be20a89dea7434
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Mon Nov 3 17:35:39 2025 +0100
Add ducklake tests
commit bd58abcdfb4485a1a9dbb750bd0587803fd1c559
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Mon Nov 3 17:35:15 2025 +0100
Load vortex tests
commit 62fe1bff77a60fd690b9911aa7a38b7bc197f865
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Mon Nov 3 22:08:43 2025 +0100
Pass down extensions_test_selection -> complete
commit e2604e6f5259453f482e0c49ca10520e89ddf269
Author: Yannick Welsch <yannick@welsch.lu>
Date: Mon Nov 3 19:18:47 2025 +0100
Always has_metadata_blocks after checkpoint
commit 73c0d0db15621d3d1c2936816becf27e2c41e2ab
Merge: 286924e634 b518b2aa0b
Author: Mark <mark.raasveldt@gmail.com>
Date: Mon Nov 3 18:24:26 2025 +0100
Improve error message around compression type deprecation/availability checks (#19619)
This PR fixes https://github.com/duckdblabs/duckdb-internal/issues/6436
The old code kept only a list of "deprecated" types, and returned a
boolean, losing the context whether the compression type was available
at one point and is now deprecated OR is newly introduced and not
available yet in the storage version that is currently used.
commit 0e5a33dae35aab5209a8e959cf48d7525fa7ec8d
Author: Yannick Welsch <yannick@welsch.lu>
Date: Thu Oct 30 19:28:54 2025 +0100
Verify blocks
commit 286924e6348723138ca4dfd55b749d847bce59a9
Merge: 535f905874 c248313a1d
Author: Mark <mark.raasveldt@gmail.com>
Date: Mon Nov 3 17:12:32 2025 +0100
bump iceberg (#19618)
commit 87193fd5abf342d6ddce9d984e69007a4ccdc7d2
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Mon Nov 3 14:43:08 2025 +0100
try to prevent overshooting by pre-emptively increasing write size
commit 3838c4a1edd83dc1373b6077dc6ee478bb996e50
Author: Max Gabrielsson <max@gabrielsson.com>
Date: Mon Nov 3 13:55:53 2025 +0100
increase fallback string cast cost
commit 535f90587495e0c8f5974a0968b06b15ad01b32e
Merge: d643cefe13 06df593c60
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Mon Nov 3 13:49:57 2025 +0100
[DevEx] Improve error message when FROM clause is omitted (#18995)
This PR fixes #18954
If the "similar bindings" is entirely empty, that means that there are
no bindings, which can only happen if the FROM clause is entirely
missing.
commit 9268637337a21b9c03fdc7dceb0a88fbbe001a73
Author: Max Gabrielsson <max@gabrielsson.com>
Date: Mon Nov 3 12:35:30 2025 +0100
bump extensions
commit d643cefe13de6873f6fb0ecc0bca1c14111cde11
Merge: 5f8cf7d7f8 c6434fd89a
Author: Mark <mark.raasveldt@gmail.com>
Date: Mon Nov 3 12:28:33 2025 +0100
Avoid eagerly resolving the next on-disk pointer in the MetadataReader, as that pointer might not always be valid (#19588)
When enabling the new [experimental metadata
re-use](https://github.com/duckdb/duckdb/pull/18395), it is possible for
metadata of *some* row groups to be re-used. This can cause linked lists
of metadata blocks to contain invalid references.
For example, when writing a bunch of row groups, we might get this
layout:
```
METADATA BLOCK 1
ROW GROUP 1
ROW GROUP 2 (pt 1)
NEXT BLOCK: 2
->
METADATA BLOCK 2
ROW GROUP 2 (pt 2)
ROW GROUP 3
```
Metadata is stored in a linked list (block 1 -> block 2) - but we don't
need to traverse this linked list fully. We store pointers to individual
row groups, and can start reading from their position.
Now suppose we re-use metadata of `ROW GROUP 1`, but not of the other
row groups (because e.g. they have been updated / changed). Since this
is fully contained in `METADATA BLOCK 1`, we can garbage collect
`METADATA BLOCK 2`, leaving the following metadata block:
```
METADATA BLOCK 1
ROW GROUP 1
ROW GROUP 2 (pt 1)
NEXT BLOCK: 2
```
Now we can safely read this block and read the metadata for `ROW GROUP
1`, **however**, this block contains a reference to a metadata block
that is no longer valid and might have been garbage collected. This
revealed a problem in the `MetadataReader`. In the current
implementation of the `MetadataReader` - when pointing it towards a
block, it would eagerly try to figure out the metadata location of *the
next block*. This is normally not a problem, however, with these invalid
chains, we might try to resolve a block that has been freed up already -
causing an internal exception to trigger:
```
Failed to load metadata pointer (id %llu, idx %llu, ptr %llu)
```
This PR resolves the issue by making the MetadataReader lazy. Instead of
eagerly resolving the next pointer, we only do this when it is actually
required.
commit b518b2aa0b06372d583fb203f5cae0011a53a87f
Author: Tishj <t_b@live.nl>
Date: Mon Nov 3 12:24:43 2025 +0100
enum util fix
commit 5f8cf7d7f81981f4b2355959257fa82982c3dd11
Merge: 407720a348 2cdc7f922b
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Mon Nov 3 12:22:52 2025 +0100
add vortex external extension (#19580)
commit 7c2353cb06d867813b7725f893a6b1092821c807
Author: Tishj <t_b@live.nl>
Date: Mon Nov 3 11:21:32 2025 +0100
differentiate between deprecated/not available yet in the check, to improve error reporting
commit c248313a1dd40f1569b608b80bdec1229de0b6b4
Author: Tmonster <tom@ebergen.com>
Date: Mon Nov 3 10:54:40 2025 +0100
bump iceberg
commit 407720a34804f0da61d5ba6645c3c44ec6ddf0d8
Merge: 7764771eaa d4fb98d454
Author: Mark <mark.raasveldt@gmail.com>
Date: Sun Nov 2 15:01:29 2025 +0100
Wal index deletes (#19477)
This adds support for buffering and replaying Index delete operations
for WAL replay. During WAL replay, index operations are buffered since
the Indexes are not bound yet. During Index binding, the buffered
operations are applied to the Index. UnboundIndex is modified to support
buffering delete operations on top of inserts.
BoundIndex::ApplyBufferedAppends is changed to a
BoundIndex::ApplyBufferedReplays which supports replaying both inserts
and deletes.
Documentation along relevant code paths is added which clarifies the
ordering of mapped_column_ids and the index_chunks being buffered.
Before, the mapping was any order since it was only coming from Index
insert paths. Now, buffering can come from both insert and delete paths,
so both need to make sure to buffer index chunks and the mappings in the
same order, (which is just in sorted order of the physical Index column
IDs).
There is also a bug fix for buffering index data on a table with
generated columns, since the table chunk being created for replaying
buffered operations contained all column types previously, including
generated columns, whereas now it only contains the physical column
layout which is needed for index operations. (ART Index operations take
a chunk of data with only the index columns containing any data, and the
non-Indexed columns are empty).
A catch block is added to Transaction CleanupState::Flush which was
silently throwing away any failures (which caught this WAL replay in the
first place). Also, some test coverage for ART duplicate rowids and a
LookupInLeaf function was added which allows searching for a rowid in a
Leaf that is either inlined, or a gate node to a nested ART.
@taniabogatsch
commit c6434fd89a7391e428f2cb31e6e3d676d5257b0d
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Sun Nov 2 14:54:33 2025 +0100
Fix lock order inversion
commit eb514c01e4ea4ad434fb87fde70307f64992d52a
Merge: 2f3d2db509 7764771eaa
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Sun Nov 2 09:45:34 2025 +0100
Merge branch 'v1.4-andium' into metadatareusefixes
commit 7764771eaa654cb44f5c731e99f5d989951aefb8
Merge: 9ea6e07a29 fc2bf610d0
Author: Mark <mark.raasveldt@gmail.com>
Date: Sun Nov 2 09:44:54 2025 +0100
Skip compiling remote optimizer test when TSAN Is enabled (#19590)
This test uses `fork` which seems to mess up the thread sanitizer,
causing strange errors to occur sporadically.
commit fc2bf610d0c9851d1e3f6ad273dcfb47b6ec60a6
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Sat Nov 1 23:27:43 2025 +0100
Skip compiling entirely
commit a68390e2b1a6f09b899d248881d331e5dbbab89a
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Sat Nov 1 23:23:18 2025 +0100
Skip fork test with tsan
commit 2f3d2db50968fd917f253c2c34cf488290dadfa4
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Sat Nov 1 15:27:51 2025 +0100
Avoid eagerly resolving the next on-disk pointer in the MetadataReader, as that pointer might not always be valid
commit 9ea6e07a290db878c9da097d407b3a866c43c8e0
Merge: 5f1ce8ba5c a740840f97
Author: Mark <mark.raasveldt@gmail.com>
Date: Sat Nov 1 09:25:59 2025 +0100
Fix edge case in uncompressed validity scan with offset and fix off-by-one in ArrayColumnData::Select (#19567)
This PR fixes a off-by-one in the consecutive-array-scan optimization
implemented in https://github.com/duckdb/duckdb/pull/16356 as well as an
edge case in our uncompressed validity data scan.
Fixes https://github.com/duckdb/duckdb/issues/19377
I can't figure out how to write a test for this, it seems like no matter
what I do I'm unable to replicate the same storage characteristics as
the database file provided in the issue above.
In the repro we do a scan+skip+scan, where part of the first
`validity_t` in the second scan contains a bunch of zeroes at the
positions "before" the scan window that remain even after shifting. I've
solved it by setting all lower bits up to `result_idx` in the first
`validity_t` we scan, but not sure if this is the most elegant solution.
Strangely enough If we remove all bitwise logic and just do the same
"fall-back" logic as ifdef:ed for `VECTOR_SIZE < 128` it all works
though, so the issue has to be part of the bit-manipulation.
commit 5f1ce8ba5c0000770412b35a763af417f8fb2b90
Merge: be0142d4ee dbe272dff0
Author: Mark <mark.raasveldt@gmail.com>
Date: Sat Nov 1 09:22:00 2025 +0100
[v1.4-andium] Add Profiler output to logger interface (#19572)
This is https://github.com/duckdb/duckdb/pull/19546 backported to
`v1.4-andium` branch, see conversation there.
---
Idea is: if both profiler and logger are enabled, then you can access
profiler output also via logger.
This is on top / independent of the current choices for where to output
the profiler (JSON / graphviz / query-tree / ...). While this might be
somewhat wasteful, it's allow for an easier PR and leave unopinionated
what should the SQL interface be. Also given ToLog() call is inexpensive
(in particular if the logger is disabled), and that it's unclear if
logger alone can satisfy profiler necessities, I think going additive is
the best path here.
Demo:
```sql
ATTACH 'my_db.db';
USE my_db;
---- enable profiling to json file
PRAGMA profiling_output = 'profiling_output.json';
PRAGMA enable_profiling = 'json';
---- enable logging (to in-memory table)
call enable_logging();
----
CREATE TABLE small AS FROM range(100);
CREATE TABLE medium AS FROM range(10000);
CREATE TABLE big AS FROM range(1000000);
PRAGMA disable_profiling;
SELECT query_id, type, metric, value FROM duckdb_logs_parsed('Metrics') WHERE metric == 'CPU_TIME';
```
Will result in for example in:
```
┌──────────┬─────────┬──────────┬───────────────────────┐
│ query_id │ type │ metric │ value │
│ uint64 │ varchar │ varchar │ varchar │
├──────────┼─────────┼──────────┼───────────────────────┤
│ 10 │ Metrics │ CPU_TIME │ 8.1041e-05 │
│ 11 │ Metrics │ CPU_TIME │ 0.0002499510000000001 │
│ 12 │ Metrics │ CPU_TIME │ 0.02776677799999981 │
└──────────┴─────────┴──────────┴───────────────────────┘
```
A more complex example would be for example:
With the duckdb cli, execute:
```sql
PRAGMA profiling_output = 'metrics_folder/tmp_profiling_output.json';
PRAGMA enable_profiling = 'json';
CALL enable_logging(storage='file', storage_path='./metrics_folder');
--- arbitrary queryies
CREATE TABLE small AS FROM range(100);
CREATE TABLE medium AS FROM range(10000);
CREATE TABLE big AS FROM range(1000000);
```
then close, restart duckdb cli, and query what's persisted in the
`metric_folder` folder:
```sql
PRAGMA disable_profiling;
CALL enable_logging(storage='file', storage_path='./metrics_folder');
SELECT queries.message, metrics.metric, TRY_CAST(metrics.value AS DOUBLE) as value
FROM duckdb_logs_parsed('QueryLog') queries,
duckdb_logs_parsed('Metrics') metrics
WHERE queries.query_id = metrics.query_id AND metrics.metric = 'CPU_TIME';```
```
```
┌─────────────────────────────────────────────┬──────────┬─────────────────────────────────────┐
│ message │ metric │ TRY_CAST(metrics."value" AS DOUBLE) │
│ varchar │ varchar │ double │
├─────────────────────────────────────────────┼──────────┼─────────────────────────────────────┤
│ CREATE TABLE small AS FROM range(100); │ CPU_TIME │ 8.1041e-05 │
│ CREATE TABLE medium AS FROM range(10000); │ CPU_TIME │ 0.0002499510000000001 │
│ CREATE TABLE big AS FROM range(1000000); │ CPU_TIME │ 0.02776677799999981 │
└─────────────────────────────────────────────┴──────────┴─────────────────────────────────────┘
```
commit be0142d4ee0385262520ae2488e8dd11ac213735
Merge: b68a1696de 7df4151c0d
Author: Mark <mark.raasveldt@gmail.com>
Date: Sat Nov 1 09:21:19 2025 +0100
fix inconsistent behavior in remote read_file/blob, and prevent union… (#19531)
Closes https://github.com/duckdb/duckdb-fuzzer/issues/4208
Closes https://github.com/duckdb/duckdb/issues/19090
Our remote filesystem doesn't actually check that files exist when
"globbing" a non-glob pattern. Now we check that the file exists in the
read_blob/text function even if we just access the file name.
Diff is a bit bigger cause I also moved a bunch of templated stuff into
the cpp file.
commit 06df593c60bb22973642d776c1c3c3aca85ee0d6
Author: Tishj <t_b@live.nl>
Date: Fri Oct 31 15:26:18 2025 +0100
fix up tests
commit 2cdc7f922bde5550aa1ecd24dabf23b05fbf202b
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Fri Oct 31 15:10:31 2025 +0100
add vortex external extension
commit b68a1696de1a603b59e39efc25da7fc2826a3135
Merge: 8169d4f15c 9414882f7f
Author: Mark <mark.raasveldt@gmail.com>
Date: Fri Oct 31 13:45:16 2025 +0100
Release relevant tests to still be run on all builds (#19559)
I would propose, at least for the Linux builds, to add back a minimal
amount of tests also on release builds.
They will ensure at a minimum that:
* for a given release, the corresponding storage_version is valid
* for a minor release, that the corresponding name has been set
There are more tests that we might consider basic enough AND connected
to behaviour specific of a release that we might want to add to the
`release` tag.
Fixes https://github.com/duckdb/duckdb/issues/19354 (together with
https://github.com/duckdb/duckdb/pull/19525 that actually added the
name).
Note that given the current release process happens in advance, eventual
test failure are annoying but not fatal, but they will require changes
to code. I am not sure if it's worth having a `keep_going_in_all_cases`
option, basically turning the boolean into a set, but I think it can be
done when need arise.
commit 8169d4f15cf556d0ca0ec68d9c876c2bb84aae09
Merge: d9028d09d5 6e2c195859
Author: Mark <mark.raasveldt@gmail.com>
Date: Fri Oct 31 13:44:30 2025 +0100
Fix race condition between `Append` and `Scan` (#19571)
Update `ColumnData::count` only after actually `Append` the data to
avoid Race Condition with `Scan`. See
https://github.com/duckdb/duckdb/issues/19570 for details.
commit d4fb98d45409bcaaf8c3030c7aa7e40b1f60b9d1
Merge: 0743b590d3 d9028d09d5
Author: Artjom Plaunov <artyemnyc@gmail.com>
Date: Fri Oct 31 11:16:23 2025 +0100
Merge remote-tracking branch 'upstream/v1.4-andium' into wal-index-deletes
commit 0743b590d361041cc167f0634250f78c20f4d332
Author: Artjom Plaunov <artyemnyc@gmail.com>
Date: Fri Oct 31 11:15:04 2025 +0100
remove C++ test, add extra interleaved index replay SQL test
commit 5ca334715faa6c871c8e96029c142aacf53969a7
Author: Tishj <t_b@live.nl>
Date: Fri Oct 31 10:43:03 2025 +0100
fix up tests
commit 0a6b5fb4919a8092b38e19051a9286eeaaeb392c
Merge: 0b1f0e320a d9028d09d5
Author: Tishj <t_b@live.nl>
Date: Fri Oct 31 10:38:56 2025 +0100
Merge branch 'v1.4-andium' into missing_from_clause_better_error
commit a740840f9772a1702a5ffeec43694c48be3526c5
Author: Max Gabrielsson <max@gabrielsson.com>
Date: Thu Oct 30 18:04:39 2025 +0100
fix consecutive array range calculation, fix validity scanning when bits before result offset are null
commit 6e2c195859a496f1f98c20fd887fac944ba0e344
Author: zhangxizhe <zhangxizhe.zxz@alibaba-inc.com>
Date: Fri Oct 31 13:43:19 2025 +0800
Update `ColumnData::count` only after actually `Append` the data to
avoid Race Condition with `Scan`. See `issue #19570` for details.
commit d9028d09d56640599dd8307dd9ae6c8837267e9f
Merge: 307f9b41ff 6bc51dd58e
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Fri Oct 31 08:47:10 2025 +0100
Disable jemalloc on BSD (#19560)
Fixes https://github.com/duckdb/duckdb/issues/14363
commit dbe272dff0a63d0d01269cee05945a0b016d219f
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Wed Oct 29 23:51:42 2025 +0100
Add Profiler output to logger interface
Idea is: if both profiler and logger are enabled, then you can access profiler output also via logger.
This is on top / independent of the current choices for where to output the profiler (JSON / graphviz / query-tree / ...).
While this might be somewhat wasteful, it's allow for an easier PR and leave unopinionated what should the SQL
interface be. Also given ToLog() call is inexpensive (in particular if the logger is disabled), and that it's unclear if logger alone can satisfy
profiler necessities, I think going additive is the best path here.
Demo:
```sql
ATTACH 'my_db.db';
USE my_db;
---- enable profiling to json file
PRAGMA profiling_output = 'profiling_output.json';
PRAGMA enable_profiling = 'json';
---- enable logging (to in-memory table)
call enable_logging();
----
CREATE TABLE small AS FROM range(1000);
CREATE TABLE medium AS FROM range(1000000);
CREATE TABLE big AS FROM range(1000000000);
PRAGMA disable_profiling;
SELECT * EXCLUDE timestamp FROM duckdb_logs() WHERE type == 'Metrics' ORDER BY message.split(',')[1], context_id;
```
Will result in for example in:
```
┌────────────┬─────────┬───────────┬────────────────────────────────────────────────────────────┐
│ context_id │ type │ log_level │ message │
│ uint64 │ varchar │ varchar │ varchar │
├────────────┼─────────┼───────────┼────────────────────────────────────────────────────────────┤
│ 39 │ Metrics │ INFO │ {'metric': CHECKPOINT_LATENCY, 'value': 0.0} │
│ 44 │ Metrics │ INFO │ {'metric': CHECKPOINT_LATENCY, 'value': 0.0} │
│ 49 │ Metrics │ INFO │ {'metric': CHECKPOINT_LATENCY, 'value': 0.017832} │
│ 39 │ Metrics │ INFO │ {'metric': COMMIT_WRITE_WAL_LATENCY, 'value': 0.000305292} │
│ 44 │ Metrics │ INFO │ {'metric': COMMIT_WRITE_WAL_LATENCY, 'value': 0.003793958} │
│ 49 │ Metrics │ INFO │ {'metric': COMMIT_WRITE_WAL_LATENCY, 'value': 0.0} │
│ 39 │ Metrics │ INFO │ {'metric': CPU_TIME, 'value': 0.000110209} │
│ 44 │ Metrics │ INFO │ {'metric': CPU_TIME, 'value': 0.009471759999999997} │
│ 49 │ Metrics │ INFO │ {'metric': CPU_TIME, 'value': 8.241736770029297} │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ 39 │ Metrics │ INFO │ {'metric': SYSTEM_PEAK_BUFFER_MEMORY, 'value': 36864} │
│ 44 │ Metrics │ INFO │ {'metric': SYSTEM_PEAK_BUFFER_MEMORY, 'value': 6625280} │
│ 49 │ Metrics │ INFO │ {'metric': SYSTEM_PEAK_BUFFER_MEMORY, 'value': 63510528} │
│ 39 │ Metrics │ INFO │ {'metric': TOTAL_BYTES_WRITTEN, 'value': 0} │
│ 44 │ Metrics │ INFO │ {'metric': TOTAL_BYTES_WRITTEN, 'value': 262144} │
│ 49 │ Metrics │ INFO │ {'metric': TOTAL_BYTES_WRITTEN, 'value': 12587008} │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
├────────────┴─────────┴───────────┴────────────────────────────────────────────────────────────┤
│ 57 rows (? shown) 4 columns │
└───────────────────────────────────────────────────────────────────────────────────────────────┘
```
commit 307f9b41ff0464dba0e0f2504c75747c7ead2ecc
Merge: 1cba2e741b 08bf725300
Author: Mark <mark.raasveldt@gmail.com>
Date: Thu Oct 30 15:03:25 2025 +0100
[ported from main] Fix bug initializing std::vector for column names (#19555)
This 4 line fix was merged with main in #19444. It should be in
v1.4-andium as well so that it makes it into v1.4.2.
commit 1cba2e741b6622f5be156c061478a6fa66c0f819
Merge: ecb6bfe5b4 80554e4d59
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Thu Oct 30 14:47:58 2025 +0100
Bugfixes: Parquet JSON+DELTA_LENGTH_BYTE_ARRAY and sorting iterator (#19556)
This PR fixes an issue introduced in v1.4.1 with the Parquet reader when
combining a `JSON` column with `DELTA_LENGTH_BYTE_ARRAY` encoding. The
issue was caused by trying to validate an entire block of strings in one
go, which is OK for UTF-8, but for JSON. This PR makes it so we validate
individual strings if the column has `JSON` type.
Fixes https://github.com/duckdb/duckdb/issues/19366
This PR also fixes an issue with the new sorting code, which had an
error in the calculation of subtraction under modulo. I've fixed this,
and unified the code for `InMemoryBlockIteratorState` and
`ExternalBlockIteratorState` with some templating, so now the erroneous
calculation should be gone from both state types.
Fixes https://github.com/duckdb/duckdb/issues/19498
commit 9414882f7fc81be58af0ec914cbe8c6045af3517
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Thu Oct 30 12:39:48 2025 +0100
Allow back basics tests also in release mode
commit 2987acd0d19656e583f30447a91852793ef188f7
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Thu Oct 30 12:36:32 2025 +0100
Add test on codename being registered, and tag it as release
commit 6bc51dd58edaf76725810b595a5300044749c0cf
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Thu Oct 30 13:24:45 2025 +0100
disable jemalloc BSD
commit 80554e4d592ec793676a80b180469a572a247f2a
Merge: 5974ef8c03 ecb6bfe5b4
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Thu Oct 30 09:57:58 2025 +0100
Merge branch 'v1.4-andium' into bugfixes_v1.4
commit 08bf725300335d34f05cd6f6f508f78ef57c477b
Author: Curt Hagenlocher <curt@hagenlocher.org>
Date: Fri Oct 17 14:08:52 2025 -0700
Fix bug initializing std::vector for column names
commit ecb6bfe5b483ffd1a2a490275b48ec91501680c4
Merge: 09a36d2f73 94471b8e04
Author: Hannes Mühleisen <227792+hannes@users.noreply.github.com>
Date: Thu Oct 30 09:01:41 2025 +0200
Follow up to staging move (#19551)
Follow up to #19539, CF does not like AWS regions
commit 94471b8e0472a2507623b2408808156f6ddde764
Author: Hannes Mühleisen <hannes@muehleisen.org>
Date: Thu Oct 30 07:49:34 2025 +0200
this region does not exist in cf
commit 09a36d2f73d1b2f93682e315761bb3c4973f8ac9
Merge: a23f54fb54 c2a4fc29dc
Author: Mark <mark.raasveldt@gmail.com>
Date: Wed Oct 29 21:51:05 2025 +0100
[Dev] Disable the use of `ZSTD` if the block_manager is the `InMemoryBlockManager` (#19543)
This PR fixes https://github.com/duckdblabs/duckdb-internal/issues/6319
This has to be done because the InMemoryBlockManager doesnt support
GetFreeBlockId, which is required by the zstd compression method.
I couldn't produce a test for this because I can't reproduce the problem
in the unittester, only in the CLI
(I assume the storage version prevents in-memory compression???)
commit c2a4fc29dceb617c80ab9156d84f2320add29542
Author: Tishj <t_b@live.nl>
Date: Wed Oct 29 16:37:20 2025 +0100
add test for disabled zstd compression in memory
commit 5974ef8c03afcd01df670a42dd7be0bbb2a6c6ff
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Wed Oct 29 16:34:54 2025 +0100
properly set file paht in test
commit a35ba26f267eca2fb144e07b14706af2b96270a8
Author: Tishj <t_b@live.nl>
Date: Wed Oct 29 15:19:03 2025 +0100
disable the use of ZSTD if the block_manager is the InMemoryBlockManager, since it doesnt support GetFreeBlockId
commit fd85508aa0065a18180a6f9af1d4c66842b28964
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Wed Oct 29 15:08:06 2025 +0100
re-add missing initialization
commit a23f54fb54c686614cdaf547778b4c6f47bcbf5c
Merge: f2e48a73d4 ab586dfaf6
Author: Hannes Mühleisen <227792+hannes@users.noreply.github.com>
Date: Wed Oct 29 14:52:40 2025 +0200
Creating separate OSX cli binaries for each arch (#19538)
Also no longer adding the shared library three times because of symlinks
commit f2e48a73d42ce538706529e51aec54cfd9f96d84
Merge: 5a6521ca7e ccefe12386
Author: Hannes Mühleisen <227792+hannes@users.noreply.github.com>
Date: Wed Oct 29 14:51:26 2025 +0200
Moving staging to cf and uploading to install bucket (#19539)
This adds a custom endpoint for staging uploads so we can move to R2 for
this. We also add functionality to upload to the R2 bucket behind
`install.duckdb.org`. Once merged, I will update/add the following
secrets:
- `S3_DUCKDB_STAGING_ENDPOINT`
- `S3_DUCKDB_STAGING_ID`
- `S3_DUCKDB_STAGING_KEY`
- `DUCKDB_INSTALL_S3_ENDPOINT`
- `DUCKDB_INSTALL_S3_ID`
- `DUCKDB_INSTALL_S3_SECRET`
commit f5bc9796be79b602ed1892484e060f0e79083610
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Wed Oct 29 13:43:05 2025 +0100
nicer templating and less code duplication
commit ccefe12386007dd65fae1fe3ff1d65bcb45df44d
Author: Hannes Mühleisen <227792+hannes@users.noreply.github.com>
Date: Wed Oct 29 14:18:15 2025 +0200
Update .github/workflows/StagedUpload.yml
Co-authored-by: Carlo Piovesan <piovesan.carlo@gmail.com>
commit 41fc70ae3312599e425d140f7db770f56c2c5c38
Author: Hannes Mühleisen <227792+hannes@users.noreply.github.com>
Date: Wed Oct 29 14:00:41 2025 +0200
Update .github/workflows/StagedUpload.yml
Co-authored-by: Carlo Piovesan <piovesan.carlo@gmail.com>
commit e8c2d9401b580c64ef5d3cad3cb8d301375ddbd3
Author: Hannes Mühleisen <hannes@muehleisen.org>
Date: Wed Oct 29 12:35:30 2025 +0200
moving staging to cf and uploading to install bucket
commit 7df4151c0d4967e2dd33eff7f426805df3c56442
Author: Max Gabrielsson <max@gabrielsson.com>
Date: Wed Oct 29 10:58:22 2025 +0100
remove named parameters
commit ab586dfaf6bf58fa8376944e599c51efea462cb8
Author: Hannes Mühleisen <hannes@muehleisen.org>
Date: Wed Oct 29 11:46:18 2025 +0200
creating separate osx cli binaries for each arch
commit 8f30296d7c05c277771bf1fe95b73fafe7fa9d0f
Merge: 5dac9f7504 5a6521ca7e
Author: Artjom Plaunov <artyemnyc@gmail.com>
Date: Wed Oct 29 09:39:30 2025 +0100
Merge remote-tracking branch 'upstream/v1.4-andium' into wal-index-deletes
commit 5a6521ca7e744205e4c3b67cab8708e2df87073b
Merge: 8c7210f9b0 601d68526c
Author: Mark <mark.raasveldt@gmail.com>
Date: Wed Oct 29 07:55:06 2025 +0100
Add test that either 'latest' or 'vX.Y.Z' are supported STORAGE_VERSIONs (#19527)
Connected to https://github.com/duckdb/duckdb/pull/19525, adds a test
that would have triggered.
That test is not build when actually building releases, so that's not
fool-proof, but I think adding this in is helpful.
Tested locally to behave as intended both on dev commit (success) and
tag (fails, fixed via linked PR).
commit 8c7210f9b0270517e1dba11502dc196a3f0cb13c
Merge: 7b5c16f2d5 99f26bde2d
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Oct 28 18:58:35 2025 +0100
add upcoming patch release to internal versions (#19525)
commit 7b5c16f2d51dda602c9ddfed58d71bb6ae3275a0
Merge: 23228babba 295603915b
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Oct 28 18:58:16 2025 +0100
Bump multiple extensions (#19522)
This PR bumps the following extensions:
- `avro` from `7b75062f63` to `93da8a19b4`
- `delta` from `03aaf0f073` to `0747c23791`
- `ducklake` from `f134ad86f2` to `2554312f71`
- `iceberg` from `4f3c5499e5` to `30a2c66f10`
- `spatial` from `a6a607fe3a` to `61ede09bec`
commit 23228babba519ec70b183b03ea6bc4457b3ed84c
Merge: 71a64b5ab4 6a38ac0f69
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Oct 28 18:58:00 2025 +0100
Bump: inet (#19526)
This PR bumps the following extensions:
- `inet` from `f6a2a14f06` to `fe7f60bb60 (patches removed: 1)`
commit 067d6eb0d5c56270f1d24951966191d9c12c3008
Author: Max Gabrielsson <max@gabrielsson.com>
Date: Tue Oct 28 17:33:43 2025 +0100
fix inconsistent behavior in remote read_file/blob, and prevent union_by_name from crashing
commit 601d68526c9e616ff08a0e08d949f00dcfb76060
Author: Carlo Piovesan <piovesan.carlo@gmail.com>
Date: Tue Oct 28 13:11:45 2025 +0100
Add test that either 'latest' or 'vX.Y.Z' are supported STORAGE_VERSIONs
commit c63c5060d01340dc11f39349bf7950fb8eaa455b
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Tue Oct 28 15:55:12 2025 +0100
fix #19498
commit 7e52dc5a75532c5413088fbb9f90e6a30f9e5d14
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Tue Oct 28 15:54:56 2025 +0100
add missing test
commit 71a64b5ab4005fd2eb63cb3912403fde29f4d7e0
Merge: 76ee047ce4 3856fa8ea8
Author: Mark <mark.raasveldt@gmail.com>
Date: Tue Oct 28 14:30:18 2025 +0100
Support non-standard NULL in Parquet again (#19523)
https://github.com/duckdb/duckdb/pull/19406 removed support for the
non-standard NULL by adding the safe enum casts.
Support for this was explicitly added in
https://github.com/duckdb/duckdb/pull/11774
We could consider removing support for this - but it shouldn't be done
as part of a bug-fix release imo. This also currently breaks merging
v1.4 -> main.
commit 05fb1249cab3404bc396ccaee0cdb1959ae11481
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Tue Oct 28 14:19:50 2025 +0100
fix #19366
commit 5dac9f750490e1ea601b03d8e3d11db7a9cc0197
Merge: 0d4a78c90f 76ee047ce4
Author: Artjom Plaunov <artyemnyc@gmail.com>
Date: Tue Oct 28 13:14:30 2025 +0100
Merge remote-tracking branch 'upstream/v1.4-andium' into wal-index-deletes
commit 0d4a78c90f6288abe842afab521ba1e7a075307f
Author: Artjom Plaunov <artyemnyc@gmail.com>
Date: Tue Oct 28 13:12:44 2025 +0100
remove int types
commit 6a38ac0f699f2f85adda33d61c94c6ec054d89ca
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Tue Oct 28 13:08:40 2025 +0100
bump extensions
commit 3cd616b89657c5489844d8a76d26169554e5af96
Author: Artjom Plaunov <artyemnyc@gmail.com>
Date: Tue Oct 28 12:57:05 2025 +0100
PR review fixes + more C++ test coverage
commit 0fde0c573099c317b0710ed42d87864ee4b75c00
Merge: baa522991e 76ee047ce4
Author: Laurens Kuiper <laurens.kuiper@cwi.nl>
Date: Tue Oct 28 12:32:44 2025 +0100
Merge branch 'v1.4-andium' into bugfixes_v1.4
commit 99f26bde2d03e9958ac4bd37f5f8a0ac67b2fcd3
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Tue Oct 28 12:07:39 2025 +0100
add upcoming patch release to internal versions
commit 3856fa8ea82bd8b9c11166102aab602ddf165ee2
Author: Mytherin <mark.raasveldt@gmail.com>
Date: Tue Oct 28 11:19:35 2025 +0100
Support non-standard NULL in Parquet again
commit 295603915b0ab3a1532cbbe6cf9547f9803e3c46
Author: Sam Ansmink <samansmink@hotmail.com>
Date: Tue Oct 28 10:58:22 2025 +0100
bump extensions
commit c1d826f2523bd8454426ad7401665e8e69f9dadc
Author: Artjom Plaunov <artyemnyc@gmail.com>
Date: Tue Oct 28 08:55:00 2025 +0100
unnamed name space
commit 76ee047ce45bab9472068ea360f9894a3a456a83
Merge: b62b03c4b3 bd3eb153b1
Author: Laurens Kuiper <laurens@duckdblabs.com>
Date: Tue Oct 28 08:34:42 2025 +0100
Make `DatabaseInstance::…
Mytherin
added a commit
that referenced
this pull request
Nov 20, 2025
DuckDB can significantly speed up checkpoints by [reusing existing metadata](#18395) when there have been no changes to a rowgroup. This does not apply for rowgroups with deletes, unfortunately. As soon as someone has run a simple `select count(*)` query and the deletes are loaded, the metadata reuse on checkpoint optimization stops working. This PR here fixes the situation by allowing the optimization to still come into play, even when the deletes are loaded.
philippmd
pushed a commit
to motherduckdb/public-duckdb
that referenced
this pull request
Nov 21, 2025
DuckDB can significantly speed up checkpoints by [reusing existing metadata](duckdb#18395) when there have been no changes to a rowgroup. This does not apply for rowgroups with deletes, unfortunately. As soon as someone has run a simple `select count(*)` query and the deletes are loaded, the metadata reuse on checkpoint optimization stops working. This PR here fixes the situation by allowing the optimization to still come into play, even when the deletes are loaded.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Follow-up from #18390
This PR implements metadata re-use at the row group level, so if we are e.g. appending to a large table we no longer rewrite the metadata of unchanged row-groups, and instead refer to the existing metadata on-disk. In addition, this PR also performs a few fixes where we would eagerly load columns unnecessarily.
Performance
Consider a database storing TPC-H SF100. The biggest table (lineitem) has 600M rows.
Now insert a single row into the largest table (
lineitem) and checkpoint. We measure two times: the time of theCHECKPOINTcommand, and the full runtime of opening the database, running the insert + checkpoint, and closing the database.