Skip to content

Conversation

@Alex-Monahan
Copy link

@Alex-Monahan Alex-Monahan commented Dec 22, 2025

Hi folks!

This is a new PR meant to solve the same use case as #593, but addressing the PR feedback! Thank you for the guidance - it was super helpful. I am still open to any changes you recommend!

This adds a new table to the DuckLake spec: ducklake_sort_key. This keeps a version history of the sort settings for tables over time. (The prior PR used an option for this, but that has been removed based on the feedback).

Also, would you prefer the name ducklake_sort_info to align with ducklake_partition_info as the spec table name?

There are still a few limitations with this PR:

  1. This does not order during insert (only during compaction and inline flush).
    • I would love to try to do a follow up PR to add this!
  2. Only explicit column names can be used in the sorting, not expressions.
    • I tried to implement this but could not, so I could use some help with this piece!
    • There is a friendly error message (and tests) to document this limitation.
    • The spec has a column expression, so the intention was to make the spec itself forwards-compatible with expression-oriented sorting.
  3. Files are still selected for compaction based on insertion order. It could be better to sort the list of files by min/max metadata before selecting files for compaction.
    • Let me know if this is desirable and I can work on it after the "order during insert" in number 1!

I believe that I made this fully compatible with the batching code, but I was testing locally on a DuckDB catalog and not on Postgres. Any extra eyes on that side would be great!

If this looks good, I can also do any docs PRs that you recommend - happy to help there.

Thanks folks! CC @philippmd as well as an FYI

@Alex-Monahan
Copy link
Author

The failing test did not reproduce for me locally, so I made a commit with just a comment change to force a CI re-run in the hopes that it was a flaky test. The test is not an area of the code that I think I should have edited, but, there are reasons that we have tests...

Failing test was test/sql/data_inlining/data_inlining_flush_schema.test.

@Alex-Monahan
Copy link
Author

Hmm, this is a different failing test and does not appear related. Please let me know if you would like for me to investigate, otherwise it seems random...

@Alex-Monahan
Copy link
Author

Thinking ahead to adding the insert capability, is the right spot to add that in ducklake_insert.cpp in DuckLakeInsert::PlanCopyForInsert, right before creating the auto &physical_copy ?

@pdet
Copy link
Collaborator

pdet commented Dec 23, 2025

Hmm, this is a different failing test and does not appear related. Please let me know if you would like for me to investigate, otherwise it seems random...

There are a couple of spurious tests failing that I still have to track down, the ones I have seen here are unrelated to your PR.

@pdet
Copy link
Collaborator

pdet commented Dec 23, 2025

Thinking ahead to adding the insert capability, is the right spot to add that in ducklake_insert.cpp in DuckLakeInsert::PlanCopyForInsert, right before creating the auto &physical_copy ?

That sounds about right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants