Skip to content

Conversation

@Tmonster
Copy link
Contributor

@Tmonster Tmonster commented Dec 2, 2025

# group: [sample]

# sample max on windows 32 bit is 10000000, otherwise 1000000000
require notwindows
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be require ram 16gb or similar - given the reservoir sample down below is pre-allocating loads of memory in the current implementation

Copy link
Contributor Author

@Tmonster Tmonster Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added require ram 16gb. Makes se wonder if I should lower the sample limit even further? 1,000,000,000 (1 billion) rows at 1 byte per row is already one GB. Maybe assume 1kb per row max? Which would make the max reservoir sample size something like 100,000 rows?

The error message could then include that you should use a percentage if you want more. Reservoir percentage samples are allocated as the table scan executes, and not all at once.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just not pre-allocate the entire reservoir if the reservoir exceeds some threshold instead of lowering the limit, but let's do that at a later point.

Copy link
Contributor Author

@Tmonster Tmonster Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft December 2, 2025 22:01
@Tmonster Tmonster marked this pull request as ready for review December 2, 2025 22:07
@duckdb-draftbot duckdb-draftbot marked this pull request as draft December 3, 2025 08:43
@Tmonster
Copy link
Contributor Author

Tmonster commented Dec 3, 2025

First checking CI locally here

@taniabogatsch
Copy link
Contributor

I'll mirror this PR internally so we can track it for the release (it is related to InvokeCI failing, right)?

@Tmonster Tmonster marked this pull request as ready for review December 3, 2025 12:09
@Tmonster
Copy link
Contributor Author

Tmonster commented Dec 3, 2025

Looks like MacOS failed int he extensions, but that is due to a download failure here

@taniabogatsch
Copy link
Contributor

Maybe we can re-run the job cc @pdet or @lnkuiper ?

@taniabogatsch
Copy link
Contributor

Or is this error also happening on other PRs / something broke on our side?

@lnkuiper
Copy link
Collaborator

lnkuiper commented Dec 3, 2025

I can't selectively rerun that one CI run right now, but I think it will be possible once more CI runs are done

@Mytherin Mytherin merged commit a86af88 into duckdb:v1.4-andium Dec 3, 2025
61 of 62 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Dec 3, 2025

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Dec 3, 2025
Windows must sample less (duckdb/duckdb#20018)
Bump MySQL scanner (duckdb/duckdb#20025)
Fix INSERT OR REPLACE BY NAME with partial columns(duckdb/duckdb#19845) (duckdb/duckdb#19989)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Dec 3, 2025
Windows must sample less (duckdb/duckdb#20018)
Bump MySQL scanner (duckdb/duckdb#20025)
Fix INSERT OR REPLACE BY NAME with partial columns(duckdb/duckdb#19845) (duckdb/duckdb#19989)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants