-
Notifications
You must be signed in to change notification settings - Fork 2.8k
No sampling over 281TB #19978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No sampling over 281TB #19978
Conversation
| auto rows = sample_value.GetValue<int64_t>(); | ||
| if (rows < 0) { | ||
| throw ParserException("Sample rows %lld out of range, must be bigger than or equal to 0", rows); | ||
| if (rows < 0 || sample_value.GetValue<uint64_t>() >= Allocator::MAXIMUM_ALLOC_SIZE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sample size is in rows but the allocator max size is in bytes - I suspect that sampling for 281474976710656-1 rows will still throw the allocation error. Should we reduce the limit further / add a test for this?
Alternatively, we might want to not allocate all the space for the sample up-front if the sample size is very large - but that's obviously a larger change.
|
Have updated it so that no more than 1 billion rows can be requested. No to get an internal error every row sampled would need 281kb of data for the allocation limit to be reached. I can reduce the max row limit if desired |
No sampling over 281TB (duckdb/duckdb#19978)
No sampling over 281TB (duckdb/duckdb#19978) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Disable sampling over the Allocator::MAXIMUM_ALLOC_SIZE.
Error is thrown in the same place as what might happen if someone samples -1 rows.
Fixes #17780
Fixes https://github.com/duckdblabs/duckdb-internal/issues/5065