-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat: Fix length of sort keys #19260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // call Finalize on the result | ||
| for (idx_t r = 0; r < size; r++) { | ||
| result_data[r].Finalize(); | ||
| result_data[r].SetSizeAndFinalize(offsets[r], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #18885 this sounds like it could also result in a problem here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Yes, I saw this. Do we need to take extra care in this PR?
Earlier, I wondered if SetSizeAndFinalize() should also zero out the remaining INLINE_LENGTH - size bytes that were not filled by the memcpy(), to satisfy the check in VerifyCharacters() .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it already does that
What it doesnt do is deal with the case where you allocate for > inlined_size and then SetSizeAndFinalize(<= inlined_size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it did? I'm seeing in my code:
void SetSizeAndFinalize(uint32_t size, idx_t allocated_size) {
value.inlined.length = size;
if (allocated_size > INLINE_LENGTH && IsInlined()) {
//! Data was written to the 'value.pointer.ptr', has to be copied to the inlined bytes
D_ASSERT(value.pointer.ptr);
auto ptr = value.pointer.ptr;
memcpy(GetDataWriteable(), ptr, size);
}
Finalize();
VerifyCharacters();
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tishj I think this should work properly here. The allocation could be greater than inlined size (while the actual written string is not), but SetSizeAndFinalize handles that case nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I thought that somehow landed in a dev branch, since I thought I only saw one argument to SetSizeAndFinalize but I was mistaken
Thats why I linked to that PR
Anyway, ignore this thread
lnkuiper
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This should always truncate the string to the actual written size (stored in offsets), which excludes bytes for fixed-width columns that are NULL (that were initially added to constant_length).
|
Thanks! |
Pass lambda bindings to next binder (duckdb/duckdb#19144) feat: Fix length of sort keys (duckdb/duckdb#19260) Fix regex optimization to remove flags when converting to contains (duckdb/duckdb#19290) bump duckdb-azure ref for 1.4.1 (duckdb/duckdb#19275)
Pass lambda bindings to next binder (duckdb/duckdb#19144) feat: Fix length of sort keys (duckdb/duckdb#19260) Fix regex optimization to remove flags when converting to contains (duckdb/duckdb#19290) bump duckdb-azure ref for 1.4.1 (duckdb/duckdb#19275) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
This is in response to Valgrind errors observed in the R package.
In the presence of
NULLvalues (and perhaps with other conditions), sort keys can become shorter than the originally anticipated length. This leads to uninitialized memory access.To prevent UMA, we can either ensure that the uninitialized memory is not read (as this PR does), or fill it with some value.
Submitting the PR to get a second opinion and also to perhaps run the tests on CI/CD. Merging will make my patch in the R package obsolete.
Details
Failing valgrind run: https://github.com/duckdb/duckdb-r/actions/runs/18214180255/job/51860150272#step:6:1717
Reproducible example in R: