Skip to content

Conversation

@DavisVaughan
Copy link
Member

The struct forces an 8 byte alignment, so it costs us 24 bytes for this struct, when it should really only cost 20 bytes. Since we need 2 vectors of these, thats 8 wasted bytes per string that we can save by splitting them back up, at the cost of more bookkeeping and slightly lower loading speeds, because they are split across separate vectors now.

The `struct` forces an 8 byte alignment, so it costs us 24 bytes for this struct, when it should really only cost 20 bytes. Since we need 2 vectors of these, thats 8 wasted bytes per string that we can save by splitting them back up, at the cost of more bookkeeping.
* Track `NA`ness rather than whole `SEXP`

For 14 bytes less memory per string

* Go back to post check loading

* Handle `NA`s up front (#2122)

* Handle `NA`s up front

Avoiding the need for `p_x_string_nas_aux` altogether, and nicely simplifying and speeding up the radix and insertion paths

* Remove `p_lazy_x_string_nas` entirely

By having the `SEXP` be available in both `chr_order()` and `chr_order_chunk()`

* Split extraction and order rearrangement

* Simplify aux placement code

* Remove `x_string_sizes` as well (#2123)

* Remove size related vectors

* Advance `p_x[i]` in place, rather than accessing with `[pass]`

This keeps `p_x[i]` up to date after each pass, always pointing to the next byte.

This didn't seem to have much of a performance impact, but it nicely cleans up `chr_insertion_order()` and `chr_all_same_byte()` because they no longer need to worry about the `pass`, the pointers are always advanced to `pass` already
@DavisVaughan DavisVaughan merged commit 5af7352 into feature/no-truelength-order Dec 12, 2025
11 checks passed
@DavisVaughan DavisVaughan deleted the feature/no-truelength-order-split-info branch December 12, 2025 21:15
DavisVaughan added a commit that referenced this pull request Dec 15, 2025
* Remove dead lazy code

* Remove `TRUELENGTH` backed character sort

Still need to properly support `NA`, `decreasing`, and `na_last`, but this is a major part of it

* Add `decreasing`, `na_last`, and `NA` support in `chr_order_insertion()`

* Handle `decreasing`, `na_last` and `NA` in `chr_order_radix()`

* Move the all-same-byte check before the histogram

* Get `order.Rmd` working again

* Consistency tweaks

* Recognize that we can pointer swap `p_x` and `p_x_aux`

* Split `struct str_info` into individual components (#2120)

* Split `struct str_info` into individual components

The `struct` forces an 8 byte alignment, so it costs us 24 bytes for this struct, when it should really only cost 20 bytes. Since we need 2 vectors of these, thats 8 wasted bytes per string that we can save by splitting them back up, at the cost of more bookkeeping.

* Track `NA`ness rather than whole `SEXP` (#2121)

* Track `NA`ness rather than whole `SEXP`

For 14 bytes less memory per string

* Go back to post check loading

* Handle `NA`s up front (#2122)

* Handle `NA`s up front

Avoiding the need for `p_x_string_nas_aux` altogether, and nicely simplifying and speeding up the radix and insertion paths

* Remove `p_lazy_x_string_nas` entirely

By having the `SEXP` be available in both `chr_order()` and `chr_order_chunk()`

* Split extraction and order rearrangement

* Simplify aux placement code

* Remove `x_string_sizes` as well (#2123)

* Remove size related vectors

* Advance `p_x[i]` in place, rather than accessing with `[pass]`

This keeps `p_x[i]` up to date after each pass, always pointing to the next byte.

This didn't seem to have much of a performance impact, but it nicely cleans up `chr_insertion_order()` and `chr_all_same_byte()` because they no longer need to worry about the `pass`, the pointers are always advanced to `pass` already

* Tweak comment

* Another empty string test

* Make the `\0` early exit less hidden
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants