[FEA] Add 64-bit size type option at build-time for libcudf

Many libcudf users have expressed interest in using a 64-bit size type (see #3958 for reference). The `cudf::size_type` uses a `int32_t` data type that limits the number of elements in libcudf columns to `INT_MAX` (2.1 billion) elements. For string columns this imposes a ~2 GB limit, for int32 columns this imposes a ~8 GB limit, and for list columns this imposes a leaf element count <2.1 billion. Downstream libraries must partition their data to avoid these limits.

We expect that using a 64-bit size type will incur significant penalties to memory footprint and data throughput. Memory footprint will double for all offset vectors, and runtime of most functions will increase due to the larger data sizes. Kernel performance may degrade even further due to increased register count and unoptimized shared memory usage.

As GPUs increase in memory, the limit from a 32-bit `cudf::size_type` will force data partitions to become smaller fractions of device memory. Excessive data partitioning also leads to performance penalties, so libcudf should enable its community to start experimenting with a 64-bit size type. Scoping for 64-bit size types in the cuDF-python layer will be tracked in a separate issue (#TBD).

- [ ] Consult with thrust/cub experts about outstanding issues with 64-bit indexing. Some libcudf functions may depend on upstream changes in CCCL, please see [cccl/47](https://github.com/NVIDIA/cccl/issues/47), [thrust/1271](https://github.com/NVIDIA/cccl/issues/744), and [cub/212](https://github.com/NVIDIA/cub/issues/212). `copy_if`, `reduce`, `parallel_for`, `merge` and `sort` may have unresolved issues.
- [ ] Consult with thrust/cub experts about making 32-bit kernels optional. Currently the 64-bit kernels and disabled in libcudf builds. Disabling the 32-bit kernels would avoid large increases in compile time and binary size when we enable 64-bit thrust/cub kernels.
- [ ] Verify compatibility of 64-bit size type with cuco data structures (needs additional scoping)
- [ ] Audit custom kernels in libcudf for the impact of a 64-big size type. Introduce conditional logic to adjust shared memory allocations and threads per block as needed based on the size type. Identify implementation details that take a 32-bit size type for granted.
- [ ] Audit cuIO size types and their interaction with `cudf::size_type`
- [ ] Resolve compilation errors from using a 64-bit size type
- [ ] Resolve test failures from using a 64-bit size type
- [ ] Review performance impact of a 64-bit size type using libcudf microbenchmark results 
- [ ] Add a build-time option for advanced users to use a 64-bit size type instead of a 32-bit size type.
- [ ] Add a CI step to build and test the 64-bit size type option.

From this stage we will have a better sense of the impact and value of using a 64-bit size type with libcudf.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add 64-bit size type option at build-time for libcudf #13159

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Add 64-bit size type option at build-time for libcudf #13159

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions