Skip to content

[FEA] Add 64-bit size type option at build-time for libcudf #13159

@GregoryKimball

Description

@GregoryKimball

Many libcudf users have expressed interest in using a 64-bit size type (see #3958 for reference). The cudf::size_type uses a int32_t data type that limits the number of elements in libcudf columns to INT_MAX (2.1 billion) elements. For string columns this imposes a ~2 GB limit, for int32 columns this imposes a ~8 GB limit, and for list columns this imposes a leaf element count <2.1 billion. Downstream libraries must partition their data to avoid these limits.

We expect that using a 64-bit size type will incur significant penalties to memory footprint and data throughput. Memory footprint will double for all offset vectors, and runtime of most functions will increase due to the larger data sizes. Kernel performance may degrade even further due to increased register count and unoptimized shared memory usage.

As GPUs increase in memory, the limit from a 32-bit cudf::size_type will force data partitions to become smaller fractions of device memory. Excessive data partitioning also leads to performance penalties, so libcudf should enable its community to start experimenting with a 64-bit size type. Scoping for 64-bit size types in the cuDF-python layer will be tracked in a separate issue (#TBD).

  • Consult with thrust/cub experts about outstanding issues with 64-bit indexing. Some libcudf functions may depend on upstream changes in CCCL, please see cccl/47, thrust/1271, and cub/212. copy_if, reduce, parallel_for, merge and sort may have unresolved issues.
  • Consult with thrust/cub experts about making 32-bit kernels optional. Currently the 64-bit kernels and disabled in libcudf builds. Disabling the 32-bit kernels would avoid large increases in compile time and binary size when we enable 64-bit thrust/cub kernels.
  • Verify compatibility of 64-bit size type with cuco data structures (needs additional scoping)
  • Audit custom kernels in libcudf for the impact of a 64-big size type. Introduce conditional logic to adjust shared memory allocations and threads per block as needed based on the size type. Identify implementation details that take a 32-bit size type for granted.
  • Audit cuIO size types and their interaction with cudf::size_type
  • Resolve compilation errors from using a 64-bit size type
  • Resolve test failures from using a 64-bit size type
  • Review performance impact of a 64-bit size type using libcudf microbenchmark results
  • Add a build-time option for advanced users to use a 64-bit size type instead of a 32-bit size type.
  • Add a CI step to build and test the 64-bit size type option.

From this stage we will have a better sense of the impact and value of using a 64-bit size type with libcudf.

Metadata

Metadata

Assignees

No one assigned

    Labels

    0 - BacklogIn queue waiting for assignmentfeature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.

    Type

    No type

    Projects

    Status

    Story Issue

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions