Releases: NVIDIA/cccl
CCCL Python Libraries (v0.4.3)
These are the release notes for the cuda-cccl Python package version 0.4.3, dated December 18th, 2025. The previous release was v0.4.2.
cuda.cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.
Installation
Please refer to the install instructions here
Features
Improvements and bug fixes
- Add missing OpKind docs entries (#6910)
- Unify operator handling in cuda.compute (#6938)
- [cuda.compute] Refactor code for creating void* wrappers (#6941)
- Remove need for hardcoded
LevelTfor histogram in c.parallel and cuda.compute (#6915) - c.parallel: reuse CUB agent policies for histogram (#6974)
- [cuda.compute]: fix alignment not being set properly for
gpu_structtypes (#6995)
CCCL Python Libraries (v0.4.2)
These are the release notes for the cuda-cccl Python package version 0.4.2, dated December 9th, 2025. The previous release was v0.4.1.
cuda.cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.
Installation
Please refer to the install instructions here
Features
Improvements and bug fixes
- Add explicit dependency on nvidia-nvvm (#6909 )
CCCL Python Libraries (v0.4.1)
These are the release notes for the cuda-cccl Python package version 0.4.1, dated December 8th, 2025. The previous release was v0.4.0.
cuda.cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.
Installation
Please refer to the install instructions here
Features
Improvements and bug fixes
- Fix issue with
get_dtype()not working anymore for pytorch arrays (#6882) - Add fast path to extract PyTorch array pointer (#6884)
Breaking Changes
CCCL Python Libraries (v0.4.0)
These are the release notes for the cuda-cccl Python package version 0.4.0, dated December 3rd, 2025. The previous release was v0.3.4.
cuda.cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.
Installation
Please refer to the install instructions here
Features
- Added
selectalgorithm for filtering data (#6766) - Support for nested structs (#6353)
- Added
DiscardIterator(#6618) - The
cccl-pythonPython package can now be installed via conda (#6513)
Improvements and bug fixes
- Allow numpy struct types as initial value for Zipiterator inputs (#6861)
- Allow using ZipIterator as an output in cuda.compute (#6518)
- Enable caching of advance/dereference methods for Zipiterator and PermutationIterator (#6753)
- Use wrapper with
void*argument types for iterator advance/dereference signature (#6634) - Fixes and improvements to function caching (#6758)
- Fix handling of wrapped cuda.jit functions (#6770)
- Use annotations if available to determine return type of transform op (#6760)
- Allow passing in
Noneas init value for scan when using an iterator as input (#6499)
Breaking Changes
v3.1.3
What's Changed
🔄 Other Changes
- [Backport branch/3.1.x] Fix invalid reference type of
cuda::strided_iteratorby @github-actions[bot] in #6517 - [Backport branch/3.1.x] Fixes issue with select close to int_max by @github-actions[bot] in #6700
- Bump branch/3.1.x to 3.1.3. by @wmaxey in #6621
- Backport changes for XGBoost compatibility by @bdice in #6727
Full Changelog: v3.1.2...v3.1.3
v3.1.2
What's Changed
🔄 Other Changes
- [BACKPORT 3.1] Always include
<new>when we need operator new for clang-cuda (#6310) by @miscco in #6445 - [Backport branch/3.1.x] Fix offset_iterator tests by @github-actions[bot] in #6446
- [BACKPORT 3.1] Add
_CCCL_DECLSPEC_EMPTY_BASESto mdspan features (#6444) by @miscco in #6449 - Bump branch/3.1.x to 3.1.2. by @wmaxey in #6433
- [Backport 3.1] Fix clang 21 issues (#6404) by @davebayer in #6447
- [Backport branch/3.1.x] Ensure that
detect_wrong_differenceis a valid output iterator by @github-actions[bot] in #6453 - [Backport to 3.1] Fix
cub.bench.radix_sort.keys.baseregression on H200 (#6452) by @bernhardmgruber in #6458 - [Backport 3.1] Do not mark deduction guides as hidden (#6350) by @miscco in #6457
Full Changelog: v3.1.1...v3.1.2
python-0.3.4
These are the release notes for the cuda-cccl Python package version 0.3.4, dated November 5th, 2025. The previous release was v0.3.3.
cuda.cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.
Installation
Please refer to the install instructions here
Features and improvements
- Introduced
cuda.compute.segmented_sortAPI.
Bug Fixes
Breaking Changes
v3.1.1
What's Changed
🔄 Other Changes
- Bump branch/3.1.x to 3.1.1. by @wmaxey in #6235
- [Backport branch/3.1.x] Fix
__compressed_movable_boxby @github-actions[bot] in #6248 - [Backport branch/3.1.x] Fix
__is_primary_std_templatefor libc++ by @github-actions[bot] in #6249 - [Backport 3.1] Fix invalid refactoring of #4377 (#6246) by @miscco in #6265
- [Backport branch/3.1.x] Fix using
charas the index type oftabulate_output_iteratorby @github-actions[bot] in #6273 - [Backport 3.1]: Fix missing qualifications for
__construct_at(#6270) by @miscco in #6274 - [Backport branch/3.1.x] Fix missed constructor with compressed box by @github-actions[bot] in #6272
- [Backport 3.1] Fix
string_viewconstruction fromstd::string_view(#6291) by @davebayer in #6301 - [Backport 3.1] Include
<math.h>in<cuda/std/cmath>headers unconditionally (#6333) by @davebayer in #6339
Full Changelog: v3.1.0...v3.1.1
python-0.3.3
These are the release notes for the cuda-cccl Python package version 0.3.3, dated October 21st, 2025. The previous release was v0.3.2.
cuda.cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.
Installation
Please refer to the install instructions here
Features and improvements
- This is the first release that features Windows wheels published to PyPI. You can now
pip install cuda-cccl[cu12]orpip install cuda-cccl[cu13]on Windows for Python versions 3.10, 3.11, 3.12, and 3.13.
Bug Fixes
Breaking Changes
python-0.3.2
These are the release notes for the cuda-cccl Python package version 0.3.2, dated October 17th, 2025. The previous release was v0.3.1.
cuda.cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.
Installation
Please refer to the install instructions here
Features and improvements
- Allow passing in a device array or
Noneas the initial value in scan.