Skip to content

Releases: NVIDIA/cccl

CCCL Python Libraries (1.0.0)

12 May 14:35
Immutable release. Only release title and notes can be modified.
e97a80f

Choose a tag to compare

CCCL Python Libraries (v1.0.0)

Previous release: v0.7.0.

This is the first stable release of the cuda-cccl Python package.

The cuda.compute module is now considered stable, and we will follow semantic versioning for changes to its public API going forward.

The cuda.coop module remains experimental and has been moved to cuda.coop._experimental to reflect that. See breaking changes below.

Installation

Please refer to the install instructions here.

API breaking changes

  • cuda.coop cooperative primitives moved to cuda.coop._experimental (#8788)

    The block, warp, and StatefulFunction entry points previously exported from cuda.coop have been moved to the cuda.coop._experimental submodule, signaling that their API is not yet stable and is expected to change in a future release. Top-level cuda.coop no longer re-exports these.

    Before:

    from cuda.coop import block, warp, StatefulFunction

    After:

    from cuda.coop._experimental import block, warp, StatefulFunction
  • cuda.cccl.cooperative legacy namespace removed (#8788)

    The deprecated cuda.cccl.cooperative package (previously kept as a transitional alias) has been removed entirely. Migrate any remaining imports to cuda.coop._experimental.

Features

  • Python 3.14 supportcuda-cccl is now built and tested against Python 3.14 in addition to 3.10–3.13 (#8870).

Bug Fixes / Packaging

  • Avoid incompatible numba-cuda versions — The dependency pin on numba-cuda was tightened to exclude 0.27.x, 0.28.x, 0.29.x, and 0.30.0, which contain regressions that break cuda-cccl (#8831).

Known issues

  • cuda.coop._experimental may fail with RuntimeError: nvdisasm was not found or could not be executed if nvdisasm is not discoverable. Follow the suggestion in the error message to install nvdisasm. If it is already installed, set the CUDA_PATH environment variable (not PATH) to the root of the directory containing bin/nvdisasm:

    export CUDA_PATH=/path/to/cuda   # such that $CUDA_PATH/bin/nvdisasm exists

Notes

  • cuda.compute itself has no API changes in this release relative to v0.7.0. The 0.7.0 release contained the API cleanup (keyword-only arguments, parameter reordering, d_in_values/d_out_values rename in merge_sort); 1.0.0 is the formal stabilization of that API.

CCCL Python Libraries (v0.7.0)

05 May 15:15
Immutable release. Only release title and notes can be modified.
1b6eeab

Choose a tag to compare

cuda-cccl Python package — version 0.7.0

Release date: May 5th, 2026. Previous release: v0.6.0.

cuda-cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.

Installation

Please refer to the install instructions here

API breaking changes

  • All cuda.compute functions now require keyword-only arguments (#8772)

    Every top-level function and factory (make_*) in cuda.compute now enforces keyword-only call
    syntax (i.e., all parameters must be passed by name). Positional calls will raise a TypeError.

    Before:

    reduce_into(d_in, d_out, op, num_items, h_init)

    After:

    reduce_into(d_in=d_in, d_out=d_out, num_items=num_items, op=op, h_init=h_init)

Features

  • System CUDA toolkit install extras — New pip extras sysctk12 / sysctk13 (and
    minimal-sysctk12 / minimal-sysctk13) allow installing cuda-cccl without pulling in
    cuda-toolkit as a pip dependency, for users who already have CUDA installed system-wide
    (#8608):

    pip install cuda-cccl[sysctk13]          # full install, system CTK
    pip install cuda-cccl[minimal-sysctk13]  # no Numba, system CTK

Performance

  • Faster binary searchlower_bound / upper_bound are now implemented via transform
    with a small linear search for the final steps, improving throughput on modern GPUs (#8642)
  • Adaptive warpspeed scan — The scan tuning policy now automatically selects the warpspeed
    (lookahead) scan path when beneficial for the data type and architecture (#8158)

Bug Fixes

  • Fix incorrect minimum CUDA architecture targeted when building the cccl.c native extension
    (#8631)

v3.3.3

20 Apr 18:06
Immutable release. Only release title and notes can be modified.
af8cce4

Choose a tag to compare

What's Changed

🔄 Other Changes

  • Bump branch/3.3.x to 3.3.3. by @wmaxey in #8409
  • [Backport branch/3.3.x] [libcu++] Add missing braces supression to other mempool types by @github-actions[bot] in #8166
  • [Backport branch/3.3.x] Fix order of _CCCL_API and CCCL_DEPRECATED by @github-actions[bot] in #8390
  • [backport 3.3] Fix family arch specific feature detection in <nv/target> (#8027) by @davebayer in #8294
  • [Backport branch/3.3.x] Fix codegen in 128bit atomic CAS by @github-actions[bot] in #8408
  • [Backport branch/3.3.x] [libcu++] Add missing bit_cast in the buffer construction (#8420) by @pciolkosz in #8425

Full Changelog: v3.3.2...v3.3.3

v3.3.2

14 Apr 15:19
Immutable release. Only release title and notes can be modified.
8768676

Choose a tag to compare

What's Changed

🔄 Other Changes

  • Bump branch/3.3.x to 3.3.2. by @wmaxey in #7992
  • [Backport to 3.3]: Support non-copyable stream types in DeviceTransform (#7915) by @bernhardmgruber in #8011
  • [Backport branch/3.3.x] Support DLPack inclusion for both <dlpack/dlpack.h> and <dlpack.h> by @github-actions[bot] in #7910
  • [Backport branch/3.3.x] Add fallback for _CCCL_BUILTIN_EXPECT by @github-actions[bot] in #8049
  • [Backport 3.3] reformulate __as_type_list to avoid MSVC overload resolution bug (#7991) by @miscco in #8062
  • [Backport 3.3] Avoid deprecation warning with is_always_equal (#7674) by @miscco in #8078
  • [Backport branch/3.3.x] Fix use of EXPAND in token concatenation by @github-actions[bot] in #8077

Full Changelog: v3.3.1...v3.3.2

CCCL Python Libraries v0.6.0

09 Apr 13:27
Immutable release. Only release title and notes can be modified.
318bef7

Choose a tag to compare

These are the release notes for the cuda-cccl Python package version 0.6.0, dated April 9th, 2026. The previous release was v0.5.1.

cuda-cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.

Installation

Please refer to the install instructions here

API breaking changes

  • cuda.coop refactored to use maker factory functions (#7713)

Features

  • ShuffleIterator — New iterator type added to cuda.compute (#7721)
  • max_segment_size guarantee — Exposed in the public API (#8284)
  • LTO-IR support — Can now directly pass LTO-IR for custom operators (#7625)
  • Numba-optional install — Added a path to install cuda.compute without Numba as a dependency (#7633)

Performance

  • Faster TransformIterator construction (#7660)

Bug Fixes

  • Fix faulty pointer arithmetic in CUB dispatch (#7940)
  • Fix merge sort returning negative temp storage bytes (#7916)
  • Fix histogram build object caching when using privatized smem strategy (#7657)

v3.3.1

14 Apr 15:19
Immutable release. Only release title and notes can be modified.
c262ef4

Choose a tag to compare

What's Changed

🔄 Other Changes

  • Bump 3.3.0 to 3.3.1. by @wmaxey in #7742
  • [Backport 3.3] #7787 and #7738 by @miscco in #7800
  • [Backport 3.3]: Avoid use of class static variable in device function (#7776) by @miscco in #7825
  • [Backport branch/3.3.x] Forward policy hub from dispatch_streaming_arg_reduce_t to reduce::dispatch by @github-actions[bot] in #7814
  • [Backport branch/3.3.x] cub: change {Lower,Upper}Bound to accept iterator and number of elements. by @github-actions[bot] in #7816
  • [Backport branch/3.3.x] Fix version guard for cudaDevAttrHostNumaMemoryPoolsSupported by @github-actions[bot] in #7842
  • [Backport 3.3] Buffer changes by @miscco in #7841
  • [Backport branch/3.3.x] [libcu++] Change default pool getters to return memory_pool_ref& by @github-actions[bot] in #7858
  • [Backport branch/3.3.x] Avoid compile issue with __iset by @github-actions[bot] in #7879
  • [Backport to 3.3] Require CUDA 12.9 for host numa implementation of pinned memory pool (#7856) by @pciolkosz in #7872
  • [Backport 3.3] Avoid GCC bug with dependent type template (#7857) by @miscco in #7860

Full Changelog: v3.3.0...v3.3.1

v3.3.0

27 Feb 22:39
Immutable release. Only release title and notes can be modified.
09094af

Choose a tag to compare

Full Changelog: v3.3.0...v3.3.0

What's Changed

📚 Libcudacxx

  • [libcudacxx] Fix a typo in the documentation by @caugonnet in #7330
  • Add a test for <nv/target> to validate old dialect support. by @wmaxey in #7241

🔄 Other Changes

Read more

v3.2.1

12 Feb 01:03
Immutable release. Only release title and notes can be modified.
d84981c

Choose a tag to compare

Full Changelog: v3.2.1...v3.2.1

What's Changed

🔄 Other Changes

  • Bump branch/3.2.x to 3.2.1. by @wmaxey in #7329
  • [Backport branch/3.2.x] Add accessor methods to shared_resource by @github-actions[bot] in #7322
  • [Backport branch/3.2.x] Fix clang warning about missing braces again by @github-actions[bot] in #7324
  • [Backport branch/3.2.x] part deux: make the abi of __basic_any compatible between c++17 and c++20 by @github-actions[bot] in #7421
  • [backport 3.2] Fix missing c2h symbol when compiling with clang-cuda (#7454) by @davebayer in #7600
  • [Backport branch/3.2.x] Remove recursion from __internal_is_address_from by @github-actions[bot] in #7573
  • [Backport branch/3.2.x] Fix ranges_overlap for nvc++ -cuda by @github-actions[bot] in #7598
  • [Backport branch/3.2.x] Fix cuda::device::current_arch_id by @github-actions[bot] in #7601
  • [Backport branch/3.2.x] Check for _GLIBCXX_USE_CXX11_ABI only when compiling with libstdc++ by @github-actions[bot] in #7630
  • [Backport branch/3.2.x] Fix cuda::barrier missing accounting of results in try_wait by @github-actions[bot] in #7634

Full Changelog: v3.2.0...v3.2.1

CCCL Python Libraries (v0.5.1)

07 Feb 10:24
Immutable release. Only release title and notes can be modified.
37dc08c

Choose a tag to compare

These are the release notes for the cuda-cccl Python package version 0.5.1, dated February 6th, 2026. The previous release was v0.5.0.

cuda-cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.

Installation

Please refer to the install instructions here

Features

Improvements

  • Restrict to numba-cuda less than 0.27 (#7529)

Bug Fixes

  • Fix caching of functions referencing numpy ufuncs (#7535)

CCCL Python Libraries (v0.5.0)

05 Feb 14:38
Immutable release. Only release title and notes can be modified.
1836859

Choose a tag to compare

These are the release notes for the cuda-cccl Python package version 0.5.0, dated February 5th, 2026. The previous release was v0.4.5.

cuda-cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.

Installation

Please refer to the install instructions here

⚠️ Breaking change

Object-based API requires passing operator to algorithm __call__ method

This API change affects only users of the object-based API (expert mode).

Previously, constructing an algorithm object required passing the operator as an argument, but invoking it did not:

# step 1: create algorithm object
transformer = cuda.compute.make_unary_transform(d_input, d_output, some_unary_op)

# step 2: invoke algorithm
transformer(d_in1, d_out1, num_items1)  # NOTE: not passing some_unary_op here

The new behaviour requires passing it in both places:

# step 1: create algorithm object
transformer = cuda.compute.make_unary_transform(d_input, d_output, some_unary_op)

# step 2: invoke algorithm
transformer(d_in1, d_out1, some_unary_op, num_items1)  # NOTE: need to pass some_unary_op here

This change is introduced because in many situations (such as in a loop), the operator itself and the globals/closures it references can change between construction and invocation (or between invocations).

Features

Improvements

  • Avoid unnecessary recompilation of stateful operators (#7500)
  • Improved cache lookup performance (#7501)

Bug Fixes

  • Fix handling of boolean types in cuda.compute (#7389)