Releases · llnl/RAJA

This release contains mostly improvements to code robustness and testing, including evolving internal code implementations to use C++17.

Please download the RAJA-v2025.12.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependence on git submodules.

Notable changes include:

New features / API changes: NONE
Build changes/improvements:
- Update Camp submodule to v2025.12.0 release.
- Improve CMake support for configuring with Caliper and fix issue reported by a user.
Bug fixes/improvements:
- Fix compilation failure when a downstream library or applications is built without OpenMP enabled when RAJA was built with OpenMP enabled. There may still be some corner cases that violate C++ ODR that we have not resolved and are not being exposed by users.
- Various internal code cleanups, simplifications, and improvements using C++17 features, with an eye toward C++20.

This release contains some bug fixes and build changes.

Please download the RAJA-v2025.09.1.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependence on git submodules.

This release contains a bugfix and a submodule update:
- A build conflict with Caliper and ROCTX/NVTX has been fixed
- BlueOS cmake scripts have been removed
- toss3/clangcuda_6_0_0_nvcc_8_0.cmake has been removed
- C++17 features have been used to simplify RAJA internals
- Missing use of const has been corrected in the CompareFirst struct methods

This release contains a variety of new features, bug fixes, and build changes.

Please download the RAJA-v2025.09.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependence on git submodules.

Notable changes include:

New features / API changes:
- The RAJA::expt::Reduce interface now works with RAJA::kernel. There is an example of this usage in the RAJA/examples/kernel-reduction.cpp file.
- Builtin atomics for unsigned fixed width integer types that support Windows builds have been added.
- Added a global function to turn on/off Caliper profiling, when enabled.
- Added some quality-of-life improvements to RAJA MultiView construct, such as the ability to construct and set a MultiView with const data, the ability to construct an empty MultiView, and new accessors to data and layout.
- Added size methods to the RAJA IndexLayout construct. These can be used to check the size of the layout or to determine whether the size is non-zero. All the size methods directly call the base Layout implementation. Also added host-device decorators to the methods make_tuple_index and make_index_layout.
- Added the grid_constant decorator to global function parameters for CUDA and make most global function parameters const for CUDA and HIP. This allows nvcc to better optimize parameter usage in some cases. ROCm compilers do not support this decorator and do not appear to optimize use of this parameter. For more details, please see https://docs.nvidia.com/cuda/cuda-c-programming-guide/#grid-constant
- Added an experimental feature to support printing of arguments to GPU API functions on error. This capability will continue to improve and mature. Hopefully, it will help RAJA users understand what went wrong when a GPU kernel fails. This capability is currently supported for RAJA CUDA and HIP back-ends.
Build changes/improvements:
- RAJA now requires C++17 as the minimum C++ standard.
- Updated BLT submodule to v0.7.1 release.
- Updated Camp submodule to v2025.09.2 release.
- Updated to NVTX3 profiling library to support CUDA 12.9 and above (also supports back to CUDA 10)
- [BREAKING CHANGE] Renamed CMake option RAJA_ENABLE_NV_TOOLS_EXT to RAJA_ENABLE_NVTX
- Updated desul submodule to 6114dd25b54782678c555c0c1d2197f13cc8d2a0 commit.
- The CUB and rocPRIM submodules in RAJA have been removed. Moving forward, the versions of these that are deployed with the CUDA and ROCm compiler stacks will be used.
- Fixed the RAJA minimum architecture check, which did not work on Blackwell cards. Now, if CMAKE_CUDA_ARCHITECTURES is not set, the compiler will select a reasonable architecture default, which is guaranteed to be at least sm_35 for nvcc since RAJA now requires CUDA 11 as the minimum CUDA version.
Bug fixes/improvements:
- Race conditions due to inconsistent usage of Camp resources (CUDA/HIP streams) in RAJA tests have been fixed.
- Resolved a bunch of shadow variable warnings reported by some users.

This release contains bugfixes.

Please download the RAJA-v2025.03.2.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Build changes/improvements:
- Removed unused variables related to kernel naming
- Added missing host device annotations on missing param reducers
- CMAKE build option to allow for use of OpenMP 5.1 atomics for min/max operations. The option is on by default.
- Full backwards compatibility of kernel naming and lambda capture style reducers.
- Removed compiler warnings related to NVCC and loop unrolling

This release contains one new feature and a bug fix.

Please download the RAJA-v2025.03.1.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

New features / API changes:
- Added initial support for Caliper to gather profiling data for kernels. See user docs and examples for configuration instructions and examples of usage.
Build changes/improvements:
- None
Bug fixes/improvements:
- Fix header file include issue when vectorization enabled in a HIP build.

This release contains new features, bug fixes, and updates to submodule dependencies.

Please download the RAJA-v2025.03.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

New features / API changes:
- Added improved support for perfectly nested loops in RAJA::launch.
- Added helper methods to simplify the creation of RAJA View objects with permutations of stride ordering. Examples and user docs have also been added.
- Added GPU policies for CUDA and HIP that do not check loop bounds when they do not need to be checked in a kernel. This can help improve performance by up to 5%. The new policies are documented in the RAJA user guide and include direct_unchecked in their names.
- Refactored the new (experimental) RAJA reduction interface to have consistent min/max/loc operator semantics and added type safety to reduce erroneous usage. Changes are described in the RAJA User Guide.
- Added support for new RAJA reduction interface to RAJA::dynamic_forall and pulled dynamic_forall out of RAJA expt namespace.
- Added RAJA_HIP_WAVESIZE CMake option to set the wave size for HIP builds. It defaults to 64 but can be set to 32, for example, to build RAJA to run on Radeon gaming cards.
Build changes/improvements:
- Update BLT to v0.7.0 release.
- Update camp submodule to v2025.03.0 release.
- Update desul submodule to 6114dd25b54782678c555c0c1d2197f13cc8d2a0 commit.
- Added clang-format CI check (clang 14) that must pass before a PR can be merged -- noted here so external contributors are aware.
Bug fixes/improvements:
- Resolved undefined behavior related to constructing uniform_int_distribution with min > max. This was causing some Windows tests to fail.
- Corrected call to wrong global function when using a fixed CUDA policy and reductions in RAJA::launch kernel -- potential performance issue.
- Fixed memory leak in RAJA::launch OpenMP back-end.
- Added missing host-device decorations to some math utility functions.
- Fixed MSVC compilation failures with 64-bit intrinsics in x86 Windows builds.
- Fixed issue so that a kernel will no longer be launched when there is no work for it to do; i.e., no active iteration space entries.
- Removed invalid C++ usage in implementation of RAJA::kernel initLocalMem statement, which was causing large warning messages during compilation.

This release contains new features, improvements, and bugfixes.

Please download the RAJA-v2024.07.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

New features / API changes:
- Added support for a "multi-reduction" operation which allows users to perform a run time-defined number of reduction operations in a kernel. Please see the RAJA User Guide for details and examples.
- Added first couple of sections for a "RAJA Cookbook" in the RAJA User Guide. The goal is to provide users with more detailed guidance about using RAJA features, choosing execution policies, etc. Additional content will be provided in future releases.
- Added atomicLoad and atomicStore routines for correctness in some use cases.
- Added OpenMP 5.1 implementations for atomicMin and atomicMax.
- Add SYCL reduction support in RAJA::launch
Build changes/improvements:
- Update camp submodule to v2024.07.0 release. There will be a version constraint for this release in RAJA Spack package when that is pushed upstream to Spack.
- Minimum required CMake version bumped to 3.23.
Bug fixes/improvements:
- Fix CMake issue for case when RAJA is used as a submodule dependency.
- Various fixes and improvements to builtin atomic support.
- Fixes and improvements to other atomic operations:
  - Modified HIP and CUDA generic atomic compare and swap algorithms to use atomic loads instead of relying on volatile.
  - Re-implemented atomic loads in terms of builtin atomics for CUDA and HIP so that the generic compare and swap functions can use it.
  - Removes volatile qualifier in atomic function signatures.
  - Use cuda::atomic_ref in newer versions of CUDA to back atomicLoad/atomicStore.
  - Use atomicAdd as a fallback for atomicSub in CUDA.
  - Removed checks where CUDA_ARCH is less than 350 since RAJA requires that as the minimum supported architecture (CMake check).
- Fixed issues with naming RAJA forall::kernels when using CUDA.
- Fixes in SYCL back-end for RAJA::launch.
- Fixed some issues in examples.
- Bugfixes and cleanup in parts of the SYCL back-end needed to support a bunch of new SYCL kernels that will appear in RAJA Performance Suite release.
- Fix type naming issue that was exposed with a new version of the Intel oneAPI compiler.
- Fix issue in User Guide documentation for configuring a project using RAJA CMake configuration.

This release contains a bugfix and new execution policies that improve performance for GPU kernels with reductions.

Please download the RAJA-v2024.02.2.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

New features / API changes:
- RAJA::loop_exec and associated policies (loop_reduce, etc.) have been removed. These were deprecated in an earlier release and type aliased to RAJA::seq_exec, etc. which have the same behavior as RAJA::loop_exec, etc. in the past. When you update to this version of RAJA, please change use of loop_exec too seq_exec in your code.
- New GPU execution policies for CUDA and HIP added which provide improved performance for GPU kernels with reductions. Please see the RAJA User Guide for more information. Short summary:
  - Option added to change max grid size in policies that use the occupancy calculator.
  - Policies added to run with max occupancy, a fraction of of the max occupancy, and to run with a "concretizer" which allows a user to determine how to run based on what the occupancy calculator determines about a kernel.
  - Additional options to tune kernels containing reductions, such as
    - an option to initialize data on host for reductions that use atomic operations
    - an option to avoid device scope memory fences
- Change ordering of SYCL thread index ordering in RAJA::launch to follow the SYCL "row-major" convention. Please see RAJA User Guide for more information.
Build changes/improvements:
- NONE.
Bug fixes/improvements:
- Fixed issue in bump-style allocator used internally in RAJA::launch.

This release contains submodule updates and minor RAJA improvements.

Please download the RAJA-v2024.02.1.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

New features / API changes:
- NONE.
Build changes/improvements:
- Update BLT submodule to v0.6.2 release.
- Update camp submodule to v2024.02.1 release.
Bug fixes/improvements:
- Various changes to quiet compiler warnings in SYCL builds related to deprecated usage.

This release contains several RAJA improvements and submodule updates.

Please download the RAJA-v2024.02.0.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.

Notable changes include:

New features / API changes:
- BREAKING CHANGE (ALMOST): The loop_exec and associated policies such as loop_atomic, loop_reduce, etc. were deprecated in the v2023.06.0 release (please see the release notes for that version for details). Users should replace these with seq_exec and associated policies for sequential CPU execution. The code behavior will be identical to what you observed with loop_exec, etc. However, due to a request from some users with special circumstances, the loop_* policies still exist in this release as type aliases to their seq_* analogues. The loop_* policies will be removed in a future release.
- BREAKING CHANGE: RAJA TBB back-end support has been removed. It was not feature complete and the TBB API has changed so that the code no longer compiles with newer Intel compilers. Since we know of no project that depends on it, we have removed it.
- An IndexLayout concept was added, which allows for accessing elements of a RAJA View via a collection of indicies and use a different indexing strategy along different dimensions of a multi-dimensional View. Please the RAJA User Guide for more information.
- Add support for SYCL reductions using the new RAJA reduction API.
- Add support for new reduction API for all back-ends in RAJA::launch.
Build changes/improvements:
- Update BLT submodule to v0.6.1 and incorporate its new macros for managing TPL targets in CMake.
- Update camp submodule to v2024.02.0, which contains changes to support ROCm 6.x compilers.
- Update desul submodule to afbd448.
- Replace internal use of HIP and CUDA platform macros to their newer versions to support latest compilers.
Bug fixes/improvements:
- Change internal memory allocation for HIP to use coarse-grained pinned memory, which improves performance because it can be cached on a device.
- Fix compilation error resulting from incorrect namespacing of OpenMP execution policy.
- Several fixes to internal implementation of Reducers and Operators.

Releases: llnl/RAJA

v2025.12.0

Uh oh!

v2025.09.1

Uh oh!

v2025.09.0

Uh oh!

v2025.03.2

Uh oh!

v2025.03.1

Uh oh!

v2025.03.0

Uh oh!

v2024.07.0

Uh oh!

v2024.02.2

Uh oh!

v2024.02.1

Uh oh!

v2024.02.0

Uh oh!