[CUB] Refactor DeviceSelect::Unique to always take an environment#9386
[CUB] Refactor DeviceSelect::Unique to always take an environment#9386miscco wants to merge 4 commits into
DeviceSelect::Unique to always take an environment#9386Conversation
We want to use env based API to ensure that we take advantage of user provided tunings
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (5)
OverviewThis PR refactors cub::DeviceSelect::Unique so its API consistently accepts an execution environment (EnvT) instead of a raw CUDA stream. The change enables DeviceSelect and related PSTL call sites to propagate user-provided tunings via the environment/policy. It also updates DeviceFind overloads, the PSTL CUDA backends for find_if/unique, and introduces a CUDA backend specialization for unique_copy. The description notes this should be merged after PR Core API & Dispatch Changes
PSTL & libcudacxx Backends
Tests
Public API surface changes
Notes for reviewers
suggestion: WalkthroughThis PR changes env dispatch helpers to accept environments by const-ref and add caller-owned temp-storage overloads; converts DeviceFind/DeviceSelect APIs to env-based signatures; adapts PSTL CUDA backends (find_if, unique) and adds a new CUDA unique_copy backend; and extends tests for user-provided temporary storage across environment variants. ChangesEnvironment-based API migration
Possibly related PRs
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
cub/test/catch2_test_device_select_unique.cu (1)
210-212: ⚡ Quick winsuggestion: Remove unused variables
boundaryandnum_selected_std. The test validates output by comparing vectors (lines 241, 265) but never asserts the count matches the std::unique result. Either remove these variables or addREQUIRE(num_selected_out[0] == num_selected_std)assertions like other tests in this file do (see lines 328, 335).libcudacxx/include/cuda/std/__pstl/cuda/unique.h (2)
93-102: 💤 Low valuesuggestion: Error message on line 95 still references
cub::DispatchSelectIf::Dispatchbut the code now callscub::DeviceSelect::Unique. Update for clarity:_CCCL_TRY_CUDA_API( CUB_NS_QUALIFIER::DeviceSelect::Unique, - "__pstl_cuda_unique: kernel launch of cub::DispatchSelectIf::Dispatch failed", + "__pstl_cuda_unique: kernel launch of cub::DeviceSelect::Unique failed", __storage.__get_temp_storage(),
90-91: 💤 Low valuesuggestion: Comment says "as well as a copy of the input sequence as Unique is not inplace" but the code uses the in-place variant of
DeviceSelect::Unique. The temporary storage is only for__num_selectedand CUB internals. Consider updating the comment:- { // Create temporary storage for the return value as well as a copy of the input sequence as Unique is not inplace + { // Create temporary storage for the return value (num_selected) and CUB internal scratchlibcudacxx/include/cuda/std/__pstl/cuda/unique_copy.h (1)
83-93: 💤 Low valuesuggestion: Error messages reference
__pstl_cuda_uniquebut this isunique_copy. Update for debugging clarity:
- Line 85:
"__pstl_cuda_unique_copy: determination of device storage..."- Line 100:
"__pstl_cuda_unique_copy: kernel launch..."- Line 112:
"__pstl_cuda_unique_copy: copy of num_selected..."Also applies to: 98-108, 110-117
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2a9c4a23-bd8a-4562-b58e-775fd280c548
📒 Files selected for processing (10)
cub/cub/detail/env_dispatch.cuhcub/cub/device/device_find.cuhcub/cub/device/device_select.cuhcub/test/catch2_test_device_find_env.cucub/test/catch2_test_device_select_unique.culibcudacxx/include/cuda/std/__pstl/cuda/find_if.hlibcudacxx/include/cuda/std/__pstl/cuda/unique.hlibcudacxx/include/cuda/std/__pstl/cuda/unique_copy.hlibcudacxx/include/cuda/std/__pstl/dispatch.hlibcudacxx/include/cuda/std/__pstl/unique_copy.h
dcfa290 to
1a3118a
Compare
DeviceSelect::Unique to always take an environmentDeviceSelect::Unique to always take an environment
1a3118a to
ad308bb
Compare
🥳 CI Workflow Results🟩 Finished in 1h 27m: Pass: 100%/343 | Total: 3d 08h | Max: 1h 22m | Hits: 96%/494169See results here. |
| d_temp_storage, | ||
| temp_storage_bytes, | ||
| env, | ||
| [&]([[maybe_unused]] auto tuning_env, void* d_temp_storage, size_t& temp_storage_bytes, cudaStream_t stream) { |
There was a problem hiding this comment.
again, consider variable shadowing otherwise LGTM
This refactors
cub::DeviceSelect::Uniqueto always take an environment instead of just a stream.With that we can take advantage of tunings even for those overloads.
We also refactor the PSTL algorithms that use this API to always pass the policy so a user can tune them
DeviceFindIf#9318