[ROCm] Fix foreach profiler check and cholesky_solve nondeterminism #170964
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes two ROCm test failures by applying tolerances and skips that match existing patterns for similar issues.
Changes
1. Skip profiler check in foreach tests on ROCm
The foreach tests verify that
multi_tensor_apply_kernelruns by checking profiler output. On ROCm, rocTracer sometimes fails to detect the kernel even when it executes correctly—the same issue NVIDIA has with CUPTI on CUDA 12.6/12.8.We already skip this profiler check for those CUDA versions (#148681), so this extends that skip to ROCm (#97167).
2. Add nondeterminism tolerance for cholesky_solve gradcheck
hipSOLVER's backward pass for complex types has slight numerical variation between runs, which causes gradcheck's reentrant test to fail. Adding
GRADCHECK_NONDET_TOL(1e-12) fixes this—the same approach used foraddmm,mm, and other ops with similar behavior.Fixes #164193
Tests now passing
test_foreach_copy_with_multi_dtypes(5 dtype combinations)test_fn_gradgrad_cholesky_solve_cuda_complex128