Skip to content

Conversation

@adolago
Copy link

@adolago adolago commented Dec 20, 2025

This PR fixes two ROCm test failures by applying tolerances and skips that match existing patterns for similar issues.

Changes

1. Skip profiler check in foreach tests on ROCm

The foreach tests verify that multi_tensor_apply_kernel runs by checking profiler output. On ROCm, rocTracer sometimes fails to detect the kernel even when it executes correctly—the same issue NVIDIA has with CUPTI on CUDA 12.6/12.8.

We already skip this profiler check for those CUDA versions (#148681), so this extends that skip to ROCm (#97167).

2. Add nondeterminism tolerance for cholesky_solve gradcheck

hipSOLVER's backward pass for complex types has slight numerical variation between runs, which causes gradcheck's reentrant test to fail. Adding GRADCHECK_NONDET_TOL (1e-12) fixes this—the same approach used for addmm, mm, and other ops with similar behavior.

Fixes #164193

Tests now passing

  • test_foreach_copy_with_multi_dtypes (5 dtype combinations)
  • test_fn_gradgrad_cholesky_solve_cuda_complex128

@adolago adolago requested a review from mruberry as a code owner December 20, 2025 23:25
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/170964

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 742a068 with merge base 58fac80 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 20, 2025

CLA Signed
The committers listed above are authorized under a signed CLA.

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: foreach_frontend release notes category labels Dec 20, 2025
ROCm's rocTracer has kernel name detection issues similar to NVIDIA's
CUPTI on CUDA 12.6/12.8. The profiler may fail to detect the
multi_tensor_apply_kernel even when it runs correctly.

This follows the same pattern as the existing CUDA 12.6/12.8 skip,
allowing the fastpath to run without attempting to verify via profiler.

References:
- CUDA profiler issue: pytorch#148681
- ROCm profiler issues: pytorch#97167

Enables 5+ foreach tests on ROCm that were failing due to profiler
detection, not actual fastpath problems.
ROCm's hipSOLVER exhibits nondeterministic behavior in backward passes
for complex types, causing gradcheck to fail with 'backward is not
reentrant' errors.

This adds GRADCHECK_NONDET_TOL (1e-12) to allow small numerical
differences in gradient reentrance checks, following the pattern used
by other ops with similar nondeterminism (addmm, mm, etc).

Fixes: pytorch#164193
@adolago adolago force-pushed the fix-rocm-foreach-cholesky branch from 5c51272 to 742a068 Compare December 20, 2025 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: rocm AMD GPU support for Pytorch open source release notes: foreach_frontend release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DISABLED test_fn_gradgrad_cholesky_solve_cuda_complex128 (__main__.TestBwdGradientsCUDA)

2 participants