[ROCm] Fix foreach profiler check and cholesky_solve nondeterminism #170964

adolago · 2025-12-20T23:25:39Z

This PR fixes two ROCm test failures by applying tolerances and skips that match existing patterns for similar issues.

Changes

1. Skip profiler check in foreach tests on ROCm

The foreach tests verify that multi_tensor_apply_kernel runs by checking profiler output. On ROCm, rocTracer sometimes fails to detect the kernel even when it executes correctly—the same issue NVIDIA has with CUPTI on CUDA 12.6/12.8.

We already skip this profiler check for those CUDA versions (#148681), so this extends that skip to ROCm (#97167).

2. Add nondeterminism tolerance for cholesky_solve gradcheck

hipSOLVER's backward pass for complex types has slight numerical variation between runs, which causes gradcheck's reentrant test to fail. Adding GRADCHECK_NONDET_TOL (1e-12) fixes this—the same approach used for addmm, mm, and other ops with similar behavior.

Fixes #164193

Tests now passing

test_foreach_copy_with_multi_dtypes (5 dtype combinations)
test_fn_gradgrad_cholesky_solve_cuda_complex128

pytorch-bot · 2025-12-20T23:25:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/170964

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 742a068 with merge base 58fac80 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-12-20T23:25:45Z

The committers listed above are authorized under a signed CLA.

✅ login: adolago / name: Artur (742a068, cc001a1)

ROCm's rocTracer has kernel name detection issues similar to NVIDIA's CUPTI on CUDA 12.6/12.8. The profiler may fail to detect the multi_tensor_apply_kernel even when it runs correctly. This follows the same pattern as the existing CUDA 12.6/12.8 skip, allowing the fastpath to run without attempting to verify via profiler. References: - CUDA profiler issue: pytorch#148681 - ROCm profiler issues: pytorch#97167 Enables 5+ foreach tests on ROCm that were failing due to profiler detection, not actual fastpath problems.

ROCm's hipSOLVER exhibits nondeterministic behavior in backward passes for complex types, causing gradcheck to fail with 'backward is not reentrant' errors. This adds GRADCHECK_NONDET_TOL (1e-12) to allow small numerical differences in gradient reentrance checks, following the pattern used by other ops with similar nondeterminism (addmm, mm, etc). Fixes: pytorch#164193

adolago requested a review from mruberry as a code owner December 20, 2025 23:25

pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: foreach_frontend release notes category labels Dec 20, 2025

pytorchbot added the open source label Dec 20, 2025

adolago added 2 commits December 21, 2025 00:41

adolago force-pushed the fix-rocm-foreach-cholesky branch from 5c51272 to 742a068 Compare December 20, 2025 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Fix foreach profiler check and cholesky_solve nondeterminism #170964

[ROCm] Fix foreach profiler check and cholesky_solve nondeterminism #170964

Uh oh!

adolago commented Dec 20, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 20, 2025 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Dec 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ROCm] Fix foreach profiler check and cholesky_solve nondeterminism #170964

Are you sure you want to change the base?

[ROCm] Fix foreach profiler check and cholesky_solve nondeterminism #170964

Uh oh!

Conversation

adolago commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

1. Skip profiler check in foreach tests on ROCm

2. Add nondeterminism tolerance for cholesky_solve gradcheck

Tests now passing

Uh oh!

pytorch-bot bot commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/170964

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adolago commented Dec 20, 2025 •

edited

Loading

pytorch-bot bot commented Dec 20, 2025 •

edited

Loading

linux-foundation-easycla bot commented Dec 20, 2025 •

edited

Loading