Skip to content

[Issue] Linux gfx1151: Run rocm-sdk sanity tests fails — rocminfo crashes with SIGSEGV #5779

@TriveniTadapaneni

Description

@TriveniTadapaneni

Context

  • Workflow: Release portable Linux PyTorch Wheels
  • Workflow file: .github/workflows/release_portable_linux_pytorch_wheels.yml
  • Failing run: ↗ View run
  • Platform: Linux
  • Impacted Arch: gfx1151
  • PyTorch Versions: release/2.9, release/2.10, release/2.12, nightly
  • Python Versions: all (3.10, 3.11, 3.12, 3.13, 3.14)
  • ROCm nightly: 7.14.0a20260611

Summary

All 20 test jobs for gfx1151 fail in the Run rocm-sdk sanity tests step. The testConsoleScripts test in rocm_sdk.tests.core_test.ROCmCoreTest calls the rocminfo console-script wrapper, which crashes with SIGSEGV (exit signal 11). The failure is consistent across multiple distinct runners (linux-strix-halo-gpu-rocm-3, -6, -7, -9, CS-RORDMZ-DT239, CS-RORDMZ-DT241), ruling out a single-runner issue.

Error

ERROR: testConsoleScripts (rocm_sdk.tests.core_test.ROCmCoreTest) [Check console-script rocminfo]
  File ".../.venv/lib/python3.10/site-packages/rocm_sdk/tests/core_test.py", line 141, in testConsoleScripts
    output_text = subprocess.check_output(...)
subprocess.CalledProcessError: Command '[PosixPath('/.../rocminfo')]' died with <Signals.SIGSEGV: 11>.
FAILED (errors=1)
##[error]Process completed with exit code 1.
##[error]Executing the custom container implementation failed. Please contact your self hosted runner administrator.

Counts: 1 ERROR / 19 run — across 20 jobs

Root Cause

The rocminfo binary from the rocm-core==7.14.0a20260611 package segfaults on gfx1151 (Strix Halo) hardware. The regression is in the nightly ROCm SDK package for this build date and needs investigation in the rocminfo component or its runtime dependencies (HSA-Runtime, hsa-rocr).

Full Logs

py 3.10, torch release/2.9
py 3.11, torch release/2.9
py 3.10, torch nightly
py 3.11, torch nightly
py 3.12, torch release/2.9
py 3.10, torch release/2.10
py 3.14, torch release/2.9
py 3.12, torch release/2.10
py 3.14, torch release/2.10
py 3.10, torch release/2.12
py 3.12, torch nightly
py 3.13, torch release/2.10
py 3.14, torch nightly
py 3.13, torch nightly
py 3.11, torch release/2.12
py 3.13, torch release/2.12
py 3.13, torch release/2.9
py 3.14, torch release/2.12
py 3.11, torch release/2.10
py 3.12, torch release/2.12

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions