Skip to content

Conversation

@ysiraichi
Copy link
Collaborator

This PR improves error handling and error messages of the ParseDeviceString function, which is used for parsing a given device string into a BackendDevice. Besides that, it also applies the improvement to GetDefaultDevice function, which makes use of the parsing function.

Key Changes:

  • ParseDeviceString is deprecated in favor of SafeParseDeviceString
  • GetDefaultDevice is deprecated in favor of SafeGetDefaultDevice
  • Inner lambda of GetDefaultDevice for initializing the function-local static variable was refactored into a static function
  • New test_device.cpp file was created for testing the new SafeParseDeviceString error messages

Example

I'm using wait_device_ops function, since it runs ParseDeviceString (which calls the safe version), taking the device string as parameter.

def run(dev):
    print(f"==== Running: {dev}")
    try:
        xm.wait_device_ops([dev])
    except:
        traceback.print_exc()


run("")
run("aaaaa")
run("aaaaa:1")
run("xla:")

Example 1: Wrong Format

run("xla:bbbbbbbbbb")

Before:

Traceback (most recent call last):
  File "ext/examples/device.py", line 8, in run
    xm.wait_device_ops([dev])
  File "pytorch/xla/torch_xla/core/xla_model.py", line 1091, in wait_device_ops
    torch_xla._XLAC._xla_wait_device_ops(devices=devices)
RuntimeError: Check failed: device_spec_parts.size() == 2 (2 vs. 1)Invalid device specification: aaaaa (at torch_xla/csrc/device.cpp:67)

Exception raised from operator& at torch_xla/csrc/runtime/tf_logging.cpp:26 (most recent call first):

After:

Traceback (most recent call last):
  File "ext/examples/device.py", line 8, in run
    xm.wait_device_ops([dev])
  File "pytorch/xla/torch_xla/core/xla_model.py", line 1091, in wait_device_ops
    torch_xla._XLAC._xla_wait_device_ops(devices=devices)
RuntimeError: expected the device string `aaaaa` to be in the format: `<type>:<index>`.

Status Propagation Trace:
    From: SafeParseDeviceString at torch_xla/csrc/device.cpp:78 (error: expected the device string `aaaaa` to be in the format: `<type>:<index>`.)
    From: ParseDeviceString at torch_xla/csrc/device.cpp:68

Exception raised from ThrowStatusError at torch_xla/csrc/status.cpp:128 (most recent call first):

Example 2: Integer Parsing Error

run("xla:bbbbbbbbbb")

Before:

Traceback (most recent call last):
  File "ext/examples/device.py", line 8, in run
    xm.wait_device_ops([dev])
  File "pytorch/xla/torch_xla/core/xla_model.py", line 1091, in wait_device_ops
    torch_xla._XLAC._xla_wait_device_ops(devices=devices)
ValueError: stoi

After:

Traceback (most recent call last):
  File "ext/examples/device.py", line 8, in run
    xm.wait_device_ops([dev])
  File "pytorch/xla/torch_xla/core/xla_model.py", line 1091, in wait_device_ops
    torch_xla._XLAC._xla_wait_device_ops(devices=devices)
RuntimeError: error while parsing the device spec `xla:bbbbbbbbbb`: stoi

Status Propagation Trace:
    From: SafeParseDeviceString at torch_xla/csrc/device.cpp:90 (error: error while parsing the device spec `xla:bbbbbbbbbb`: stoi)
    From: ParseDeviceString at torch_xla/csrc/device.cpp:68

Exception raised from ThrowStatusError at torch_xla/csrc/status.cpp:128 (most recent call first):

@ysiraichi ysiraichi merged commit 37eee05 into master Nov 14, 2025
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants