Skip to content

Conversation

@stliibm
Copy link
Contributor

@stliibm stliibm commented Sep 18, 2025

On big-endian machines, there are plenty of test fails due to missing/wrong byteswaps. I've created a couple of patches to fix those fails, in numpy_helper.py::from_array() and helper.py::make_tensor() and in various testcases.

Some of the fixed Testcases test_make_tensor_raw are also mentioned in the issue from 2024-07-14 created by tehbone:
s390x test failures - test_make_tensor_raw #6181

Especially the last commit "Adjust remaining tests which are using tobytes()." is not really required as all corresponding tests did not fail before. But it makes the tests more consistent as before.

In the end I'm down to 7 remaining fails:

7 failed, 5314 passed, 3660 skipped, 31 warnings
FAILED onnx/test/helper_test.py::TestHelperTensorFunctions::test_make_bfloat16_tensor_raw
FAILED onnx/test/test_backend_reference.py::OnnxBackendNodeModelTest::test_cast_BFLOAT16_to_FLOAT_cpu
FAILED onnx/test/test_backend_reference.py::OnnxBackendNodeModelTest::test_cast_FLOAT_to_BFLOAT16_cpu
FAILED onnx/test/test_backend_reference.py::OnnxBackendNodeModelTest::test_castlike_BFLOAT16_to_FLOAT_cpu
FAILED onnx/test/test_backend_reference.py::OnnxBackendNodeModelTest::test_castlike_BFLOAT16_to_FLOAT_expanded_cpu
FAILED onnx/test/test_backend_reference.py::OnnxBackendNodeModelTest::test_castlike_FLOAT_to_BFLOAT16_cpu
FAILED onnx/test/test_backend_reference.py::OnnxBackendNodeModelTest::test_castlike_FLOAT_to_BFLOAT16_expanded_cpu

Those fails happen as the ml_dtypes-bfloat16 datatype does not byteswap. I've already filed an issue:
"bfloat16: byteswap does not swap bytes #308"
There is already a pull-request fixing this issue, but the assigned reviewer has not yet responded:
"Fix bfloat16 byteswap #311"
With a locally fixed ml_dtypes, the remaining tests are also passing.

How are those dependency issues solved in onnx?
Is the required version bumped as soon as fixed and released in ml_dtypes?
Or shall we implement a workaround with a different approach to swap bytes for bfloat16 datatype in onnx-sources?

On big-endian systems like s390x, the already existing byteswapping approach
does not work as expected.  If the array is accessed directly, it works fine:
print (array[0][0])
-29045=0x8E8B

array = array.view(array.dtype.newbyteorder("<"))

print (array[0][0])
-29810= 0x8B8E

But tobytes() returns the same bytes before and after the existing approach:
print(array.tobytes().hex())
8e8b...

Similar to to_array(), the new approach is using byteswap() and tobytes() is
really returning the swapped bytes.

Furthermore, I've removed _IS_LITTLE_ENDIAN and used sys.byteorder directly as
also done in to_array().

This fixes those tests on s390x:
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_int16 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_uint16 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_int32 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_int64 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_complex128 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_complex64 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_float FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_float16 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_from_dict_values_are_np_arrays_of_float FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_from_dict_values_are_np_arrays_of_int FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_from_dict_values_are_np_arrays_of_ints FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_00_FLOAT FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_03_UINT16 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_04_INT16 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_05_INT32 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_06_INT64 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_08_FLOAT16 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_09_DOUBLE FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_10_UINT32 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_11_UINT64 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_12_COMPLEX64 FAILED
onnx/test/numpy_helper_test.py::TestNumpyHelper::test_to_array_from_array_13_COMPLEX128 FAILED
onnx/test/test_external_data.py::TestLoadExternalData_0_protobuf::test_load_external_data FAILED
onnx/test/test_external_data.py::TestLoadExternalData_0_protobuf::test_load_external_data_for_model FAILED
onnx/test/test_external_data.py::TestLoadExternalData_0_protobuf::test_save_external_data FAILED
onnx/test/test_external_data.py::TestLoadExternalData_1_textproto::test_load_external_data FAILED
onnx/test/test_external_data.py::TestLoadExternalData_1_textproto::test_load_external_data_for_model FAILED
onnx/test/test_external_data.py::TestLoadExternalData_1_textproto::test_save_external_data FAILED
onnx/test/test_external_data.py::TestLoadExternalDataSingleFile_0_protobuf::test_load_external_single_file_data FAILED
onnx/test/test_external_data.py::TestLoadExternalDataSingleFile_0_protobuf::test_save_external_single_file_data FAILED
onnx/test/test_external_data.py::TestLoadExternalDataSingleFile_1_textproto::test_load_external_single_file_data FAILED
onnx/test/test_external_data.py::TestLoadExternalDataSingleFile_1_textproto::test_save_external_single_file_data FAILED
onnx/test/test_external_data.py::TestSaveAllTensorsAsExternalData_0_protobuf::test_convert_model_to_external_data_converts_attribute_values FAILED
onnx/test/test_external_data.py::TestSaveAllTensorsAsExternalData_0_protobuf::test_convert_model_to_external_data_from_one_file_with_location FAILED
onnx/test/test_external_data.py::TestSaveAllTensorsAsExternalData_0_protobuf::test_convert_model_to_external_data_without_size_threshold FAILED
onnx/test/test_external_data.py::TestSaveAllTensorsAsExternalData_0_protobuf::test_save_model_does_convert_and_saves_the_model FAILED
onnx/test/test_external_data.py::TestSaveAllTensorsAsExternalData_0_protobuf::test_save_model_without_loading_external_data FAILED
onnx/test/test_external_data.py::TestSaveAllTensorsAsExternalData_1_textproto::test_convert_model_to_external_data_converts_attribute_values FAILED
onnx/test/test_external_data.py::TestSaveAllTensorsAsExternalData_1_textproto::test_convert_model_to_external_data_from_one_file_with_location FAILED
onnx/test/test_external_data.py::TestSaveAllTensorsAsExternalData_1_textproto::test_convert_model_to_external_data_without_size_threshold FAILED
onnx/test/test_external_data.py::TestSaveAllTensorsAsExternalData_1_textproto::test_save_model_does_convert_and_saves_the_model FAILED
onnx/test/test_external_data.py::TestSaveAllTensorsAsExternalData_1_textproto::test_save_model_without_loading_external_data FAILED
onnx/test/inference_function_test.py::TestInferenceFunctionCall::test_reshape_inference FAILED
onnx/test/model_container_refeval_test.py::TestLargeOnnxReferenceEvaluator::test_large_multi_files FAILED
onnx/test/model_container_refeval_test.py::TestLargeOnnxReferenceEvaluator::test_large_one_weight_file FAILED
onnx/test/model_container_refeval_test.py::TestLargeOnnxReferenceEvaluator::test_large_onnx_no_large_initializer FAILED
onnx/test/reference_evaluator_model_test.py::TestReferenceEvaluatorModel::test_loop_fft FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_function_attribute_nested_nested_graph FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_if FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_if_function FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_00 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_01 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_02 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_03 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_04 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_05 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_06 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_07 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_08 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_09 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_10 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_11 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_12 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_13 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_14 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_15 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_16 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_op_reduce_17 FAILED
onnx/test/reference_evaluator_test.py::TestReferenceEvaluator::test_reference_evaluator_lr_clip FAILED
onnx/test/tools_test.py::TestToolsFunctions::test_replace_constant FAILED
onnx/test/tools_test.py::TestToolsFunctions::test_replace_constant_function FAILED
onnx/test/tools_test.py::TestToolsFunctions::test_replace_constant_graph FAILED
onnx/test/tools_test.py::TestToolsFunctions::test_replace_initializer FAILED
onnx/test/tools_test.py::TestToolsFunctions::test_replace_range FAILED
onnx/test/tools_test.py::TestToolsFunctions::test_replace_range_function FAILED
onnx/test/tools_test.py::TestToolsFunctions::test_replace_range_graph FAILED

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
On big-endian systems like s390x, those tests are failing as creating a
numpy-array via numpy_helper.to_array(tensor) assumes that raw_data within the
tensor is formatted as little-endian and then performs a byte swap on big-endian
systems.

This fixes those tests on s390x:
onnx/test/helper_test.py::test_make_tensor_raw[TensorProto.INT16] FAILED
onnx/test/helper_test.py::test_make_tensor_raw[TensorProto.INT32] FAILED
onnx/test/helper_test.py::test_make_tensor_raw[TensorProto.INT64] FAILED
onnx/test/helper_test.py::test_make_tensor_raw[TensorProto.FLOAT16] FAILED
onnx/test/helper_test.py::test_make_tensor_raw[TensorProto.FLOAT] FAILED
onnx/test/helper_test.py::test_make_tensor_raw[TensorProto.DOUBLE] FAILED
onnx/test/helper_test.py::test_make_tensor_raw[TensorProto.COMPLEX64] FAILED
onnx/test/helper_test.py::test_make_tensor_raw[TensorProto.COMPLEX128] FAILED
onnx/test/helper_test.py::test_make_tensor_raw[TensorProto.UINT16] FAILED
onnx/test/helper_test.py::test_make_tensor_raw[TensorProto.UINT32] FAILED

Note: Some of those tests are also mentioned in the following issue from 2024-01-14:
s390x test failures - test_make_tensor_raw onnx#6181

The same applies for test_make_bfloat16_tensor_raw. Note that this test did
not fail before as byteswapping ml_dtypes.bfloat16 is currently not byteswapping
at all.  The test will pass as soon as either ml_dtypes.bfloat16 is fixed or
an alternative approach for byteswapping is used in numpy_helper.to_array().
See ml_dtypes issue: "bfloat16: byteswap does not swap bytes onnx#308"
jax-ml/ml_dtypes#308

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
On s390x, I get those "RuntimeWarning: invalid value encountered in cast"
in onnx/test/helper_test.py for functions test_make_tensor_vals and
test_make_tensor_raw and the UINT8, UINT16, UINT32, UINT64 data types.

Fixed those warnings by using only positive floats for those datatypes.

See similar commit a90d702
Use randint instead of randn to prevent undefined cast (float to uint) (onnx#4804)

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
…) on s390x

On big-endian systems like s390x some external tests are failing because the
tensors are created with big instead of little endian bytes.  After saving
and loading the model, the tensors are converted to a np.ndarray where the
bytes are byteswapped on big-endian machines from little to big-endian.

Thus this patch creates the tensors in the correct byte order.

This fixes those tests on s390x:
onnx/test/test_external_data.py::TestExternalDataToArray_0_protobuf::test_save_model_with_external_data_multiple_times FAILED
onnx/test/test_external_data.py::TestExternalDataToArray_0_protobuf::test_to_array_with_external_data FAILED
onnx/test/test_external_data.py::TestExternalDataToArray_1_textproto::test_save_model_with_external_data_multiple_times FAILED
onnx/test/test_external_data.py::TestExternalDataToArray_1_textproto::test_to_array_with_external_data FAILED

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
If a numpy array is passed to helper.py::make_tensor(raw=True), it currently
just uses tobytes().  Compared to numpy_helper.py:from_array() this is wrong
for the 4bit datatypes as those needs to be packed.
Furthermore the bytes needs to be stored in little endian byte order (only
needed on big endian machines like s390x).

Thus this patch adds the byteswap and the packing. Furthermore the testcase
onnx/test/helper_test.py::test_make_tensor_raw() is extended to also check
passing numpy arrays.

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
On big endian machines, we have to ensure that the bytes stored in a
tensor are converted to little endian.  While searching in the sources,
those are the remaining places - all testcases - where tobytes() is
used.  This patch either add a byteswap or is now passing a numpy array
to make_tensor, which then performs the byteswap, instead of using
tobytes().

Note that all corresponding tests did not fail before.  Thus in theory
we do not need those changes.  But it is more consistent as before.

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
@stliibm stliibm requested review from a team as code owners September 18, 2025 14:51
@github-project-automation github-project-automation bot moved this to In progress in PR Tracker Sep 18, 2025
Copy link
Member

@justinchuby justinchuby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the fixes!

@justinchuby
Copy link
Member

Is the required version bumped as soon as fixed and released in ml_dtypes?

Yes we can bump that for s390x systems.

@codecov
Copy link

codecov bot commented Sep 18, 2025

Codecov Report

❌ Patch coverage is 92.85714% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.43%. Comparing base (05b413e) to head (0b9908a).
⚠️ Report is 82 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
onnx/numpy_helper.py 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7309      +/-   ##
==========================================
+ Coverage   54.38%   54.43%   +0.04%     
==========================================
  Files         511      511              
  Lines       31916    31930      +14     
  Branches     2870     2873       +3     
==========================================
+ Hits        17359    17381      +22     
+ Misses      13767    13760       -7     
+ Partials      790      789       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andife
Copy link
Member

andife commented Sep 19, 2025

@cyyever Maybe we should run the test for s390x (big-endian) in some docker container once a week? So far, we haven't tested anything, neither the build process nor the tests?

@cyyever
Copy link
Contributor

cyyever commented Sep 19, 2025

@andife I agree

@cyyever
Copy link
Contributor

cyyever commented Sep 19, 2025

We should fix the serialisation endian and record it in documentation.

@justinchuby justinchuby self-assigned this Sep 19, 2025
@stliibm
Copy link
Contributor Author

stliibm commented Sep 19, 2025

Is the required version bumped as soon as fixed and released in ml_dtypes?

Yes we can bump that for s390x systems.

Sounds great. Thus we first have to wait until the pull-request for ml_dtypes is reviewed/merged and a new release is created.

@justinchuby
Copy link
Member

@stliibm do you know what the best of setting up an s390x CI on GitHub is?

This patch introduces a common function numpy_helper.py::tobytes_little_endian() to
convert a numpy array to bytes. On big-endian systems it also swaps the bytes
which ensures that the returned bytes are in little-endian order.

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
@stliibm
Copy link
Contributor Author

stliibm commented Sep 19, 2025

@stliibm do you know what the best of setting up an s390x CI on GitHub is?

From a colleague, I've got an example how to use Github action runner on s390x. Of course in the link below, it was requested for other projects:
"New Project: GGML-ORG / RamaLama #8"

Here is also an onboarding guide:
Welcome to GitHub Actions on Power, Z, and LinuxONE

I hope this helps?

Copy link
Member

@justinchuby justinchuby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@github-project-automation github-project-automation bot moved this from In progress to Reviewer approved in PR Tracker Sep 19, 2025
@github-project-automation github-project-automation bot moved this from Reviewer approved to Review in progress in PR Tracker Sep 20, 2025
This adds a doc-string and the versionadded directive to tobytes_little_endian().

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
The check when to swap the bytes to little-endian is now more accurate as the byteorder
of the dtype is really checked.
Furthermore, we avoid using byteswap() but instead use astype() as the latter one also
records the correct byteorder.

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
@github-project-automation github-project-automation bot moved this from Review in progress to Reviewer approved in PR Tracker Sep 26, 2025
@cyyever
Copy link
Contributor

cyyever commented Sep 26, 2025

nice

@cyyever cyyever added the run release CIs Use this label to trigger release tests in CI label Sep 26, 2025
@justinchuby justinchuby added the auto update doc Generate md/proto files automatically using the CI pipeline label Sep 26, 2025
justinchuby and others added 2 commits September 26, 2025 07:52
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@justinchuby justinchuby removed the auto update doc Generate md/proto files automatically using the CI pipeline label Sep 26, 2025
@justinchuby justinchuby enabled auto-merge (squash) September 26, 2025 15:08
@justinchuby justinchuby merged commit f307a8d into onnx:main Sep 26, 2025
34 checks passed
@github-project-automation github-project-automation bot moved this from Reviewer approved to Done in PR Tracker Sep 26, 2025
@stliibm
Copy link
Contributor Author

stliibm commented Sep 29, 2025

@justinchuby @cyyever Thanks a lot for the review and merging the branch. I've just left a comment in the ml_dtypes PR Fix bfloat16 byteswap #311. Hopefully it gets reviewed soon. Then also the remaining bfloat16 testcases will pass.

@cyyever
Copy link
Contributor

cyyever commented Sep 29, 2025

@stliibm Nice job

@titaiwangms titaiwangms added the module: CI pipelines Issues related to the CI pipeline label Oct 30, 2025
stliibm added a commit to stliibm/onnx that referenced this pull request Nov 24, 2025
In order to get the testsuite clean on s390x (big-endian), we need
the already committed onnx-commit:
"Fix testsuite fails on s390x (big-endian) (onnx#7309)"
onnx@f307a8d

AND the ml_dtypes-commit:
"Fix bfloat16 byteswap not working"
jax-ml/ml_dtypes@fd7fa4a

The ml_dtypes one has now landed in ml_dtypes v0.5.4 release:
https://github.com/jax-ml/ml_dtypes/releases/tag/v0.5.4
"Fixed bug in byte-swap operation for custom floats (onnx#311)"

This patch bumps the requirement for ml_dtypes to v0.5.4 only on s390x
as discussed in the original pull-request:
onnx#7309 (comment)
All other architectures, still require for ml_dtypes v0.5.0 as before.
stliibm added a commit to stliibm/onnx that referenced this pull request Nov 24, 2025
In order to get the testsuite clean on s390x (big-endian), we need
the already committed onnx-commit:
"Fix testsuite fails on s390x (big-endian) (onnx#7309)"
onnx@f307a8d

AND the ml_dtypes-commit:
"Fix bfloat16 byteswap not working"
jax-ml/ml_dtypes@fd7fa4a

The ml_dtypes one has now landed in ml_dtypes v0.5.4 release:
https://github.com/jax-ml/ml_dtypes/releases/tag/v0.5.4
"Fixed bug in byte-swap operation for custom floats (onnx#311)"

This patch bumps the requirement for ml_dtypes to v0.5.4 only on s390x
as discussed in the original pull-request:
onnx#7309 (comment)
All other architectures, still require for ml_dtypes v0.5.0 as before.

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
andife pushed a commit that referenced this pull request Nov 24, 2025
In order to get the testsuite clean on s390x (big-endian), we need the
already committed onnx-commit:
"Fix testsuite fails on s390x (big-endian) (#7309)"
f307a8d

AND the ml_dtypes-commit:
"Fix bfloat16 byteswap not working"

jax-ml/ml_dtypes@fd7fa4a

The ml_dtypes one has now landed in ml_dtypes v0.5.4 release:
https://github.com/jax-ml/ml_dtypes/releases/tag/v0.5.4 
"Fixed bug in byte-swap operation for custom floats (#311)"


This patch bumps the requirement for ml_dtypes to v0.5.4 only on s390x
as discussed in the original pull-request:
#7309 (comment) All other
architectures, still require for ml_dtypes v0.5.0 as before.

Signed-off-by: Stefan Liebler <stli@linux.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: CI pipelines Issues related to the CI pipeline run release CIs Use this label to trigger release tests in CI

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants