Skip to content

[XPU] Fix XPU kernel errors for paddle.diagonal_scatter#79008

Open
YqGe585 wants to merge 1 commit into
PaddlePaddle:developfrom
YqGe585:xpu-api-fixer/PAD-215-xpu-error
Open

[XPU] Fix XPU kernel errors for paddle.diagonal_scatter#79008
YqGe585 wants to merge 1 commit into
PaddlePaddle:developfrom
YqGe585:xpu-api-fixer/PAD-215-xpu-error

Conversation

@YqGe585
Copy link
Copy Markdown
Member

@YqGe585 YqGe585 commented May 15, 2026

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

Fix XPU kernel errors for paddle.diagonal_scatter where the underlying XDNN fill_diagonal_tensor function rejected several data types (float32, int64, int32, float16, bool) with XDNN_INVALID_PARAM.

Root Cause

The XPU kernel at paddle/phi/kernels/xpu/fill_diagonal_tensor_kernel.cc delegated the diagonal fill logic entirely to the XDNN library's xpu::fill_diagonal_tensor function. This function does not support all the data types that Paddle registers the kernel for, causing runtime errors for those dtypes.

Fix

Replaced the XDNN fill_diagonal_tensor call with a CPU round-trip approach:

  1. Copy the output tensor from XPU to CPU
  2. Copy the fill tensor from XPU to CPU
  3. Compute diagonal positions using CalMatDims (same as CPU/GPU kernel)
  4. Overwrite diagonal values on CPU
  5. Copy the result back to XPU

This approach:

  • Uses the same diagonal index computation algorithm as the CPU and GPU kernels
  • Works for all data types registered by the kernel
  • Follows established XPU kernel patterns (similar to unique_kernel.cc, generate_proposals_kernel.cc)
  • No changes to interface definitions or behavior

Test Results

All 13 testable cases from PaddleAPITest/all_config.txt pass with max_abs_diff=0, max_rel_diff=0 (bitwise identical XPU vs GPU outputs):

  • bool, complex64, float16, float32 (offset=0/1/-2, axis1=0/1 axis2=0/1), float64, int8, int16, int32, int64, uint8

complex128 is skipped by the test framework (known XPU platform limitation).

Does this PR introduce a precision change?

Yes — XPU precision corrected to align with GPU for previously failing data types.

… xpu::fill_diagonal_tensor XDNN call with CPU-side diagonal fill

The XDNN fill_diagonal_tensor function rejected several dtypes (float32,
int64, int32, float16, bool) with XDNN_INVALID_PARAM. Replace it with a
CPU round-trip approach that computes diagonal positions via CalMatDims
and fills values on CPU, matching the CPU/GPU kernel algorithm. This
eliminates the XPU kernel dependency on XDNN for the core fill logic.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 15, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant