[XPU] Fix XPU kernel errors for paddle.diagonal_scatter by YqGe585 · Pull Request #79008 · PaddlePaddle/Paddle

YqGe585 · 2026-05-15T07:16:57Z

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

Fix XPU kernel errors for paddle.diagonal_scatter where the underlying XDNN fill_diagonal_tensor function rejected several data types (float32, int64, int32, float16, bool) with XDNN_INVALID_PARAM.

Root Cause

The XPU kernel at paddle/phi/kernels/xpu/fill_diagonal_tensor_kernel.cc delegated the diagonal fill logic entirely to the XDNN library's xpu::fill_diagonal_tensor function. This function does not support all the data types that Paddle registers the kernel for, causing runtime errors for those dtypes.

Fix

Replaced the XDNN fill_diagonal_tensor call with a CPU round-trip approach:

Copy the output tensor from XPU to CPU
Copy the fill tensor from XPU to CPU
Compute diagonal positions using CalMatDims (same as CPU/GPU kernel)
Overwrite diagonal values on CPU
Copy the result back to XPU

This approach:

Uses the same diagonal index computation algorithm as the CPU and GPU kernels
Works for all data types registered by the kernel
Follows established XPU kernel patterns (similar to unique_kernel.cc, generate_proposals_kernel.cc)
No changes to interface definitions or behavior

Test Results

All 13 testable cases from PaddleAPITest/all_config.txt pass with max_abs_diff=0, max_rel_diff=0 (bitwise identical XPU vs GPU outputs):

bool, complex64, float16, float32 (offset=0/1/-2, axis1=0/1 axis2=0/1), float64, int8, int16, int32, int64, uint8

complex128 is skipped by the test framework (known XPU platform limitation).

Does this PR introduce a precision change?

Yes — XPU precision corrected to align with GPU for previously failing data types.

… xpu::fill_diagonal_tensor XDNN call with CPU-side diagonal fill The XDNN fill_diagonal_tensor function rejected several dtypes (float32, int64, int32, float16, bool) with XDNN_INVALID_PARAM. Replace it with a CPU round-trip approach that computes diagonal positions via CalMatDims and fills values on CPU, matching the CPU/GPU kernel algorithm. This eliminates the XPU kernel dependency on XDNN for the core fill logic. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

paddle-bot · 2026-05-15T08:25:15Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] Fix XPU kernel errors for paddle.diagonal_scatter#79008

[XPU] Fix XPU kernel errors for paddle.diagonal_scatter#79008
YqGe585 wants to merge 1 commit into
PaddlePaddle:developfrom
YqGe585:xpu-api-fixer/PAD-215-xpu-error

YqGe585 commented May 15, 2026

Uh oh!

paddle-bot Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YqGe585 commented May 15, 2026

PR Category

PR Types

Description

Root Cause

Fix

Test Results

Does this PR introduce a precision change?

Uh oh!

paddle-bot Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant