Skip to content

SYCL: skip grid-stride loop when not needed for MDRange#9250

Draft
masterleinad wants to merge 2 commits into
kokkos:developfrom
masterleinad:sycl_grid_stride_loop
Draft

SYCL: skip grid-stride loop when not needed for MDRange#9250
masterleinad wants to merge 2 commits into
kokkos:developfrom
masterleinad:sycl_grid_stride_loop

Conversation

@masterleinad

Copy link
Copy Markdown
Contributor

Analogous to #9142 for SYCL. Copilot wrote the code since I wanted to try it and it seemed simple enough. I did some cleanup.

before:

1: OverlapMDRangePolicy/N:200/M:10000/R:10                                                                                                                                        0.058 s         0.058 s            12       0.0106696     5.34192m               0.0101212                  5.08093m    5.08321m          5.041u
1: MDRangeStencil_2D_MDRange_LayoutRight/size:512/tile_size:0/iterations:1/manual_time                                                                                            0.076 ms        0.078 ms            1              1          8         32
1: MDRangeStencil_3D_MDRange_LayoutLeft/size:128/tile_size:0/iterations:1/manual_time                                                                                             0.093 ms        0.094 ms            1              1         32          2          4
1: MDRangeStencil_4D_MDRange_LayoutRight/size:32/tile_size:0/iterations:1/manual_time                                                                                             0.073 ms        0.074 ms            1              1          2          2          2         16
      Start  4: Kokkos_PerformanceTest_MDRangePolicy_Stream
4: Test command: /home/darndt/kokkos/build/core/perf_test/Kokkos_PerformanceTest_MDRangePolicy_Stream "--benchmark_counters_tabular=true" "--benchmark_out=Kokkos_PerformanceTest_MDRangePolicy_Stream_2026-06-05_T19-10-24.json"
4: Running /home/darndt/kokkos/build/core/perf_test/Kokkos_PerformanceTest_MDRangePolicy_Stream
4: MDRangePolicy_Set<1>/22/manual_time        0.508 ms        0.508 ms         1529 1.78503k/s    907.039
4: MDRangePolicy_Set<3>/22/manual_time        0.613 ms        0.613 ms         1221 1.48012k/s    907.039
4: MDRangePolicy_Set<6>/22/manual_time         2.84 ms         2.84 ms          247  319.439/s    907.039
4: MDRangePolicy_Triad<1>/22/manual_time       1.38 ms         1.38 ms          510 1.96657k/s   2.72112k
4: MDRangePolicy_Triad<3>/22/manual_time       1.62 ms         1.62 ms          429 1.68467k/s   2.72112k
4: MDRangePolicy_Triad<6>/22/manual_time       3.81 ms         3.81 ms          185  714.687/s   2.72112k
 4/12 Test  #4: Kokkos_PerformanceTest_MDRangePolicy_Stream ...   Passed    6.58 sec

after:

1: OverlapMDRangePolicy/N:200/M:10000/R:10                                                                                                                                        0.058 s         0.058 s            12       0.0104729      5.3311m               0.0101351                  5.07832m    5.07214m          5.248u
1: MDRangeStencil_2D_MDRange_LayoutRight/size:512/tile_size:0/iterations:1/manual_time                                                                                            0.063 ms        0.064 ms            1              1          8         32
1: MDRangeStencil_3D_MDRange_LayoutLeft/size:128/tile_size:0/iterations:1/manual_time                                                                                             0.078 ms        0.078 ms            1              1         32          2          4
1: MDRangeStencil_4D_MDRange_LayoutRight/size:32/tile_size:0/iterations:1/manual_time                                                                                             0.079 ms        0.080 ms            1              1          2          2          2         16
      Start  4: Kokkos_PerformanceTest_MDRangePolicy_Stream
4: Test command: /home/darndt/kokkos/build/core/perf_test/Kokkos_PerformanceTest_MDRangePolicy_Stream "--benchmark_counters_tabular=true" "--benchmark_out=Kokkos_PerformanceTest_MDRangePolicy_Stream_2026-06-05_T19-10-24.json"
4: Running /home/darndt/kokkos/build/core/perf_test/Kokkos_PerformanceTest_MDRangePolicy_Stream
4: MDRangePolicy_Set<1>/22/manual_time        0.521 ms        0.521 ms         1492 1.74044k/s    907.039
4: MDRangePolicy_Set<3>/22/manual_time        0.640 ms        0.640 ms         1120 1.41771k/s    907.039
4: MDRangePolicy_Set<6>/22/manual_time         2.07 ms         2.07 ms          344  439.084/s    907.039
4: MDRangePolicy_Triad<1>/22/manual_time       1.38 ms         1.38 ms          511 1.96756k/s   2.72112k
4: MDRangePolicy_Triad<3>/22/manual_time       1.60 ms         1.60 ms          439 1.69945k/s   2.72112k
4: MDRangePolicy_Triad<6>/22/manual_time       3.09 ms         3.09 ms          227  880.916/s   2.72112k
 4/12 Test  #4: Kokkos_PerformanceTest_MDRangePolicy_Stream ...   Passed    6.25 sec

So we see quite some improvement for MDRangeStencil_3D_MDRange_LayoutLeft, MDRangePolicy_Triad<6>andMDRangePolicy_Set<6>/22/manual_time` while the other times stay more or less the same.

masterleinad and others added 2 commits June 5, 2026 11:27
…kokkos#9142)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Daniel Arndt <arndtd@ornl.gov>
Signed-off-by: Daniel Arndt <arndtd@ornl.gov>

@yasahi-hpc yasahi-hpc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Adrien-Tab

Copy link
Copy Markdown
Member

Just a note that for the other backends (CUDA and HIP), the bandwidth gap between ranks 3 and 6 is less significant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants