-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Always ensure gpu_threads count >= warp size of 32 #8656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ac74c4b to
53b1567
Compare
Tile the image with `TailStrategy::GuardWithIf` whenever `can_parallelize(c, 3)` is encountered and when the split factor is smaller than GPU's warp size (=32). For 2D/3D tile schedules having mixed tail strategies, pick the most conservative one, i.e. `TailStrategy::GuardWithIf`. Enable `iir_blur` tests on the Github Actions/Buildbot.
53b1567 to
1452961
Compare
|
The Testbench passes for targets Error message: |
antonysigma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Help wanted to troubleshoot the buildbot failure for host-metal target.
|
The Halide buildbots are physical machines we have sitting in various offices and homes, so the cost is just electricity. Provided you aren't starving other PRs, it's fine to add experimental commits just to get more debugging info from the tests. |
93076ae to
38b75b4
Compare
38b75b4 to
988e83f
Compare
antonysigma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mcourteaux and @alexreinking ,
Hardcoding the input and output image sizes for the pipeline iir_blur seemed to resolve the OSX Metal error. I also found that the channel count estimates (=3) was inaccurate; the generated pipeline was fed with rgba.png having 4 channels. So I also modified the test code to feed a 3-channel image rgb.png.
The buildbot now reports separate errors regarding the max GPU threads per group error for other pipelines. I will resolve that in the next PR.
I have already documented the thread count limit at #8640 . I will study after this tail-strategy PR.
New errors:
/Users/halidenightly/build_bot/worker/halide-testbranch-main-llvm20-x86-64-osx-cmake/halide-build/apps/bilateral_grid/bilateral_grid_process "gray.png" "out.png" "0.1" "10"
2025-06-30 23:21:11.717 bilateral_grid_process[32169:21028558] Metal API Validation Enabled
-[MTLDebugComputeCommandEncoder _validateThreadsPerThreadgroup:]:1267: failed assertion `(threadsPerThreadgroup.width(64) * threadsPerThreadgroup.height(16) * threadsPerThreadgroup.depth(1))(1024) must be <= 896. (kernel threadgroup size limit)
/Users/halidenightly/build_bot/worker/halide-testbranch-main-llvm20-x86-64-osx-cmake/halide-build/apps/lens_blur/lens_blur_filter "rgb_small.png" "32" "13" "0.5" "32" "3" "out.png"
2025-06-30 23:26:02.260 lens_blur_filter[32272:21031150] Metal API Validation Enabled
-[MTLDebugComputeCommandEncoder _validateThreadsPerThreadgroup:]:1267: failed assertion `(threadsPerThreadgroup.width(32) * threadsPerThreadgroup.height(32) * threadsPerThreadgroup.depth(1))(1024) must be <= 896. (kernel threadgroup size limit)'
Manually-tuned time: 40.1912ms
-Antony
| // Hardcode all CPU/GPU kernel bounds via Halide's constant bound propagation. | ||
| input.dim(0).set_bounds(0, 1536).set_stride(1); | ||
| input.dim(1).set_bounds(0, 2560).set_stride(1536); | ||
| input.dim(2).set_bounds(0, 3).set_stride(1536 * 2560); | ||
| output.dim(0).set_bounds(0, 1536).set_stride(1); | ||
| output.dim(1).set_bounds(0, 2560).set_stride(1536); | ||
| output.dim(2).set_bounds(0, 3).set_stride(1536 * 2560); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: OSX Metal buffer allocation error.
Hardcoding the input and output image sizes in the iir_blur pipeline appears to resolve the Metal GPU buffer size limit error.
bb0534d to
25c27b2
Compare
|
Hi @alexreinking , I am done with the bugfixes. There is one minor buildbot error though. What does it mean? How do I help resolving CMake errors? |
|
I think that's a ccache bug... I'll clear that out. |
|
@antonysigma - it was indeed ccache. Merged! 🙂 |
|
@alexreinking Thanks for merging it! The rest of the issues (e.g. metal GPU having smaller register count than Nvidia GPU on buildbot) are somewhat minor, and they should not be blocking the next Halide release. Let me know if you have other concerns related to bugfixes and/or scheduled releases. |
Tile the image with
TailStrategy::GuardWithIfwhenevercan_parallelize(c, 3)is encountered and when the split factor (3) is smaller than GPU's warp size (=32).For 2D/3D tile schedules having mixed tail strategies, pick the most conservative one, i.e.
TailStrategy::GuardWithIf.Enable
iir_blurtests on the Github Actions/Buildbot.Before:
After:
Note: We are not trying to compete with manual GPU schedules. We just want it to pass correctness tests.
cc'ed @alexreinking and @mcourteaux .