Skip to content

Conversation

@wj-laskowski
Copy link
Contributor

@wj-laskowski wj-laskowski commented Dec 16, 2025

As titled. Following variants are added:

  • grouped_conv2d_fwd_dynamic_op
  • grouped_conv3d_fwd_dynamic_op
  • grouped_conv3d_fwd_bilinear
  • grouped_conv3d_fwd_convscale
  • grouped_conv3d_fwd_convinvscale
  • grouped_conv3d_fwd_convscale_add
  • grouped_conv3d_fwd_convscale_relu
  • grouped_conv3d_fwd_scale
  • grouped_conv3d_fwd_combconvscale
  • grouped_conv3d_fwd_scaleadd_scaleadd_relu

Proposed changes

Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

ancahamuraru and others added 30 commits May 15, 2025 10:36
The descriptors are larger than needed (even though the compiler don't alloc registers for unused values).
krithalith and others added 20 commits October 1, 2025 10:31
…unction. Was necessary to pass the bias_clamp_large_cases test.
…zed to avoid undefined strides. Not convinced this struct is properly initialized in other code / future code.
…ty. Use this for grouped conv fwd but not in general.
…function can be used. Remove splitK, and forceThreadTileTransfer for now. Also add CShuffle epilogue argument.
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
…3 to only be used for f8, just in case it hurts performance.
@wj-laskowski wj-laskowski force-pushed the streamhpc/grouped-conv-fwd-wmma-tuned-instances branch from 30fd906 to 598d41e Compare December 16, 2025 16:18
Following flavors are updated with tuned instance list:
  - grouped_conv2d_fwd
  - grouped_conv2d_fwd_bias_clamp
  - grouped_conv2d_fwd_clamp
  - grouped_conv3d_fwd
  - grouped_conv3d_fwd_bias_clamp
  - grouped_conv3d_fwd_clamp
  - grouped_conv3d_fwd_scaleadd_ab

Re-factored instance selection:
  - removed all the unnecessary instance tuples (comp/mem/16x16/generic)
  - removed all unnecessary layouts and data types
Base automatically changed from streamhpc/grouped-conv-fwd-wmma-tuned-instances to streamhpc/grouped-conv-fwd-wmma December 16, 2025 16:21
@wj-laskowski wj-laskowski force-pushed the streamhpc/grouped-conv-fwd-extra-flavors branch from 4b4d491 to 532329c Compare December 16, 2025 16:53
As titled. Following variants are added:
- grouped_conv2d_fwd_dynamic_op
- grouped_conv3d_fwd_dynamic_op
- grouped_conv3d_fwd_bilinear
- grouped_conv3d_fwd_convscale
- grouped_conv3d_fwd_convinvscale
- grouped_conv3d_fwd_convscale_add
- grouped_conv3d_fwd_convscale_relu
- grouped_conv3d_fwd_scale
- grouped_conv3d_fwd_combconvscale
- grouped_conv3d_fwd_scaleadd_scaleadd_relu
@wj-laskowski wj-laskowski force-pushed the streamhpc/grouped-conv-fwd-extra-flavors branch from 532329c to 0b0aa06 Compare December 16, 2025 17:33
@krithalith krithalith force-pushed the streamhpc/grouped-conv-fwd-wmma branch 3 times, most recently from 147d0f6 to 1983e16 Compare December 17, 2025 14:31
Base automatically changed from streamhpc/grouped-conv-fwd-wmma to develop December 18, 2025 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants