Tags: ROCm/MIOpen
Tags
[gfx12][Solvers][Winograd] Winograd Base for gfx12 v40.6.0 (#3000) (#… …3846) **Cherry-pick from develop branch** ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> Add Winograd Base 40.6.0 for gfx12 to improve convolution operations performance Issues related: * ROCm/rocm-libraries#2567 * ROCm/rocm-libraries#897 * SWDEV-549814 ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> Added kernels that implements winograd conv operation: Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f2x3_dilation2.inc Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f2x3_stride1.inc Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f2x3_stride2.inc Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f3x2_dilation2.inc Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f3x2_stride1.inc Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f3x2_stride2.inc Conv_Winograd_v40_6_0_gfx12_fp32_f2x3_dilation2.inc Conv_Winograd_v40_6_0_gfx12_fp32_f2x3_stride1.inc Conv_Winograd_v40_6_0_gfx12_fp32_f2x3_stride2.inc Conv_Winograd_v40_6_0_gfx12_fp32_f3x2_dilation2.inc Conv_Winograd_v40_6_0_gfx12_fp32_f3x2_stride1.inc Conv_Winograd_v40_6_0_gfx12_fp32_f3x2_stride2.inc ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> - [x] Unit tests for the solver on gfx1201/gfx1200 and gfx1100 - [x] Performance tests based on resnet50 problems - [x] e2e test with models -> https://amd.atlassian.net/wiki/spaces/VPGFXAT/pages/1232601634/Support+of+Winograd+Convolution+kernels+for+gfx120x ## Test Result <!-- Briefly summarize test outcomes. --> For resnet50 problems: `MIOpenDriver convfp16 -n 128 -c 3 -H 224 -W 224 -k 64 -y 7 -x 7 -p 3 -q 3 -u 2 -v 2 -l 1 -j 1 -m conv -g 1 -F 1 -t 1` * gfx1201 GFLOPs improved from 6621 to 21745 Time reduced from 4.562853 to 1.389383 * gfx1200 GFLOPs improved from 4641 to 11753 Time reduced from 6.509030 to 2.58354 ## Submission Checklist - Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
[gfx12][Solvers][Winograd] Winograd Base for gfx12 v40.6.0 (#3000) (#… …3846) **Cherry-pick from develop branch** ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> Add Winograd Base 40.6.0 for gfx12 to improve convolution operations performance Issues related: * ROCm/rocm-libraries#2567 * ROCm/rocm-libraries#897 * SWDEV-549814 ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> Added kernels that implements winograd conv operation: Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f2x3_dilation2.inc Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f2x3_stride1.inc Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f2x3_stride2.inc Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f3x2_dilation2.inc Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f3x2_stride1.inc Conv_Winograd_v40_6_0_gfx12_fp16_dot2_f3x2_stride2.inc Conv_Winograd_v40_6_0_gfx12_fp32_f2x3_dilation2.inc Conv_Winograd_v40_6_0_gfx12_fp32_f2x3_stride1.inc Conv_Winograd_v40_6_0_gfx12_fp32_f2x3_stride2.inc Conv_Winograd_v40_6_0_gfx12_fp32_f3x2_dilation2.inc Conv_Winograd_v40_6_0_gfx12_fp32_f3x2_stride1.inc Conv_Winograd_v40_6_0_gfx12_fp32_f3x2_stride2.inc ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> - [x] Unit tests for the solver on gfx1201/gfx1200 and gfx1100 - [x] Performance tests based on resnet50 problems - [x] e2e test with models -> https://amd.atlassian.net/wiki/spaces/VPGFXAT/pages/1232601634/Support+of+Winograd+Convolution+kernels+for+gfx120x ## Test Result <!-- Briefly summarize test outcomes. --> For resnet50 problems: `MIOpenDriver convfp16 -n 128 -c 3 -H 224 -W 224 -k 64 -y 7 -x 7 -p 3 -q 3 -u 2 -v 2 -l 1 -j 1 -m conv -g 1 -F 1 -t 1` * gfx1201 GFLOPs improved from 6621 to 21745 Time reduced from 4.562853 to 1.389383 * gfx1200 GFLOPs improved from 4641 to 11753 Time reduced from 6.509030 to 2.58354 ## Submission Checklist - Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
PreviousNext