Tags: ROCm/triton
Tags
Merge pull request #410 from ROCmSoftwarePlatform/ifu-231117 Ifu 231117
add bitcode for gfx941 and gfx942 (#403) Co-authored-by: Aleksandr Efimov <130555951+alefimov-amd@users.noreply.github.com>
Merge pull request #395 from ROCmSoftwarePlatform/ifu-231108 Ifu 231108
[Tutorial] Fix post IFU issues with FA (#398) * [Tutorial] Fix post IFU issues with FA * Remove redundant kernels in 06-fused-attention.py * Added README for scripts in perf-kernels dir * Fix bwd kernel --------- Co-authored-by: Lixun Zhang <lixun.zhang@amd.com>
Merge pull request #382 from ROCmSoftwarePlatform/ifu231005-rebase Ifu231005
Add OptimizeEpilogue pass. (#346) * optimize_epilogue * Add config * Remove licenses * Comment out Hopper specific parameters when printing out configs * Add benchmark parameters from flash-attention repo * Add Z and H in the key of autotuner --------- Co-authored-by: Lixun Zhang <lixun.zhang@amd.com>
use different int8 mfma instructions on different GPUs. (#368) * changes support to choose different int8 instructions * rename an instruction name Co-authored-by: Aleksandr Efimov <efimov.alexander@gmail.com>
PreviousNext