Tags: ROCm/hipBLASLt
Tags
Revert bad logic - Low Offset Overflow (#2472) (#2647) ## Motivation In response to a ticket debug. The problem was the low 32-bits of the read address were being incremented when moving to the next tile, but the high 32-bits were not. This could cause a problem if the workspace buffer was allocated with an address close to the 32-bit boundary - it can create a scenario where incrementing to the next tile causes the low 32-bits to wrap to 0, and the carry was not being handled correctly so the read address would be out of bounds before the beginning of the buffer. Reverting bad logic from ROCm/rocm-libraries#1080 ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: mahmoodw <44450175+mahmoodw@users.noreply.github.com> Co-authored-by: mahmoodw <wmahmood@amd.com>
Fix StreamK ExtraIters Bug (#1933) (#2008) ## Motivation This PR fixes a bug in StreamK extraIters calculations + Improving naming conventions for the parallel reduction path. ## Technical Details Fixes bug in extraIters calculation that would cause incorrect results. ## Test Plan Passed all CI tests. ## Test Result ## Submission Checklist - [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: Ali Yazdani <ayazdani@amd.com> Co-authored-by: Val Movsik <160653499+vamovsik@users.noreply.github.com>
[rocm-libraries] ROCm/rocm-libraries#1753 (commit 0a25de4) Cherry-Pick StreamK Changes to rocm 7.0 ## Motivation Some StreamK features/improvements are needed. ## Technical Details This PR avoids multiple potential overflows in StreamK math. ## Test Plan Locally on GFX950 and CI ## Test Result [----------] Global test environment tear-down [==========] 19997 tests from 12 test suites ran. (1601396 ms total) [ PASSED ] 19997 tests. hipBLASLt version: 100000 hipBLASLt git version: 20250912-42-17-gb1537e7cb6-dirty command line: ./hipblaslt-test ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
[rocm-libraries] ROCm/rocm-libraries#1233 (commit 976b9c4) Origami lib for F8BS_TN_SABV (#521) This PR adds library for F8BS_TN with row-wise scaling (SABV). These changes have been reviewed and validated, passed CI.
[rocm-libraries] ROCm/rocm-libraries#1233 (commit 976b9c4) Origami lib for F8BS_TN_SABV (#521) This PR adds library for F8BS_TN with row-wise scaling (SABV). These changes have been reviewed and validated, passed CI.
Fix CI errors: don't run layernorm API on un-suppoprted platforms
PreviousNext