Skip to content

Tags: ROCm/hipBLASLt

Tags

rocm-7.1.1

Toggle rocm-7.1.1's commit message
Revert bad logic - Low Offset Overflow (#2472) (#2647)

## Motivation
In response to a ticket debug.

The problem was the low 32-bits of the read address were being
incremented when moving to the next tile, but the high 32-bits were not.
This could cause a problem if the workspace buffer was allocated with an
address close to the 32-bit boundary - it can create a scenario where
incrementing to the next tile causes the low 32-bits to wrap to 0, and
the carry was not being handled correctly so the read address would be
out of bounds before the beginning of the buffer.

Reverting bad logic from
ROCm/rocm-libraries#1080

## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: mahmoodw <44450175+mahmoodw@users.noreply.github.com>
Co-authored-by: mahmoodw <wmahmood@amd.com>

rocm-7.1.0

Toggle rocm-7.1.0's commit message
Fix StreamK ExtraIters Bug (#1933) (#2008)

## Motivation

This PR fixes a bug in StreamK extraIters calculations + Improving
naming conventions for the parallel reduction path.

## Technical Details

Fixes bug in extraIters calculation that would cause incorrect results.

## Test Plan

Passed all CI tests.

## Test Result


## Submission Checklist

- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Ali Yazdani <ayazdani@amd.com>
Co-authored-by: Val Movsik <160653499+vamovsik@users.noreply.github.com>

rocm-7.0.2

Toggle rocm-7.0.2's commit message
[rocm-libraries] ROCm/rocm-libraries#1753 (commit 0a25de4)

Cherry-Pick StreamK Changes to rocm 7.0

## Motivation

Some StreamK features/improvements are needed.

## Technical Details

This PR avoids multiple potential overflows in StreamK math.

## Test Plan

Locally on GFX950 and CI

## Test Result

[----------] Global test environment tear-down
[==========] 19997 tests from 12 test suites ran. (1601396 ms total)
[  PASSED  ] 19997 tests.
hipBLASLt version: 100000
hipBLASLt git version: 20250912-42-17-gb1537e7cb6-dirty
command line: ./hipblaslt-test

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

rocm-7.0.1

Toggle rocm-7.0.1's commit message
[rocm-libraries] ROCm/rocm-libraries#1233 (commit 976b9c4)

Origami lib for F8BS_TN_SABV (#521)

This PR adds library for F8BS_TN with row-wise scaling (SABV). These
changes have been reviewed and validated, passed CI.

rocm-7.0.0

Toggle rocm-7.0.0's commit message
[rocm-libraries] ROCm/rocm-libraries#1233 (commit 976b9c4)

Origami lib for F8BS_TN_SABV (#521)

This PR adds library for F8BS_TN with row-wise scaling (SABV). These
changes have been reviewed and validated, passed CI.

rocm-6.4.4

Toggle rocm-6.4.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Revert "Add BBS TN Equality" (#2276)

Reverts #2275

rocm-6.4.3

Toggle rocm-6.4.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Revert "Add BBS TN Equality" (#2276)

Reverts #2275

rocm-6.4.2

Toggle rocm-6.4.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Enable gfx1103, gfx1150 and gfx1151 (#1766) (#2067)

Auto-submit by Jenkins

rocm-6.4.1

Toggle rocm-6.4.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add 6.4.1 changelog entry and bump version number (#1976)

rocm-6.4.0

Toggle rocm-6.4.0's commit message
Fix CI errors: don't run layernorm API on un-suppoprted platforms