forked from ROCm/MIOpen
-
Notifications
You must be signed in to change notification settings - Fork 5
Port var backward Op #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
et16kr
reviewed
Jul 29, 2024
et16kr
reviewed
Jul 29, 2024
et16kr
reviewed
Jul 29, 2024
make analyze 확인 부탁 드립니다. |
Author
수정 완료했습니다. |
|
성능 측정 요약에서 cont와 non-cont를 분리해 주세요. |
Author
non-cont 커널 성능 측정 따로 완료하여 해당 테스트 결과 추가해두었습니다. |
|
non-cont 통게에서 1d는 빼야겠네요. 그거 말고는 괜찮아 보입니다. 수고하셨습니다. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When comparing the newly developed miopen var kernel with ROCm, there's performance improvement for a specific range of input sizes.(1024 ~ 1024 * 1024 * 2)
float32(contiguous)
float16(contiguous)
bfloat16(contiguous)
float32(noncontiguous)
float16(noncontiguous)
bfloat16(noncontiguous)