-
Notifications
You must be signed in to change notification settings - Fork 22.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDPA-CUDNN] Make CuDNN Attention Opt in #138522
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138522
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0357a75 with merge base 7786869 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 58f70e7943348cd11908451eddc150a2c1b22cde Pull Request resolved: #138522
cc mikaylagawarecki [ghstack-poisoned]
ghstack-source-id: 0864209b86022cefec2f8ad2029be0b4facb96f2 Pull Request resolved: #138522
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
# Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc atalman cc mikaylagawarecki [ghstack-poisoned]
ghstack-source-id: 28290c32b53b82f8e49f67b44312a42435ad006b Pull Request resolved: #138522
Merge failedReason: New commits were pushed while merging. Please rerun the merge command. Details for Dev Infra teamRaised by workflow job |
ghstack-source-id: 30a6d89d5c096b1da8bf95be136bd491d2bdf6de Pull Request resolved: #138522
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot cherry-pick --onto release/2.5 -c critical |
# Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet (cherry picked from commit 9a9a0ab)
Cherry picking #138522The cherry pick PR is at #138587 and it is recommended to link a critical cherry pick PR with an issue. Details for Dev Infra teamRaised by workflow job |
@drisspg I think this change is failing on Windows https://github.com/pytorch/pytorch/actions/runs/11464507611/job/31902166240#step:15:21915 test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_dense_cuda GH job link HUD commit link Could you help take a look? |
[SDPA-CUDNN] Make CuDNN Attention Opt in (#138522) # Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet (cherry picked from commit 9a9a0ab) Co-authored-by: drisspg <drisspguessous@gmail.com>
# Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet
Stack from ghstack (oldest at bottom):
Summary
Currently we have a
cudnn_order
that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5:query
's memory layout ordering foroutput
in cuDNN SDPA #138354In light of the above we are going to make the CuDNN backend Opt-in by default.
This can be done easily with the context manager for choosing backends I.e.:
This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager).
Cc @atalman
cc @mikaylagawarecki