[SDPA-CUDNN] Make CuDNN Attention Opt in #138522

drisspg · 2024-10-21T23:10:11Z

Stack from ghstack (oldest at bottom):

-> [SDPA-CUDNN] Make CuDNN Attention Opt in #138522

Summary

Currently we have a cudnn_order that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5:

SDPA: CUDNN backend error w/ q_seq_len = 1 #138529
RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph. huggingface/diffusers#9704
[cuDNN][SDPA] Match query's memory layout ordering for output in cuDNN SDPA #138354

In light of the above we are going to make the CuDNN backend Opt-in by default.

This can be done easily with the context manager for choosing backends I.e.:

from torch.nn.attention import sdpa_kernel, SDPBackend    

with sdpa_kernel(SDPBackend.CUDNN_ATTENTION):
    out = F.scaled_dot_product_attention(q, k, v)

This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager).

Cc @atalman

cc @mikaylagawarecki

[ghstack-poisoned]

pytorch-bot · 2024-10-21T23:10:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138522

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0357a75 with merge base 7786869 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 58f70e7943348cd11908451eddc150a2c1b22cde Pull Request resolved: #138522

cc mikaylagawarecki [ghstack-poisoned]

ghstack-source-id: 0864209b86022cefec2f8ad2029be0b4facb96f2 Pull Request resolved: #138522

drisspg · 2024-10-22T00:38:17Z

@pytorchbot merge

pytorchmergebot · 2024-10-22T00:39:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc atalman cc mikaylagawarecki [ghstack-poisoned]

ghstack-source-id: 28290c32b53b82f8e49f67b44312a42435ad006b Pull Request resolved: #138522

pytorchmergebot · 2024-10-22T01:11:38Z

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

[ghstack-poisoned]

ghstack-source-id: 30a6d89d5c096b1da8bf95be136bd491d2bdf6de Pull Request resolved: #138522

drisspg · 2024-10-22T04:10:42Z

@pytorchbot merge

pytorchmergebot · 2024-10-22T04:13:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-22T07:24:10Z

This PR (#138522) was merged in 9a9a0ab but it is still open, likely due to a Github bug, so mergebot is closing it manually. If you think this is a mistake, please feel free to reopen and contact Dev Infra.

atalman · 2024-10-22T12:52:52Z

@pytorchbot cherry-pick --onto release/2.5 -c critical

@atalman

# Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet (cherry picked from commit 9a9a0ab)

pytorchbot · 2024-10-22T12:56:48Z

Cherry picking #138522

The cherry pick PR is at #138587 and it is recommended to link a critical cherry pick PR with an issue.

Details for Dev Infra team

Raised by workflow job

huydhn · 2024-10-22T20:15:55Z

@drisspg I think this change is failing on Windows https://github.com/pytorch/pytorch/actions/runs/11464507611/job/31902166240#step:15:21915

test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_dense_cuda GH job link HUD commit link

Could you help take a look?

drisspg · 2024-10-22T20:53:51Z

@huydhn can I forward fix this, #138641

Should fix but dont have a windows machine to test on

@atalman

[SDPA-CUDNN] Make CuDNN Attention Opt in (#138522) # Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet (cherry picked from commit 9a9a0ab) Co-authored-by: drisspg <drisspguessous@gmail.com>

@atalman

# Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet

[SDPA-CUDNN] Make CuDNN Attention Opt in

4cfc1c2

[ghstack-poisoned]

drisspg added a commit that referenced this pull request Oct 21, 2024

[SDPA-CUDNN] Make CuDNN Attention Opt in

df8058f

ghstack-source-id: 58f70e7943348cd11908451eddc150a2c1b22cde Pull Request resolved: #138522

ngimel approved these changes Oct 21, 2024

View reviewed changes

drisspg added topic: not user facing topic category module: multi-headed-attention labels Oct 21, 2024

Update on "[SDPA-CUDNN] Make CuDNN Attention Opt in"

aecd92d

cc mikaylagawarecki [ghstack-poisoned]

drisspg added a commit that referenced this pull request Oct 21, 2024

[SDPA-CUDNN] Make CuDNN Attention Opt in

9be5d4f

ghstack-source-id: 0864209b86022cefec2f8ad2029be0b4facb96f2 Pull Request resolved: #138522

eqy approved these changes Oct 22, 2024

View reviewed changes

drisspg added this to the 2.5.1 milestone Oct 22, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 22, 2024

pytorchmergebot added the merging label Oct 22, 2024

malfet approved these changes Oct 22, 2024

View reviewed changes

drisspg added a commit that referenced this pull request Oct 22, 2024

[SDPA-CUDNN] Make CuDNN Attention Opt in

e8c5205

ghstack-source-id: 28290c32b53b82f8e49f67b44312a42435ad006b Pull Request resolved: #138522

pytorchmergebot removed the merging label Oct 22, 2024

sayakpaul mentioned this pull request Oct 22, 2024

RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph. huggingface/diffusers#9704

Open

Update

0357a75

[ghstack-poisoned]

drisspg added a commit that referenced this pull request Oct 22, 2024

[SDPA-CUDNN] Make CuDNN Attention Opt in

1448700

ghstack-source-id: 30a6d89d5c096b1da8bf95be136bd491d2bdf6de Pull Request resolved: #138522

pytorchmergebot added the merging label Oct 22, 2024

pytorchmergebot added the Merged label Oct 22, 2024

pytorchmergebot closed this Oct 22, 2024

pytorchmergebot removed the merging label Oct 22, 2024

pytorchbot mentioned this pull request Oct 22, 2024

[SDPA-CUDNN] Make CuDNN Attention Opt in #138587

Merged

drisspg mentioned this pull request Oct 22, 2024

SDPA 2.5 Issue tracking #138649

Open

ad8e mentioned this pull request Oct 30, 2024

CUDNN sdp attention causes loss explosion #139298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SDPA-CUDNN] Make CuDNN Attention Opt in #138522

[SDPA-CUDNN] Make CuDNN Attention Opt in #138522

drisspg commented Oct 21, 2024 •

edited

Loading

pytorch-bot bot commented Oct 21, 2024 •

edited

Loading

drisspg commented Oct 22, 2024

pytorchmergebot commented Oct 22, 2024

pytorchmergebot commented Oct 22, 2024

drisspg commented Oct 22, 2024

pytorchmergebot commented Oct 22, 2024

pytorchmergebot commented Oct 22, 2024

atalman commented Oct 22, 2024

pytorchbot commented Oct 22, 2024

huydhn commented Oct 22, 2024

drisspg commented Oct 22, 2024

[SDPA-CUDNN] Make CuDNN Attention Opt in #138522

[SDPA-CUDNN] Make CuDNN Attention Opt in #138522

Conversation

drisspg commented Oct 21, 2024 • edited Loading

Summary

pytorch-bot bot commented Oct 21, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138522

✅ No Failures

drisspg commented Oct 22, 2024

pytorchmergebot commented Oct 22, 2024

Merge started

pytorchmergebot commented Oct 22, 2024

Merge failed

drisspg commented Oct 22, 2024

pytorchmergebot commented Oct 22, 2024

Merge started

pytorchmergebot commented Oct 22, 2024

atalman commented Oct 22, 2024

pytorchbot commented Oct 22, 2024

Cherry picking #138522

huydhn commented Oct 22, 2024

drisspg commented Oct 22, 2024

drisspg commented Oct 21, 2024 •

edited

Loading

pytorch-bot bot commented Oct 21, 2024 •

edited

Loading