Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caffe2 usage of cuDNN RNNv6 API blocks upgrade to cuDNN v9+ #124790

Closed
eqy opened this issue Apr 23, 2024 · 5 comments
Closed

Caffe2 usage of cuDNN RNNv6 API blocks upgrade to cuDNN v9+ #124790

eqy opened this issue Apr 23, 2024 · 5 comments
Labels
caffe2 module: bc-breaking Related to a BC-breaking change module: build Build system issues module: cudnn Related to torch.backends.cudnn, and CuDNN support module: rnn Issues related to RNN support (LSTM, GRU, etc) topic: build

Comments

@eqy
Copy link
Collaborator

eqy commented Apr 23, 2024

🐛 Describe the bug

cuDNN RNNv6 API support was dropped in v9+. PyTorch itself has been migrated to the RNNv8 API in #115719, #120277, but Caffe2 has not been migrated and this currently results in build breakages when attempting to upgrade cuDNN to v9+ e.g., #123475

The two immediate possible solutions are:

  1. drop Caffe2 from the breaking open source build(s) (currently just seems to be a single bazel build)
  2. remove the RNN operators from Caffe2

A third option of porting Caffe2 to RNNv8 is technically possible, but seems wasteful when considering e.g., #122527

CC @atalman @malfet

Versions

Current upstream

cc @malfet @seemethere @csarofeen @ptrblck @xwang233 @ezyang @gchanan @mikaylagawarecki

@eqy eqy added module: build Build system issues module: cudnn Related to torch.backends.cudnn, and CuDNN support caffe2 module: rnn Issues related to RNN support (LSTM, GRU, etc) topic: build topic: bc_breaking labels Apr 23, 2024
@malfet malfet added module: bc-breaking Related to a BC-breaking change and removed topic: bc_breaking labels May 20, 2024
@ezyang
Copy link
Contributor

ezyang commented May 21, 2024

Note that for FB internal, we would still be on the hook for somehow dealing with the Caffe2 bindings as they are still going to be built in fbcode and we will have the same problem when we try to upgrade cudnn internally. Also cc @r-barnes

@r-barnes
Copy link
Contributor

r-barnes commented May 24, 2024

@ezyang It's possible we can remove the RNN operators from caffe2 internally if no one is using them there.

@eqy If the blocking caffe2 code still exists on GitHub a PR removing the offending operators would be most useful for getting this resolved quickly. If it doesn't exist, then I'll figure out how to sort things out internally.

@drisspg drisspg closed this as completed Oct 24, 2024
@r-barnes
Copy link
Contributor

@drisspg - just wondering why you closed this?

@drisspg
Copy link
Contributor

drisspg commented Oct 28, 2024

I was having a discussion with Alban about all the open issues we have. Alban said that anything marked Caffe2 can be closed. If that is not the case feel free to re-open

@eqy
Copy link
Collaborator Author

eqy commented Oct 28, 2024

Somehow we upgraded to v9+ without there being problems so I'm fine with closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
caffe2 module: bc-breaking Related to a BC-breaking change module: build Build system issues module: cudnn Related to torch.backends.cudnn, and CuDNN support module: rnn Issues related to RNN support (LSTM, GRU, etc) topic: build
Projects
None yet
Development

No branches or pull requests

5 participants