Skip to content

Pytorch 2.0.1 pypi wheel does not install dependent cuda libraries #100974

@Martin4R

Description

@Martin4R

🐛 Describe the bug

With torch 2.0.1 the torch pypi wheel does not depend on cuda libraries anymore. Therefore when starting torch on a GPU enabled machine, it complains ValueError: libnvrtc.so.*[0-9].*[0-9] not found in the system path (stacktrace see at the end below).

When I show the dependency trees for torch=2.0.1 and torch=2.0.0 with poetry (installed on the same machine with same dependency file as before) it becomes clear that torch 2.0.1 is missing the nvidia dependencies:

└── torch 2.0.1 
        ├── filelock * 
        ├── jinja2 * 
        │   └── markupsafe >=2.0 
        ├── networkx * 
        ├── sympy * 
        │   └── mpmath >=0.19 
        └── typing-extensions * 

└── torch 2.0.0 
        ├── filelock * 
        ├── jinja2 * 
        │   └── markupsafe >=2.0 
        ├── networkx * 
        ├── nvidia-cublas-cu11 11.10.3.66 
        │   ├── setuptools * 
        │   └── wheel * 
        ├── nvidia-cuda-cupti-cu11 11.7.101 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cuda-nvrtc-cu11 11.7.99 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cuda-runtime-cu11 11.7.99 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cudnn-cu11 8.5.0.96 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cufft-cu11 10.9.0.58 
        ├── nvidia-curand-cu11 10.2.10.91 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cusolver-cu11 11.4.0.1 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cusparse-cu11 11.7.4.91 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-nccl-cu11 2.14.3 
        ├── nvidia-nvtx-cu11 11.7.91 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── sympy * 
        │   └── mpmath >=0.19 
        ├── triton 2.0.0 
        │   ├── cmake * 
        │   ├── filelock * (circular dependency aborted here)
        │   ├── lit * 
        │   └── torch * (circular dependency aborted here)
        └── typing-extensions * 

Here the stacktrace of the error at runtime:

File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/easyocr/recognition.py", line 2, in <module>
    import torch
File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 228, in <module>
  _load_global_deps()
File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 189, in _load_global_deps
   _preload_cuda_deps(lib_folder, lib_name)
File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 154, in _preload_cuda_deps
raise ValueError(f"{lib_name} not found in the system path {sys.path}")
ValueError: libnvrtc.so.*[0-9].*[0-9] not found in the system path ['/home/ray', '/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard', '/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/ray/thirdparty_files', '/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/workers', '/home/ray/anaconda3/envs/myenv/lib/python3.10', '/home/ray/anaconda3/envs/myenv/lib/python3.10/lib-dynload', '/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages']

Versions

Version where the issue occurs is the pypi wheel of torch 2.0.1.

When trying to run python collect_env.py to collect the versions, two errors shows up:

"OSError: libcurand.so.10: cannot open shared object file: No such file or directory"
During handling of the above exception, another exception occurred:
"ValueError: libnvrtc.so.*[0-9].*[0-9] not found in the system path"

cc @ezyang @gchanan @zou3519 @seemethere @malfet

Metadata

Metadata

Assignees

Labels

module: binariesAnything related to official binaries that we release to usersmodule: regressionIt used to work, and now it doesn'tneeds designWe want to add this feature but we need to figure out how firsttriage review

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions