-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Closed
Labels
module: binariesAnything related to official binaries that we release to usersAnything related to official binaries that we release to usersmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'tneeds designWe want to add this feature but we need to figure out how firstWe want to add this feature but we need to figure out how firsttriage review
Milestone
Description
🐛 Describe the bug
With torch 2.0.1 the torch pypi wheel does not depend on cuda libraries anymore. Therefore when starting torch on a GPU enabled machine, it complains ValueError: libnvrtc.so.*[0-9].*[0-9] not found in the system path (stacktrace see at the end below).
When I show the dependency trees for torch=2.0.1 and torch=2.0.0 with poetry (installed on the same machine with same dependency file as before) it becomes clear that torch 2.0.1 is missing the nvidia dependencies:
└── torch 2.0.1
├── filelock *
├── jinja2 *
│ └── markupsafe >=2.0
├── networkx *
├── sympy *
│ └── mpmath >=0.19
└── typing-extensions *
└── torch 2.0.0
├── filelock *
├── jinja2 *
│ └── markupsafe >=2.0
├── networkx *
├── nvidia-cublas-cu11 11.10.3.66
│ ├── setuptools *
│ └── wheel *
├── nvidia-cuda-cupti-cu11 11.7.101
│ ├── setuptools * (circular dependency aborted here)
│ └── wheel * (circular dependency aborted here)
├── nvidia-cuda-nvrtc-cu11 11.7.99
│ ├── setuptools * (circular dependency aborted here)
│ └── wheel * (circular dependency aborted here)
├── nvidia-cuda-runtime-cu11 11.7.99
│ ├── setuptools * (circular dependency aborted here)
│ └── wheel * (circular dependency aborted here)
├── nvidia-cudnn-cu11 8.5.0.96
│ ├── setuptools * (circular dependency aborted here)
│ └── wheel * (circular dependency aborted here)
├── nvidia-cufft-cu11 10.9.0.58
├── nvidia-curand-cu11 10.2.10.91
│ ├── setuptools * (circular dependency aborted here)
│ └── wheel * (circular dependency aborted here)
├── nvidia-cusolver-cu11 11.4.0.1
│ ├── setuptools * (circular dependency aborted here)
│ └── wheel * (circular dependency aborted here)
├── nvidia-cusparse-cu11 11.7.4.91
│ ├── setuptools * (circular dependency aborted here)
│ └── wheel * (circular dependency aborted here)
├── nvidia-nccl-cu11 2.14.3
├── nvidia-nvtx-cu11 11.7.91
│ ├── setuptools * (circular dependency aborted here)
│ └── wheel * (circular dependency aborted here)
├── sympy *
│ └── mpmath >=0.19
├── triton 2.0.0
│ ├── cmake *
│ ├── filelock * (circular dependency aborted here)
│ ├── lit *
│ └── torch * (circular dependency aborted here)
└── typing-extensions *
Here the stacktrace of the error at runtime:
File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/easyocr/recognition.py", line 2, in <module>
import torch
File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 228, in <module>
_load_global_deps()
File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 189, in _load_global_deps
_preload_cuda_deps(lib_folder, lib_name)
File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 154, in _preload_cuda_deps
raise ValueError(f"{lib_name} not found in the system path {sys.path}")
ValueError: libnvrtc.so.*[0-9].*[0-9] not found in the system path ['/home/ray', '/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard', '/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/ray/thirdparty_files', '/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/workers', '/home/ray/anaconda3/envs/myenv/lib/python3.10', '/home/ray/anaconda3/envs/myenv/lib/python3.10/lib-dynload', '/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages']
Versions
Version where the issue occurs is the pypi wheel of torch 2.0.1.
When trying to run python collect_env.py to collect the versions, two errors shows up:
"OSError: libcurand.so.10: cannot open shared object file: No such file or directory"
During handling of the above exception, another exception occurred:
"ValueError: libnvrtc.so.*[0-9].*[0-9] not found in the system path"
twoertwein, durandtibo, nobu-g, lbasseto, strawberrypie and 82 more
Metadata
Metadata
Assignees
Labels
module: binariesAnything related to official binaries that we release to usersAnything related to official binaries that we release to usersmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'tneeds designWe want to add this feature but we need to figure out how firstWe want to add this feature but we need to figure out how firsttriage review