Replies: 1 comment
-
|
I think I have found the underlying problem: ProblemWhen the TT (and I assume any other) decomposition function calls Possible solutionsI see two possible solutions for this problem which can be implemented in TensorLy.
In the figure below I show the difference between the memory allocation for the current svd computation (left), the torch.linalg.svd using the approximate SVD In this case, the solutions I listed above decrease the required memory to decompose the tensor by a factor of ~2 (!), or in absolute terms by 4-6 Gb, adjusting for base load. I would be interested in putting together a PR to migrate TensorLy (on PyTorch backend) to the [^1] Note: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
I am trying to track down a potential issue that is leading to OOM errors (CUDA) when decomposing large tensors on the PyTorch backend. For context, in my use case I use TensorLy to compute decompositions of noisy order-4 tensors with a total size of several GB (float32) on GPU. My tensors have relatively similar number of indices in each mode (~$\mathcal{T} \in \mathbb{R}^{(50 \times 50 \times 100 \times 100)}$ ). I mostly use the TT decomposition.
Question/Problem
I would like to better understand the difference between
svd = "randomized_svd"andsvd = "truncated_svd"in regard to memory allocation/usage when computing decompositions. I experimentally observe a large difference between the two options in terms of GPU memory allocation, as tracked by PyTorch. The figure below shows the allocated GPU memory over time.In the figure you can see the large difference between
"truncated_svd"(left) and"randomized_svd"(right) when computing the TT decomposition. You can see that with the truncated SVD ~5.5 GB are allocated over what I expected, roughly 8 times the size of the tensor being decomposed. In the example I use a rank of 5, 10, and 20, respectively for each of the peaks, decompositing the same tensor each time.The (largest) spikes are connected to the following calls (read from bottom to top):
Is this expected behavior when computing the SVDs? Why is the difference so large between the randomized and truncated versions? Naively, I would have expected the truncated SVD to use less memory, especially since the rank of the TT factors are so low. Unless this is expected behavior, the large memory allocation spikes are problematic because they make it difficult to compute the low-rank decompositions of large tensors for my downstream application using truncated SVDs. In my case I can just default to the randomized version, however, I am still curious about how to interpret this difference.
Thank you for the great package and for any help or insight you can offer on this.
Notes
Maybe related to #554 , however, the tensor_train interface does not expose an
init='random'option.Could also relate to #36.
Edit:
I see that randomized_svd calls truncated_svd under the hood, and I observe experimentally that, for slightly larger datasets, the same spikes in GPU memory allocation also happen with the randomized_svd.
Beta Was this translation helpful? Give feedback.
All reactions