PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity
-
Updated
May 11, 2026 - Cuda
PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity
PyTorch Memory Efficient Sparse Sparse Matrix Multiplication
Sparse Matrix Computations in CUDA
Code for Sparse Matrix and Vector multiplication. Parallelised using CUDA and MPI
CUDA sparse binary 2-D FFT with compact CSC input, Bluestein transforms, and cuFFT/SpFFT baselines.
CUDA SpMV kernels (scalar, warp-per-row, ELL) on NVIDIA A100 benchmarked against cuSPARSE on SuiteSparse matrices, plus AVX2 + cache-tiled CPU baselines on Intel Xeon Gold. Vector kernel reaches 98-110% of HBM2 peak, beating cuSPARSE by 24-56% on regular matrices.
University project on Sparse Matrix transposition with CUDA.
Reproducible Instruction Roofline analysis of cuSPARSE and Ginkgo SpMM on RTX 4090 using Nsight Compute metrics.
Sparse binary 2D FFT on CUDA/cuFFT with memory-footprint optimization, streaming tiles, Hermitian symmetry, and Nsight analysis.
Machine problems
Add a description, image, and links to the sparse-matrix topic page so that developers can more easily learn about it.
To associate your repository with the sparse-matrix topic, visit your repo's landing page and select "manage topics."