SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
-
Updated
Jun 16, 2025 - C++
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs
A neat C++ custom Matrix class to perform super-fast GPU (or CPU) powered Matrix/Vector computations with minimal code, leveraging the power of cuBLAS where applicable.
Multiple GEMM operators are constructed with cutlass to support LLM inference.
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
C++ CUDA-compatible template class that provides an interface for generic purpose matrix related algorithms and computations. Includes Matlab-like functions. This is mainly an example of how to use CUDA code with C++. Don't expect such high performance.
HSD: Hierarchical Spherical Deformation for Cortical Surface Registration
CUDA Gemm Convolution implementation
VTensor, a C++ library, facilitates tensor manipulation on GPUs, emulating the python-numpy style for ease of use. It leverages RMM (RAPIDS Memory Manager) for efficient device memory management. It also supports xtensor for host memory operations.
A really old project that implemented the Stable Fluids using CUDA, cuBLAS and cuSPARSE
Matrix multiplication example performed with OpenMP, OpenACC, BLAS, cuBLABS, and CUDA
Framework, toolkit and ready-to-use applications for numerical linear algebra dependent machine learning algorithms.
Add a description, image, and links to the cublas topic page so that developers can more easily learn about it.
To associate your repository with the cublas topic, visit your repo's landing page and select "manage topics."