cublas

Star

Here are 94 public repositories matching this topic...

cupy / cupy

Sponsor

Star

NumPy & SciPy for GPU

python gpu numpy cuda cublas scipy tensor cudnn rocm cupy cusolver nccl curand cusparse nvrtc cutensor nvtx cusparselt

Updated Dec 18, 2025
Python

ZrobMiloudaa / jetson-orin-matmul-analysis

Star

🔍 Analyze CUDA matrix multiplication performance and power consumption on NVIDIA Jetson Orin Nano across multiple implementations and settings.

machine-learning robotics cuda cublas matrix-multiplication high-performance-computing gpu-computing performance-optimization autonomous-systems edge-computing nvidia-jetson embeded-systems tensor-cores ml-deployment jetson-orin-nano gpu-benchmarking power-efficiency-benchmark cuda-optimization

Updated Dec 18, 2025
Python

High-performance CUDA implementation of Muon optimizer for LLM training. Features Newton-Schulz polar decomposition, cuBLAS acceleration, and transpose optimization for 8x FLOP savings on transformer FFN layers. Benchmarked on NVIDIA A100 with Llama 3.1 8B architectures (4096×11008 weights).

neural-network cublas mnist cuda-kernels gpu-optimization optimizers muon-optimizer newton-schulz

Updated Dec 18, 2025
Python

chitono / StuCrs

Star

Rust製深層学習フレームワーク。Rustでゼロから実装して深層学習の原理を探究しよう！

rust framework ai neural-network cuda cublas scratch deeplearning deepneuralnetworks japanese-development

Updated Dec 18, 2025
Rust

deepreinforce-ai / CUDA-L2

Star

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

reinforcement-learning cublas nvidia matrix-multiplication cuda-kernels large-language-models

Updated Dec 15, 2025
Cuda

chelsea0x3b / cudarc

Sponsor

Star

Safe rust wrapper around CUDA toolkit

rust gpu cuda cublas gpu-acceleration cuda-kernels cudnn cuda-toolkit nccl curand cuda-programming nvrtc

Updated Dec 11, 2025
Rust

aditya2819 / CUDA-accelerated-linear-algebra-toolkit

Star

High-performance GPU-accelerated linear algebra library for scientific computing. Custom kernels outperform cuBLAS+cuSPARSE by 2.4x in iterative solvers. Built for circuit simulation workloads.

benchmarking hpc optimization linear-algebra parallel-computing cuda cublas scientific-computing matrix-multiplication gpu-acceleration cuda-kernels gpu-computing numerical-methods conjugate-gradient circuit-simulation sparse-matrices performance-optimization cusparse custom-kernels

Updated Dec 6, 2025
Cuda

neur1n / x.h

Star

Cross platform C/C++ utilities.

c cross-platform cpp logger logging cuda cublas

Updated Dec 4, 2025
C++

mnovak42 / leuven

Star

Framework, toolkit and ready-to-use applications for numerical linear algebra dependent machine learning algorithms.

machine-learning-algorithms cublas blas lapack cusolver ls-svm sparse-kernel-spectral-clustering

Updated Nov 11, 2025
C++

Cre4T3Tiv3 / jetson-orin-matmul-analysis

Sponsor

Star

Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x 5 matrix sizes on Jetson Orin Nano. 1,282 GFLOPS peak, 90% performance @ 88% power (25W mode), 99.5% accuracy validation, edge AI deployment guide.

Updated Oct 14, 2025
Python

Bruce-Lee-LY / cutlass_gemm

Star

Multiple GEMM operators are constructed with cutlass to support LLM inference.

gpu cublas nvidia cutlass gemm cublaslt llm matrix-multiply tensor-core

Updated Aug 3, 2025
C++

coderonion / awesome-cuda-and-hpc

Star

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

Updated Aug 2, 2025

kevmo314 / scuda

Star

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

networking gpu cuda cublas nvml remote-access cudnn mlops

Updated Jun 16, 2025
C++

dc-fukuoka / gpumm

Star

gpumm - matrix-matrix multiplication by using CUDA, cublas, cublasxt and OpenACC.

openmp cuda cublas high-performance-computing openacc cublasxt

Updated Jun 3, 2025
Cuda

eth-cscs / Tiled-MM

Star

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

amd gpu cuda cublas nvidia matrix-multiplication rocm cublasxt matmul rocblasxt rocblas

Updated Apr 2, 2025
C++

VORTICITY-INC / VTensor

Star

VTensor, a C++ library, facilitates tensor manipulation on GPUs, emulating the python-numpy style for ease of use. It leverages RMM (RAPIDS Memory Manager) for efficient device memory management. It also supports xtensor for host memory operations.

gpu numpy cuda cublas xarray tensor xtensor rmm cusolver curand

Updated Apr 1, 2025
C++

machineko / SwiftCUBLAS

Star

SwiftCUBLAS is a wrapper for cuBLAS APIs with extra utilities for ease of usage, along with a suite of tests. The repository is tested on the newest (v12.5) CUDA runtime API on both Linux and Windows.

swift cuda cublas matrix-multiplication gpu-acceleration gpu-computing

Updated Feb 22, 2025
Swift

enp1s0 / CULiP

Star

Library for profiling the execution time of CUDA official library functions

cuda cublas profiling

Updated Feb 15, 2025
Cuda

bokutotu / zenu

Star

A Deep Learning framework with very few dependencies, Written in Rust

rust deep-neural-networks ai deep-learning hpc cuda autograd cublas blas gpu-computing cudnn

Updated Feb 14, 2025
Rust

yester31 / CUDA_EX

Star

CUDA kernel functions

gpu cuda cublas matrix-multiplication cuda-kernels gemm cuda-programming bicubic-interpolation

Updated Dec 2, 2024
Cuda

Improve this page

Add a description, image, and links to the cublas topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cublas topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cublas

Here are 94 public repositories matching this topic...

cupy / cupy

ZrobMiloudaa / jetson-orin-matmul-analysis

HyperKuvid-Labs / muon_exps

chitono / StuCrs

deepreinforce-ai / CUDA-L2

chelsea0x3b / cudarc

aditya2819 / CUDA-accelerated-linear-algebra-toolkit

neur1n / x.h

mnovak42 / leuven

Cre4T3Tiv3 / jetson-orin-matmul-analysis

Bruce-Lee-LY / cutlass_gemm

coderonion / awesome-cuda-and-hpc

kevmo314 / scuda

dc-fukuoka / gpumm

eth-cscs / Tiled-MM

VORTICITY-INC / VTensor

machineko / SwiftCUBLAS

enp1s0 / CULiP

bokutotu / zenu

yester31 / CUDA_EX

Improve this page

Add this topic to your repo