cublas

Star

Here are 14 public repositories matching this topic...

cupy / cupy

Sponsor

Star

NumPy & SciPy for GPU

python gpu numpy cuda cublas scipy tensor cudnn rocm cupy cusolver nccl curand cusparse nvrtc cutensor nvtx cusparselt

Updated Jun 11, 2026
Python

lebedov / scikit-cuda

Star

Python interface to GPU-powered libraries

python gpu cuda cublas blas lapack numerical cufft pycuda cusolver

Updated Oct 15, 2023
Python

Cre4T3Tiv3 / jetson-orin-matmul-analysis

Sponsor

Star

CUDA matrix multiplication benchmarking on Jetson Orin Nano. Four implementations, three power modes, five matrix sizes. 99.5% mathematical validation. C++/CUDA and Python.

Updated Apr 2, 2026
Python

TApplencourt / mkl-verbose-toolkit

Star

Tools to run and parse MKL verbose mode

cublas mkl oneapi

Updated Jun 28, 2022
Python

gigernau / PCAHyperspectralClassifier

Star

Classification of Hyperspectral Images ( HSIs ) with Principal Component Analysis ( PCA ) in CUDA ( cuBLAS ).

machine-learning deep-learning cuda cublas pca classification principal-component-analysis hyperspectral-image-classification edge-computing hyperspectral-imaging jetson-nano

Updated Jan 17, 2024
Python

parallelArchitect / spark-gpu-throttle-check

Star

Enhanced GPU throttle diagnostic for DGX Spark (GB10): NVML direct telemetry, throttle cause decoder, PCIe link monitoring, baseline drift detection, timeline capture.

cuda cublas nvidia nvml pcie usb-pd gpu-monitoring power-delivery gb10 gpu-diagnostics dgx-spark throttle-detection clock-throttling

Updated Mar 22, 2026
Python

coderonion / cuda-beginner-course-python-version

Star

bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码

python rust cpp gpu cuda cublas nvidia cudnn nvcc cupy parallel-programming gpu-programming cuda-programming

Updated Mar 18, 2024
Python

ZrobMiloudaa / jetson-orin-matmul-analysis

Star

🔍 Analyze CUDA matrix multiplication performance and power consumption on NVIDIA Jetson Orin Nano across multiple implementations and settings.

machine-learning robotics cuda cublas matrix-multiplication high-performance-computing gpu-computing performance-optimization autonomous-systems edge-computing nvidia-jetson embeded-systems tensor-cores ml-deployment jetson-orin-nano gpu-benchmarking power-efficiency-benchmark cuda-optimization

Updated Jun 14, 2026
Python

High-performance CUDA implementation of Muon optimizer for LLM training. Features Newton-Schulz polar decomposition, cuBLAS acceleration, and transpose optimization for 8x FLOP savings on transformer FFN layers. Benchmarked on NVIDIA A100 with Llama 3.1 8B architectures (4096×11008 weights).

neural-network cublas mnist cuda-kernels gpu-optimization optimizers muon-optimizer newton-schulz

Updated Dec 21, 2025
Python

trnsci / trnblas

Star

BLAS Levels 1–3 for AWS Trainium via NKI (cuBLAS-equivalent) — GEMM with stationary-tile reuse, batched GEMM, TRSM, validated DF-MP2 for quantum chemistry.

python linear-algebra pytorch cublas scientific-computing matrix-multiplication quantum-chemistry blas gemm nki aws-neuron aws-trainium

Updated Apr 28, 2026
Python

dendisuhubdy / cupy

Star

NumPy-like API accelerated with CUDA

numpy cuda cublas cudnn cusolver

Updated Dec 9, 2019
Python

amacharla15 / gpu-profiling-cuda-kernels

Star

GPU profiling suite & CUDA kernels on A100 80GB — ResNet-50 benchmarks, Nsight Systems profiling, tiled matrix multiplication with shared memory

cuda pytorch cublas profiling resnet-50 gpu-programming a100

Updated Mar 17, 2026
Python

miraliahmadli / YoloV2-C

Star

python c cuda avx cublas openblas cnn-inference-engine

Updated Jun 5, 2020
Python

parallelArchitect / nvidia-gpu-val

Star

NVIDIA GPU validation: PCIe transport, Unified Memory prefetch, SGEMM compute, drift detection.

Updated Feb 25, 2026
Python

Improve this page

Add a description, image, and links to the cublas topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cublas topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cublas

Here are 14 public repositories matching this topic...

cupy / cupy

lebedov / scikit-cuda

Cre4T3Tiv3 / jetson-orin-matmul-analysis

TApplencourt / mkl-verbose-toolkit

gigernau / PCAHyperspectralClassifier

parallelArchitect / spark-gpu-throttle-check

coderonion / cuda-beginner-course-python-version

ZrobMiloudaa / jetson-orin-matmul-analysis

HyperKuvid-Labs / muon_exps

trnsci / trnblas

dendisuhubdy / cupy

amacharla15 / gpu-profiling-cuda-kernels

miraliahmadli / YoloV2-C

parallelArchitect / nvidia-gpu-val

Improve this page

Add this topic to your repo