cublas
Here are 13 public repositories matching this topic...
CUDA matrix multiplication benchmarking on Jetson Orin Nano. Four implementations, three power modes, five matrix sizes. 99.5% mathematical validation. C++/CUDA and Python.
-
Updated
Apr 2, 2026 - Python
Classification of Hyperspectral Images ( HSIs ) with Principal Component Analysis ( PCA ) in CUDA ( cuBLAS ).
-
Updated
Jan 17, 2024 - Python
bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码
-
Updated
Mar 18, 2024 - Python
Enhanced GPU throttle diagnostic for DGX Spark (GB10): NVML direct telemetry, throttle cause decoder, PCIe link monitoring, baseline drift detection, timeline capture.
-
Updated
Mar 22, 2026 - Python
🔍 Analyze CUDA matrix multiplication performance and power consumption on NVIDIA Jetson Orin Nano across multiple implementations and settings.
-
Updated
Apr 3, 2026 - Python
High-performance CUDA implementation of Muon optimizer for LLM training. Features Newton-Schulz polar decomposition, cuBLAS acceleration, and transpose optimization for 8x FLOP savings on transformer FFN layers. Benchmarked on NVIDIA A100 with Llama 3.1 8B architectures (4096×11008 weights).
-
Updated
Dec 21, 2025 - Python
NVIDIA GPU validation: PCIe transport, Unified Memory prefetch, SGEMM compute, drift detection.
-
Updated
Feb 25, 2026 - Python
Improve this page
Add a description, image, and links to the cublas topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the cublas topic, visit your repo's landing page and select "manage topics."