cublas
Here are 10 public repositories matching this topic...
Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x 5 matrix sizes on Jetson Orin Nano. 1,282 GFLOPS peak, 90% performance @ 88% power (25W mode), 99.5% accuracy validation, edge AI deployment guide.
-
Updated
Oct 14, 2025 - Python
Classification of Hyperspectral Images ( HSIs ) with Principal Component Analysis ( PCA ) in CUDA ( cuBLAS ).
-
Updated
Jan 17, 2024 - Python
bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码
-
Updated
Mar 18, 2024 - Python
🔍 Analyze CUDA matrix multiplication performance and power consumption on NVIDIA Jetson Orin Nano across multiple implementations and settings.
-
Updated
Dec 18, 2025 - Python
High-performance CUDA implementation of Muon optimizer for LLM training. Features Newton-Schulz polar decomposition, cuBLAS acceleration, and transpose optimization for 8x FLOP savings on transformer FFN layers. Benchmarked on NVIDIA A100 with Llama 3.1 8B architectures (4096×11008 weights).
-
Updated
Dec 18, 2025 - Python
Improve this page
Add a description, image, and links to the cublas topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the cublas topic, visit your repo's landing page and select "manage topics."