gemm
Here are 93 public repositories matching this topic...
My attempt of making a GEMM kernel...
-
Updated
Apr 21, 2025 - Cuda
The fastest Tropical number matrix multiplication on GPU
-
Updated
Aug 23, 2025 - Julia
Development of deep learning inference code by OpenCL kerenl function.
-
Updated
Jun 1, 2022 - C++
My gemm optimization on RPi (ARM) achieved a 170x performance boost, showing speeds faster than Eigen and close to OpenBLAS.
-
Updated
Nov 17, 2024 - C++
Low Precision Arithmetic for Convolutional Neural Network Inference
-
Updated
Oct 29, 2017 - C++
Performance comparison of naive, AVX2-optimized, and cBLAS matrix multiplication implementations in C.
-
Updated
Nov 10, 2025 - C
Safety-hardened GEMM (matrix multiply) implementation achieving 169.8 GFLOPS on Intel i9-14900. Built for embedded systems and safety-critical applications where reliability matters as much as speed. 162× faster than naive, zero UB, fully validated.
-
Updated
Nov 21, 2025 - C
Repo for the SPOGA Accelerator - Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels
-
Updated
Apr 3, 2025 - Python
mixed-precision GEMM library
-
Updated
May 27, 2025 - C++
Improve this page
Add a description, image, and links to the gemm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gemm topic, visit your repo's landing page and select "manage topics."