Lists (1)
Sort Name ascending (A-Z)
Stars
9
results
for source starred repositories
written in Cuda
Clear filter
DeepEP: an efficient expert-parallel communication library
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashInfer: Kernel Library for LLM Serving
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.