Stars
8
stars
written in Cuda
Clear filter
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Sample codes for my CUDA programming book
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
codebase for Coruscant: Co-Designing GPU Kernel and Sparse Tensor Core to Advocate Unstructured Sparsity in Efficient LLM Inference