Skip to content
View quaternior's full-sized avatar

Highlights

  • Pro

Organizations

@AIDASLab

Block or report quaternior

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
8 stars written in Cuda
Clear filter

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,318 823 Updated Oct 17, 2025

Sample codes for my CUDA programming book

Cuda 1,922 375 Updated Feb 15, 2025

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 222 22 Updated Sep 24, 2023

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

Cuda 120 7 Updated Jul 13, 2024

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Cuda 68 7 Updated Sep 8, 2024
Cuda 28 2 Updated Apr 2, 2025

codebase for Coruscant: Co-Designing GPU Kernel and Sparse Tensor Core to Advocate Unstructured Sparsity in Efficient LLM Inference

Cuda 3 Updated Oct 17, 2025