quaternior

Jinhyeok Kim quaternior

Highlights

8 stars written in Cuda

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,318 823 Updated Oct 17, 2025

Sample codes for my CUDA programming book

Cuda 1,922 375 Updated Feb 15, 2025

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 222 22 Updated Sep 24, 2023

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

Cuda 120 7 Updated Jul 13, 2024

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Cuda 68 7 Updated Sep 8, 2024

Cuda 56 3 Updated Nov 14, 2024

Cuda 28 2 Updated Apr 2, 2025

codebase for Coruscant: Co-Designing GPU Kernel and Sparse Tensor Core to Advocate Unstructured Sparsity in Efficient LLM Inference

Cuda 3 Updated Oct 17, 2025