Skip to content
View tuanavu's full-sized avatar

Highlights

  • Pro

Block or report tuanavu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
9 stars written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 28,081 3,264 Updated Jun 26, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,324 825 Updated Nov 6, 2025

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,409 177 Updated Feb 24, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,183 172 Updated Jul 29, 2023

CUDA Kernel Benchmarking Library

Cuda 759 90 Updated Oct 21, 2025

Alex Krizhevsky's original code from Google Code

Cuda 198 32 Updated Mar 10, 2016

Some CUDA example code with READMEs.

Cuda 176 26 Updated Mar 2, 2025

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…

Cuda 175 30 Updated Nov 2, 2025

Build CUDA Neural Network From Scratch

Cuda 21 1 Updated Aug 28, 2024