Skip to content
View lishuai-97's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Beijing
  • 04:52 (UTC +08:00)

Block or report lishuai-97

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
24 results for source starred repositories written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 28,075 3,264 Updated Jun 26, 2025

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,030 2,017 Updated Oct 8, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,691 972 Updated Nov 5, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,318 823 Updated Oct 17, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,861 736 Updated Oct 15, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,018 558 Updated Nov 5, 2025

CUDA accelerated rasterization of gaussian splatting

Cuda 3,900 590 Updated Oct 2, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,622 257 Updated Oct 28, 2025

how to optimize some algorithm in cuda.

Cuda 2,596 235 Updated Oct 30, 2025

Sample codes for my CUDA programming book

Cuda 1,922 375 Updated Feb 15, 2025

Learn CUDA Programming, published by Packt

Cuda 1,210 262 Updated Dec 30, 2023

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 959 97 Updated Dec 30, 2024

CUDA 算子手撕与面试指南

Cuda 673 75 Updated Aug 23, 2025

UNet diffusion model in pure CUDA

Cuda 651 31 Updated Jun 28, 2024

CUDA Learning guide

Cuda 466 51 Updated Jun 20, 2024

Learnings and programs related to CUDA

Cuda 422 18 Updated Jun 29, 2025

A simple high performance CUDA GEMM implementation.

Cuda 414 42 Updated Jan 4, 2024

learning how CUDA works

Cuda 334 44 Updated Mar 3, 2025

A shift-window based transformer for 3D sparse tasks

Cuda 263 23 Updated Jun 25, 2023

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

Cuda 223 34 Updated Dec 14, 2024

Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)

Cuda 144 20 Updated Aug 18, 2020

Improved 3DGS rasterizer.

Cuda 123 5 Updated Feb 26, 2025