Skip to content
View Risc-lt's full-sized avatar

Highlights

  • Pro

Organizations

@Tech-JI

Block or report Risc-lt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
10 results for source starred repositories written in Cuda
Clear filter

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,920 876 Updated Dec 4, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,803 1,025 Updated Dec 5, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,953 774 Updated Dec 8, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,211 594 Updated Dec 15, 2025

how to optimize some algorithm in cuda.

Cuda 2,687 243 Updated Dec 6, 2025

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 840 145 Updated Sep 26, 2025

Fastest kernels written from scratch

Cuda 498 62 Updated Sep 18, 2025

GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.

Cuda 363 32 Updated Nov 19, 2025

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 331 30 Updated Jul 2, 2024

[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive

Cuda 50 2 Updated Dec 11, 2025