Skip to content
View scatyf3's full-sized avatar
  • New York University
  • New York
  • 14:38 (UTC -05:00)

Block or report scatyf3

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

18 results for source starred repositories written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 28,460 3,338 Updated Jun 26, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,045 889 Updated Dec 24, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,996 778 Updated Dec 23, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,209 178 Updated Jul 29, 2023

Static suckless single batch CUDA-only qwen3-0.6B mini inference engine

Cuda 536 46 Updated Sep 8, 2025

Step-by-step optimization of CUDA SGEMM

Cuda 416 54 Updated Mar 30, 2022

一款便捷的抢占显卡脚本

Cuda 387 40 Updated Dec 15, 2025

CUDA Matrix Multiplication Optimization

Cuda 247 24 Updated Jul 19, 2024

A set of hands-on tutorials for CUDA programming

Cuda 243 35 Updated Apr 8, 2024

easy cuda code

Cuda 92 45 Updated Dec 24, 2024

🌈 Solutions of LeetGPU

Cuda 59 9 Updated Nov 12, 2025

Official implementation of "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training"

Cuda 43 9 Updated Mar 4, 2024

CUDA Embedding Lookup Kernel Library

Cuda 40 5 Updated Oct 21, 2025

some hpc project for learning

Cuda 26 4 Updated Aug 28, 2024

Build CUDA Neural Network From Scratch

Cuda 22 1 Updated Aug 28, 2024

source code for TaiChi (A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU)

Cuda 8 1 Updated Mar 20, 2023