This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,206 177 Updated Jul 29, 2023

rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 963 220 Updated Dec 18, 2025

olcf / cuda-training-series

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 921 336 Updated Aug 19, 2024

ArchaeaSoftware / cudahandbook

Source code that accompanies The CUDA Handbook.

Cuda 558 197 Updated Oct 7, 2025

yassa9 / qwen600

Static suckless single batch CUDA-only qwen3-0.6B mini inference engine

Cuda 534 44 Updated Sep 8, 2025

eyalroz / cuda-kat

CUDA kernel author's tools

Cuda 115 8 Updated Apr 24, 2022

salykova / sgemm.cu

High-Performance SGEMM on CUDA devices

Cuda 113 5 Updated Jan 21, 2025

microsoft / TileFusion

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 104 6 Updated Jun 28, 2025

phys-sim-book / solid-sim-tutorial-gpu

A curated set of C++ examples for optimization-based elastodynamic contact simulation using CUDA, emphasizing algorithmic convergence, penetration-free, and inversion-free conditions. Designed for …

Cuda 104 6 Updated Jun 29, 2025

CUDA-Tutorial / CodeSamples

Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"

Cuda 94 32 Updated Aug 14, 2023

BobMcDear / neural-network-cuda

Neural network from scratch in CUDA/C++

Cuda 87 19 Updated Sep 8, 2025

aresbit

Lists (8)

db

gpt

✨ Inspiration

leetcode

macp

mcp

pl

src

Starred repositories

CUDA

Kotlin

Deep learning

C

Algorithm

Awesome Lists

competitive-programming

Tensorflow