ardacoskunses

Arda Coskunses ardacoskunses

0 followers · 10 following

Achievements

Stars

9 stars written in Cuda

Clear filter

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,981 877 Updated Dec 4, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,977 778 Updated Dec 8, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,696 244 Updated Dec 6, 2025

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,207 177 Updated Jul 29, 2023

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 980 148 Updated Sep 2, 2025

pranjalssh / fast.cu

Fastest kernels written from scratch

Cuda 499 62 Updated Sep 18, 2025

rbaygildin / learn-gpgpu

Algorithms implemented in CUDA + resources about GPGPU

Cuda 62 15 Updated Jan 18, 2022

SzymonOzog / FastSoftmax

Step by step implementation of a fast softmax kernel in CUDA

Cuda 59 6 Updated Jan 6, 2025

gcoe-dresden / cuda-gpu-tlb

TLB Benchmarks

Cuda 35 10 Updated Sep 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arda Coskunses ardacoskunses

Achievements

Achievements

Block or report ardacoskunses

Stars

xlite-dev / LeetCUDA

deepseek-ai / DeepGEMM

BBuf / how-to-optim-algorithm-in-cuda

Liu-xiandong / How_to_optimize_in_GPU

siboehm / SGEMM_CUDA

pranjalssh / fast.cu

rbaygildin / learn-gpgpu

SzymonOzog / FastSoftmax

gcoe-dresden / cuda-gpu-tlb