ardacoskunses

Arda Coskunses ardacoskunses

0 followers · 12 following

Achievements

Stars

9 stars written in Cuda

Clear filter

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,635 952 Updated Feb 5, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,166 817 Updated Feb 3, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,819 256 Updated Jan 31, 2026

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,237 179 Updated Jul 29, 2023

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 1,046 159 Updated Sep 2, 2025

pranjalssh / fast.cu

Fastest kernels written from scratch

Cuda 533 64 Updated Sep 18, 2025

rbaygildin / learn-gpgpu

Algorithms implemented in CUDA + resources about GPGPU

Cuda 62 15 Updated Jan 18, 2022

SzymonOzog / FastSoftmax

Step by step implementation of a fast softmax kernel in CUDA

Cuda 60 6 Updated Jan 6, 2025

gcoe-dresden / cuda-gpu-tlb

TLB Benchmarks

Cuda 35 10 Updated Sep 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arda Coskunses ardacoskunses

Achievements

Achievements

Block or report ardacoskunses

Stars

xlite-dev / LeetCUDA

deepseek-ai / DeepGEMM

BBuf / how-to-optim-algorithm-in-cuda

Liu-xiandong / How_to_optimize_in_GPU

siboehm / SGEMM_CUDA

pranjalssh / fast.cu

rbaygildin / learn-gpgpu

SzymonOzog / FastSoftmax

gcoe-dresden / cuda-gpu-tlb