Erik worstpractice

🌈

https://www.youtube.com/watch?v=TTaKdPS-qSE

79 followers · 931 following

Sweden

Achievements

Starred repositories

6 stars written in Cuda

Clear filter

HigherOrderCO / HVM

A massively parallel, optimal functional runtime in Rust

Cuda 11,178 427 Updated Nov 21, 2024

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,984 777 Updated Dec 8, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,874 289 Updated Dec 11, 2025

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 1,953 378 Updated Dec 14, 2025

thu-ml / SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 839 71 Updated Dec 17, 2025

sdbds / SageAttention-for-windows

Forked from thu-ml/SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 94 7 Updated Nov 25, 2025