V1ammer

🦀

Vlad Gerasov V1ammer

🦀

12 followers · 72 following

team-73

Achievements

Lists (1)

Sort

ferrous-systems

ferrous-systems repos

11 repositories

Starred repositories

9 stars written in Cuda

Clear filter

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,971 877 Updated Dec 4, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,696 244 Updated Dec 6, 2025

Tony-Tan / CUDA_Freshman

Cuda 2,639 500 Updated Jan 16, 2024

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,023 100 Updated Dec 30, 2024

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 459 50 Updated May 14, 2025

wangzyon / NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

Cuda 416 54 Updated Mar 30, 2022

xlite-dev / ffpa-attn

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

Cuda 241 12 Updated Nov 18, 2025

Enigmatisms / cuda-pt

Writing a CUDA software ray tracing renderer with Analysis-Driven Optimization from scratch: a python-importable, distributed parallel renderer.

Cuda 37 2 Updated Oct 5, 2025

rishisankar / flashattention2

Flash Attention 2 CUDA implementations

Cuda 8 1 Updated Apr 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vlad Gerasov V1ammer

Achievements

Achievements

Block or report V1ammer

Lists (1)

ferrous-systems

Starred repositories

xlite-dev / LeetCUDA

BBuf / how-to-optim-algorithm-in-cuda

Tony-Tan / CUDA_Freshman

tspeterkim / flash-attention-minimal

66RING / tiny-flash-attention

wangzyon / NVIDIA_SGEMM_PRACTICE

xlite-dev / ffpa-attn

Enigmatisms / cuda-pt

rishisankar / flashattention2

Starred topics

low-level-design

Terminal

IPFS

Linux

Rust