This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,208 178 Updated Jul 29, 2023

tuoxie007 / play_with_llvm

A book about LLVM & Clang（中文开源书：玩转 LLVM）

C++ 401 52 Updated Jul 3, 2020

AlibabaPAI / FLASHNN

Python 103 8 Updated Sep 9, 2024

ROCm / aotriton

Ahead of Time (AOT) Triton Math Library

Python 84 35 Updated Dec 12, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 17,914 2,465 Updated Dec 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JensenFire JensenFire

Achievements

Achievements

Block or report JensenFire

Stars

zhaochenyang20 / Awesome-ML-SYS-Tutorial

xlite-dev / LeetCUDA

deepseek-ai / DeepEP

zhuzilin / ring-flash-attention

feifeibear / long-context-attention

kvcache-ai / Mooncake

flagos-ai / FlagGems

Liu-xiandong / How_to_optimize_in_GPU

tuoxie007 / play_with_llvm

AlibabaPAI / FLASHNN

ROCm / aotriton

triton-lang / triton