Lists (14)
Sort Name ascending (A-Z)
- All languages
- Assembly
- Astro
- Batchfile
- C
- C#
- C++
- CSS
- Clojure
- Cuda
- Dart
- Dockerfile
- GCC Machine Description
- Go
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lean
- Lua
- MATLAB
- MLIR
- Makefile
- Markdown
- Mathematica
- Mojo
- MoonBit
- Nim
- OCaml
- PHP
- Perl
- PostScript
- PowerShell
- Python
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- Sass
- Scala
- Shell
- SourcePawn
- Svelte
- SystemVerilog
- TeX
- TypeScript
- Typst
- VHDL
- Verilog
- Visual Basic .NET
- Vue
- ZenScript
Starred repositories
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine
Step-by-step optimization of CUDA SGEMM
CUDA Matrix Multiplication Optimization
A set of hands-on tutorials for CUDA programming
Official implementation of "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training"
Build CUDA Neural Network From Scratch
source code for TaiChi (A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU)