- Nanjing
- All languages
- ANTLR
- ASL
- Assembly
- Batchfile
- BitBake
- Bluespec
- C
- C#
- C++
- CMake
- CSS
- Clojure
- Cuda
- Cython
- Dart
- Dockerfile
- Emacs Lisp
- GLSL
- Go
- HTML
- Java
- JavaScript
- Jupyter Notebook
- Koka
- LLVM
- Lua
- Makefile
- Markdown
- Mathematica
- Objective-C
- Objective-C++
- PHP
- Perl
- PostScript
- PowerShell
- Python
- Rich Text Format
- Roff
- Ruby
- Rust
- SAS
- Sass
- Scala
- Shell
- Starlark
- Swift
- SystemVerilog
- TSQL
- TeX
- TypeScript
- Verilog
- Vim Script
Starred repositories
Instant neural graphics primitives: lightning fast NeRF and more
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashInfer: Kernel Library for LLM Serving
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Source code that accompanies The CUDA Handbook.
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
Gallatin is a general-purpose memory manager for CUDA that allows for threads to quickly malloc and free memory of arbitrary size inside of kernels.
Professional CUDA C Programming
This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at…