Lists (9)
Sort Name ascending (A-Z)
- All languages
- ANTLR
- ASL
- Assembly
- Astro
- Batchfile
- Bicep
- C
- C#
- C++
- CMake
- CSS
- Common Lisp
- Cuda
- Cypher
- Cython
- Dart
- Dockerfile
- GCC Machine Description
- Go
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Jinja
- Jsonnet
- Jupyter Notebook
- Kotlin
- Lua
- MDX
- MLIR
- Makefile
- Markdown
- MoonBit
- Nix
- Nushell
- Objective-C
- Open Policy Agent
- PDDL
- PHP
- PLpgSQL
- Pascal
- PowerShell
- Python
- R
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- Shell
- Smarty
- Svelte
- Swift
- SystemVerilog
- Tcl
- TeX
- TypeScript
- VBA
- Vim Script
- Vue
- Zig
Starred repositories
12
stars
written in Cuda
Clear filter
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashInfer: Kernel Library for LLM Serving
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine
Reference implementation of Megalodon 7B model
flash attention tutorial written in python, triton, cuda, cutlass
llama3.cuda is a pure C/CUDA implementation for Llama 3 model.