Highlights
- Pro
Stars
9
stars
written in C++
Clear filter
FlashMLA: Efficient Multi-head Latent Attention Kernels
CUDA Templates and Python DSLs for High-Performance Linear Algebra
❤️中国科学技术大学计算机学院课程资源备份,最新的请查看--->
Stanford computer networking lab, an elegant TCP/IP implementation
Optimized FP16/BF16 x FP4 GPU kernels for AMD GPUs