LLM Infer, AI Infra, CUDA
-
Tsinghua University
- https://www.zhihu.com/people/mu-zi-zhi-6-28
- https://bruce-lee-ly.medium.com
Stars
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashMLA: Efficient Multi-head Latent Attention Kernels
CUDA Templates and Python DSLs for High-Performance Linear Algebra
SGLang is a high-performance serving framework for large language models and multimodal models.
Development repository for the Triton language and compiler
NVIDIA Linux open GPU kernel module source
Fast and memory-efficient exact attention