🎯
Focusing
Interested in AI for system, efficient LLM training and serving!
-
Ph.D. Candidate@CUHK-MMLab, B.E.@ UCAS
- HongKong
- https://jf-d.github.io/
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
8
results
for source starred repositories
written in Cuda
Clear filter
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashInfer: Kernel Library for LLM Serving
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Step-by-step optimization of CUDA SGEMM
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity