Lists (4)
Sort Name ascending (A-Z)
Starred repositories
8
results
for source starred repositories
written in Cuda
Clear filter
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashInfer: Kernel Library for LLM Serving
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference