Stars
10
results
for source starred repositories
written in Cuda
Clear filter
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashInfer: Kernel Library for LLM Serving
FSA/FST algorithms, differentiable, with PyTorch compatibility.
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
An example project showing how to build a pip-installable Python package that invokes custom CUDA/C++ code