Starred repositories
10
results
for source starred repositories
written in Cuda
Clear filter
A massively parallel, optimal functional runtime in Rust
FlashInfer: Kernel Library for LLM Serving
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Efficient GPU kernels for block-sparse matrix multiplication and convolution
cuVS - a library for vector search and clustering on the GPU
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu