:)
Highlights
- Pro
Stars
4
stars
written in Cuda
Clear filter
A massively parallel, optimal functional runtime in Rust
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Fast k nearest neighbor search using GPU
Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda