FlashInfer: Kernel Library for LLM Serving
-
Updated
Oct 9, 2025 - Cuda
FlashInfer: Kernel Library for LLM Serving
cuGraph - RAPIDS Graph Analytics Library
GPU Accelerated t-SNE for CUDA with Python bindings
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
CUDA Kernel Benchmarking Library
Graphics Processing Units Molecular Dynamics
cuVS - a library for vector search and clustering on the GPU
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
PopSift is an implementation of the SIFT algorithm in CUDA.
GPU accelerated decision optimization
A simple GPU hash table implemented in CUDA using lock free techniques
SDK for GPU accelerated genome assembly and analysis
GPU-accelerated triangle mesh processing
Add a description, image, and links to the gpu topic page so that developers can more easily learn about it.
To associate your repository with the gpu topic, visit your repo's landing page and select "manage topics."