Stars
10
stars
written in Cuda
Clear filter
DeepEP: an efficient expert-parallel communication library
FlashInfer: Kernel Library for LLM Serving
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.
Record GPU memory accesses of a CUDA program and visualize the access pattern in a browser