🏠
Working from home
-
22:31
(UTC -12:00)
Lists (1)
Sort Name ascending (A-Z)
Stars
4
results
for source starred repositories
written in Cuda
Clear filter
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Optimized Parallel Tiled Approach to perform 2D Convolution by taking advantage of the lower latency, higher bandwidth shared memory as well as global constant memory cached aggresively within GPU …
CUDA-based implementation of image convolution - normal, tiled and cuDNN versions.