Stars
7
stars
written in Cuda
Clear filter
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"
how to optimize some algorithm in cuda.
Distribution-Aware Coordinate Representation for Human Pose Estimation
InternLM / AdaptiveGEMM
Forked from deepseek-ai/DeepGEMMAdaptiveGEMM: FP8 GEMM with Adaptation to Various Lengths of Group M
InternLM / GroupedGEMM
Forked from fanshiqing/grouped_gemmPyTorch bindings for CUTLASS and CUBLAS Grouped GEMM, Permute and Unpermute.