Lists (3)
Sort Name ascending (A-Z)
Stars
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"
A GPU implementation of Convolutional Neural Nets in C++
Code for KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
flash attention tutorial written in python, triton, cuda, cutlass
Unsupervised Learning of Video Representations using LSTMs
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups
Code for Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference (CVPR2020)
zeyiwen / CUDA-GMM-MultiGPU
Forked from Corv/CUDA-GMM-MultiGPUCUDA implementation of data clustering using expectation maximization with a Gaussian mixture model. Supports multiple GPUs on a single node.