Skip to content
View honggui's full-sized avatar

Block or report honggui

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

34 stars written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 29,285 3,454 Updated Jun 26, 2025

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,343 2,061 Updated Feb 2, 2026

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Cuda 4,469 926 Updated Aug 30, 2024

Fast parallel CTC.

Cuda 4,074 1,033 Updated Mar 4, 2024

Squeeze-and-Excitation Networks

Cuda 3,616 852 Updated Feb 25, 2019

Tile primitives for speedy kernels

Cuda 3,280 268 Updated Mar 28, 2026

how to optimize some algorithm in cuda.

Cuda 2,893 266 Updated Mar 24, 2026

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,151 347 Updated Mar 26, 2026

Sample codes for my CUDA programming book

Cuda 2,028 384 Updated Dec 14, 2025

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,825 464 Updated Oct 9, 2023

Fully Convolutional Instance-aware Semantic Segmentation

Cuda 1,565 408 Updated Sep 27, 2021

NCCL Tests

Cuda 1,471 360 Updated Mar 11, 2026

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,450 187 Updated Feb 24, 2025

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 878 149 Updated Sep 26, 2025

Reference implementation of real-time autoregressive wavenet inference

Cuda 747 125 Updated Jan 19, 2021

Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

Cuda 725 227 Updated Jan 25, 2018

Source code that accompanies The CUDA Handbook.

Cuda 570 198 Updated Mar 10, 2026

Integral Human Pose Regression

Cuda 486 76 Updated Apr 4, 2019

A UNIVERSAL MUSIC TRANSLATION NETWORK - a method for translating music across musical instruments and styles.

Cuda 465 71 Updated Aug 15, 2021

NVIDIA-accelerated zero latency video compression library for interactive remoting applications

Cuda 394 93 Updated Jun 3, 2020

GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.

Cuda 380 33 Updated Mar 18, 2026

A CUDA backend for Torch7

Cuda 339 207 Updated Sep 11, 2017

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA

Cuda 337 102 Updated Mar 12, 2026

Facebook's CUDA extensions.

Cuda 284 57 Updated Mar 27, 2019

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 276 23 Updated Jul 16, 2025

CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups

Cuda 237 75 Updated Feb 27, 2025
Cuda 214 173 Updated Aug 27, 2019

Fast CUDA Kernels for ResNet Inference.

Cuda 182 47 Updated May 26, 2019

Improved 3DGS rasterizer.

Cuda 128 5 Updated Feb 26, 2025
Next