honggui

Follow

Honggui honggui

Follow

47 followers · 37 following

Shanghai

Achievements

Achievements

Starred repositories

34 stars written in Cuda

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 29,285 3,454 Updated Jun 26, 2025

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,343 2,061 Updated Feb 2, 2026

leoxiaobin / deep-high-resolution-net.pytorch

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Cuda 4,469 926 Updated Aug 30, 2024

baidu-research / warp-ctc

Fast parallel CTC.

Cuda 4,074 1,033 Updated Mar 4, 2024

hujie-frank / SENet

Squeeze-and-Excitation Networks

Cuda 3,616 852 Updated Feb 25, 2019

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,280 268 Updated Mar 28, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,893 266 Updated Mar 24, 2026

rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,151 347 Updated Mar 26, 2026

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 2,028 384 Updated Dec 14, 2025

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,825 464 Updated Oct 9, 2023

msracver / FCIS

Fully Convolutional Instance-aware Semantic Segmentation

Cuda 1,565 408 Updated Sep 27, 2021

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,471 360 Updated Mar 11, 2026

mit-han-lab / torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,450 187 Updated Feb 24, 2025

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 878 149 Updated Sep 26, 2025

NVIDIA / nv-wavenet

Reference implementation of real-time autoregressive wavenet inference

Cuda 747 125 Updated Jan 19, 2021

jzbontar / mc-cnn

Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

Cuda 725 227 Updated Jan 25, 2018

ArchaeaSoftware / cudahandbook

Source code that accompanies The CUDA Handbook.

Cuda 570 198 Updated Mar 10, 2026

JimmySuen / integral-human-pose

Integral Human Pose Regression

Cuda 486 76 Updated Apr 4, 2019

facebookresearch / music-translation

A UNIVERSAL MUSIC TRANSLATION NETWORK - a method for translating music across musical instruments and styles.

Cuda 465 71 Updated Aug 15, 2021

NVIDIA / NvPipe

NVIDIA-accelerated zero latency video compression library for interactive remoting applications

Cuda 394 93 Updated Jun 3, 2020

facebookresearch / dietgpu

GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.

Cuda 380 33 Updated Mar 18, 2026

torch / cutorch

A CUDA backend for Torch7

Cuda 339 207 Updated Sep 11, 2017

gpufit / Gpufit

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA

Cuda 337 102 Updated Mar 12, 2026

facebookarchive / fbcuda

Facebook's CUDA extensions.

Cuda 284 57 Updated Mar 27, 2019

usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 276 23 Updated Jul 16, 2025

NVlabs / CGBN

CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups

Cuda 237 75 Updated Feb 27, 2025

torch / cunn

Cuda 214 173 Updated Aug 27, 2019

xuqiantong / CUDA-Winograd

Fast CUDA Kernels for ResNet Inference.

Cuda 182 47 Updated May 26, 2019

dendenxu / diff-gaussian-rasterization

Improved 3DGS rasterizer.

Cuda 128 5 Updated Feb 26, 2025

spectral-compute / scale-examples

Cuda 67 2 Updated Jul 10, 2024

Starred topics

Linux