hopkins516

Follow

🏠

Working from home

Hopkins hopkins516

🏠

Working from home

Follow

17 followers · 318 following

Nanjing

Starred repositories

27 stars written in Cuda

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,255 2,050 Updated Feb 2, 2026

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,617 949 Updated Feb 5, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,967 1,090 Updated Feb 5, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,162 815 Updated Feb 3, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,124 236 Updated Feb 5, 2026

rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,120 344 Updated Feb 7, 2026

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,426 346 Updated Feb 5, 2026

openai / blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,063 198 Updated Jun 8, 2023

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 864 147 Updated Sep 26, 2025

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 809 102 Updated Feb 5, 2026

NVIDIA / cuopt

GPU accelerated decision optimization

Cuda 697 122 Updated Feb 8, 2026

baidu-research / baidu-allreduce

Cuda 600 112 Updated Apr 6, 2018

ArchaeaSoftware / cudahandbook

Source code that accompanies The CUDA Handbook.

Cuda 566 197 Updated Oct 7, 2025

Yinghan-Li / YHs_Sample

Yinghan's Code Sample

Cuda 365 62 Updated Jul 25, 2022

pyscf / gpu4pyscf

A plugin to use Nvidia GPU in PySCF package

Cuda 267 51 Updated Feb 6, 2026

KuangjuX / NVSHMEM-Tutorial

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 162 14 Updated Sep 18, 2025

sith-lab / gpuhammer

Cuda 77 12 Updated Aug 29, 2025

uuudown / Tartan

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite

Cuda 68 15 Updated Sep 12, 2018

mapengfei-nwpu / ProfessionalCUDACProgramming

Professional CUDA C Programming

Cuda 30 8 Updated Jul 13, 2020

saltsystemslab / gallatin

Gallatin is a general-purpose memory manager for CUDA that allows for threads to quickly malloc and free memory of arbitrary size inside of kernels.

Cuda 25 4 Updated Feb 4, 2026

NVIDIA / HMM_sample_code

CUDA 12.2 HMM demos

Cuda 20 8 Updated Jul 26, 2024

SecureArch / gpu_mem_attack

Cuda 19 4 Updated Oct 24, 2024

carsonpo / quadmul

a fast and customizable CUDA int4 tensor core gemm

Cuda 15 3 Updated Aug 2, 2024

HydraQYH / cutlass_cute_experiments

Cuda 9 4 Updated Jun 18, 2024

rai-project / cupti_samples

CUPTI samples from NVIDIA

Cuda 6 2 Updated Jun 19, 2019

Awrsha / Advanced-CUDA-Programming-GPU-Architecture

This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at…

Cuda 5 1 Updated Nov 13, 2024

HydraQYH / GPU-Cache-Operator

Cuda 3 Updated Nov 4, 2024

Starred topics

cpp-projects

learn-cpp

Virtual reality

Unreal Engine

Unity

Ubuntu

Terminal

Operating system

OpenGL

MongoDB

See all starred topics