hopkins516

Follow

🏠

Working from home

Hopkins hopkins516

🏠

Working from home

Follow

16 followers · 286 following

Nanjing

Starred repositories

24 stars written in Cuda

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,150 2,039 Updated Dec 14, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,990 877 Updated Dec 4, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,818 1,033 Updated Dec 5, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,985 777 Updated Dec 8, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,312 607 Updated Dec 20, 2025

rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,091 342 Updated Dec 19, 2025

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,376 337 Updated Nov 21, 2025

openai / blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,063 198 Updated Jun 8, 2023

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 843 145 Updated Sep 26, 2025

NVIDIA / cuopt

GPU accelerated decision optimization

Cuda 627 104 Updated Dec 19, 2025

baidu-research / baidu-allreduce

Cuda 600 112 Updated Apr 6, 2018

ArchaeaSoftware / cudahandbook

Source code that accompanies The CUDA Handbook.

Cuda 558 197 Updated Oct 7, 2025

Yinghan-Li / YHs_Sample

Yinghan's Code Sample

Cuda 360 61 Updated Jul 25, 2022

pyscf / gpu4pyscf

A plugin to use Nvidia GPU in PySCF package

Cuda 249 48 Updated Dec 20, 2025

KuangjuX / NVSHMEM-Tutorial

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 147 13 Updated Sep 18, 2025

sith-lab / gpuhammer

Cuda 73 9 Updated Aug 29, 2025

uuudown / Tartan

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite

Cuda 68 15 Updated Sep 12, 2018

saltsystemslab / gallatin

Gallatin is a general-purpose memory manager for CUDA that allows for threads to quickly malloc and free memory of arbitrary size inside of kernels.

Cuda 25 4 Updated Sep 19, 2025

mapengfei-nwpu / ProfessionalCUDACProgramming

Professional CUDA C Programming

Cuda 23 7 Updated Jul 13, 2020

NVIDIA / HMM_sample_code

CUDA 12.2 HMM demos

Cuda 20 8 Updated Jul 26, 2024

SecureArch / gpu_mem_attack

Cuda 19 4 Updated Oct 24, 2024

carsonpo / quadmul

a fast and customizable CUDA int4 tensor core gemm

Cuda 14 2 Updated Aug 2, 2024

rai-project / cupti_samples

CUPTI samples from NVIDIA

Cuda 6 2 Updated Jun 19, 2019

Awrsha / Advanced-CUDA-Programming-GPU-Architecture

This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at…

Cuda 4 1 Updated Nov 13, 2024

Starred topics

cpp-projects

learn-cpp

Virtual reality

Unreal Engine

Unity

Ubuntu

Terminal

Operating system

OpenGL

MongoDB

See all starred topics