jiangsy

Shengyi Jiang jiangsy

86 followers · 466 following

Achievements

Stars

12 stars written in Cuda

Clear filter

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,374 2,055 Updated Feb 2, 2026

HigherOrderCO / HVM2

A massively parallel, optimal functional runtime in Rust

Cuda 11,237 436 Updated Nov 21, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,331 276 Updated Apr 29, 2026

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,235 201 Updated Apr 30, 2026

PacktPublishing / Learn-CUDA-Programming

Learn CUDA Programming, published by Packt

Cuda 1,243 260 Updated Dec 30, 2023

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Cuda 1,107 179 Updated Apr 30, 2026

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 883 149 Updated Sep 26, 2025

vincentfpgarcia / kNN-CUDA

Fast k nearest neighbor search using GPU

Cuda 546 111 Updated Aug 6, 2018

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 415 52 Updated Jan 2, 2025

vchoutas / torch-mesh-isect

Cuda 345 86 Updated Oct 5, 2022

KemengHuang / GPU_IPC

This is the first fully GPU Optimized IPC framework

Cuda 135 18 Updated Mar 20, 2026

smoorjani / matrix-multiplication

Custom SpMM operations integrated into PyTorch

Cuda 11 Updated Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shengyi Jiang jiangsy

Achievements

Achievements

Block or report jiangsy

Stars

NVlabs / instant-ngp

HigherOrderCO / HVM2

HazyResearch / ThunderKittens

mirage-project / mirage

PacktPublishing / Learn-CUDA-Programming

alibaba / rtp-llm

NVIDIA / multi-gpu-programming-models

vincentfpgarcia / kNN-CUDA

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

vchoutas / torch-mesh-isect

KemengHuang / GPU_IPC

smoorjani / matrix-multiplication