Skip to content
View PWhiddy's full-sized avatar

Highlights

  • Pro

Organizations

@recursecenter @N-BodyShop @NVIDIAGameWorks @dirac-institute @B612-Asteroid-Institute @GraphicsProgramming @Computer-Graphics-And-Pretty-Pictures @shader-park @computerender

Block or report PWhiddy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
16 stars written in Cuda
Clear filter

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,133 2,037 Updated Dec 2, 2025

A massively parallel, optimal functional runtime in Rust

Cuda 11,175 425 Updated Nov 21, 2024

Code and data for paper "Deep Painterly Harmonization": https://arxiv.org/abs/1804.03189

Cuda 6,058 615 Updated Aug 2, 2021

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,808 464 Updated Oct 9, 2023

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Cuda 1,293 232 Updated Nov 19, 2025

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,061 198 Updated Jun 8, 2023

Fast CUDA matrix multiplication from scratch

Cuda 977 146 Updated Sep 2, 2025

State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

Cuda 408 25 Updated Dec 14, 2024

CUDA-accelerated Fully Homomorphic Encryption Library

Cuda 236 61 Updated Jul 7, 2021

A fast and highly scalable GPU dynamic memory allocator

Cuda 110 9 Updated Mar 11, 2015

A GPU algorithm for sparse matrix-matrix multiplication

Cuda 73 16 Updated Oct 1, 2020

CUDA implementation of parallel radix sort using Blelloch scan

Cuda 66 16 Updated Feb 29, 2024

Library of common noise functions for CUDA kernels

Cuda 41 7 Updated Aug 17, 2025

Efficient CUDA Stream Compaction Library

Cuda 35 6 Updated Jun 9, 2023

A simple library-less CUDA implementation of the OneSweep sorting algorithm.

Cuda 11 Updated Feb 26, 2024
Cuda 4 Updated Mar 24, 2025