Skip to content
View crcrpar's full-sized avatar
  • NVIDIA
  • Tokyo
  • 08:25 (UTC +09:00)

Block or report crcrpar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
18 stars written in Cuda
Clear filter

Code and data for paper "Deep Painterly Harmonization": https://arxiv.org/abs/1804.03189

Cuda 6,051 613 Updated Aug 2, 2021

Fast parallel CTC.

Cuda 4,073 1,033 Updated Mar 4, 2024

Squeeze-and-Excitation Networks

Cuda 3,629 850 Updated Feb 25, 2019

CUDA Library Samples

Cuda 2,384 457 Updated Apr 20, 2026

Fully Convolutional Instance-aware Semantic Segmentation

Cuda 1,564 407 Updated Sep 27, 2021

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,065 198 Updated Jun 8, 2023

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 1,002 231 Updated Apr 30, 2026

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 977 356 Updated Aug 19, 2024

CUDA Kernel Benchmarking Library

Cuda 858 105 Updated Apr 30, 2026

Reference implementation of real-time autoregressive wavenet inference

Cuda 745 125 Updated Jan 19, 2021
Cuda 640 108 Updated Apr 30, 2026

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 280 24 Updated Jul 16, 2025

Parrot is an array fusion GPU library built on NVIDIA's CCCL libaries (Thrust/CUB).

Cuda 272 18 Updated Apr 23, 2026

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 186 50 Updated Apr 8, 2026

WholeGraph - large scale Graph Neural Networks

Cuda 106 37 Updated Nov 25, 2024
Cuda 56 11 Updated Dec 10, 2025

Optimized Parallel Tiled Approach to perform Matrix Multiplication by taking advantage of the lower latency, higher bandwidth shared memory within GPU thread blocks.

Cuda 16 1 Updated Sep 24, 2017
Cuda 2 Updated Mar 2, 2026