Skip to content
View nzw0301's full-sized avatar

Organizations

@apache @optuna

Block or report nzw0301

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
9 stars written in Cuda
Clear filter

Code and data for paper "Deep Painterly Harmonization": https://arxiv.org/abs/1804.03189

Cuda 6,053 613 Updated Aug 2, 2021

GPU Accelerated t-SNE for CUDA with Python bindings

Cuda 1,926 137 Updated Oct 2, 2024

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 994 228 Updated Apr 10, 2026

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 825 171 Updated Mar 10, 2026

Reference implementation of real-time autoregressive wavenet inference

Cuda 745 125 Updated Jan 19, 2021

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 337 31 Updated Jul 2, 2024

Facebook's CUDA extensions.

Cuda 284 57 Updated Mar 27, 2019

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 239 22 Updated Sep 24, 2023

Efficient reservoir sampling implementation for PyTorch

Cuda 106 5 Updated Sep 28, 2021