Skip to content
View mnicely's full-sized avatar

Highlights

  • Pro

Block or report mnicely

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashInfer: Kernel Library for LLM Serving

Python 5,307 866 Updated Apr 6, 2026

NCCL communication API layer, and transport layer created from first principles.

C++ 16 Updated Aug 20, 2025

NCCL Tests

Cuda 1,480 362 Updated Mar 11, 2026

A Quirky Assortment of CuTe Kernels

Python 897 104 Updated Apr 5, 2026

Optimized primitives for collective multi-GPU communication

C++ 4,589 1,195 Updated Apr 4, 2026

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

Python 705 146 Updated Apr 3, 2026

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

Jupyter Notebook 145 13 Updated Feb 11, 2026

TRaSH-Guides is a comprehensive collection of guides for Radarr, Sonarr, and related media management applications.

Markdown 2,889 301 Updated Apr 5, 2026

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,734 324 Updated Oct 19, 2024

The official PyTorch implementation of the paper "Human Motion Diffusion Model"

Python 3,942 450 Updated Oct 1, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,534 1,770 Updated Apr 2, 2026

RTX compute samples

C++ 70 13 Updated Jun 17, 2023

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,827 463 Updated Oct 9, 2023