Skip to content
View cih-y2k's full-sized avatar

Block or report cih-y2k

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
14 stars written in Cuda
Clear filter

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,374 2,055 Updated Feb 2, 2026

A massively parallel, optimal functional runtime in Rust

Cuda 11,237 436 Updated Nov 21, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,829 1,093 Updated Apr 20, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,591 1,217 Updated Apr 29, 2026

Fast parallel CTC.

Cuda 4,073 1,033 Updated Mar 4, 2024

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,167 350 Updated Apr 30, 2026

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Cuda 1,332 233 Updated Apr 23, 2026

Fast CUDA matrix multiplication from scratch

Cuda 1,159 178 Updated Sep 2, 2025

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Cuda 1,107 179 Updated Apr 30, 2026

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA

Cuda 339 102 Updated Mar 12, 2026

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

Cuda 231 35 Updated Dec 10, 2025

Parallel CUDA implementation of NON maximum Suppression

Cuda 81 19 Updated Sep 19, 2020

This repository contains the CUDA implementation of the paper "Work-efficient Parallel Non-Maximum Suppression Kernels".

Cuda 15 5 Updated Aug 21, 2020

TenTrans High-Performance Inference Toolkit

Cuda 6 1 Updated Mar 24, 2023