cih-y2k

CIH cih-y2k

40 followers · 35 following

Stars

17 stars written in Cuda

Clear filter

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 28,121 3,283 Updated Jun 26, 2025

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,052 2,025 Updated Oct 8, 2025

HigherOrderCO / HVM

A massively parallel, optimal functional runtime in Rust

Cuda 11,151 427 Updated Nov 21, 2024

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,713 983 Updated Nov 6, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,380 829 Updated Nov 6, 2025

baidu-research / warp-ctc

Fast parallel CTC.

Cuda 4,073 1,037 Updated Mar 4, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,045 562 Updated Nov 10, 2025

nerfstudio-project / gsplat

CUDA accelerated rasterization of gaussian splatting

Cuda 3,923 596 Updated Oct 2, 2025

rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,080 338 Updated Nov 10, 2025

k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Cuda 1,279 231 Updated Nov 4, 2025

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 967 99 Updated Dec 30, 2024

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 935 139 Updated Sep 2, 2025

gpufit / Gpufit

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA

Cuda 333 99 Updated Nov 3, 2025

BlinkDL / RWKV-CUDA

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

Cuda 224 35 Updated Dec 14, 2024

jeetkanjani7 / Parallel_NMS

Parallel CUDA implementation of NON maximum Suppression

Cuda 80 19 Updated Sep 19, 2020

hertasecurity / gpu-nms

This repository contains the CUDA implementation of the paper "Work-efficient Parallel Non-Maximum Suppression Kernels".

Cuda 15 5 Updated Aug 21, 2020

TenTrans / TenTrans-Decoding

TenTrans High-Performance Inference Toolkit

Cuda 6 1 Updated Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly