cih-y2k

CIH cih-y2k

40 followers · 35 following

Stars

17 stars written in Cuda

Clear filter

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 28,090 3,266 Updated Jun 26, 2025

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,036 2,018 Updated Oct 8, 2025

HigherOrderCO / HVM

A massively parallel, optimal functional runtime in Rust

Cuda 11,149 428 Updated Nov 21, 2024

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,696 973 Updated Nov 6, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,332 826 Updated Nov 6, 2025

baidu-research / warp-ctc

Fast parallel CTC.

Cuda 4,073 1,036 Updated Mar 4, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,021 558 Updated Nov 6, 2025

nerfstudio-project / gsplat

CUDA accelerated rasterization of gaussian splatting

Cuda 3,904 591 Updated Oct 2, 2025

rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,078 338 Updated Nov 6, 2025

k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Cuda 1,277 230 Updated Nov 4, 2025

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 961 97 Updated Dec 30, 2024

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 930 138 Updated Sep 2, 2025

gpufit / Gpufit

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA

Cuda 333 99 Updated Nov 3, 2025

BlinkDL / RWKV-CUDA

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

Cuda 223 34 Updated Dec 14, 2024

jeetkanjani7 / Parallel_NMS

Parallel CUDA implementation of NON maximum Suppression

Cuda 80 19 Updated Sep 19, 2020

hertasecurity / gpu-nms

This repository contains the CUDA implementation of the paper "Work-efficient Parallel Non-Maximum Suppression Kernels".

Cuda 15 5 Updated Aug 21, 2020

TenTrans / TenTrans-Decoding

TenTrans High-Performance Inference Toolkit

Cuda 6 1 Updated Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly