Skip to content
View erichocean's full-sized avatar
  • Xy Group Ltd
  • North Carolina

Organizations

@fohr

Block or report erichocean

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
15 stars written in Cuda
Clear filter

A massively parallel, optimal functional runtime in Rust

Cuda 11,178 426 Updated Nov 21, 2024

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,981 778 Updated Dec 8, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,308 606 Updated Dec 20, 2025

Tile primitives for speedy kernels

Cuda 3,008 217 Updated Dec 9, 2025

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,809 464 Updated Oct 9, 2023

CUDA Data Parallel Primitives Library

Cuda 437 96 Updated Nov 9, 2018

State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

Cuda 411 26 Updated Dec 14, 2024

NVIDIA-accelerated zero latency video compression library for interactive remoting applications

Cuda 392 93 Updated Jun 3, 2020

A CUDNN minimal deep learning training code sample using LeNet.

Cuda 268 93 Updated Jul 30, 2023

High-Performance SGEMM on CUDA devices

Cuda 113 5 Updated Jan 21, 2025

SCI-Solver_FEM is a C++/CUDA library written to solve an FEM linear system. It is designed to solve the FEM system quickly by using GPU hardware.

Cuda 97 30 Updated Feb 22, 2019

OptiX version of Pete Shirley's "Ray Tracing in One Weekend" (Final Chapter example only)

Cuda 88 4 Updated Sep 20, 2021

WIP for a k-d-tree implementation in CUDA

Cuda 35 4 Updated Mar 22, 2023

Highly-optimized spatially and temporally-blocked implementation of Diffusion 2D and 3D stencils for Intel FPGAs using OpenCL

Cuda 13 2 Updated Dec 25, 2023

GPU model checker

Cuda 11 2 Updated Apr 17, 2019