Skip to content
View erichocean's full-sized avatar
  • Xy Group Ltd
  • North Carolina

Organizations

@fohr

Block or report erichocean

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
15 results for source starred repositories written in Cuda
Clear filter

A massively parallel, optimal functional runtime in Rust

Cuda 11,238 436 Updated Nov 21, 2024

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,149 955 Updated Apr 24, 2026

Tile primitives for speedy kernels

Cuda 3,331 276 Updated Apr 29, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,235 201 Updated Apr 30, 2026

State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

Cuda 454 30 Updated Dec 14, 2024

CUDA Data Parallel Primitives Library

Cuda 438 97 Updated Nov 9, 2018

NVIDIA-accelerated zero latency video compression library for interactive remoting applications

Cuda 394 93 Updated Jun 3, 2020

A CUDNN minimal deep learning training code sample using LeNet.

Cuda 268 93 Updated Jul 30, 2023

High-Performance FP32 GEMM on CUDA devices

Cuda 122 9 Updated Jan 21, 2025

SCI-Solver_FEM is a C++/CUDA library written to solve an FEM linear system. It is designed to solve the FEM system quickly by using GPU hardware.

Cuda 97 30 Updated Feb 22, 2019

OptiX version of Pete Shirley's "Ray Tracing in One Weekend" (Final Chapter example only)

Cuda 90 4 Updated Sep 20, 2021

High-Performance GPU Cuckoo Filter

Cuda 38 1 Updated Apr 29, 2026

WIP for a k-d-tree implementation in CUDA

Cuda 35 4 Updated Mar 22, 2023

Highly-optimized spatially and temporally-blocked implementation of Diffusion 2D and 3D stencils for Intel FPGAs using OpenCL

Cuda 13 2 Updated Dec 25, 2023

GPU model checker

Cuda 13 2 Updated Apr 17, 2019