Skip to content
View Gui-Yom's full-sized avatar
🦀
🦀

Highlights

  • Pro

Organizations

@chapi-com @Mercuri-Inc

Block or report Gui-Yom

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

11 stars written in Cuda
Clear filter

A massively parallel, optimal functional runtime in Rust

Cuda 11,178 427 Updated Nov 21, 2024

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,984 777 Updated Dec 8, 2025

Tile primitives for speedy kernels

Cuda 3,008 217 Updated Dec 9, 2025

CUDA Kernel Benchmarking Library

Cuda 778 97 Updated Dec 10, 2025

Fastest kernels written from scratch

Cuda 500 62 Updated Sep 18, 2025

State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

Cuda 411 26 Updated Dec 14, 2024

Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without un…

Cuda 240 14 Updated Dec 18, 2025

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 104 6 Updated Jun 28, 2025
Cuda 41 13 Updated May 21, 2021

FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units (PPoPP'25)

Cuda 6 1 Updated Jan 9, 2025
Cuda 1 Updated Jul 30, 2023