Skip to content
View rsohlot's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report rsohlot

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

10 stars written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 28,414 3,330 Updated Jun 26, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,811 1,029 Updated Dec 5, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,966 776 Updated Dec 8, 2025

CUDA accelerated rasterization of gaussian splatting

Cuda 4,150 639 Updated Nov 18, 2025

Tile primitives for speedy kernels

Cuda 3,002 216 Updated Dec 9, 2025

This package contains the original 2012 AlexNet code.

Cuda 2,790 360 Updated Mar 12, 2025

UNet diffusion model in pure CUDA

Cuda 656 31 Updated Jun 28, 2024

Reference implementation of Megalodon 7B model

Cuda 527 54 Updated May 17, 2025

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 229 22 Updated Sep 24, 2023

Some CUDA example code with READMEs.

Cuda 179 27 Updated Nov 11, 2025