Skip to content
View aresbit's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Block or report aresbit

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

18 results for source starred repositories written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 28,418 3,333 Updated Jun 26, 2025

A massively parallel, optimal functional runtime in Rust

Cuda 11,177 426 Updated Nov 21, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,971 877 Updated Dec 4, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,814 1,032 Updated Dec 5, 2025

This package contains the original 2012 AlexNet code.

Cuda 2,791 360 Updated Mar 12, 2025

Learn CUDA Programming, published by Packt

Cuda 1,217 262 Updated Dec 30, 2023

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,206 177 Updated Jul 29, 2023

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 963 220 Updated Dec 18, 2025

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 921 337 Updated Aug 19, 2024

Source code that accompanies The CUDA Handbook.

Cuda 558 197 Updated Oct 7, 2025

Static suckless single batch CUDA-only qwen3-0.6B mini inference engine

Cuda 535 44 Updated Sep 8, 2025

CUDA kernel author's tools

Cuda 115 8 Updated Apr 24, 2022

High-Performance SGEMM on CUDA devices

Cuda 113 5 Updated Jan 21, 2025

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 104 6 Updated Jun 28, 2025

A curated set of C++ examples for optimization-based elastodynamic contact simulation using CUDA, emphasizing algorithmic convergence, penetration-free, and inversion-free conditions. Designed for …

Cuda 104 6 Updated Jun 29, 2025

Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"

Cuda 94 32 Updated Aug 14, 2023

Neural network from scratch in CUDA/C++

Cuda 87 19 Updated Sep 8, 2025