Skip to content
View andrewssobral's full-sized avatar
🔴
I may be very slow to respond.
🔴
I may be very slow to respond.

Organizations

@ow2-proactive

Block or report andrewssobral

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

24 results for source starred repositories written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 28,075 3,264 Updated Jun 26, 2025

Fast parallel CTC.

Cuda 4,073 1,036 Updated Mar 4, 2024

Squeeze-and-Excitation Networks

Cuda 3,580 851 Updated Feb 25, 2019

This package contains the original 2012 AlexNet code.

Cuda 2,762 356 Updated Mar 12, 2025

how to optimize some algorithm in cuda.

Cuda 2,596 235 Updated Oct 30, 2025

Sample codes for my CUDA programming book

Cuda 1,922 375 Updated Feb 15, 2025

MatConvNet: CNNs for MATLAB

Cuda 1,431 748 Updated Dec 21, 2021

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 950 217 Updated Nov 5, 2025

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 823 143 Updated Sep 26, 2025

Automatically exported from code.google.com/p/cuda-convnet2

Cuda 812 294 Updated Dec 3, 2015

UNet diffusion model in pure CUDA

Cuda 651 31 Updated Jun 28, 2024

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 635 133 Updated Oct 20, 2025

A GPU implementation of Convolutional Neural Nets in C++

Cuda 504 229 Updated Oct 1, 2020

CUDA Learning guide

Cuda 466 51 Updated Jun 20, 2024

Unsupervised Learning of Video Representations using LSTMs

Cuda 362 112 Updated Mar 6, 2018

llama3.cuda is a pure C/CUDA implementation for Llama 3 model.

Cuda 344 25 Updated Apr 27, 2025

Alex Krizhevsky's original code from Google Code

Cuda 198 32 Updated Mar 10, 2016

CUDA Matrix Factorization Library with Alternating Least Square (ALS)

Cuda 180 46 Updated Aug 14, 2018

Some CUDA example code with READMEs.

Cuda 176 26 Updated Mar 2, 2025

Wang Yi's GPT solution

Cuda 142 7 Updated Dec 17, 2023
Cuda 87 17 Updated Jun 9, 2025

Deep Object Co-Segmentation

Cuda 56 14 Updated Aug 13, 2020

CUDA extension for the SPORCO project

Cuda 18 6 Updated Jul 5, 2021

This project optimizes multi-GPU parallelism for machine learning training by accelerating multi-GPU using fused gradient buffers, NCCL AllReduce, and CUDA C kernel-level optimizations including me…

Cuda 8 Updated May 13, 2025