Skip to content
View CisMine's full-sized avatar

Block or report CisMine

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,200 116 Updated Mar 19, 2026

My study notes and hands-on projects for CUDA-based GPU programming

Cuda 11 Updated Dec 11, 2025

NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

Python 230 18 Updated Jun 10, 2024

CUDA Python: Performance meets Productivity

Cython 3,216 270 Updated Apr 14, 2026

Finetuning BLOOM on a single GPU using gradient-accumulation

Python 31 4 Updated Mar 29, 2023

This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog post.

Python 92 11 Updated Jul 14, 2023

A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/

Jupyter Notebook 1,181 260 Updated Jun 9, 2025

A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU par…

Shell 902 107 Updated Mar 29, 2025
Jupyter Notebook 5 3 Updated Dec 31, 2025

Some CUDA example code with READMEs.

Cuda 181 27 Updated Nov 11, 2025

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Python 1,143 149 Updated Mar 12, 2026

RAPIDS Deployment Documentation

Jupyter Notebook 15 33 Updated Apr 9, 2026

cuML - RAPIDS Machine Learning Library

C++ 5,174 621 Updated Apr 13, 2026

High-Performance FP32 GEMM on CUDA devices

Cuda 119 9 Updated Jan 21, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 513 82 Updated Oct 24, 2025
Cuda 469 83 Updated Dec 18, 2025

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 1,466 256 Updated Apr 14, 2026

Accessing all private registers of a warp from main thread of warp.

Cuda 3 Updated Sep 30, 2024

Vietnamese CLIP using PhoBERT

Jupyter Notebook 5 2 Updated May 21, 2024

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

HTML 26,150 5,524 Updated Apr 8, 2026
Python 93 8 Updated Nov 11, 2025

Apply GPU in ML and DL

Jupyter Notebook 67 6 Updated Mar 23, 2026

NVIDIA Linux open GPU kernel module source

C 16,892 1,658 Updated Apr 3, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,569 1,784 Updated Apr 9, 2026
2 Updated Jun 25, 2024

My first pygame

HTML 2 Updated Jun 14, 2024
Python 2 Updated Jun 23, 2024

A CUDA tutorial to make people learn CUDA program from 0

Cuda 276 67 Updated Jul 9, 2024
Next