Skip to content
View CisMine's full-sized avatar

Block or report CisMine

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUB is a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming.

Cuda 1 Updated Mar 21, 2016

My study notes and hands-on projects for CUDA-based GPU programming

Cuda 9 Updated Dec 11, 2025

NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

Python 223 18 Updated Jun 10, 2024

CUDA Python: Performance meets Productivity

Cython 3,096 233 Updated Dec 19, 2025

Finetuning BLOOM on a single GPU using gradient-accumulation

Python 31 4 Updated Mar 29, 2023

This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog post.

Python 92 11 Updated Jul 14, 2023

A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/

Jupyter Notebook 1,106 249 Updated Jun 9, 2025

A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU par…

Shell 821 94 Updated Mar 29, 2025
Jupyter Notebook 5 3 Updated Dec 2, 2024

Some CUDA example code with READMEs.

Cuda 179 27 Updated Nov 11, 2025

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Python 1,135 148 Updated Oct 23, 2025

RAPIDS Deployment Documentation

Jupyter Notebook 14 31 Updated Dec 19, 2025

cuML - RAPIDS Machine Learning Library

C++ 5,062 609 Updated Dec 19, 2025

High-Performance SGEMM on CUDA devices

Cuda 113 5 Updated Jan 21, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 475 72 Updated Oct 24, 2025
Cuda 417 74 Updated Dec 18, 2025

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 1,001 178 Updated Dec 12, 2025

Accessing all private registers of a warp from main thread of warp.

Cuda 2 Updated Sep 30, 2024

Vietnamese CLIP using PhoBERT

Jupyter Notebook 5 1 Updated May 21, 2024

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

22,395 4,834 Updated Nov 17, 2025
Python 86 8 Updated Nov 11, 2025

Apply GPU in ML and DL

Jupyter Notebook 55 5 Updated Sep 18, 2025

NVIDIA Linux open GPU kernel module source

C 16,484 1,544 Updated Dec 18, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,984 1,587 Updated Dec 19, 2025
2 Updated Jun 25, 2024

My first pygame

HTML 2 Updated Jun 14, 2024
Python 2 Updated Jun 23, 2024

A CUDA tutorial to make people learn CUDA program from 0

Cuda 262 65 Updated Jul 9, 2024
Next