- Ho Chi Minh, Viet Nam
- https://viblo.asia/u/Giahuy
- https://medium.com/@giahuy04
- in/cismine
Stars
cfregly / cub
Forked from NVIDIA/cubCUB is a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming.
My study notes and hands-on projects for CUDA-based GPU programming
NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.
CUDA Python: Performance meets Productivity
Finetuning BLOOM on a single GPU using gradient-accumulation
This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog post.
A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/
A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU par…
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
collection of benchmarks to measure basic GPU capabilities
NVIDIA curated collection of educational resources related to general purpose GPU programming.
Accessing all private registers of a warp from main thread of warp.
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
NVIDIA Linux open GPU kernel module source
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A CUDA tutorial to make people learn CUDA program from 0