Highlights
Stars
A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU par…
NVIDIA curated collection of educational resources related to general purpose GPU programming.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Godot Engine – Multi-platform 2D and 3D game engine
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
High-performance C++ tensor library with NumPy/PyTorch-like API, SIMD vectorization, BLAS acceleration, and Metal GPU support.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Optimized primitives for collective multi-GPU communication
Claude Code for CUDA. Free AI assistant that actually understands GPU architecture
A tool for bandwidth measurements on NVIDIA GPUs.
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
SGLang is a high-performance serving framework for large language models and multimodal models.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.