Lists (2)
Sort Name ascending (A-Z)
Stars
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Introduction to Parallel Programming class code
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
A CUDNN minimal deep learning training code sample using LeNet.
Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
Implementation and analysis of five different GPU based SPMV algorithms in CUDA
CUDA Dynamic Memory Allocator for SOA Data Layout
Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch.
This repository contains the source code for our ACM SIGMOD '22 paper (Evaluating Multi-GPU Sorting with Modern Interconnects)
try newly released `cudaLaunchCooperativeKernelMultiDevice()` in CUDA C++