Code for Sparse Matrix and Vector multiplication. Parallelised using CUDA and MPI
-
Updated
May 8, 2017 - Cuda
Code for Sparse Matrix and Vector multiplication. Parallelised using CUDA and MPI
MPI + NCCL tests with GPU Direct RDMA.
Meta-Iterative Map-Reduce to perform Regression massively parallely on a cluster with MPI and CUDA for GPU and CPU-nodes support.
Faster Integration using parallel processing.
Distributed MPI based Heterogenous GPU Solver for Markov Decision Processes (MDP)
Parallel EWH
High performance neural networks - using NN on the fly with MPI/OpenMP/CUDA (alpha version)
Multiple implementations of distributed matrix product
A many-GPU-centric two phase flow simulation code implementing the Physalis method
This repo is to solve the all-pairs shortest path problem with CPU threads and then further accelerate the program with CUDA accompanied by Blocked Floyd-Warshall algorithm
use ncclSend ncclRecv realize ncclSendrecv ncclGather ncclScatter ncclAlltoall
This repository contains a framework with a GPU implementation of generalized convolution operators. The framework is designed for large image data sets and can run in a distributed system.
Add a description, image, and links to the mpi topic page so that developers can more easily learn about it.
To associate your repository with the mpi topic, visit your repo's landing page and select "manage topics."