This is the repository for the PPoPP'26 paper: ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities.
ROME is a GPU-optimized All-Pairs Shortest Path (APSP) system that regularizes irregular graph workloads via spatial computation restructuring and an asynchronous execution pipeline to achieve high hardware utilization.
We have tested our project on a server equipped with NVIDIA RTX 4090 (Ada Lovelace architecture, 24 GB).
- Ubuntu 20.04
- CUDA >= 12.0
- flex (2.6.4)
- Bison (3.5.1)
# Clone Repo
git clone --recurse-submodules https://github.com/LyleLuo/ROME.git
# Build METIS dependency
cd modules/metis
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DUSE_THREAD=Y -Dintsize=64 ..
cmake --build .
cd ../../..
# Build Scotch dependency
sudo apt install -y flex bison
cd modules/scotch
mkdir -p build && cd build
cmake .. -DBUILD_FORTRAN=OFF -DBUILD_PTSCOTCH=OFF -DINTSIZE=64 -DSCOTCH_DETERMINISTIC=FULL
make -j4
cd ../../..
# Build ROME
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DUSE_THREAD=Y ..
cmake --build .We support Matrix Market format (.mtx files). A sample graph delaunay_n13.mtx is provided in the mtx/ directory.
To run the experiments in the paper, please follow the instructions below:
OMP_NUM_THREADS=32 CUDA_VISIBLE_DEVICES=0 "$EXECUTABLE" "$MTX_PATH" ignoreFor example:
OMP_NUM_THREADS=32 CUDA_VISIBLE_DEVICES=0 ./rome mtx/delaunay_n13.mtx ignoreROME's core design includes the Merge Engine, Batch and Tile Engine, and Async Pipeline. These features can be manually disabled before building. In src/APSP/rome.cu, the following switches are available:
- If
NO_ASYNCis defined, the Pipeline will be disabled. - If
BATCH_TILE_ENGINEorMERGE_ENGINEis set tofalse, those engines will be disabled.
//#define NO_ASYNC
const bool BATCH_TILE_ENGINE = true;
const bool MERGE_ENGINE = true;Please refer to the paper for details on the functionality of these engines.
Please refer to this paper for more details.
@inproceedings{luo2026rome,
title={ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities},
author={Luo, Weile and Chen, Yuhan and Yu, Xiangrui and Wang, Qiang and Fan, Ruibo and Liu, Hongyuan and Chu, Xiaowen},
booktitle={Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming},
year={2026}
}