ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities

This is the repository for the PPoPP'26 paper: ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities.

ROME is a GPU-optimized All-Pairs Shortest Path (APSP) system that regularizes irregular graph workloads via spatial computation restructuring and an asynchronous execution pipeline to achieve high hardware utilization.

Requirements

Hardware: NVIDIA GPU

We have tested our project on a server equipped with NVIDIA RTX 4090 (Ada Lovelace architecture, 24 GB).

OS & Software:

Ubuntu 20.04
CUDA >= 12.0
flex (2.6.4)
Bison (3.5.1)

Setup

# Clone Repo
git clone --recurse-submodules https://github.com/LyleLuo/ROME.git

# Build METIS dependency
cd modules/metis
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DUSE_THREAD=Y -Dintsize=64 ..
cmake --build .
cd ../../..

# Build Scotch dependency
sudo apt install -y flex bison
cd modules/scotch
mkdir -p build && cd build
cmake .. -DBUILD_FORTRAN=OFF -DBUILD_PTSCOTCH=OFF -DINTSIZE=64 -DSCOTCH_DETERMINISTIC=FULL
make -j4
cd ../../..

# Build ROME
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DUSE_THREAD=Y ..
cmake --build .

How to Run

Test Graphs

We support Matrix Market format (.mtx files). A sample graph delaunay_n13.mtx is provided in the mtx/ directory.

ROME

To run the experiments in the paper, please follow the instructions below:

OMP_NUM_THREADS=32 CUDA_VISIBLE_DEVICES=0 "$EXECUTABLE" "$MTX_PATH" ignore

For example:

OMP_NUM_THREADS=32 CUDA_VISIBLE_DEVICES=0 ./rome mtx/delaunay_n13.mtx ignore

Engines

ROME's core design includes the Merge Engine, Batch and Tile Engine, and Async Pipeline. These features can be manually disabled before building. In src/APSP/rome.cu, the following switches are available:

If NO_ASYNC is defined, the Pipeline will be disabled.
If BATCH_TILE_ENGINE or MERGE_ENGINE is set to false, those engines will be disabled.

//#define NO_ASYNC
const bool BATCH_TILE_ENGINE = true;
const bool MERGE_ENGINE = true;

Please refer to the paper for details on the functionality of these engines.

Paper

Please refer to this paper for more details.

@inproceedings{luo2026rome,
  title={ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities},
  author={Luo, Weile and Chen, Yuhan and Yu, Xiangrui and Wang, Qiang and Fan, Ruibo and Liu, Hongyuan and Chu, Xiaowen},
  booktitle={Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming},
  year={2026}
}

Related Code

SuperFW: Acknowledgment - We reused some code from this repository that is not related to ROME's core framework, and integrated its SuperFW implementation into ROME for correctness verification.
FastAPSP
cuGraph

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
include		include
modules		modules
mtx		mtx
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
build.sh		build.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities

Requirements

Hardware: NVIDIA GPU

OS & Software:

Setup

How to Run

Test Graphs

ROME

Engines

Paper

Related Code

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities

Requirements

Hardware: NVIDIA GPU

OS & Software:

Setup

How to Run

Test Graphs

ROME

Engines

Paper

Related Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages