GitHub - arthureleven/axn: Native PGAS-Based GCN Training

 █████╗ ██╗  ██╗███╗   ██╗
██╔══██╗╚██╗██╔╝████╗  ██║
███████║ ╚███╔╝ ██╔██╗ ██║
██╔══██║ ██╔██╗ ██║╚██╗██║
██║  ██║██╔╝ ██╗██║ ╚████║
╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═══╝

Native PGAS-Based Graph Convolutional Network Training

AXN is a Graph Convolutional Network (GCN) training framework written natively in Chapel, leveraging its Partitioned Global Address Space (PGAS) model with GPU acceleration via Chapel's native CUDA backend.

It is the first framework to implement GCN training with full backpropagation natively in Chapel. The primary contribution is not raw speed, but the demonstration that Chapel's PGAS model provides a natural and productive abstraction for distributed graph learning.

Paper coming soon — arXiv · TMLR

Features

Forward and backward pass implemented natively in Chapel
GPU acceleration via Chapel's on here.gpus[0] locale model
Distributed forward pass via Chapel PGAS (coforall over locales)
Neighborhood sampler with Fisher-Yates k-hop sampling
Mini-batch training on ogbn-products (2.4M nodes, 126M edges)
Adam optimizer validated against PyTorch
Persistent GPU buffers — 1.82x speedup over per-step transfer
Full validation against PyG and PyTorch

Architecture

axn/
├── src/
│   ├── core/          # CSR, sparse subdomain
│   ├── layers/        # GCN forward/backward (CPU + GPU)
│   ├── data/          # binary loader, neighborhood sampler
│   ├── optim/         # Adam optimizer
│   ├── distributed/   # PGAS distributed forward
│   └── train.chpl     # training loop
├── benchmarks/        # Chapel + Python benchmark scripts
├── docs/              # setup, architecture, backprop derivation
├── experiments/       # results and analysis
├── scripts/           # data export and baseline utilities
└── tests/             # 14/14 passing

Full benchmark results and empirical evaluation in experiments/summary.md.

Setup

Requirements

Chapel (compiled with CHPL_LOCALE_MODEL=gpu, CHPL_GPU=nvidia, CHPL_GPU_ARCH=sm_75)
CUDA 12+
Python 3.10+ with PyTorch, PyG, OGB (for baselines)

Install Python dependencies

pip install -r requirements.txt

Run tests

cd axn && bash tests/run_all_tests.sh

Validation

All components validated against reference implementations:

Component	Max diff	Reference
Forward pass CPU	4.96e-05	PyG GCNConv
Forward pass GPU	2.98e-08	CPU forward
Backward pass	1.07e-06	PyG autograd
Adam optimizer	5.96e-08	PyTorch Adam
PGAS forward	4.96e-05	PyG GCNConv

Limitations

Chapel does not support real(16) — minimum float is real(32)
PGAS multi-locale benchmark requires CHPL_COMM=gasnet and cluster access
GPU kernels use generic forall — no cuSPARSE/cuBLAS bindings
CSR build time ~21s for 126M edges (sequential fill loop)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Architecture

Setup

Validation

Limitations

License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
benchmarks		benchmarks
docs		docs
experiments		experiments
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Features

Architecture

Setup

Validation

Limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages