█████╗ ██╗ ██╗███╗ ██╗
██╔══██╗╚██╗██╔╝████╗ ██║
███████║ ╚███╔╝ ██╔██╗ ██║
██╔══██║ ██╔██╗ ██║╚██╗██║
██║ ██║██╔╝ ██╗██║ ╚████║
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═══╝
Native PGAS-Based Graph Convolutional Network Training
AXN is a Graph Convolutional Network (GCN) training framework written natively in Chapel, leveraging its Partitioned Global Address Space (PGAS) model with GPU acceleration via Chapel's native CUDA backend.
It is the first framework to implement GCN training with full backpropagation natively in Chapel. The primary contribution is not raw speed, but the demonstration that Chapel's PGAS model provides a natural and productive abstraction for distributed graph learning.
- Forward and backward pass implemented natively in Chapel
- GPU acceleration via Chapel's
on here.gpus[0]locale model - Distributed forward pass via Chapel PGAS (
coforallover locales) - Neighborhood sampler with Fisher-Yates k-hop sampling
- Mini-batch training on ogbn-products (2.4M nodes, 126M edges)
- Adam optimizer validated against PyTorch
- Persistent GPU buffers — 1.82x speedup over per-step transfer
- Full validation against PyG and PyTorch
axn/
├── src/
│ ├── core/ # CSR, sparse subdomain
│ ├── layers/ # GCN forward/backward (CPU + GPU)
│ ├── data/ # binary loader, neighborhood sampler
│ ├── optim/ # Adam optimizer
│ ├── distributed/ # PGAS distributed forward
│ └── train.chpl # training loop
├── benchmarks/ # Chapel + Python benchmark scripts
├── docs/ # setup, architecture, backprop derivation
├── experiments/ # results and analysis
├── scripts/ # data export and baseline utilities
└── tests/ # 14/14 passing
Full benchmark results and empirical evaluation in
experiments/summary.md.
Requirements
- Chapel (compiled with
CHPL_LOCALE_MODEL=gpu,CHPL_GPU=nvidia,CHPL_GPU_ARCH=sm_75) - CUDA 12+
- Python 3.10+ with PyTorch, PyG, OGB (for baselines)
Install Python dependencies
pip install -r requirements.txtRun tests
cd axn && bash tests/run_all_tests.shAll components validated against reference implementations:
| Component | Max diff | Reference |
|---|---|---|
| Forward pass CPU | 4.96e-05 | PyG GCNConv |
| Forward pass GPU | 2.98e-08 | CPU forward |
| Backward pass | 1.07e-06 | PyG autograd |
| Adam optimizer | 5.96e-08 | PyTorch Adam |
| PGAS forward | 4.96e-05 | PyG GCNConv |
- Chapel does not support
real(16)— minimum float isreal(32) - PGAS multi-locale benchmark requires
CHPL_COMM=gasnetand cluster access - GPU kernels use generic
forall— no cuSPARSE/cuBLAS bindings - CSR build time ~21s for 126M edges (sequential fill loop)
MIT © Arthur da Costa, 2026