Skip to content

arthureleven/axn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


 █████╗ ██╗  ██╗███╗   ██╗
██╔══██╗╚██╗██╔╝████╗  ██║
███████║ ╚███╔╝ ██╔██╗ ██║
██╔══██║ ██╔██╗ ██║╚██╗██║
██║  ██║██╔╝ ██╗██║ ╚████║
╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═══╝

Native PGAS-Based Graph Convolutional Network Training

Status Language License Dataset



AXN is a Graph Convolutional Network (GCN) training framework written natively in Chapel, leveraging its Partitioned Global Address Space (PGAS) model with GPU acceleration via Chapel's native CUDA backend.

It is the first framework to implement GCN training with full backpropagation natively in Chapel. The primary contribution is not raw speed, but the demonstration that Chapel's PGAS model provides a natural and productive abstraction for distributed graph learning.

Paper coming soon — arXiv · TMLR


Features

  • Forward and backward pass implemented natively in Chapel
  • GPU acceleration via Chapel's on here.gpus[0] locale model
  • Distributed forward pass via Chapel PGAS (coforall over locales)
  • Neighborhood sampler with Fisher-Yates k-hop sampling
  • Mini-batch training on ogbn-products (2.4M nodes, 126M edges)
  • Adam optimizer validated against PyTorch
  • Persistent GPU buffers — 1.82x speedup over per-step transfer
  • Full validation against PyG and PyTorch

Architecture

axn/
├── src/
│   ├── core/          # CSR, sparse subdomain
│   ├── layers/        # GCN forward/backward (CPU + GPU)
│   ├── data/          # binary loader, neighborhood sampler
│   ├── optim/         # Adam optimizer
│   ├── distributed/   # PGAS distributed forward
│   └── train.chpl     # training loop
├── benchmarks/        # Chapel + Python benchmark scripts
├── docs/              # setup, architecture, backprop derivation
├── experiments/       # results and analysis
├── scripts/           # data export and baseline utilities
└── tests/             # 14/14 passing

Full benchmark results and empirical evaluation in experiments/summary.md.


Setup

Requirements

  • Chapel (compiled with CHPL_LOCALE_MODEL=gpu, CHPL_GPU=nvidia, CHPL_GPU_ARCH=sm_75)
  • CUDA 12+
  • Python 3.10+ with PyTorch, PyG, OGB (for baselines)

Install Python dependencies

pip install -r requirements.txt

Run tests

cd axn && bash tests/run_all_tests.sh

Validation

All components validated against reference implementations:

Component Max diff Reference
Forward pass CPU 4.96e-05 PyG GCNConv
Forward pass GPU 2.98e-08 CPU forward
Backward pass 1.07e-06 PyG autograd
Adam optimizer 5.96e-08 PyTorch Adam
PGAS forward 4.96e-05 PyG GCNConv

Limitations

  • Chapel does not support real(16) — minimum float is real(32)
  • PGAS multi-locale benchmark requires CHPL_COMM=gasnet and cluster access
  • GPU kernels use generic forall — no cuSPARSE/cuBLAS bindings
  • CSR build time ~21s for 126M edges (sequential fill loop)

License

MIT © Arthur da Costa, 2026

About

Native PGAS-Based GCN Training

Topics

Resources

License

Stars

Watchers

Forks

Contributors