Code for [paper] MoEPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training (NeurIPS'2025). It pretrains neural operator transformers (from 30M to 0.5B) on multiple PDE datasets. Pre-trained weights could be found at here (I have also uploaded the corresponding training results for DPOT ).
Our pre-trained MoE-POT achieves the state-of-the-art performance on multiple PDE datasets and could be used for finetuning on different types of downstream PDE problems.
We have five pre-trained checkpoints of different sizes. Pre-trained weights are at here
| Size | Attention dim | MLP dim | Layers | Heads | Model size |
|---|---|---|---|---|---|
| Tiny | 512 | 512 | 4 | 4 | 30M |
| Small | 1024 | 1024 | 6 | 8 | 166M |
| Medium | 1024 | 2038 | 8 | 8 | 489M |
All datasets are stored using hdf5 format, containing data field. Some datasets are stored with individual hdf5 files, others are stored within a single hdf5 file.
In data_generation/preprocess.py, we have the script for preprocessing the datasets from each source. Download the original file from these sources and preprocess them to /data folder.
| Dataset | Link |
|---|---|
| FNO data | Here |
| PDEBench data | Here |
| PDEArena data | Here |
| CFDbench data | Here |
In utils/make_master_file.py , we have all dataset configurations. When new datasets are merged, you should add a configuration dict. It stores all relative paths so that you could run on any places.
In the code, we refer to the datasets by a different identifier than the original datasets, see the following table for a mapping,For specific data processing, please refer to data_generation/preprocess.py:
| Code Identifier | Original dataset |
|---|---|
| ns2d_fno_1e-5 | NavierStokes_V1e-5_N1200_T20 |
| ns2d_fno_1e-4 | NavierStokes_V1e-4_N10000_T30 |
| ns2d_fno_1e-3 | NavierStokes_V1e-3_N5000_T50 |
| ns2d_pdb_M1e-1_eta1e-2_zeta1e-2 | 2D_CFD_Rand_M0.1_Eta0.01_Zeta0.01_periodic_128_Train.hdf5 |
| ns2d_pdb_M1_eta1e-2_zeta1e-2 | 2D_CFD_Rand_M1.0_Eta0.01_Zeta0.01_periodic_128_Train.hdf5 |
| swe_pdb | 2D_rdb_NA_NA.h5 |
| dr_pdb | 2D_diff-react_NA_NA.h5 |
| cfdbench | CFDBench |
| ns2d_pda | NavierStokes-2D |
| ns2d_cond_pda | NavierStokes-2D-conditoned |
python train_temporal.py
# or
python trainer.py --config_file ns2d_pretrain.yamlpython parallel_trainer.py --config_file pretrain_tiny.yaml
# tiny ,small ,medium three modelNow I use yaml as the configuration file. You could specify parameters for args. If you want to run multiple tasks, you could move parameters into the tasks ,
model: MoEPOT
width: 512
tasks:
lr: [0.001,0.0001]
batch_size: [256, 32] This means that you start 2 tasks if you submit this configuration to trainer.py.
Install the following packages via conda-forge
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install matplotlib scikit-learn scipy pandas h5py -c conda-forge
conda install timm einops tensorboard -c conda-forge-
README.md -
train_temporal.py: main code of single GPU pre-training auto-regressive model -
trainer.py: framework of auto scheduling training tasks for parameter tuning -
parallel_trainer.pyframework of auto scheduling training tasks for Mutil-GPU -
train_temporal_parallel.pymain code of Mutil-GPU pre-training auto-regressive model -
utils/criterion.py: loss functions of relative errorgriddataset.py: dataset of mixture of temporal uniform grid datasetmake_master_file.py: datasets config filenormalizer: normalization methodsoptimizer: Adam/AdamW/Lamb optimizer supporting complex numbersutilities.py: other auxiliary functions
-
configs/: configuration files for pre-training or fine-tuning -
models/moepot.py: MoEPOT modelMoE_conv.py: moe modelfno.py: FNO with group normalizationmlp.py
We would like to express our gratitude to all collaborators, fellow students, and anonymous reviewers for their valuable assistance. Special thanks are extended to Zhongkai Hao and Kuan Xu for their significant support. And we would like to thank the following open-source projects and research works: DPOT for model architecture, poseidon for dataset
If you use MoE-POT in your research, please use the following BibTeX entry.
@misc{wang2025mixtureofexpertsoperatortransformerlargescale, title={Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training}, author={Hong Wang and Haiyang Xin and Jie Wang and Xuanze Yang and Fei Zha and Huanshuo Dong and Yan Jiang}, year={2025}, eprint={2510.25803}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2510.25803}, }