SIMSHIFT: A Benchmark for Adapting Neural Surrogates to Distribution Shifts

Paper: https://arxiv.org/abs/2506.12007

Authors: Paul Setinek, Gianluca Galletti, Thomas Gross, Dominik Schnürer, Johannes Brandstetter, Werner Zellinger

Overview

This repository contains the datasets, dataloaders, baseline models, unsupervised domain adatpation algorithms and model selection strategies together with experiments and evaluation protocols for SIMSHIFT, a benchmark designed to evaluate Unsupervised Domain Adaptation (UDA) methods for neural surrogates of physical simulations. The benchmark's datasets target real world industrial scenarios and provides distribution shifts across parameter configurations in mesh-based PDE simulations.

Datasets:

Hot rolling
Sheet metal forming
Electric motor design
Heatsink desing

All the datasets are hosted on Huggingface at https://huggingface.co/datasets/simshift/SIMSHIFT_data.

1. Installation

Clone the repo:

git clone https://github.com/psetinek/simshift.git
cd simshift

Create a new virual environment (the code was developed and tested with python 3.11):

conda create -n simshift python=3.11
conda activate simshift

First please install you desired torch version (the repo was tested with torch 2.6.0), as shown here. If you are on a linux system and have cuda 12.6, the command would be:

pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126

Additionally, we use PyTorch Geometric (PyG), please install it as follows:

pip install torch_geometric

We also need torch-scatter. To install it, first check your PyTorch and CUDA version:

python -c "import torch; print(f'PyTorch {torch.__version__}, CUDA {torch.version.cuda}')"

Then install the packages above as described in the respective documentations. For torch 2.6.0 and CUDA 12.6, it would work like:

pip install torch-scatter -f https://data.pyg.org/whl/torch-2.6.0+cu126.html

Then install torch-cluster with:

pip install torch-cluster -f https://data.pyg.org/whl/torch-2.6.0+cu126.html

Finally, install the simshift package, via:

pip install -e .

2. Tutorial notebooks

To get familiar with the capabilities of this repository, we provide a tutorial notebook showing easy model training and result evaluation and visualization.

Please also take a look at the documentation of the package. There, we also provide instructions of how you can easily extend SIMSHIFTs current functionalities, i.e. add a new model, dataset or domain adaptation algorithm.

3. Custom domain splits

In the paper, we report results on 1D/2D splits, but you can define arbitrary n-D splits on the respective parameters of the SIMSHIFT datasets. We provide domain_splitter.py, which filters source samples by ranges on any subset of parameters, and puts everything else into target. Both domains are then split into train/(val)/test as requested.

You can use it as follows:

python domain_splitter.py \
    --data-path <path_to_dataset> \
    --config-file domain_splitting_configs/motor.json \
    --output-name splits.json

The arguments are:

Argument	Required	Description
`--data-path`	Yes	Path to the dataset directory (must contain `metadata.csv`)
`--config-file`	Yes	JSON configuration file defining domain ranges and split ratios
`--output-name`	No	Output filename (default: `splits.json`)

The splitting configs used for the results in the manuscript can be found here. An example config could look like this:

{
    "seed": 42,
    "source_ratios": [0.6, 0.2, 0.2],
    "target_ratios": [0.6, 0.4],
    "split_column": "Geometry.Rotor.dr3",
    "source_range": [0, 119],
    "target_ranges": {
        "easy": [119, 121],
        "medium": [121, 123],
        "hard": [123, 126]
    }
}

The required parameters are:

Parameter	Description
`seed`	Random seed for reproducibility
`source_ratios`	`[train, val, test]` ratios for source domain (must sum to 1.0)
`target_ratios`	`[train, test]` ratios for target domain (must sum to 1.0)
`split_column`	Column name in `metadata.csv` to use for domain splitting
`source_range`	`[min, max)` range for source domain samples
`target_ranges`	Dict mapping difficulty names to `[min, max)` ranges

Additional requirements are:

The dataset directory must contain a metadata.csv file with:
- A sample_id column identifying each sample
- The conditioning column specified in split_column

4. Paper results reproduciblility

We try to be as transparent as possible. Therefore, we provide clear instructions on how we obtained the papers results here.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
configs		configs
docs		docs
domain_splitting_configs		domain_splitting_configs
notebooks		notebooks
res		res
results		results
simshift		simshift
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE.md		LICENSE.md
README.md		README.md
domain_splitter.py		domain_splitter.py
launcher.py		launcher.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_model_selection.py		run_model_selection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIMSHIFT: A Benchmark for Adapting Neural Surrogates to Distribution Shifts

Overview

1. Installation

2. Tutorial notebooks

3. Custom domain splits

4. Paper results reproduciblility

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SIMSHIFT: A Benchmark for Adapting Neural Surrogates to Distribution Shifts

Overview

1. Installation

2. Tutorial notebooks

3. Custom domain splits

4. Paper results reproduciblility

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages