Paper: https://arxiv.org/abs/2506.12007
Authors: Paul Setinek, Gianluca Galletti, Thomas Gross, Dominik Schnürer, Johannes Brandstetter, Werner Zellinger
This repository contains the datasets, dataloaders, baseline models, unsupervised domain adatpation algorithms and model selection strategies together with experiments and evaluation protocols for SIMSHIFT, a benchmark designed to evaluate Unsupervised Domain Adaptation (UDA) methods for neural surrogates of physical simulations. The benchmark's datasets target real world industrial scenarios and provides distribution shifts across parameter configurations in mesh-based PDE simulations.
Datasets:
- Hot rolling
- Sheet metal forming
- Electric motor design
- Heatsink desing
All the datasets are hosted on Huggingface at https://huggingface.co/datasets/simshift/SIMSHIFT_data.
Clone the repo:
git clone https://github.com/psetinek/simshift.git
cd simshiftCreate a new virual environment (the code was developed and tested with python 3.11):
conda create -n simshift python=3.11
conda activate simshiftFirst please install you desired torch version (the repo was tested with torch 2.6.0), as shown here. If you are on a linux system and have cuda 12.6, the command would be:
pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126Additionally, we use PyTorch Geometric (PyG), please install it as follows:
pip install torch_geometricWe also need torch-scatter. To install it, first check your PyTorch and CUDA version:
python -c "import torch; print(f'PyTorch {torch.__version__}, CUDA {torch.version.cuda}')"Then install the packages above as described in the respective documentations. For torch 2.6.0 and CUDA 12.6, it would work like:
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.6.0+cu126.htmlThen install torch-cluster with:
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.6.0+cu126.htmlFinally, install the simshift package, via:
pip install -e .
To get familiar with the capabilities of this repository, we provide a tutorial notebook showing easy model training and result evaluation and visualization.
Please also take a look at the documentation of the package. There, we also provide instructions of how you can easily extend SIMSHIFTs current functionalities, i.e. add a new model, dataset or domain adaptation algorithm.
In the paper, we report results on 1D/2D splits, but you can define arbitrary n-D splits on the respective parameters of the SIMSHIFT datasets. We provide domain_splitter.py, which filters source samples by ranges on any subset of parameters, and puts everything else into target. Both domains are then split into train/(val)/test as requested.
You can use it as follows:
python domain_splitter.py \
--data-path <path_to_dataset> \
--config-file domain_splitting_configs/motor.json \
--output-name splits.jsonThe arguments are:
| Argument | Required | Description |
|---|---|---|
--data-path |
Yes | Path to the dataset directory (must contain metadata.csv) |
--config-file |
Yes | JSON configuration file defining domain ranges and split ratios |
--output-name |
No | Output filename (default: splits.json) |
The splitting configs used for the results in the manuscript can be found here. An example config could look like this:
{
"seed": 42,
"source_ratios": [0.6, 0.2, 0.2],
"target_ratios": [0.6, 0.4],
"split_column": "Geometry.Rotor.dr3",
"source_range": [0, 119],
"target_ranges": {
"easy": [119, 121],
"medium": [121, 123],
"hard": [123, 126]
}
}The required parameters are:
| Parameter | Description |
|---|---|
seed |
Random seed for reproducibility |
source_ratios |
[train, val, test] ratios for source domain (must sum to 1.0) |
target_ratios |
[train, test] ratios for target domain (must sum to 1.0) |
split_column |
Column name in metadata.csv to use for domain splitting |
source_range |
[min, max) range for source domain samples |
target_ranges |
Dict mapping difficulty names to [min, max) ranges |
Additional requirements are:
- The dataset directory must contain a
metadata.csvfile with:- A
sample_idcolumn identifying each sample - The conditioning column specified in
split_column
- A
We try to be as transparent as possible. Therefore, we provide clear instructions on how we obtained the papers results here.