Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI

This repository contains implementations of the paper Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI.

The GenCFD and CNO models used in this paper are adapted from the repositories:

GenCFD
CNO

Some dataloaders are adapted from Poseidon implementation.

While the code is also available in this project, please note that the original implementations can be found in their respective repositories.

⚠️ Note: The code has only been tested on GPUs. Running on CPUs may cause issues.

📦 Requirements

Python 3.8+
PyTorch + PyTorch Lightning
Weights & Biases (wandb) account (optional, for logging)

Install dependencies (example):

pip install -r requirements.txt

🗂️ Datasets

The Wave Equation datasets can be downloaded from this link, or can be generated using the Jupyter notebook:
utils/generate_wave_data/generate_wave.ipynb
The original MERRA-2 datasets can be found on the NASA website
The Navier–Stokes datasets can be downloaded from Poseidon on Hugging Face
The BraTS2020 dataset can be downloaded from BraTS2020
The Classification datasets are standard CIFAR-10 and MNIST.

🔹 1D Toy Model

The 1D toy model can be run in the notebook:

1d_notebooks/notebook_1d_diffusion.ipynb

It consists of:

1D training of a regression model
1D training of a diffusion model
ODE-based and SDE-based sampling
Likelihood estimation

🔹 2D Regression Models

Steps to Train

Set working directory in train_regression_pl.py (default: trained_models).
Configure wandb: provide your wandb account for logging metrics. Otherwise, disable logging manually (see: regression/GeneralModule_pl.py and GenCFD/model/lightning_wrap/pl_wrapper.py).

Create a config file. Example:

{

 "config": null,
 "device": "cpu",
 "which_model": "cno",
 "tag": "tmp",
 "loss": 1,
 "epochs": 100,
 "warmup_epochs": 0,
 "batch_size": 32,
 "peak_lr": 0.0001,
 "end_lr": 0.00001,
 "which_data": "wave",
 "in_dim": 1,
 "out_dim": 1,
 "N_train": 128,
 "ood_share": 0.0,
 "is_time": true,
 "is_masked": null,
 "max_num_time_steps": 1,
 "time_step_size": 1,
 "fix_input_to_time_step": null,
 "allowed_transitions": [0],
 "s": 128,
 "config_arch": "/configs/architectures//config_cno_very_small_att.json",
 "wandb_project_name": "your_project",
 "wandb_run_name": "_1"

}

Variable explanations:

config: keep null
device: "cpu" or "cuda"
which_model: "cno", "unet", "basic_vit3", or "fno"
tag: string identifier for the model (important for saving)
loss: loss function index
epochs: number of epochs
warmup_epochs: number of warmup epochs
batch_size: training batch size
peak_lr: peak learning rate
end_lr: final learning rate at end of training
which_data: dataset/experiment (wave, ns_mix, ns_pwc, etc.)
in_dim, out_dim: input and output channel dimensions
N_train: number of training samples/trajectories
ood_share: OOD fraction (used in classification)
is_time: whether time conditioning is used
is_masked: keep null unless using masks
max_num_time_steps: max number of time steps (set to 7 for time-dependent NS)
time_step_size: step size in trajectory (set to 2 for NS)
fix_input_to_time_step: keep null unless fixed inputs are required
allowed_transitions: transitions allowed in all2all strategy (set to [1,2,3,4,5,6,7] for NS)
s: resolution
config_arch: path to architecture config file (examples in /configs/architectures)
wandb_project_name: project name for wandb logging
wandb_run_name: run tag for wandb

Run Training

python3 train_regression_pl.py --config=/path_to_config_file/

🔹 2D Diffusion Models

Steps to Train

Create a config file. Example:

{

    "config": null,
    "device": "cuda",
    "tag": "tmp",
    "epochs": 200,
    "warmup_epochs": 0,
    "batch_size": 40,
    "peak_lr": 0.0002,
    "end_lr": 0.00001,
    "which_data": "ns_pwc",
    "is_time": true,
    "is_masked": null,
    "max_num_time_steps": 10,
    "time_step_size": 2,
    "fix_input_to_time_step": null,
    "allowed_transitions": [1,2,3,4,5,6,7],
    "which_type": "x&y",
    "sigma": 100.0,
    "in_dim": 2,
    "out_dim": 2,
    "N_train": 1000,
    "ood_share": 0.0,
    "s": 128,
    "is_log_uniform": false,
    "log_uniform_frac": 1.0,
    "is_exploding": true,
    "ema_param": 0.999,
    "skip": true,
    "config_arch": "/configs/architectures/config_unet_base.json",
    "wandb_project_name": "your_project",
    "wandb_run_name": "_1"
}

Variable explanations (in addition to regression ones):

which_type: x&y (joint) or "x" (input only)
sigma: max noise level for denoiser
is_log_uniform: whether to use log-uniform scheme
log_uniform_frac: scaling factor for log-uniform scheme
is_exploding: whether to use exploding schedule
ema_param: exponential moving average decay parameter
skip: whether the denoiser uses skip connections

Run Training

python3 train_diffusion_pl.py --config=/path_to_config_file/

🔹 2D Inference

To obtain estimated likelihoods (or other diffusion-based certificates), you need a config file. Example:

{
    "config_regression": "/path_to_regression_model/",
    "config_diffusion": "/path_to_diffusion_model/",
    "which_data": "ns_pwc",
    "tag_data": "3",
    "device": "cuda",
    "N_samples": 123,
    "ood_share": 0.0,
    "batch_size": 8,
    "baseline_avg_grad": null,
    "which_ckpt": null,
    "save_data": false,
    "is_diff": true,
    "is_ar": true,
    "is_time": true,
    "is_masked": null,
    "max_num_time_steps": 7,
    "time_step_size": 2,
    "fix_input_to_time_step": null,
    "allowed_transitions": [7],
    "regression_scheme": [1,1,1,1,1,1,1],
    "dt": 0.1,
    "inference_tag": "1",
    "num_gen": 0
}

Variable explanations (new ones):

config_regression: path to regression model config
config_diffusion: path to diffusion model config
tag_data: tag for OOD testing data
N_samples: number of test samples
baseline_avg_grad: if using gradient-based baselines, set to "time", "space", "time_space", "grad_norm"
which_ckpt: checkpoint to load (set null for default)
save_data: whether to save generated samples
is_diff: whether to run diffusion inference, or only regression
is_ar: whether to run autoregressive evaluation
regression_scheme: AR scheme used during evaluation
dt: autoregressive time step size
inference_tag: identifier for inference run
num_gen: number of generations (set 0)

Run Inference

python3 inference.py --config=/path_to_config_file/

🔹 Classification

For classification tasks, run the script:

python3 train_classification.py --config=/path_to_config_file/

The config file for training is very similar to the one used for regression.
An important parameter is:

ood_share: fraction of the OOD class used during training.

The inference for classification tasks is implemented in the Jupyter notebook:

inference_classification.ipynb

Very helpful notebooks that we used in our research can be found here and here.

🔹 Segmentation

The segmentation training follows this script.
The backbone used for segmentation is the CNO model, with the final layer being a binary segmentation head.
The dataloader for the brain segmentation task is located in dataloader/dataloader.py.
The inference for segmentation is done with the predicted masks, in the same way as in regression inference.

🔹 Other 1D Experiments

For running other 1D experiments, please use the provided Jupyter notebooks:

1d_notebooks/notebook_1d.ipynb

Citation

@misc{ood_certificate,
      title={Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI}, 
      author={Bogdan Raonić and Siddhartha Mishra and Samuel Lanthaler},
      year={2025},
      eprint={2509.25080},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.25080}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI

📦 Requirements

🗂️ Datasets

🔹 1D Toy Model

🔹 2D Regression Models

Steps to Train

Run Training

🔹 2D Diffusion Models

Steps to Train

Run Training

🔹 2D Inference

Run Inference

🔹 Classification

🔹 Segmentation

🔹 Other 1D Experiments

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
1d_notebooks		1d_notebooks
CNO2d_original_version		CNO2d_original_version
GenCFD		GenCFD
configs		configs
dataloader		dataloader
diffusion		diffusion
regression		regression
utils		utils
visualization		visualization
LICENSE		LICENSE
README.md		README.md
gitignore.txt		gitignore.txt
inference_classification.ipynb		inference_classification.ipynb
inference_pl.py		inference_pl.py
requirements.txt		requirements.txt
train_classification.py		train_classification.py
train_diffusion_pl.py		train_diffusion_pl.py
train_regression_pl.py		train_regression_pl.py

License

camlab-ethz/OOD_Detection_ScientificML

Folders and files

Latest commit

History

Repository files navigation

Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI

📦 Requirements

🗂️ Datasets

🔹 1D Toy Model

🔹 2D Regression Models

Steps to Train

Run Training

🔹 2D Diffusion Models

Steps to Train

Run Training

🔹 2D Inference

Run Inference

🔹 Classification

🔹 Segmentation

🔹 Other 1D Experiments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages