Skip to content

camlab-ethz/OOD_Detection_ScientificML

Repository files navigation

Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI

image info

This repository contains implementations of the paper Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI.

The GenCFD and CNO models used in this paper are adapted from the repositories:

Some dataloaders are adapted from Poseidon implementation.

While the code is also available in this project, please note that the original implementations can be found in their respective repositories.

⚠️ Note: The code has only been tested on GPUs. Running on CPUs may cause issues.

📦 Requirements

Install dependencies (example):

pip install -r requirements.txt

🗂️ Datasets

  • The Wave Equation datasets can be downloaded from this link, or can be generated using the Jupyter notebook:
    utils/generate_wave_data/generate_wave.ipynb

  • The original MERRA-2 datasets can be found on the NASA website

  • The Navier–Stokes datasets can be downloaded from Poseidon on Hugging Face

  • The BraTS2020 dataset can be downloaded from BraTS2020

  • The Classification datasets are standard CIFAR-10 and MNIST.

🔹 1D Toy Model

The 1D toy model can be run in the notebook:

  • 1d_notebooks/notebook_1d_diffusion.ipynb

It consists of:

  • 1D training of a regression model
  • 1D training of a diffusion model
  • ODE-based and SDE-based sampling
  • Likelihood estimation

🔹 2D Regression Models

Steps to Train

  1. Set working directory in train_regression_pl.py (default: trained_models).

  2. Configure wandb: provide your wandb account for logging metrics. Otherwise, disable logging manually (see: regression/GeneralModule_pl.py and GenCFD/model/lightning_wrap/pl_wrapper.py).

  3. Create a config file. Example:

    {

     "config": null,
     "device": "cpu",
     "which_model": "cno",
     "tag": "tmp",
     "loss": 1,
     "epochs": 100,
     "warmup_epochs": 0,
     "batch_size": 32,
     "peak_lr": 0.0001,
     "end_lr": 0.00001,
     "which_data": "wave",
     "in_dim": 1,
     "out_dim": 1,
     "N_train": 128,
     "ood_share": 0.0,
     "is_time": true,
     "is_masked": null,
     "max_num_time_steps": 1,
     "time_step_size": 1,
     "fix_input_to_time_step": null,
     "allowed_transitions": [0],
     "s": 128,
     "config_arch": "/configs/architectures//config_cno_very_small_att.json",
     "wandb_project_name": "your_project",
     "wandb_run_name": "_1"
    

    }

Variable explanations:

  • config: keep null
  • device: "cpu" or "cuda"
  • which_model: "cno", "unet", "basic_vit3", or "fno"
  • tag: string identifier for the model (important for saving)
  • loss: loss function index
  • epochs: number of epochs
  • warmup_epochs: number of warmup epochs
  • batch_size: training batch size
  • peak_lr: peak learning rate
  • end_lr: final learning rate at end of training
  • which_data: dataset/experiment (wave, ns_mix, ns_pwc, etc.)
  • in_dim, out_dim: input and output channel dimensions
  • N_train: number of training samples/trajectories
  • ood_share: OOD fraction (used in classification)
  • is_time: whether time conditioning is used
  • is_masked: keep null unless using masks
  • max_num_time_steps: max number of time steps (set to 7 for time-dependent NS)
  • time_step_size: step size in trajectory (set to 2 for NS)
  • fix_input_to_time_step: keep null unless fixed inputs are required
  • allowed_transitions: transitions allowed in all2all strategy (set to [1,2,3,4,5,6,7] for NS)
  • s: resolution
  • config_arch: path to architecture config file (examples in /configs/architectures)
  • wandb_project_name: project name for wandb logging
  • wandb_run_name: run tag for wandb

Run Training

python3 train_regression_pl.py --config=/path_to_config_file/

🔹 2D Diffusion Models

Steps to Train

Create a config file. Example:

{

    "config": null,
    "device": "cuda",
    "tag": "tmp",
    "epochs": 200,
    "warmup_epochs": 0,
    "batch_size": 40,
    "peak_lr": 0.0002,
    "end_lr": 0.00001,
    "which_data": "ns_pwc",
    "is_time": true,
    "is_masked": null,
    "max_num_time_steps": 10,
    "time_step_size": 2,
    "fix_input_to_time_step": null,
    "allowed_transitions": [1,2,3,4,5,6,7],
    "which_type": "x&y",
    "sigma": 100.0,
    "in_dim": 2,
    "out_dim": 2,
    "N_train": 1000,
    "ood_share": 0.0,
    "s": 128,
    "is_log_uniform": false,
    "log_uniform_frac": 1.0,
    "is_exploding": true,
    "ema_param": 0.999,
    "skip": true,
    "config_arch": "/configs/architectures/config_unet_base.json",
    "wandb_project_name": "your_project",
    "wandb_run_name": "_1"
}

Variable explanations (in addition to regression ones):

  • which_type: x&y (joint) or "x" (input only)
  • sigma: max noise level for denoiser
  • is_log_uniform: whether to use log-uniform scheme
  • log_uniform_frac: scaling factor for log-uniform scheme
  • is_exploding: whether to use exploding schedule
  • ema_param: exponential moving average decay parameter
  • skip: whether the denoiser uses skip connections

Run Training

python3 train_diffusion_pl.py --config=/path_to_config_file/

🔹 2D Inference

To obtain estimated likelihoods (or other diffusion-based certificates), you need a config file. Example:

{
    "config_regression": "/path_to_regression_model/",
    "config_diffusion": "/path_to_diffusion_model/",
    "which_data": "ns_pwc",
    "tag_data": "3",
    "device": "cuda",
    "N_samples": 123,
    "ood_share": 0.0,
    "batch_size": 8,
    "baseline_avg_grad": null,
    "which_ckpt": null,
    "save_data": false,
    "is_diff": true,
    "is_ar": true,
    "is_time": true,
    "is_masked": null,
    "max_num_time_steps": 7,
    "time_step_size": 2,
    "fix_input_to_time_step": null,
    "allowed_transitions": [7],
    "regression_scheme": [1,1,1,1,1,1,1],
    "dt": 0.1,
    "inference_tag": "1",
    "num_gen": 0
}

Variable explanations (new ones):

  • config_regression: path to regression model config
  • config_diffusion: path to diffusion model config
  • tag_data: tag for OOD testing data
  • N_samples: number of test samples
  • baseline_avg_grad: if using gradient-based baselines, set to "time", "space", "time_space", "grad_norm"
  • which_ckpt: checkpoint to load (set null for default)
  • save_data: whether to save generated samples
  • is_diff: whether to run diffusion inference, or only regression
  • is_ar: whether to run autoregressive evaluation
  • regression_scheme: AR scheme used during evaluation
  • dt: autoregressive time step size
  • inference_tag: identifier for inference run
  • num_gen: number of generations (set 0)

Run Inference

python3 inference.py --config=/path_to_config_file/

🔹 Classification

For classification tasks, run the script:

python3 train_classification.py --config=/path_to_config_file/

The config file for training is very similar to the one used for regression.
An important parameter is:

  • ood_share: fraction of the OOD class used during training.

The inference for classification tasks is implemented in the Jupyter notebook:

  • inference_classification.ipynb

Very helpful notebooks that we used in our research can be found here and here.

🔹 Segmentation

  • The segmentation training follows this script.
  • The backbone used for segmentation is the CNO model, with the final layer being a binary segmentation head.
  • The dataloader for the brain segmentation task is located in dataloader/dataloader.py.
  • The inference for segmentation is done with the predicted masks, in the same way as in regression inference.

🔹 Other 1D Experiments

For running other 1D experiments, please use the provided Jupyter notebooks:

  • 1d_notebooks/notebook_1d.ipynb

Citation

@misc{ood_certificate,
      title={Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI}, 
      author={Bogdan Raonić and Siddhartha Mishra and Samuel Lanthaler},
      year={2025},
      eprint={2509.25080},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.25080}, 
}

About

Official implementation of the paper "Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published