This repository contains implementations of the paper Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI.
The GenCFD and CNO models used in this paper are adapted from the repositories:
Some dataloaders are adapted from Poseidon implementation.
While the code is also available in this project, please note that the original implementations can be found in their respective repositories.
- Python 3.8+
- PyTorch + PyTorch Lightning
- Weights & Biases (wandb) account (optional, for logging)
Install dependencies (example):
pip install -r requirements.txt
-
The Wave Equation datasets can be downloaded from this link, or can be generated using the Jupyter notebook:
utils/generate_wave_data/generate_wave.ipynb -
The original MERRA-2 datasets can be found on the NASA website
-
The Navier–Stokes datasets can be downloaded from Poseidon on Hugging Face
-
The BraTS2020 dataset can be downloaded from BraTS2020
-
The Classification datasets are standard CIFAR-10 and MNIST.
The 1D toy model can be run in the notebook:
1d_notebooks/notebook_1d_diffusion.ipynb
It consists of:
- 1D training of a regression model
- 1D training of a diffusion model
- ODE-based and SDE-based sampling
- Likelihood estimation
-
Set working directory in
train_regression_pl.py(default:trained_models). -
Configure wandb: provide your wandb account for logging metrics. Otherwise, disable logging manually (see:
regression/GeneralModule_pl.pyandGenCFD/model/lightning_wrap/pl_wrapper.py). -
Create a config file. Example:
{
"config": null, "device": "cpu", "which_model": "cno", "tag": "tmp", "loss": 1, "epochs": 100, "warmup_epochs": 0, "batch_size": 32, "peak_lr": 0.0001, "end_lr": 0.00001, "which_data": "wave", "in_dim": 1, "out_dim": 1, "N_train": 128, "ood_share": 0.0, "is_time": true, "is_masked": null, "max_num_time_steps": 1, "time_step_size": 1, "fix_input_to_time_step": null, "allowed_transitions": [0], "s": 128, "config_arch": "/configs/architectures//config_cno_very_small_att.json", "wandb_project_name": "your_project", "wandb_run_name": "_1"}
Variable explanations:
config: keepnulldevice:"cpu"or"cuda"which_model:"cno","unet","basic_vit3", or"fno"tag: string identifier for the model (important for saving)loss: loss function indexepochs: number of epochswarmup_epochs: number of warmup epochsbatch_size: training batch sizepeak_lr: peak learning rateend_lr: final learning rate at end of trainingwhich_data: dataset/experiment (wave,ns_mix,ns_pwc, etc.)in_dim,out_dim: input and output channel dimensionsN_train: number of training samples/trajectoriesood_share: OOD fraction (used in classification)is_time: whether time conditioning is usedis_masked: keepnullunless using masksmax_num_time_steps: max number of time steps (set to 7 for time-dependent NS)time_step_size: step size in trajectory (set to 2 for NS)fix_input_to_time_step: keepnullunless fixed inputs are requiredallowed_transitions: transitions allowed in all2all strategy (set to[1,2,3,4,5,6,7]for NS)s: resolutionconfig_arch: path to architecture config file (examples in/configs/architectures)wandb_project_name: project name for wandb loggingwandb_run_name: run tag for wandb
python3 train_regression_pl.py --config=/path_to_config_file/
Create a config file. Example:
{
"config": null,
"device": "cuda",
"tag": "tmp",
"epochs": 200,
"warmup_epochs": 0,
"batch_size": 40,
"peak_lr": 0.0002,
"end_lr": 0.00001,
"which_data": "ns_pwc",
"is_time": true,
"is_masked": null,
"max_num_time_steps": 10,
"time_step_size": 2,
"fix_input_to_time_step": null,
"allowed_transitions": [1,2,3,4,5,6,7],
"which_type": "x&y",
"sigma": 100.0,
"in_dim": 2,
"out_dim": 2,
"N_train": 1000,
"ood_share": 0.0,
"s": 128,
"is_log_uniform": false,
"log_uniform_frac": 1.0,
"is_exploding": true,
"ema_param": 0.999,
"skip": true,
"config_arch": "/configs/architectures/config_unet_base.json",
"wandb_project_name": "your_project",
"wandb_run_name": "_1"
}
Variable explanations (in addition to regression ones):
which_type:x&y(joint) or"x"(input only)sigma: max noise level for denoiseris_log_uniform: whether to use log-uniform schemelog_uniform_frac: scaling factor for log-uniform schemeis_exploding: whether to use exploding scheduleema_param: exponential moving average decay parameterskip: whether the denoiser uses skip connections
python3 train_diffusion_pl.py --config=/path_to_config_file/
To obtain estimated likelihoods (or other diffusion-based certificates), you need a config file. Example:
{
"config_regression": "/path_to_regression_model/",
"config_diffusion": "/path_to_diffusion_model/",
"which_data": "ns_pwc",
"tag_data": "3",
"device": "cuda",
"N_samples": 123,
"ood_share": 0.0,
"batch_size": 8,
"baseline_avg_grad": null,
"which_ckpt": null,
"save_data": false,
"is_diff": true,
"is_ar": true,
"is_time": true,
"is_masked": null,
"max_num_time_steps": 7,
"time_step_size": 2,
"fix_input_to_time_step": null,
"allowed_transitions": [7],
"regression_scheme": [1,1,1,1,1,1,1],
"dt": 0.1,
"inference_tag": "1",
"num_gen": 0
}
Variable explanations (new ones):
config_regression: path to regression model configconfig_diffusion: path to diffusion model configtag_data: tag for OOD testing dataN_samples: number of test samplesbaseline_avg_grad: if using gradient-based baselines, set to"time","space","time_space","grad_norm"which_ckpt: checkpoint to load (setnullfor default)save_data: whether to save generated samplesis_diff: whether to run diffusion inference, or only regressionis_ar: whether to run autoregressive evaluationregression_scheme: AR scheme used during evaluationdt: autoregressive time step sizeinference_tag: identifier for inference runnum_gen: number of generations (set0)
python3 inference.py --config=/path_to_config_file/
For classification tasks, run the script:
python3 train_classification.py --config=/path_to_config_file/
The config file for training is very similar to the one used for regression.
An important parameter is:
ood_share: fraction of the OOD class used during training.
The inference for classification tasks is implemented in the Jupyter notebook:
inference_classification.ipynb
Very helpful notebooks that we used in our research can be found here and here.
- The segmentation training follows this script.
- The backbone used for segmentation is the CNO model, with the final layer being a binary segmentation head.
- The dataloader for the brain segmentation task is located in dataloader/dataloader.py.
- The inference for segmentation is done with the predicted masks, in the same way as in regression inference.
For running other 1D experiments, please use the provided Jupyter notebooks:
1d_notebooks/notebook_1d.ipynb
@misc{ood_certificate,
title={Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI},
author={Bogdan Raonić and Siddhartha Mishra and Samuel Lanthaler},
year={2025},
eprint={2509.25080},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2509.25080},
}