This repository contains the code and experiments for the Master's thesis "Out-of-Distribution Generalization in Deep Learning-Based Bayesian Causal Discovery".
The framework provides a scalable, Hydra-configured environment for benchmarking Bayesian Causal Discovery and Meta-Learning algorithms under distributional shift. It evaluates amortized inference methods (AviCi, BCNP) against explicit Bayesian methods (DiBS, BayesDAG) to test their robustness and the utility of posterior uncertainty.
The evaluation is built around a synthetic Structural Causal Model (SCM) generator that isolates specific distributional shifts between the training simulator and test environments:
- Evaluated Models: Amortized inference (AviCi, BCNP) and explicit dataset-specific inference (DiBS, BayesDAG).
- Distributional Shifts: The benchmark evaluates out-of-distribution (OOD) generalization across isolated changes in graph topology, mechanism priors (linear, MLP), exogenous noise, problem scale (node count), and sample sizes.
- Metrics & Diagnostics: Comprehensive graph metrics (SHD, SID, F1, AUROC) are used alongside likelihood proxies and marginal posterior uncertainty diagnostics to evaluate structural error and robustness.
The experiments are managed using uv for reproducible environment resolution and Hydra for configuration.
# Install the main environment (AviCi, BCNP, DiBS)
uv sync --extra cluster --extra wandb --frozen --no-editable
# Bootstrap the secondary environment (BayesDAG legacy stack)
scripts/bootstrap_uv.shRun a small, benchmark-shaped smoke config locally to verify the pipeline:
uv run causal-meta --config-name dg_2pretrain_smoke model=aviciFull experiments and ablations are designed to run on a Slurm cluster. Submission scripts are provided in scripts/:
# Run the main benchmark sweep across all models
scripts/submit_all_models.sh main
# Run the ablation suite
scripts/submit_ablation_suite.sh