This repository contains code for our paper accepted in ICML Workshop 2022 on Spurious Correlations, Invariance and Stability (SCIS): Causal Discovery using Model Invariance via Knockoffs by Wasim Ahmad, Maha Shadaydeh and Joachim Denzler.
- The work can be cited using below citation information.
@inproceedings{
ahmad2022causal,
title={Causal Discovery using Model Invariance through Knockoff Interventions},
author={Wasim Ahmad and Maha Shadaydeh and Joachim Denzler},
booktitle={ICML 2022: Workshop on Spurious Correlations, Invariance and Stability},
year={2022},
url={https://openreview.net/forum?id=OcNeMVbIdCF}
}
We discover full causal graph in multivariate nonlinear systems by testing model invariance against Knockoffs-based interventional environments:
- First we train deep network
$f_i$ using data from observational environment$E_i$ . - Then we expose the model to Knockoffs-based interventional environments
$E_k$ . - For each pair variables
${z_i, z_j}$ in nonlinear system, we test model invariance across environments. - We perform KS test over distribution
$R_i$ ,$R_k$ of model residuals in various environments. Our NULL hypothesis is that variable$z_i$ does not cause$z_j$ ,$H_0$ :$R_i$ =$R_k$ , else the alternate hypothesis$H_1$ :$R_i$ !=$R_k$ is accepted.
We test our method on synthetic as well as real data which can be found under datasets/ directory. The synthetic data is generated using file src/synthetic_dataset.py.
The real data we used is average daily discharges of rivers in the upper Danube basin, measurements of which are made available by the Bavarian Environmental Agency at
https://www.gkd.bayern.de.
src/main.py is our main file, where we model multivariate non-linear data using deep networks.
src/deepcause.pyfor actual and counterfactual outcome generation using interventions.src/functions.pyfor preprocessing and methods.src/knockoffs.pygenerate knockoffs of the original variables.src/daignostics.pyto determine the goodness of the generated knockoff copies.src/cdmi_examples.ipynbfor testing various examples for causal discovery.DeepKnockoffs/contains the knockoffs generation methods.datasets/contains the generated synthetic data and real dataset.model/contains trained models that we used for different datasets.
We also extend CDMI to discover causal graph in non-stationary time series. To do that, first we segment multivariate non-stationary time series based on their dynamics using Regime Identification method and perform regime-oriented causal discovery.
requirements.txt contains all the packages that are related to the project.
To install them, simply create a new conda environment (use python=3.8.15) and type
pip install -r requirements.txt
This work is funded by the Carl Zeiss Foundation within the scope of the program line "Breakthroughs: Exploring Intelligent Systems" for "Digitization — explore the basics, use applications" and the DFG grant SH 1682/1-1.