skip to main content
research-article

Adversarial Missingness Attacks on Causal Structure Learning

Published: 06 December 2024 Publication History

Abstract

Causality-informed machine learning has been proposed as an avenue for achieving many of the goals of modern machine learning, from ensuring generalization under domain shifts to attaining fairness, robustness, and interpretability. A key component of causal machine learning is the inference of causal structures from observational data; in practice, this data may be incompletely observed. Prior work has demonstrated that adversarial perturbations of completely observed training data may be used to force the learning of inaccurate structural causal models (SCMs). However, when the data can be audited for correctness (e.g., it is cryptographically signed by its source), this adversarial mechanism is invalidated. This work introduces a novel attack methodology wherein the adversary deceptively omits a portion of the true training data to bias the learned causal structures in a desired manner (under strong signed sample input validation, this behavior seems to be the only strategy available to the adversary). Under this model, theoretically sound attack mechanisms are derived for the case of arbitrary SCMs, and a sample-efficient learning-based heuristic is given. Experimental validation of these approaches on real and synthetic datasets, across a range of SCMs from the family of additive noise models (linear Gaussian, linear non-Gaussian, and non-linear Gaussian), demonstrates the effectiveness of adversarial missingness attacks at deceiving popular causal structure learning algorithms.

References

[1]
Emad Alsuwat, Hatim Alsuwat, Marco Valtorta, and Csilla Farkas. 2018. Cyber attacks against the PC learning algorithm. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 159–176.
[2]
Emad Alsuwat, Hatim Alsuwat, Marco Valtorta, and Csilla Farkas. 2020. Adversarial data poisoning attacks against the PC learning algorithm. International Journal of General Systems 49, 1 (2020), 3–31.
[3]
Rohit Bhattacharya, Razieh Nabi, Ilya Shpitser, and James M. Robins. 2020. Identification in missing data models represented by directed acyclic graphs. In Proceedings of the 35th Uncertainty in Artificial Intelligence Conference. PMLR, 1149–1158.
[4]
Ruichu Cai, Zhiyi Huang, Wei Chen, Zhifeng Hao, and Kun Zhang. 2023. Causal discovery with latent confounders based on higher-order cumulants. In Proceedings of the 40th International Conference on Machine Learning (ICML’23), Vol. 202, JMLR.org, 3380–3407.
[5]
Jiazhu Dai, Chuanshuai Chen, and Yufeng Li. 2019. A backdoor attack against LSTM-based text classification systems. IEEE Access 7 (2019), 138872–138878.
[6]
T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona. 2021. A survey on missing data in machine learning. Big Data 8, 1 (2021). DOI:
[7]
Minghong Fang, Neil Zhenqiang Gong, and Jia Liu. 2020. Influence function based data poisoning attacks to top-n recommender systems. In Proceedings of the Web Conference, 3019–3025.
[8]
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2008. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 3 (July 2008), 432–441. DOI:
[9]
Erdun Gao, Ignavier Ng, Mingming Gong, Li Shen, Wei Huang, Tongliang Liu, Kun Zhang, and Howard Bondell. 2022. MissDAG: Causal discovery in the presence of missing data with continuous additive noise models. arXiv:2205.13869. Retrieved from from https://arxiv.org/abs/2205.13869
[10]
H. Hamidul, J. Carlin, J. Simpson, and K. Lee. 2018. A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Medical Research Methodology 18 (December 2018). DOI:
[11]
M. Y. Kashmoola and I. Ahmed, M. Ibrahim. 2021. Threats on machine learning technique by data poisoning attack: A survey. In Advances in Cyber Security. M. Anbar N. Abdullah, S. Manickam (Eds.), Springer Singapore, Singapore, 586–600.
[12]
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 273–294.
[13]
Karthika Mohan and Judea Pearl. 2014. Graphical models for recovering probabilistic and causal queries from missing data. In Proceedings of the 27th International Conference on Neural Information Processing Systems, 1520–1528.
[14]
Ignavier Ng, AmirEmad Ghassami, and Kun Zhang. 2020. On the role of sparsity and DAG constraints for learning linear DAGs. In Advances in Neural Information Processing Systems, Vol. 33, Curran Associates, Inc., 17943–17954.
[15]
Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. 2014. Causal discovery with continuous additive noise models. The Journal of Machine Learning Research 15, 1 (2014), 2009–2053.
[16]
Donald B. Rubin. 1976. Inference and missing data. Biometrika 63, 3 (1976), 581–592.
[17]
Karen Sachs, Omar Perez, Dana Pe’er, Douglas A. Lauffenburger, and Garry P. Nolan. 2005. Causal protein-signaling networks derived from multiparameter single-cell data. Science 308, 5721 (April 2005), 523–529. DOI:
[18]
Skipper Seabold and Josef Perktold. 2010. statsmodels: Econometric and statistical modeling with python. In Proceedings of the 9th Python in Science Conference.
[19]
Ilya Shpitser, Karthika Mohan, and Judea Pearl. 2015. Missing data as a causal and probabilistic problem. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI’15). AUAI Press, 802–811.
[20]
Peter Spirtes, Clark N. Glymour, Richard Scheines, and David Heckerman. 2000. Causation, Prediction, and Search. MIT Press.
[21]
N. Städler and P. Bühlmann. 2010. Missing values: Sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing 22 (2010), 219–235.
[22]
Felix Thoemmes and Karthika Mohan. 2015. Graphical representation of missing data problems. Structural Equation Modeling: A Multidisciplinary Journal 22, 4 (2015), 631–642.
[23]
Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellström, and Kun Zhang. 2019. Causal discovery in the presence of missing data. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1762–1770.
[24]
Feng Xie, Ruichu Cai, Biwei Huang, Clark Glymour, Zeng Hao, and Kun Zhang. 2020. Generalized independent noise condition for estimating latent variable causal graphs. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ’20). Curran Associates Inc., 14891–14902.
[25]
Xun Zheng, Bryon Aragam, Pradeep K. Ravikumar, and Eric P. Xing. 2018. DAGs with NO TEARS: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems, Vol. 31, Curran Associates, Inc.
[26]
Xun Zheng, Chen Dan, Bryon Aragam, Pradeep Ravikumar, and Eric Xing. 2020. Learning sparse nonparametric DAGs. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. PMLR, 3414–3425.
[27]
Shengyu Zhu, Ignavier Ng, and Zhitang Chen. 2020. Causal discovery with reinforcement learning. arXiv:1906.04477. Retrieved from https://arxiv.org/abs/1906.04477

Index Terms

  1. Adversarial Missingness Attacks on Causal Structure Learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 6
      December 2024
      727 pages
      EISSN:2157-6912
      DOI:10.1145/3613712
      • Editor:
      • Huan Liu
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 December 2024
      Online AM: 27 August 2024
      Accepted: 04 July 2024
      Revised: 08 May 2024
      Received: 20 October 2023
      Published in TIST Volume 15, Issue 6

      Check for updates

      Author Tags

      1. Causal ML
      2. Causal Structure Learning
      3. Missing Data
      4. Adversarial ML
      5. Data Poisoning

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 216
        Total Downloads
      • Downloads (Last 12 months)216
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 12 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media