research-article

Adversarial Missingness Attacks on Causal Structure Learning

Authors:

Moti YungAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 15, Issue 6

Article No.: 119, Pages 1 - 60

https://doi.org/10.1145/3682065

Published: 06 December 2024 Publication History

Abstract

Causality-informed machine learning has been proposed as an avenue for achieving many of the goals of modern machine learning, from ensuring generalization under domain shifts to attaining fairness, robustness, and interpretability. A key component of causal machine learning is the inference of causal structures from observational data; in practice, this data may be incompletely observed. Prior work has demonstrated that adversarial perturbations of completely observed training data may be used to force the learning of inaccurate structural causal models (SCMs). However, when the data can be audited for correctness (e.g., it is cryptographically signed by its source), this adversarial mechanism is invalidated. This work introduces a novel attack methodology wherein the adversary deceptively omits a portion of the true training data to bias the learned causal structures in a desired manner (under strong signed sample input validation, this behavior seems to be the only strategy available to the adversary). Under this model, theoretically sound attack mechanisms are derived for the case of arbitrary SCMs, and a sample-efficient learning-based heuristic is given. Experimental validation of these approaches on real and synthetic datasets, across a range of SCMs from the family of additive noise models (linear Gaussian, linear non-Gaussian, and non-linear Gaussian), demonstrates the effectiveness of adversarial missingness attacks at deceiving popular causal structure learning algorithms.

References

[1]

Emad Alsuwat, Hatim Alsuwat, Marco Valtorta, and Csilla Farkas. 2018. Cyber attacks against the PC learning algorithm. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 159–176.

[2]

Emad Alsuwat, Hatim Alsuwat, Marco Valtorta, and Csilla Farkas. 2020. Adversarial data poisoning attacks against the PC learning algorithm. International Journal of General Systems 49, 1 (2020), 3–31.

[3]

Rohit Bhattacharya, Razieh Nabi, Ilya Shpitser, and James M. Robins. 2020. Identification in missing data models represented by directed acyclic graphs. In Proceedings of the 35th Uncertainty in Artificial Intelligence Conference. PMLR, 1149–1158.

[4]

Ruichu Cai, Zhiyi Huang, Wei Chen, Zhifeng Hao, and Kun Zhang. 2023. Causal discovery with latent confounders based on higher-order cumulants. In Proceedings of the 40th International Conference on Machine Learning (ICML’23), Vol. 202, JMLR.org, 3380–3407.

[5]

Jiazhu Dai, Chuanshuai Chen, and Yufeng Li. 2019. A backdoor attack against LSTM-based text classification systems. IEEE Access 7 (2019), 138872–138878.

[6]

T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona. 2021. A survey on missing data in machine learning. Big Data 8, 1 (2021). DOI:

[7]

Minghong Fang, Neil Zhenqiang Gong, and Jia Liu. 2020. Influence function based data poisoning attacks to top-n recommender systems. In Proceedings of the Web Conference, 3019–3025.

[8]

Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2008. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 3 (July 2008), 432–441. DOI:

[9]

Erdun Gao, Ignavier Ng, Mingming Gong, Li Shen, Wei Huang, Tongliang Liu, Kun Zhang, and Howard Bondell. 2022. MissDAG: Causal discovery in the presence of missing data with continuous additive noise models. arXiv:2205.13869. Retrieved from from https://arxiv.org/abs/2205.13869

[10]

H. Hamidul, J. Carlin, J. Simpson, and K. Lee. 2018. A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Medical Research Methodology 18 (December 2018). DOI:

[11]

M. Y. Kashmoola and I. Ahmed, M. Ibrahim. 2021. Threats on machine learning technique by data poisoning attack: A survey. In Advances in Cyber Security. M. Anbar N. Abdullah, S. Manickam (Eds.), Springer Singapore, Singapore, 586–600.

[12]

Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 273–294.

[13]

Karthika Mohan and Judea Pearl. 2014. Graphical models for recovering probabilistic and causal queries from missing data. In Proceedings of the 27th International Conference on Neural Information Processing Systems, 1520–1528.

[14]

Ignavier Ng, AmirEmad Ghassami, and Kun Zhang. 2020. On the role of sparsity and DAG constraints for learning linear DAGs. In Advances in Neural Information Processing Systems, Vol. 33, Curran Associates, Inc., 17943–17954.

[15]

Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. 2014. Causal discovery with continuous additive noise models. The Journal of Machine Learning Research 15, 1 (2014), 2009–2053.

Digital Library

[16]

Donald B. Rubin. 1976. Inference and missing data. Biometrika 63, 3 (1976), 581–592.

[17]

Karen Sachs, Omar Perez, Dana Pe’er, Douglas A. Lauffenburger, and Garry P. Nolan. 2005. Causal protein-signaling networks derived from multiparameter single-cell data. Science 308, 5721 (April 2005), 523–529. DOI:

[18]

Skipper Seabold and Josef Perktold. 2010. statsmodels: Econometric and statistical modeling with python. In Proceedings of the 9th Python in Science Conference.

[19]

Ilya Shpitser, Karthika Mohan, and Judea Pearl. 2015. Missing data as a causal and probabilistic problem. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI’15). AUAI Press, 802–811.

[20]

Peter Spirtes, Clark N. Glymour, Richard Scheines, and David Heckerman. 2000. Causation, Prediction, and Search. MIT Press.

[21]

N. Städler and P. Bühlmann. 2010. Missing values: Sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing 22 (2010), 219–235.

[22]

Felix Thoemmes and Karthika Mohan. 2015. Graphical representation of missing data problems. Structural Equation Modeling: A Multidisciplinary Journal 22, 4 (2015), 631–642.

[23]

Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellström, and Kun Zhang. 2019. Causal discovery in the presence of missing data. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1762–1770.

[24]

Feng Xie, Ruichu Cai, Biwei Huang, Clark Glymour, Zeng Hao, and Kun Zhang. 2020. Generalized independent noise condition for estimating latent variable causal graphs. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ’20). Curran Associates Inc., 14891–14902.

[25]

Xun Zheng, Bryon Aragam, Pradeep K. Ravikumar, and Eric P. Xing. 2018. DAGs with NO TEARS: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems, Vol. 31, Curran Associates, Inc.

[26]

Xun Zheng, Chen Dan, Bryon Aragam, Pradeep Ravikumar, and Eric Xing. 2020. Learning sparse nonparametric DAGs. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. PMLR, 3414–3425.

[27]

Shengyu Zhu, Ignavier Ng, and Zhitang Chen. 2020. Causal discovery with reinforcement learning. arXiv:1906.04477. Retrieved from https://arxiv.org/abs/1906.04477

Index Terms

Adversarial Missingness Attacks on Causal Structure Learning
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Machine learning approaches

Recommendations

Deception by Omission: Using Adversarial Missingness to Poison Causal Structure Learning
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Causality-informed machine learning has been proposed as an avenue for achieving many of the goals of modern machine learning, from ensuring generalization under domain shifts to attaining fairness, robustness, and interpretability. A key component of ...
Defending Against Adversarial Denial-of-Service Data Poisoning Attacks
DYNAMICS '20: Proceedings of the 2020 Workshop on DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security

Data poisoning is one of the most relevant security threats against machine learning and data-driven technologies. Since many applications rely on untrusted training data, an attacker can easily craft malicious samples and inject them into the training ...
Data poisoning attacks against machine learning algorithms
Abstract
For the past decade, machine learning technology has increasingly become popular and it has been contributing to many areas that have the potential to influence the society considerably. Generally, machine learning is used by various ...
Highlights
- A new approach to analyze robustness of machine learning.
- Machine learning ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 15, Issue 6

December 2024

727 pages

EISSN:2157-6912

DOI:10.1145/3613712

Editor:
Huan Liu
Arizona State University, AZ

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 December 2024

Online AM: 27 August 2024

Accepted: 04 July 2024

Revised: 08 May 2024

Received: 20 October 2023

Published in TIST Volume 15, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
216
Total Downloads

Downloads (Last 12 months)216
Downloads (Last 6 weeks)10

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents