article

Free access

Using Markov Blankets for Causal Structure Learning

Authors:

Jean-Philippe Pellet,

André ElisseeffAuthors Info & Claims

The Journal of Machine Learning Research, Volume 9

Pages 1295 - 1342

Published: 01 June 2008 Publication History

Abstract

We show how a generic feature-selection algorithm returning strongly relevant variables can be turned into a causal structure-learning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and children's parents (or spouses), also known as the Markov blanket of X. Identifying the spouses leads to the detection of the V-structure patterns and thus to causal orientations. Repeating the task for all variables yields a valid partially oriented causal graph. We first show an efficient way to identify the spouse links. We then perform several experiments in the continuous domain using the Recursive Feature Elimination feature-selection algorithm with Support Vector Regression and empirically verify the intuition of this direct (but computationally expensive) approach. Within the same framework, we then devise a fast and consistent algorithm, Total Conditioning (TC), and a variant, TC_bw, with an explicit backward feature-selection heuristics, for Gaussian data. After running a series of comparative experiments on five artificial networks, we argue that Markov blanket algorithms such as TC/TC_bw or Grow-Shrink scale better than the reference PC algorithm and provides higher structural accuracy.

References

[1]

B. Abramson, J. Brown, A. Murphy, and R. L. Winkler. Hailfinder: A Bayesian system for forecasting severe weather. International Journal of Forecasting, 12:57-71, 1996.

[2]

C. F. Aliferis, I. Tsamardinos, and A. Statnikov. HITON, a novel Markov blanket algorithm for optimal variable selection. In Proceedings of the 2003 American Medical Informatics Association (AMIA) Annual Symposium, pages 21-25, 2003.

[3]

S. Andreassen, R. Hovorka, J. Benn, Kristian G. Olesen, and E. R. Carson. A model-based approach to insulin adjustment. In Proc. of the Third Conf. on AI in Medicine, pages 239-248. Springer-Verlag, 1991.

[4]

K. Baba, R. Shibata, and M. Sibuya. Partial correlation and conditional correlation as measures of conditional independence. Australian & New Zealand Journal of Statistics, 46(4), 2004.

[5]

F. R. Bach and M. I. Jordan. Learning graphical models with Mercer kernels. In Advances in Neural Information Processing Systems 15, 2003.

[6]

I. Beinlich, H. J. Suermondt, R. M. Chavez, and G. F. Cooper. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proc. of the Second European Conf. on AI in Medicine, pages 247-256, 1989.

[7]

J. Binder, D. Koller, S. Russell, and K. Kanazawa. Adaptive probabilistic networks with hidden variables. Machine Learning, 29, 1997.

Digital Library

[8]

D. M. Chickering. Optimal structure identification with greedy search. The Journal of Machine Learning Research, 3:507-554, 2002.

Digital Library

[9]

G. Elidan. Bayes net repository. Website, 2001. URL http://compbio.cs.huji.ac.il/Repository/.

[10]

N. Friedman, M. Linial, I. Nachman, and D. Pe'er. Using Bayesian networks to analyze expression data. In RECOMB, pages 127-135, 2000.

Digital Library

[11]

L. D. Fu. A comparison of state-of-the-art algorithms for learning Bayesian network structures from continuous data. Master's thesis, Vanderbilt University, 2005.

[12]

I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182, 2003.

Digital Library

[13]

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3):389-422, 2002.

Digital Library

[14]

I. Guyon, C. Aliferis, and A. Elisseeff. Causal feature selection. In H. Liu and H. Motoda, editors, Computational Methods of Feature Selection. Chapman and Hall/CRC Press, 2007.

[15]

D. Hardin, I. Tsamardinos, and C. F. Aliferis. A theoretical characterization of linear SVM-based feature selection. In Proceedings of the Twenty First International Conference on Machine Learning , 2004.

Digital Library

[16]

D. M. Hausman and J. Woodward. Independence, invariance and the causal Markov condition. British Journal for the Philosophy of Sciencev, 50:521-583, 1999.

[17]

G. H. John, R. Kohavi, and K. Pfleger. Irrelevant feature and the subset selection problem. In Proceedings of the Eleventh International Conference on Machine Learning, pages 121-129, 1994.

Digital Library

[18]

G. G. Judge, R. Carter Hill, W. E. Griffiths, H. Lütkepohl, and T.-C. Lee. Introduction to the Theory and Practice of Econometrics, 2nd Edition. Wiley, 1988.

[19]

R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97:273- 324, 1997.

Digital Library

[20]

S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B, 50:157- 224, 1988.

[21]

P. Leray and O. François. BNT structure learning package, 2004. URL banquiseasi.insa-rouen.fr/projects/bnt-slp/.

[22]

D. Margaritis. Distribution-free learning of Bayesian network structure in continuous domains. In Proc. of the 20th National Conf. on AI, 2005.

Digital Library

[23]

D. Margaritis and S. Thrun. Bayesian network induction via local neighborhoods. In Advances in Neural Information Processing Systems 12, 1999.

[24]

C. Meek. Causal inference and causal explanation with background knowledge. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 1995.

Digital Library

[25]

R. Nilsson, J. M. Peña, J. Björkegren, and J. Tegnér. Consistent feature selection for pattern recognition in polynomial time. The Journal of Machine Learning Research, 8:589-612, 2007.

Digital Library

[26]

J. M. Peña, J. Björkegren, and J. Tegnér. Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption. In Proceedings of the Eighth European Conference on Symbolic and Quantitative Approaches to Reasoning under Uncertainty, pages 136-147, 2005.

Digital Library

[27]

J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.

Digital Library

[28]

J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Los Altos, 1988.

Digital Library

[29]

J. Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669-709, 1995.

[30]

J. Pearl and T. Verma. A theory of inferred causation. In Proc. of the Second Int. Conf. on Principles of Knowledge Representation and Reasoning. Morgan Kaufmann, 1991.

Digital Library

[31]

J.-P. Pellet and A. Elisseeff. A partial correlation-based algorithm for causal structure discovery with continuous variables. In 7th International Symposium on Intelligent Data Analysis, 2007.

Digital Library

[32]

A. Raveh. On the use of the inverse of the correlation matrix in multivariate data analysis. The American Statistician, 39:39-42, 1985.

[33]

R. Scheines, P. Spirtes, C. Glymour, C. Meek, and T. Richardson. The TETRAD project: Constraint based aids to causal model specification. Technical report, Carnegie Mellon University, Dpt. of Philosophy, 1995.

[34]

A. J. Smola and B. Schölkopf. A tutorial on support vector regression. Technical report, Neuro- COLT2, 1998.

[35]

P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search, Second Edition. The MIT Press, 2001. ISBN 0262194406.

[36]

A. Statnikov, D. Hardin, and C. F. Aliferis. Using SVM weight-based methods to identify causally relevant and non-causally relevant variables. Technical report, Vanderbilt University, USA, 2006.

[37]

D. Steel. Homogeneity, selection, and the faithfulness condition. Technical report, Michigan State University, Department of Philosophy, 2005.

[38]

M. Talih. Markov Random Fields on Time-Varying Graphs, with an Application to Portfolio Selection . PhD thesis, Hunter College, 2003.

[39]

R. Tibshirani. Regression shrinkage and selection via the lasso. Technical report, University of Toronto, 1994.

[40]

I. Tsamardinos and C. Aliferis. Towards principled feature selection: Relevancy. Artificial Intelligence and Statistics, 2003.

[41]

I. Tsamardinos, C. F. Aliferis, and A. Statnikov. Time and sample efficient discovery of Markov blankets and direct causal relations. In ACM Press, editor, Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 673-678, 2003.

Digital Library

[42]

I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 2006.

Digital Library

Cited By

Wu XJiang BWu TChen HWilliams BChen YNeville J(2023)Practical Markov boundary learning without strong assumptionsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i9.26236(10388-10398)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i9.26236
Mokhtarian EKhorasani MEtesami JKiyavash NWilliams BChen YNeville J(2023)Novel ordering-based approaches for causal structure learning in the presence of unobserved variablesProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i10.26445(12260-12268)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i10.26445
Yu KLing ZLiu LLi PWang HLi J(2023)Feature Selection for Efficient Local-to-global Bayesian Network Structure LearningACM Transactions on Knowledge Discovery from Data10.1145/362447918:2(1-27)Online publication date: 19-Sep-2023
https://dl.acm.org/doi/10.1145/3624479
Show More Cited By

Using Markov Blankets for Causal Structure Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory

Recommendations

Towards Practical Federated Causal Structure Learning
Machine Learning and Knowledge Discovery in Databases: Research Track
Abstract
Understanding causal relations is vital in scientific discovery. The process of causal structure learning involves identifying causal graphs from observational data to understand such relations. Usually, a central server performs this task, but ...
Open problems in causal structure learning: A case study of COVID-19 in the UK
Abstract
Causal machine learning (ML) algorithms recover graphical structures that tell us something about cause-and-effect relationships. The causal representation provided by these algorithms enables transparency and explainability, which is necessary ...
Causal discovery with hidden confounders using the algorithmic Markov condition
UAI '23: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence

Causal sufficiency is a cornerstone assumption in causal discovery. It is, however, both unlikely to hold in practice as well as unverifiable. When it does not hold, existing methods struggle to return meaningful results. In this paper, we show how to ...

Comments

Information & Contributors

Information

Published In

The Journal of Machine Learning Research Volume 9, Issue

6/1/2008

1964 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 June 2008

Published in JMLR Volume 9

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
617
Total Downloads

Downloads (Last 12 months)71
Downloads (Last 6 weeks)9

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu XJiang BWu TChen HWilliams BChen YNeville J(2023)Practical Markov boundary learning without strong assumptionsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i9.26236(10388-10398)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i9.26236
Mokhtarian EKhorasani MEtesami JKiyavash NWilliams BChen YNeville J(2023)Novel ordering-based approaches for causal structure learning in the presence of unobserved variablesProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i10.26445(12260-12268)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i10.26445
Yu KLing ZLiu LLi PWang HLi J(2023)Feature Selection for Efficient Local-to-global Bayesian Network Structure LearningACM Transactions on Knowledge Discovery from Data10.1145/362447918:2(1-27)Online publication date: 19-Sep-2023
https://dl.acm.org/doi/10.1145/3624479
Wu XJiang BZhong YChen H(2023)Multi-Target Markov Boundary Discovery: Theory, Algorithm, and ApplicationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.319978445:4(4964-4980)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TPAMI.2022.3199784
Dustdar SPujol VDonta P(2023)On Distributed Computing Continuum SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314285635:4(4092-4105)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3142856
Yang JXie KAn NZhang ARangwala H(2022)Causal Discovery on Non-Euclidean DataProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539485(2202-2211)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539485
Wu XJiang BZhong YChen Hd'Aquin MDietze SHauff CCurry ECudre Mauroux P(2020)Tolerant Markov Boundary Discovery for Feature SelectionProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3415927(2261-2264)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3340531.3415927
Zhang WFang WSun JChen Q(2020)Learning Bayesian Networks Structures with an Effective Knowledge-driven GA2020 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC48606.2020.9185884(1-8)Online publication date: 19-Jul-2020
https://dl.acm.org/doi/10.1109/CEC48606.2020.9185884
Xu SJia BLiang F(2019)Learning moral graphs in construction of high-dimensional bayesian networks for mixed dataNeural Computation10.1162/neco_a_0119031:6(1183-1214)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1162/neco_a_01190
de Carvalho WZarate LHung CPapadopoulos G(2019)Causality relationship among attributes applied in an educational data setProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297406(1271-1277)Online publication date: 8-Apr-2019
https://dl.acm.org/doi/10.1145/3297280.3297406
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents