skip to main content
article
Free access

Using Markov Blankets for Causal Structure Learning

Published: 01 June 2008 Publication History

Abstract

We show how a generic feature-selection algorithm returning strongly relevant variables can be turned into a causal structure-learning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and children's parents (or spouses), also known as the Markov blanket of X. Identifying the spouses leads to the detection of the V-structure patterns and thus to causal orientations. Repeating the task for all variables yields a valid partially oriented causal graph. We first show an efficient way to identify the spouse links. We then perform several experiments in the continuous domain using the Recursive Feature Elimination feature-selection algorithm with Support Vector Regression and empirically verify the intuition of this direct (but computationally expensive) approach. Within the same framework, we then devise a fast and consistent algorithm, Total Conditioning (TC), and a variant, TCbw, with an explicit backward feature-selection heuristics, for Gaussian data. After running a series of comparative experiments on five artificial networks, we argue that Markov blanket algorithms such as TC/TCbw or Grow-Shrink scale better than the reference PC algorithm and provides higher structural accuracy.

References

[1]
B. Abramson, J. Brown, A. Murphy, and R. L. Winkler. Hailfinder: A Bayesian system for forecasting severe weather. International Journal of Forecasting, 12:57-71, 1996.
[2]
C. F. Aliferis, I. Tsamardinos, and A. Statnikov. HITON, a novel Markov blanket algorithm for optimal variable selection. In Proceedings of the 2003 American Medical Informatics Association (AMIA) Annual Symposium, pages 21-25, 2003.
[3]
S. Andreassen, R. Hovorka, J. Benn, Kristian G. Olesen, and E. R. Carson. A model-based approach to insulin adjustment. In Proc. of the Third Conf. on AI in Medicine, pages 239-248. Springer-Verlag, 1991.
[4]
K. Baba, R. Shibata, and M. Sibuya. Partial correlation and conditional correlation as measures of conditional independence. Australian & New Zealand Journal of Statistics, 46(4), 2004.
[5]
F. R. Bach and M. I. Jordan. Learning graphical models with Mercer kernels. In Advances in Neural Information Processing Systems 15, 2003.
[6]
I. Beinlich, H. J. Suermondt, R. M. Chavez, and G. F. Cooper. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proc. of the Second European Conf. on AI in Medicine, pages 247-256, 1989.
[7]
J. Binder, D. Koller, S. Russell, and K. Kanazawa. Adaptive probabilistic networks with hidden variables. Machine Learning, 29, 1997.
[8]
D. M. Chickering. Optimal structure identification with greedy search. The Journal of Machine Learning Research, 3:507-554, 2002.
[9]
G. Elidan. Bayes net repository. Website, 2001. URL http://compbio.cs.huji.ac.il/Repository/.
[10]
N. Friedman, M. Linial, I. Nachman, and D. Pe'er. Using Bayesian networks to analyze expression data. In RECOMB, pages 127-135, 2000.
[11]
L. D. Fu. A comparison of state-of-the-art algorithms for learning Bayesian network structures from continuous data. Master's thesis, Vanderbilt University, 2005.
[12]
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182, 2003.
[13]
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3):389-422, 2002.
[14]
I. Guyon, C. Aliferis, and A. Elisseeff. Causal feature selection. In H. Liu and H. Motoda, editors, Computational Methods of Feature Selection. Chapman and Hall/CRC Press, 2007.
[15]
D. Hardin, I. Tsamardinos, and C. F. Aliferis. A theoretical characterization of linear SVM-based feature selection. In Proceedings of the Twenty First International Conference on Machine Learning , 2004.
[16]
D. M. Hausman and J. Woodward. Independence, invariance and the causal Markov condition. British Journal for the Philosophy of Sciencev, 50:521-583, 1999.
[17]
G. H. John, R. Kohavi, and K. Pfleger. Irrelevant feature and the subset selection problem. In Proceedings of the Eleventh International Conference on Machine Learning, pages 121-129, 1994.
[18]
G. G. Judge, R. Carter Hill, W. E. Griffiths, H. Lütkepohl, and T.-C. Lee. Introduction to the Theory and Practice of Econometrics, 2nd Edition. Wiley, 1988.
[19]
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97:273- 324, 1997.
[20]
S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B, 50:157- 224, 1988.
[21]
P. Leray and O. François. BNT structure learning package, 2004. URL banquiseasi.insa-rouen.fr/projects/bnt-slp/.
[22]
D. Margaritis. Distribution-free learning of Bayesian network structure in continuous domains. In Proc. of the 20th National Conf. on AI, 2005.
[23]
D. Margaritis and S. Thrun. Bayesian network induction via local neighborhoods. In Advances in Neural Information Processing Systems 12, 1999.
[24]
C. Meek. Causal inference and causal explanation with background knowledge. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 1995.
[25]
R. Nilsson, J. M. Peña, J. Björkegren, and J. Tegnér. Consistent feature selection for pattern recognition in polynomial time. The Journal of Machine Learning Research, 8:589-612, 2007.
[26]
J. M. Peña, J. Björkegren, and J. Tegnér. Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption. In Proceedings of the Eighth European Conference on Symbolic and Quantitative Approaches to Reasoning under Uncertainty, pages 136-147, 2005.
[27]
J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
[28]
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Los Altos, 1988.
[29]
J. Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669-709, 1995.
[30]
J. Pearl and T. Verma. A theory of inferred causation. In Proc. of the Second Int. Conf. on Principles of Knowledge Representation and Reasoning. Morgan Kaufmann, 1991.
[31]
J.-P. Pellet and A. Elisseeff. A partial correlation-based algorithm for causal structure discovery with continuous variables. In 7th International Symposium on Intelligent Data Analysis, 2007.
[32]
A. Raveh. On the use of the inverse of the correlation matrix in multivariate data analysis. The American Statistician, 39:39-42, 1985.
[33]
R. Scheines, P. Spirtes, C. Glymour, C. Meek, and T. Richardson. The TETRAD project: Constraint based aids to causal model specification. Technical report, Carnegie Mellon University, Dpt. of Philosophy, 1995.
[34]
A. J. Smola and B. Schölkopf. A tutorial on support vector regression. Technical report, Neuro- COLT2, 1998.
[35]
P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search, Second Edition. The MIT Press, 2001. ISBN 0262194406.
[36]
A. Statnikov, D. Hardin, and C. F. Aliferis. Using SVM weight-based methods to identify causally relevant and non-causally relevant variables. Technical report, Vanderbilt University, USA, 2006.
[37]
D. Steel. Homogeneity, selection, and the faithfulness condition. Technical report, Michigan State University, Department of Philosophy, 2005.
[38]
M. Talih. Markov Random Fields on Time-Varying Graphs, with an Application to Portfolio Selection . PhD thesis, Hunter College, 2003.
[39]
R. Tibshirani. Regression shrinkage and selection via the lasso. Technical report, University of Toronto, 1994.
[40]
I. Tsamardinos and C. Aliferis. Towards principled feature selection: Relevancy. Artificial Intelligence and Statistics, 2003.
[41]
I. Tsamardinos, C. F. Aliferis, and A. Statnikov. Time and sample efficient discovery of Markov blankets and direct causal relations. In ACM Press, editor, Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 673-678, 2003.
[42]
I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 2006.

Cited By

View all
  • (2023)Practical Markov boundary learning without strong assumptionsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i9.26236(10388-10398)Online publication date: 7-Feb-2023
  • (2023)Novel ordering-based approaches for causal structure learning in the presence of unobserved variablesProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i10.26445(12260-12268)Online publication date: 7-Feb-2023
  • (2023)Feature Selection for Efficient Local-to-global Bayesian Network Structure LearningACM Transactions on Knowledge Discovery from Data10.1145/362447918:2(1-27)Online publication date: 19-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

The Journal of Machine Learning Research  Volume 9, Issue
6/1/2008
1964 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 June 2008
Published in JMLR Volume 9

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)9
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Practical Markov boundary learning without strong assumptionsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i9.26236(10388-10398)Online publication date: 7-Feb-2023
  • (2023)Novel ordering-based approaches for causal structure learning in the presence of unobserved variablesProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i10.26445(12260-12268)Online publication date: 7-Feb-2023
  • (2023)Feature Selection for Efficient Local-to-global Bayesian Network Structure LearningACM Transactions on Knowledge Discovery from Data10.1145/362447918:2(1-27)Online publication date: 19-Sep-2023
  • (2023)Multi-Target Markov Boundary Discovery: Theory, Algorithm, and ApplicationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.319978445:4(4964-4980)Online publication date: 1-Apr-2023
  • (2023)On Distributed Computing Continuum SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314285635:4(4092-4105)Online publication date: 1-Apr-2023
  • (2022)Causal Discovery on Non-Euclidean DataProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539485(2202-2211)Online publication date: 14-Aug-2022
  • (2020)Tolerant Markov Boundary Discovery for Feature SelectionProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3415927(2261-2264)Online publication date: 19-Oct-2020
  • (2020)Learning Bayesian Networks Structures with an Effective Knowledge-driven GA2020 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC48606.2020.9185884(1-8)Online publication date: 19-Jul-2020
  • (2019)Learning moral graphs in construction of high-dimensional bayesian networks for mixed dataNeural Computation10.1162/neco_a_0119031:6(1183-1214)Online publication date: 1-Jun-2019
  • (2019)Causality relationship among attributes applied in an educational data setProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297406(1271-1277)Online publication date: 8-Apr-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media