Learning graphical models from data

What is independent of what?



Learning the independence graph structure from data in a graphical model.. A particular sparse model selection problem where the model is hierarchical.

Learning these models turns out to need a conditional independence test, an awareness of multiple testing and graph theory.

  • skggm (python) does the Gaussian thing but also has a nice sparsification and good explanation.

Xun Zheng, Bryon Aragam and Chen Dan blog Learning DAGs with Continuous Optimization. This is an exciting bit of work AFAICT. Download from https://github.com/xunzheng/notears, and read the papers (Zheng et al. 2018; Zheng et al. 2020).

Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint. In this paper, we introduce a fundamentally different strategy: We formulate the structure learning problem as a purely continuous optimization problem over real matrices that avoids this combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, which also makes implementation effortless. The proposed method outperforms existing ones, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree.

The key insight is

The intuition behind this function is that the k-th power of the adjacency matrix of a graph counts the number of k-step paths from one node to another. In other words, if the diagonal of the matrix power turns out to be all zeros, there is no k-step cycles in the graph. Then to characterize acyclicity, we just need to set this constraint for all k=1,2,…,d, eliminating cycles of all possible length.

  • bnlearn learns belief networks

  • sparsebn:

    A new R package for learning sparse Bayesian networks and other graphical models from high-dimensional data via sparse regularization. Designed from the ground up to handle:

    • Experimental data with interventions
    • Mixed observational / experimental data
    • High-dimensional data with p >> n
    • Datasets with thousands of variables (tested up to p=8000)
    • Continuous and discrete data

    The emphasis of this package is scalability and statistical consistency on high-dimensional datasets. […] For more details on this package, including worked examples and the methodological background, please see our new preprint.

    Overview

    The main methods for learning graphical models are:

    • estimate.dag for directed acyclic graphs (Bayesian networks).
    • estimate.precision for undirected graphs (Markov random fields).
    • estimate.covariance for covariance matrices.

    Currently, estimation of precision and covariances matrices is limited to Gaussian data.

TETRAD

TETRAD (source, tutorial) is a tool for discovering and visualising and calculating giant empirical DAGs, including general graphical inference and causality. It’s written by eminent causality inference people.

Tetrad is a program which creates, simulates data from, estimates, tests, predicts with, and searches for causal and statistical models. The aim of the program is to provide sophisticated methods in a friendly interface requiring very little statistical sophistication of the user and no programming knowledge. It is not intended to replace flexible statistical programming systems such as Matlab, Splus or R. Tetrad is freeware that performs many of the functions in commercial programs such as Netica, Hugin, LISREL, EQS and other programs, and many discovery functions these commercial programs do not perform. …

The Tetrad programs describe causal models in three distinct parts or stages: a picture, representing a directed graph specifying hypothetical causal relations among the variables; a specification of the family of probability distributions and kinds of parameters associated with the graphical model; and a specification of the numerical values of those parameters.

misc

References

Azadkia, Mona, and Sourav Chatterjee. 2019. A Simple Measure of Conditional Dependence.” arXiv:1910.12327 [Cs, Math, Stat], December.
Bayati, M., and A. Montanari. 2012. The LASSO Risk for Gaussian Matrices.” IEEE Transactions on Information Theory 58 (4): 1997–2017.
Besserve, Michel, Arash Mehrjou, Rémy Sun, and Bernhard Schölkopf. 2019. Counterfactuals Uncover the Modular Structure of Deep Generative Models.” In arXiv:1812.03253 [Cs, Stat].
Bühlmann, Peter, and Sara van de Geer. 2011. Statistics for High-Dimensional Data: Methods, Theory and Applications. 2011 edition. Heidelberg ; New York: Springer.
Buntine, W.L. 1996. A Guide to the Literature on Learning Probabilistic Networks from Data.” IEEE Transactions on Knowledge and Data Engineering 8 (2): 195–210.
Cai, T. Tony. 2017. Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures.” Annual Review of Statistics and Its Application 4 (1): 423–46.
Colombo, Diego, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. Learning High-Dimensional Directed Acyclic Graphs with Latent and Selection Variables.” The Annals of Statistics 40 (1): 294–321.
Cox, D. R., and H. S. Battey. 2017. Large Numbers of Explanatory Variables, a Semi-Descriptive Analysis.” Proceedings of the National Academy of Sciences 114 (32): 8592–95.
Dezfouli, Amir, Edwin V Bonilla, and Richard Nock. 2018. “Variational Network Inference: Strong and Stable with Concrete Support.” In, 10.
Drton, Mathias, and Marloes H. Maathuis. 2017. Structure Learning in Graphical Modeling.” Annual Review of Statistics and Its Application 4 (1): 365–93.
Foygel, Rina, and Mathias Drton. 2010. Extended Bayesian Information Criteria for Gaussian Graphical Models.” In Advances in Neural Information Processing Systems 23, edited by J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, 604–12. Curran Associates, Inc.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008. Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41.
Fu, Fei, and Qing Zhou. 2013. Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent.” Journal of the American Statistical Association 108 (501): 288–300.
Gao, Ming, Yi Ding, and Bryon Aragam. 2020. A Polynomial-Time Algorithm for Learning Nonparametric Causal Graphs.” arXiv:2006.11970 [Cs, Math, Stat], November.
Geer, Sara van de. 2014. Worst Possible Sub-Directions in High-Dimensional Models.” In arXiv:1403.7023 [Math, Stat]. Vol. 131.
Geng, Zhi, Yue Liu, Chunchen Liu, and Wang Miao. 2019. Evaluation of Causal Effects and Local Structure Learning of Causal Networks.” Annual Review of Statistics and Its Application 6 (1): 103–24.
Gnecco, Nicola, Nicolai Meinshausen, Jonas Peters, and Sebastian Engelke. 2021. Causal Discovery in Heavy-Tailed Models.” The Annals of Statistics 49 (3): 1755–78.
Gogate, Vibhav, William Webb, and Pedro Domingos. 2010. Learning Efficient Markov Networks.” In Advances in Neural Information Processing Systems, 748–56.
Gu, Jiaying, and Qing Zhou. 2020. Learning Big Gaussian Bayesian Networks: Partition, Estimation and Fusion.” Journal of Machine Learning Research 21 (158): 1–31.
Hallac, David, Jure Leskovec, and Stephen Boyd. 2015. Network Lasso: Clustering and Optimization in Large Graphs.” arXiv:1507.00280 [Cs, Math, Stat], July.
Harris, Naftali, and Mathias Drton. 2013. PC Algorithm for Nonparanormal Graphical Models.” Journal of Machine Learning Research 14 (1): 3365–83.
Hinton, Geoffrey E., Simon Osindero, and Kejie Bao. 2005. Learning Causally Linked Markov Random Fields.” In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, 128–35. Citeseer.
Huang, Biwei, Kun Zhang, Jiji Zhang, Joseph Ramsey, Ruben Sanchez-Romero, Clark Glymour, and Bernhard Schölkopf. 2020. Causal Discovery from Heterogeneous/Nonstationary Data.” Journal of Machine Learning Research 21 (89): 1–53.
Janzing, Dominik, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. 2012. Information-Geometric Approach to Inferring Causal Directions.” Artificial Intelligence 182-183 (May): 1–31.
Jung, Alexander, Nguyen Tran Quang, and Alexandru Mara. 2017. When Is Network Lasso Accurate? arXiv:1704.02107 [Stat], April.
Khoshgnauz, Ehsan. 2012. Learning Markov Network Structure Using Brownian Distance Covariance.” arXiv:1206.6361 [Cs, Stat], June.
Kocaoglu, Murat, Alex Dimakis, and Sriram Vishwanath. 2017. Cost-Optimal Learning of Causal Graphs.” In PMLR, 1875–84.
Kocaoglu, Murat, Christopher Snyder, Alexandros G. Dimakis, and Sriram Vishwanath. 2017. CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training.” arXiv:1709.02023 [Cs, Math, Stat], September.
Krämer, Nicole, Juliane Schäfer, and Anne-Laure Boulesteix. 2009. Regularized Estimation of Large-Scale Gene Association Networks Using Graphical Gaussian Models.” BMC Bioinformatics 10 (1): 384.
Lederer, Johannes. 2016. Graphical Models for Discrete and Continuous Data.” arXiv:1609.05551 [Math, Stat], September.
Lee, Su-In, Varun Ganapathi, and Daphne Koller. 2006. Efficient Structure Learning of Markov Networks Using $ L_1 $-Regularization.” In Advances in Neural Information Processing Systems, 817–24. MIT Press.
Li, Yunzhu, Antonio Torralba, Animashree Anandkumar, Dieter Fox, and Animesh Garg. 2020. Causal Discovery in Physical Systems from Videos.” arXiv:2007.00631 [Cs, Stat], July.
Liu, Han, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. 2012. The Nonparanormal SKEPTIC.” arXiv:1206.6488 [Cs, Stat], June.
Liu, Han, John Lafferty, and Larry Wasserman. 2009. The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs.” Journal of Machine Learning Research 10 (December): 2295–2328.
Locatello, Francesco, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. 2019. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations.” arXiv:1811.12359 [Cs, Stat], June.
Lorch, Lars, Jonas Rothfuss, Bernhard Schölkopf, and Andreas Krause. 2021. DiBS: Differentiable Bayesian Structure Learning.” In.
Mansinghka, Vikash, Charles Kemp, Thomas Griffiths, and Joshua Tenenbaum. 2012. Structured Priors for Structure Learning.” arXiv:1206.6852, June.
Mazumder, Rahul, and Trevor Hastie. 2012. The graphical lasso: New insights and alternatives.” Electronic Journal of Statistics 6 (November): 2125–49.
Montanari, Andrea. 2012. Graphical Models Concepts in Compressed Sensing.” Compressed Sensing: Theory and Applications, 394–438.
Mooij, Joris M., Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks.” Journal of Machine Learning Research 17 (32): 1–102.
Nair, Suraj, Yuke Zhu, Silvio Savarese, and Li Fei-Fei. 2019. Causal Induction from Visual Observations for Goal Directed Tasks.” arXiv:1910.01751 [Cs, Stat], October.
Narendra, Tanmayee, Anush Sankaran, Deepak Vijaykeerthy, and Senthil Mani. 2018. Explaining Deep Learning Models Using Causal Inference.” arXiv:1811.04376 [Cs, Stat], November.
Nauta, Meike, Doina Bucur, and Christin Seifert. 2019. Causal Discovery with Attention-Based Convolutional Neural Networks.” Machine Learning and Knowledge Extraction 1 (1): 312–40.
Neapolitan, Richard E. 2003. Learning Bayesian Networks. Vol. 38. Prentice Hal, Paperback.
Ng, Ignavier, Zhuangyan Fang, Shengyu Zhu, Zhitang Chen, and Jun Wang. 2020. Masked Gradient-Based Causal Structure Learning.” arXiv:1910.08527 [Cs, Stat], February.
Ng, Ignavier, Shengyu Zhu, Zhitang Chen, and Zhuangyan Fang. 2019. A Graph Autoencoder Approach to Causal Structure Learning.” In Advances In Neural Information Processing Systems.
Obermeyer, Fritz, Eli Bingham, Martin Jankowiak, Du Phan, and Jonathan P. Chen. 2020. Functional Tensors for Probabilistic Programming.” arXiv:1910.10775 [Cs, Stat], March.
Peters, Jonas, Joris Mooij, Dominik Janzing, and Bernhard Schoelkopf. 2012. Identifiability of Causal Graphs Using Functional Models.” arXiv:1202.3757 [Cs, Stat], February.
Ramsey, Joseph, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. 2017. A Million Variables and More: The Fast Greedy Equivalence Search Algorithm for Learning High-Dimensional Graphical Causal Models, with an Application to Functional Magnetic Resonance Images.” International Journal of Data Science and Analytics 3 (2): 121–29.
Schelldorfer, Jürg, Lukas Meier, and Peter Bühlmann. 2014. GLMMLasso: An Algorithm for High-Dimensional Generalized Linear Mixed Models Using ℓ1-Penalization.” Journal of Computational and Graphical Statistics 23 (2): 460–77.
Textor, Johannes, Alexander Idelberger, and Maciej Liśkiewicz. 2015. Learning from Pairwise Marginal Independencies.” arXiv:1508.00280 [Cs], August.
Wu, Rui, R. Srikant, and Jian Ni. 2012. “Learning Graph Structures in Discrete Markov Random Fields.” In INFOCOM Workshops, 214–19.
Yang, Mengyue, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, and Jun Wang. 2020. CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models.” arXiv:2004.08697 [Cs, Stat], July.
Zhao, Tuo, Han Liu, Kathryn Roeder, John Lafferty, and Larry Wasserman. 2012. The Huge Package for High-Dimensional Undirected Graph Estimation in R.” Journal of Machine Learning Research : JMLR 13 (April): 1059–62.
Zheng, Xun, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. 2018. DAGs with NO TEARS: Continuous Optimization for Structure Learning.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 9472–83. Curran Associates, Inc.
Zheng, Xun, Chen Dan, Bryon Aragam, Pradeep Ravikumar, and Eric Xing. 2020. Learning Sparse Nonparametric DAGs.” In International Conference on Artificial Intelligence and Statistics, 3414–25. PMLR.
Zhou, Mingyuan, Yulai Cong, and Bo Chen. 2017. “Augmentable Gamma Belief Networks,” 44.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.