Learning graphical models from data

Also, causal discovery, structure discovery



Learning the independence graph structure from data in a graphical model. A particular sparse model selection problem where the model is hierarchical, or possibly even non-directional. For bonus sophistication we might even try to learn causal effects from data.

There are a few ways we can learn graphical models. The obvious one, to my mind, is to use a Bayesian network to learn the structure. conditional independence test, an awareness of multiple testing and graph theory. But also Bayesian sampling from possible graph structures is a thing apparently. There are other approaches too.

Bayesian learning of Bayesian networks

I am indebted to Dario Draca for introducing this field to me. Her recommended references are:

I think the introduction to this paper (Scutari, Graafland, and Gutiérrez 2019) gives an accessible description of the Bayesian approach to Bayesian network structure learning, including how to define parameter priors for discrete and Gaussian networks (the former case being potentially relevant to your signal detection/classification problem).

If you are particularly interested, the introduction to this paper (Scutari 2018) also gives some more information on computing marginal likelihoods (i.e. likelihood of data conditional only on the graph structure) for discrete Bayesian networks.

This one (Kuipers, Moffa, and Heckerman 2014a) gives a rigorous derivation of the marginal graph likelihood for Gaussian networks:

I also wonder about this seminar:

Guido Consonni, Objective Bayes Model Selection of Gaussian Essential Graphs with Observational and Interventional Data.

Graphical models based on Directed Acyclic Graphs (DAGs) represent a powerful tool for investigating dependencies among variables. It is well known that one cannot distinguish between DAGs encoding the same set of conditional independencies (Markov equivalent DAGs) using only observational data. However, the space of all DAGs can be partitioned into Markov equivalence classes, each being represented by a unique Essential Graph (EG), also called Completed Partially Directed Graph (CPDAG). In some fields, in particular genomics, one can have both observational and interventional data, the latter being produced after an exogenous perturbation of some variables in the system, or from randomized intervention experiments. Interventions destroy the original causal structure, and modify the Markov property of the underlying DAG, leading to a finer partition of DAGs into equivalence classes, each one being represented by an Interventional Essential Graph (I-EG) (Hauser and Buehlmann). In this talk we consider Bayesian model selection of EGs under the assumption that the variables are jointly Gaussian. In particular, we adopt an objective Bayes approach, based on the notion of fractional Bayes factor, and obtain a closed form expression for the marginal likelihood of an EG. Next we construct a Markov chain to explore the EG space under a sparsity constraint, and propose an MCMC algorithm to approximate the posterior distribution over the space of EGs. Our methodology, which we name Objective Bayes Essential graph Search (OBES), allows to evaluate the inferential uncertainty associated to any features of interest, for instance the posterior probability of edge inclusion. An extension of OBES to deal simultaneously with observational and interventional data is also presented: this involves suitable modifications of the likelihood and prior, as well as of the MCMC algorithm.

Classic methods using independence tests on graphs

Many. In my time in the lectures of Marloes Maathuis I learnt some of the theory, but TBH everything has fallen out my head now, since I have not used them in practice. Notable works are Colombo and Maathuis (2014); Drton and Maathuis (2017); Heinze-Deml, Maathuis, and Meinshausen (2018); Maathuis, Kalisch, and Bühlmann (2009); Maathuis et al. (2010).

Most of these seem to boil down to the cases where belief propagation is well-behaved, to wit, linear-Gaussian and discrete RVs. In these, dependence is simple (in the Gaussian case, essentially, two things are independence if their shared entry in the cross-precision matrix is zero). If we can find a way of estimating actual zeros in that matrix, then what? Often we want causal distributions, so the question remains: can we convert the implied undirected graph into a directed one? tl;dr: sometimes. This is one of those cases where the Bayesian method comes out much cleaner; averaging over possible graphs is a natural way of thinking about this.

For a less causal/intervention focused method, see Nonparanormal skeptic (🏗) (H. Liu et al. 2012) which combines semiparametric regression with non-parametric graph inference.

Researcher inferring optimal graph surgery

Learning by continuous optimization

A new trick in the arsenal from those neural network nerds. Xun Zheng, Bryon Aragam and Chen Dan in their blog post Learning DAGs with Continuous Optimization introduce NOTEARS. This is an interesting bit of work AFAICT. Download from xunzheng/notears, and read the papers (Zheng et al. 2018; Zheng et al. 2020):

Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint. In this paper, we introduce a fundamentally different strategy: We formulate the structure learning problem as a purely continuous optimization problem over real matrices that avoids this combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, which also makes implementation effortless. The proposed method outperforms existing ones, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree.

Key insight:

…that the k-th power of the adjacency matrix of a graph counts the number of k-step paths from one node to another. In other words, if the diagonal of the matrix power turns out to be all zeros, there [are] no k-step cycles in the graph. Then to characterize acyclicity, we just need to set this constraint for all k=1,2,…,d, eliminating cycles of all possible length.

Has this gone anywhere? TODO: read Zheng et al. (2020).

A different appproach is Lorch et al. (2021):

Continuous characterization of acyclic graphs Orthogonal to the work on Bayesian inference, Zheng et al. (2018); have recently proposed a differentiable characterization of acyclic graphs for structure learning. In this work, we adopt the formulation of Yu et al. (2019), who show that a graph with adjacency matrix \(\mathbf{G} \in\{0,1\}^{d \times d}\) does not have any cycles if and only if \(h(\mathbf{G})=0\), where \[ h(\mathbf{G}):=\operatorname{tr}\left[\left(\mathbf{I}+\frac{1}{d} \mathbf{G}\right)^{d}\right]-d . \]

Tools

DIBS

larslorch/dibs: Joint Bayesian inference of graph and parameters of general Bayes nets

This is the Python JAX implementation for DiBS (Lorch et al., 2021), a fully differentiable method for joint Bayesian inference of the DAG and parameters of general, causal Bayesian networks.

In this implementation, DiBS inference is performed with the particle variational inference method SVGD (Q. Liu and Wang 2019). Since DiBS and SVGD operate on continuous tensors and solely rely on Monte Carlo estimation and gradient ascent-like updates, the inference code leverages efficient vectorized operations, automatic differentiation, just-in-time compilation, and hardware acceleration, fully implemented with JAX.

Documentation.

Their code example is impressive:

In this example, we use DiBS to generate 10 DAG and parameter samples from the joint posterior over Gaussian Bayes nets with means modeled by neural networks.

from dibs.inference import JointDiBS
from dibs.target import make_nonlinear_gaussian_model
import jax.random as random
key = random.PRNGKey(0)

# simulate some data
key, subk = random.split(key)
data, model = make_nonlinear_gaussian_model(key=subk, n_vars=20)

# sample 10 DAG and parameter particles from the joint posterior
dibs = JointDiBS(x=data.x, inference_model=model)
key, subk = random.split(key)
gs, thetas = dibs.sample(key=subk, n_particles=10, steps=1000)

In the above, the keyword argument x for JointDiBS is a matrix of shape [N, d] and could be any real-world data set.

Cripes.

bnlearn

bnlearn learns belief networks, i.e. directed graphical models. It is restricted to the classic exact belief propgation case, i.e. multinomial or gaussian models.

bnlearn implements the following constraint-based structure learning algorithms:

  • PC (the stable version);
  • Grow-Shrink (GS);
  • Incremental Association Markov Blanket (IAMB);
  • Fast Incremental Association (Fast-IAMB);
  • Interleaved Incremental Association (Inter-IAMB);
  • Incremental Association with FDR Correction (IAMB-FDR);
  • Max-Min Parents & Children (MMPC);
  • Semi-Interleaved Hiton-PC (SI-HITON-PC);
  • Hybrid Parents & Children (HPC);

the following score-based structure learning algorithms:

  • Hill Climbing (HC);
  • Tabu Search (Tabu);

the following hybrid structure learning algorithms:

  • Max-Min Hill Climbing (MMHC);
  • Hybrid HPC (H2PC);
  • General 2-Phase Restricted Maximization (RSMAX2);

the following local discovery algorithms:

  • Chow-Liu;
  • ARACNE;

and the following Bayesian network classifiers:

  • naive Bayes;
  • Tree-Augmented naive Bayes (TAN).

Discrete (multinomial) and continuous (multivariate normal) data sets are supported, both for structure and parameter learning. The latter can be performed using either maximum likelihood or Bayesian estimators.

sparsebn

Compare with sparsebn:

A new R package for learning sparse Bayesian networks and other graphical models from high-dimensional data via sparse regularization. Designed from the ground up to handle:

  • Experimental data with interventions
  • Mixed observational / experimental data
  • High-dimensional data with p >> n
  • Datasets with thousands of variables (tested up to p=8000)
  • Continuous and discrete data

The emphasis of this package is scalability and statistical consistency on high-dimensional datasets. […] For more details on this package, including worked examples and the methodological background, please see our new preprint.

Overview

The main methods for learning graphical models are:

  • estimate.dag for directed acyclic graphs (Bayesian networks).
  • estimate.precision for undirected graphs (Markov random fields).
  • estimate.covariance for covariance matrices.

Currently, estimation of precision and covariances matrices is limited to Gaussian data.

caus2e

MLResearchAtOSRAM/cause2e: The cause2e package provides tools for performing an end-to-end causal analysis of your data.

The main contribution of cause2e is the integration of two established causal packages that have currently been separated and cumbersome to combine:

  • Causal discovery methods from the py-causal package, which is a Python wrapper around parts of the Java TETRAD software. It provides many algorithms for learning the causal graph from data and domain knowledge.
  • Causal reasoning methods from the DoWhy package, which is the current standard for the steps of a causal analysis starting from a known causal graph and data

TETRAD

TETRAD (source, tutorial) is a tool for discovering and visualising and calculating giant empirical DAGs, including general graphical inference and causality. It’s written by eminent causality inference people.

Tetrad is a program which creates, simulates data from, estimates, tests, predicts with, and searches for causal and statistical models. The aim of the program is to provide sophisticated methods in a friendly interface requiring very little statistical sophistication of the user and no programming knowledge. It is not intended to replace flexible statistical programming systems such as Matlab, Splus or R. Tetrad is freeware that performs many of the functions in commercial programs such as Netica, Hugin, LISREL, EQS and other programs, and many discovery functions these commercial programs do not perform. …

The Tetrad programs describe causal models in three distinct parts or stages: a picture, representing a directed graph specifying hypothetical causal relations among the variables; a specification of the family of probability distributions and kinds of parameters associated with the graphical model; and a specification of the numerical values of those parameters.

py-causal is a wrapper around this for python, and R-causal for R.

skgmm

skggm (python) does the Gaussian thing but also has a nice sparsification and good explanation.

The core estimator provided in skggm is QuicGraphLasso which is a scikit-learn compatible interface to QUIC, a proximal Newton-type algorithm that solves the graphical lasso (2) objective.

References

Azadkia, Mona, and Sourav Chatterjee. 2019. A Simple Measure of Conditional Dependence.” arXiv:1910.12327 [Cs, Math, Stat], December.
Bayati, M., and A. Montanari. 2012. The LASSO Risk for Gaussian Matrices.” IEEE Transactions on Information Theory 58 (4): 1997–2017.
Besserve, Michel, Arash Mehrjou, Rémy Sun, and Bernhard Schölkopf. 2019. Counterfactuals Uncover the Modular Structure of Deep Generative Models.” In arXiv:1812.03253 [Cs, Stat].
Bühlmann, Peter, and Sara van de Geer. 2011. Statistics for High-Dimensional Data: Methods, Theory and Applications. 2011 edition. Heidelberg ; New York: Springer.
Buntine, W.L. 1996. A Guide to the Literature on Learning Probabilistic Networks from Data.” IEEE Transactions on Knowledge and Data Engineering 8 (2): 195–210.
Cai, T. Tony. 2017. Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures.” Annual Review of Statistics and Its Application 4 (1): 423–46.
Chau, Siu Lun, Jean-François Ton, Javier González, Yee Whye Teh, and Dino Sejdinovic. 2021. BayesIMP: Uncertainty Quantification for Causal Data Fusion,” June.
Colombo, Diego, and Marloes H. Maathuis. 2014. “Order-Independent Constraint-Based Causal Structure Learning.” J. Mach. Learn. Res. 15 (1): 3741–82.
Colombo, Diego, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. Learning High-Dimensional Directed Acyclic Graphs with Latent and Selection Variables.” The Annals of Statistics 40 (1): 294–321.
Cox, D. R., and H. S. Battey. 2017. Large Numbers of Explanatory Variables, a Semi-Descriptive Analysis.” Proceedings of the National Academy of Sciences 114 (32): 8592–95.
Dezfouli, Amir, Edwin V Bonilla, and Richard Nock. 2018. “Variational Network Inference: Strong and Stable with Concrete Support.” In, 10.
Drton, Mathias, and Marloes H. Maathuis. 2017. Structure Learning in Graphical Modeling.” Annual Review of Statistics and Its Application 4 (1): 365–93.
Foygel, Rina, and Mathias Drton. 2010. Extended Bayesian Information Criteria for Gaussian Graphical Models.” In Advances in Neural Information Processing Systems 23, edited by J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, 604–12. Curran Associates, Inc.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008. Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41.
Fu, Fei, and Qing Zhou. 2013. Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent.” Journal of the American Statistical Association 108 (501): 288–300.
Gao, Ming, Yi Ding, and Bryon Aragam. 2020. A Polynomial-Time Algorithm for Learning Nonparametric Causal Graphs.” arXiv:2006.11970 [Cs, Math, Stat], November.
Geer, Sara van de. 2014. Worst Possible Sub-Directions in High-Dimensional Models.” In arXiv:1403.7023 [Math, Stat]. Vol. 131.
Geng, Zhi, Yue Liu, Chunchen Liu, and Wang Miao. 2019. Evaluation of Causal Effects and Local Structure Learning of Causal Networks.” Annual Review of Statistics and Its Application 6 (1): 103–24.
Gnecco, Nicola, Nicolai Meinshausen, Jonas Peters, and Sebastian Engelke. 2021. Causal Discovery in Heavy-Tailed Models.” The Annals of Statistics 49 (3): 1755–78.
Gogate, Vibhav, William Webb, and Pedro Domingos. 2010. Learning Efficient Markov Networks.” In Advances in Neural Information Processing Systems, 748–56.
Gu, Jiaying, and Qing Zhou. 2020. Learning Big Gaussian Bayesian Networks: Partition, Estimation and Fusion.” Journal of Machine Learning Research 21 (158): 1–31.
Hallac, David, Jure Leskovec, and Stephen Boyd. 2015. Network Lasso: Clustering and Optimization in Large Graphs.” arXiv:1507.00280 [Cs, Math, Stat], July.
Harris, Naftali, and Mathias Drton. 2013. PC Algorithm for Nonparanormal Graphical Models.” Journal of Machine Learning Research 14 (1): 3365–83.
Heinze-Deml, Christina, Marloes H. Maathuis, and Nicolai Meinshausen. 2018. Causal Structure Learning.” Annual Review of Statistics and Its Application 5 (1): 371–91.
Hinton, Geoffrey E., Simon Osindero, and Kejie Bao. 2005. Learning Causally Linked Markov Random Fields.” In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, 128–35. Citeseer.
Hsieh, Cho-Jui, Mátyás A. Sustik, Inderjit S. Dhillon, and Pradeep D. Ravikumar. 2014. QUIC: Quadratic Approximation for Sparse Inverse Covariance Estimation. Journal of Machine Learning Research 15 (1): 2911–47.
Huang, Biwei, Kun Zhang, Jiji Zhang, Joseph Ramsey, Ruben Sanchez-Romero, Clark Glymour, and Bernhard Schölkopf. 2020. Causal Discovery from Heterogeneous/Nonstationary Data.” Journal of Machine Learning Research 21 (89): 1–53.
Janzing, Dominik, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. 2012. Information-Geometric Approach to Inferring Causal Directions.” Artificial Intelligence 182-183 (May): 1–31.
Jin, Ying, Weilin Fu, Jian Kang, Jiadong Guo, and Jian Guo. 2020. Bayesian Symbolic Regression.” arXiv:1910.08892 [Stat], January.
Jung, Alexander, Nguyen Tran Quang, and Alexandru Mara. 2017. When Is Network Lasso Accurate? arXiv:1704.02107 [Stat], April.
Khoshgnauz, Ehsan. 2012. Learning Markov Network Structure Using Brownian Distance Covariance.” arXiv:1206.6361 [Cs, Stat], June.
Kocaoglu, Murat, Alex Dimakis, and Sriram Vishwanath. 2017. Cost-Optimal Learning of Causal Graphs.” In PMLR, 1875–84.
Kocaoglu, Murat, Christopher Snyder, Alexandros G. Dimakis, and Sriram Vishwanath. 2017. CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training.” arXiv:1709.02023 [Cs, Math, Stat], September.
Krämer, Nicole, Juliane Schäfer, and Anne-Laure Boulesteix. 2009. Regularized Estimation of Large-Scale Gene Association Networks Using Graphical Gaussian Models.” BMC Bioinformatics 10 (1): 384.
Kuipers, Jack, Giusi Moffa, and David Heckerman. 2014a. Supplement to .. ‘Addendum on the Scoring of Gaussian Directed Acyclic Graphical Models’’ .. Deriving and Simplifying the BGe Score.” The Annals of Statistics, 9.
———. 2014b. Addendum on the Scoring of Gaussian Directed Acyclic Graphical Models.” The Annals of Statistics 42 (4).
Lederer, Johannes. 2016. Graphical Models for Discrete and Continuous Data.” arXiv:1609.05551 [Math, Stat], September.
Lee, Su-In, Varun Ganapathi, and Daphne Koller. 2006. Efficient Structure Learning of Markov Networks Using $ L_1 $-Regularization.” In Advances in Neural Information Processing Systems, 817–24. MIT Press.
Li, Yunzhu, Antonio Torralba, Animashree Anandkumar, Dieter Fox, and Animesh Garg. 2020. Causal Discovery in Physical Systems from Videos.” arXiv:2007.00631 [Cs, Stat], July.
Liu, Han, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. 2012. The Nonparanormal SKEPTIC.” arXiv:1206.6488 [Cs, Stat], June.
Liu, Han, John Lafferty, and Larry Wasserman. 2009. The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs.” Journal of Machine Learning Research 10 (December): 2295–2328.
Liu, Qiang, and Dilin Wang. 2019. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” In Advances In Neural Information Processing Systems.
Locatello, Francesco, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. 2019. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations.” arXiv:1811.12359 [Cs, Stat], June.
Lorch, Lars, Jonas Rothfuss, Bernhard Schölkopf, and Andreas Krause. 2021. DiBS: Differentiable Bayesian Structure Learning.” In.
Maathuis, Marloes H., Diego Colombo, Markus Kalisch, and Peter Bühlmann. 2010. Predicting Causal Effects in Large-Scale Systems from Observational Data.” Nature Methods 7 (4): 247–48.
Maathuis, Marloes H., Markus Kalisch, and Peter Bühlmann. 2009. Estimating High-Dimensional Intervention Effects from Observational Data.” The Annals of Statistics 37 (6A): 3133–64.
Mansinghka, Vikash, Charles Kemp, Thomas Griffiths, and Joshua Tenenbaum. 2012. Structured Priors for Structure Learning.” arXiv:1206.6852, June.
Mazumder, Rahul, and Trevor Hastie. 2012. The graphical lasso: New insights and alternatives.” Electronic Journal of Statistics 6 (November): 2125–49.
Montanari, Andrea. 2012. Graphical Models Concepts in Compressed Sensing.” Compressed Sensing: Theory and Applications, 394–438.
Mooij, Joris M., Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks.” Journal of Machine Learning Research 17 (32): 1–102.
Nair, Suraj, Yuke Zhu, Silvio Savarese, and Li Fei-Fei. 2019. Causal Induction from Visual Observations for Goal Directed Tasks.” arXiv:1910.01751 [Cs, Stat], October.
Narendra, Tanmayee, Anush Sankaran, Deepak Vijaykeerthy, and Senthil Mani. 2018. Explaining Deep Learning Models Using Causal Inference.” arXiv:1811.04376 [Cs, Stat], November.
Nauta, Meike, Doina Bucur, and Christin Seifert. 2019. Causal Discovery with Attention-Based Convolutional Neural Networks.” Machine Learning and Knowledge Extraction 1 (1): 312–40.
Neapolitan, Richard E. 2003. Learning Bayesian Networks. Vol. 38. Prentice Hal, Paperback.
Ng, Ignavier, Zhuangyan Fang, Shengyu Zhu, Zhitang Chen, and Jun Wang. 2020. Masked Gradient-Based Causal Structure Learning.” arXiv:1910.08527 [Cs, Stat], February.
Ng, Ignavier, Shengyu Zhu, Zhitang Chen, and Zhuangyan Fang. 2019. A Graph Autoencoder Approach to Causal Structure Learning.” In Advances In Neural Information Processing Systems.
Obermeyer, Fritz, Eli Bingham, Martin Jankowiak, Du Phan, and Jonathan P. Chen. 2020. Functional Tensors for Probabilistic Programming.” arXiv:1910.10775 [Cs, Stat], March.
Peters, Jonas, Joris Mooij, Dominik Janzing, and Bernhard Schoelkopf. 2012. Identifiability of Causal Graphs Using Functional Models.” arXiv:1202.3757 [Cs, Stat], February.
Ramsey, Joseph, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. 2017. A Million Variables and More: The Fast Greedy Equivalence Search Algorithm for Learning High-Dimensional Graphical Causal Models, with an Application to Functional Magnetic Resonance Images.” International Journal of Data Science and Analytics 3 (2): 121–29.
Schelldorfer, Jürg, Lukas Meier, and Peter Bühlmann. 2014. GLMMLasso: An Algorithm for High-Dimensional Generalized Linear Mixed Models Using ℓ1-Penalization.” Journal of Computational and Graphical Statistics 23 (2): 460–77.
Scutari, Marco. 2018. Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle.” arXiv:1708.00689 [Math, Stat], March.
Scutari, Marco, Catharina Elisabeth Graafland, and José Manuel Gutiérrez. 2019. Who Learns Better Bayesian Network Structures: Accuracy and Speed of Structure Learning Algorithms.” arXiv:1805.11908 [Stat], July.
Textor, Johannes, Alexander Idelberger, and Maciej Liśkiewicz. 2015. Learning from Pairwise Marginal Independencies.” arXiv:1508.00280 [Cs], August.
Wu, Rui, R. Srikant, and Jian Ni. 2012. “Learning Graph Structures in Discrete Markov Random Fields.” In INFOCOM Workshops, 214–19.
Yang, Mengyue, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, and Jun Wang. 2020. CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models.” arXiv:2004.08697 [Cs, Stat], July.
Yu, Yue, Jie Chen, Tian Gao, and Mo Yu. 2019. DAG-GNN: DAG Structure Learning with Graph Neural Networks.” Proceedings of the 36th International Conference on Machine Learning. PMLR.
Zhao, Tuo, Han Liu, Kathryn Roeder, John Lafferty, and Larry Wasserman. 2012. The Huge Package for High-Dimensional Undirected Graph Estimation in R.” Journal of Machine Learning Research : JMLR 13 (April): 1059–62.
Zheng, Xun, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. 2018. DAGs with NO TEARS: Continuous Optimization for Structure Learning.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 9472–83. Curran Associates, Inc.
Zheng, Xun, Chen Dan, Bryon Aragam, Pradeep Ravikumar, and Eric Xing. 2020. Learning Sparse Nonparametric DAGs.” In International Conference on Artificial Intelligence and Statistics, 3414–25. PMLR.
Zhou, Mingyuan, Yulai Cong, and Bo Chen. 2017. “Augmentable Gamma Belief Networks,” 44.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.