Graphs of conditional, directed independence are a convenient formalism for many models. These are also called Bayes nets (not to be confused with Bayesian inference.)
Once you have the graph, you can infer more detailed relations than mere conditional dependence or otherwise; this is precisely that hierarchical models emphasise.
These can even be causal graphical models, and when we can infer those we are extracting Science (ONO) from observational data. This is really interesting; see causal graphical models
BayesNets is a Julia package for reasoning over directed graphical models.
Aragam, Bryon, Arash A. Amini, and Qing Zhou. 2015. “Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression,” November. http://arxiv.org/abs/1511.08963.
Aragam, Bryon, Jiaying Gu, and Qing Zhou. 2017. “Learning Large-Scale Bayesian Networks with the Sparsebn Package,” March. http://arxiv.org/abs/1703.04025.
Aragam, Bryon, and Qing Zhou. 2015. “Concave Penalized Estimation of Sparse Gaussian Bayesian Networks.” Journal of Machine Learning Research 16: 2273–2328. http://jmlr.org/papers/v16/aragam15a.html.
Aral, Sinan, Lev Muchnik, and Arun Sundararajan. 2009. “Distinguishing Influence-Based Contagion from Homophily-Driven Diffusion in Dynamic Networks.” Proceedings of the National Academy of Sciences 106 (51): 21544–9. https://doi.org/10.1073/pnas.0908800106.
Arnold, Barry C., Enrique Castillo, and Jose M. Sarabia. 1999. Conditional Specification of Statistical Models. Springer Science & Business Media. https://books.google.com.au/books?hl=en&lr=&id=lKeKu_HtMdQC&oi=fnd&pg=PA1&dq=arnold+castillo+sarabia+conditional+specification+of+statistical+models&ots=gxWoVEdsde&sig=p0BJlEeB5yQ052m5YhfQ_A6Kmoo.
Bareinboim, Elias, Jin Tian, and Judea Pearl. 2014. “Recovering from Selection Bias in Causal and Statistical Inference.” In AAAI, 2410–6. http://ftp.cs.ucla.edu/pub/stat_ser/r425.pdf.
Bloniarz, Adam, Hanzhong Liu, Cun-Hui Zhang, Jasjeet Sekhon, and Bin Yu. 2015. “Lasso Adjustments of Treatment Effect Estimates in Randomized Experiments,” July. http://arxiv.org/abs/1507.03652.
Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” The Annals of Applied Statistics 9 (1): 247–74. https://doi.org/10.1214/14-AOAS788.
Buntine, W. L. 1996. “A Guide to the Literature on Learning Probabilistic Networks from Data.” IEEE Transactions on Knowledge and Data Engineering 8 (2): 195–210. https://doi.org/10.1109/69.494161.
Bühlmann, Peter, Markus Kalisch, and Lukas Meier. 2014. “High-Dimensional Statistics with a View Toward Applications in Biology.” Annual Review of Statistics and Its Application 1 (1): 255–78. https://doi.org/10.1146/annurev-statistics-022513-115545.
Bühlmann, Peter, Philipp Rütimann, and Markus Kalisch. 2013. “Controlling False Positive Selections in High-Dimensional Regression and Causal Inference.” Statistical Methods in Medical Research 22 (5): 466–92. http://smm.sagepub.com/content/22/5/466.short.
Chen, B, and J Pearl. 2012. “Regression and Causation: A Critical Examination of Econometric Textbooks.”
Christakis, Nicholas A., and James H. Fowler. 2007. “The Spread of Obesity in a Large Social Network over 32 Years.” New England Journal of Medicine 357 (4): 370–79. https://doi.org/10.1056/NEJMsa066082.
Colombo, Diego, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. “Learning High-Dimensional Directed Acyclic Graphs with Latent and Selection Variables.” The Annals of Statistics 40 (1): 294–321. http://projecteuclid.org/euclid.aos/1333567191.
Dawid, A. P. 2001. “Separoids: A Mathematical Framework for Conditional Independence and Irrelevance.” Annals of Mathematics and Artificial Intelligence 32 (1-4): 335–72. https://doi.org/10.1023/A:1016734104787.
Dawid, A. Philip. 1979. “Conditional Independence in Statistical Theory.” Journal of the Royal Statistical Society. Series B (Methodological) 41 (1): 1–31. http://people.csail.mit.edu/tdanford/discovering-causal-graphs-papers/dawid-79.pdf.
———. 1980. “Conditional Independence for Statistical Operations.” The Annals of Statistics 8 (3): 598–617. https://doi.org/10.1214/aos/1176345011.
De Luna, Xavier, Ingeborg Waernbaum, and Thomas S. Richardson. 2011. “Covariate Selection for the Nonparametric Estimation of an Average Treatment Effect.” Biometrika, October, asr041. https://doi.org/10.1093/biomet/asr041.
Edwards, David, and Smitha Ankinakatte. 2015. “Context-Specific Graphical Models for Discrete Longitudinal Data.” Statistical Modelling 15 (4): 301–25. https://doi.org/10.1177/1471082X14551248.
Fixx, James F. 1977. Games for the Superintelligent. London: Muller.
Frey, B. J., and Nebojsa Jojic. 2005. “A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models.” IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (9): 1392–1416. https://doi.org/10.1109/TPAMI.2005.169.
Gu, Jiaying, Fei Fu, and Qing Zhou. 2014. “Adaptive Penalized Estimation of Directed Acyclic Graphs from Categorical Data,” March. http://arxiv.org/abs/1403.2310.
Jordan, Michael I., Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. 1999. “An Introduction to Variational Methods for Graphical Models.” Machine Learning 37 (2): 183–233. https://doi.org/10.1023/A:1007665907178.
Jordan, Michael Irwin. 1999. Learning in Graphical Models. Cambridge, Mass.: MIT Press.
Jordan, Michael I., and Yair Weiss. 2002a. “Graphical Models: Probabilistic Inference.” The Handbook of Brain Theory and Neural Networks, 490–96. http://www.cs.iastate.edu/~honavar/jordan2.pdf.
———. 2002b. “Probabilistic Inference in Graphical Models.” Handbook of Neural Networks and Brain Theory. http://mlg.eng.cam.ac.uk/zoubin/course03/hbtnn2e-I.pdf.
Kalisch, Markus, and Peter Bühlmann. 2007. “Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm.” Journal of Machine Learning Research 8 (May): 613–36. http://jmlr.org/papers/v8/kalisch07a.html.
Koller, Daphne, and Nir Friedman. 2009. Probabilistic Graphical Models : Principles and Techniques. Cambridge, MA: MIT Press.
Krause, Andreas, and Carlos Guestrin. 2009. “Optimal Value of Information in Graphical Models.” J. Artif. Int. Res. 35 (1): 557–91.
Lauritzen, S. L., and D. J. Spiegelhalter. 1988. “Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems.” Journal of the Royal Statistical Society. Series B (Methodological) 50 (2): 157–224. http://intersci.ss.uci.edu/wiki/pdf/Lauritzen1988.pdf.
Lauritzen, Steffen L. 1996. Graphical Models. Clarendon Press.
Maathuis, Marloes H., and Diego Colombo. 2013. “A Generalized Backdoor Criterion.” arXiv Preprint arXiv:1307.5636. http://arxiv.org/abs/1307.5636.
Malioutov, Dmitry M., Jason K. Johnson, and Alan S. Willsky. 2006. “Walk-Sums and Belief Propagation in Gaussian Graphical Models.” Journal of Machine Learning Research 7 (October): 2031–64. http://jmlr.csail.mit.edu/papers/v7/malioutov06a.html.
Marbach, Daniel, Robert J. Prill, Thomas Schaffter, Claudio Mattiussi, Dario Floreano, and Gustavo Stolovitzky. 2010. “Revealing Strengths and Weaknesses of Methods for Gene Network Inference.” Proceedings of the National Academy of Sciences 107 (14): 6286–91. https://doi.org/10.1073/pnas.0913357107.
Mihalkova, Lilyana, and Raymond J. Mooney. 2007. “Bottom-up Learning of Markov Logic Network Structure.” In Proceedings of the 24th International Conference on Machine Learning, 625–32. ACM. http://dl.acm.org/citation.cfm?id=1273575.
Montanari, Andrea. 2011. “Lecture Notes for Stat 375 Inference in Graphical Models.” http://www.stanford.edu/~montanar/TEACHING/Stat375/handouts/notes_stat375_1.pdf.
Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. 1 edition. Adaptive Computation and Machine Learning Series. Cambridge, MA: MIT Press.
Neapolitan, Richard E., and others. 2004. Learning Bayesian Networks. Vol. 38. Prentice Hall Upper Saddle River. https://books.secure-services.me/Gentoomen%20Library/Artificial%20Intelligence/Bayesian%20networks/Learning%20Bayesian%20Networks%20-%20Neapolitan%20R.%20E..pdf.
Pearl, Judea. 1982. “Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach.” In In Proceedings of the National Conference on Artificial Intelligence, 133–36. http://www.aaai.org/Papers/AAAI/1982/AAAI82-032.pdf.
———. 2008. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Rev. 2. print., 12. [Dr.]. The Morgan Kaufmann Series in Representation and Reasoning. San Francisco, Calif: Kaufmann.
———. 1986. “Fusion, Propagation, and Structuring in Belief Networks.” Artificial Intelligence 29 (3): 241–88. https://doi.org/10.1016/0004-3702(86)90072-X.
Pearl, Judea, Dan Geiger, and Thomas Verma. 1989. “Conditional Independence and Its Representations.” Kybernetika 25 (7): 33–44. http://dml.cz/bitstream/handle/10338.dmlcz/125413/Kybernetika_25-1989-7_6.pdf.
Pereda, E, R Q Quiroga, and J Bhattacharya. 2005. “Nonlinear Multivariate Analysis of Neurophysiological Signals.” Progress in Neurobiology 77 (1-2): 1–37.
Pollard, Dave. 2004. “Hammersley-Clifford Theorem for Markov Random Fields.”
Rabbat, Michael G., MÁrio A. T. Figueiredo, and Robert D. Nowak. 2008. “Network Inference from Co-Occurrences.” IEEE Transactions on Information Theory 54 (9): 4053–68. https://doi.org/10.1109/TIT.2008.926315.
Shachter, Ross D. 1998. “Bayes-Ball: Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams).” In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, 480–87. UAI’98. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. https://arxiv.org/abs/1301.7412.
Shalizi, Cosma Rohilla, and Edward McFowland III. 2016. “Controlling for Latent Homophily in Social Networks Through Inferring Latent Locations,” July. http://arxiv.org/abs/1607.06565.
Smith, David A., and Jason Eisner. 2008. “Dependency Parsing by Belief Propagation.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 145–56. Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1613737.
Spirtes, Peter, Clark Glymour, and Richard Scheines. 2001. Causation, Prediction, and Search. Second Edition. Adaptive Computation and Machine Learning. The MIT Press. https://www.cs.cmu.edu/afs/cs.cmu.edu/project/learn-43/lib/photoz/.g/scottd/fullbook.pdf.
Studený, Milan, and Jiřina Vejnarová. 1998. “On Multiinformation Function as a Tool for Measuring Stochastic Dependence.” In Learning in Graphical Models, 261–97. Cambridge, Mass.: MIT Press.
Su, Ri-Qi, Wen-Xu Wang, and Ying-Cheng Lai. 2012. “Detecting Hidden Nodes in Complex Networks from Time Series.” Phys. Rev. E 85 (6): 065201. https://doi.org/10.1103/PhysRevE.85.065201.
Textor, Johannes, Alexander Idelberger, and Maciej Liśkiewicz. 2015. “Learning from Pairwise Marginal Independencies,” August. http://arxiv.org/abs/1508.00280.
Visweswaran, Shyam, and Gregory F. Cooper. 2014. “Counting Markov Blanket Structures,” July. http://arxiv.org/abs/1407.2483.
Wainwright, Martin J., and Michael I. Jordan. 2008. Graphical Models, Exponential Families, and Variational Inference. Vol. 1. Foundations and Trends® in Machine Learning. http://www.cs.berkeley.edu/~jordan/papers/wainwright-jordan-fnt.pdf.
Weiss, Yair. 2000. “Correctness of Local Probability Propagation in Graphical Models with Loops.” Neural Computation 12 (1): 1–41. https://doi.org/10.1162/089976600300015880.
Weiss, Yair, and William T. Freeman. 2001. “Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology.” Neural Computation 13 (10): 2173–2200. https://doi.org/10.1162/089976601750541769.
Winn, John M., and Christopher M. Bishop. 2005. “Variational Message Passing.” In Journal of Machine Learning Research, 661–94. http://johnwinn.org/Publications/papers/VMP2005.pdf.
Wright, Sewall. 1934. “The Method of Path Coefficients.” The Annals of Mathematical Statistics 5 (3): 161–215. https://doi.org/10.1214/aoms/1177732676.
Yedidia, J. S., W. T. Freeman, and Y. Weiss. 2003. “Understanding Belief Propagation and Its Generalizations.” In Exploring Artificial Intelligence in the New Millennium, edited by G. Lakemeyer and B. Nebel, 239–36. Morgan Kaufmann Publishers. http://www.merl.com/publications/TR2001-22.
Zhang, Kun, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. “Kernel-Based Conditional Independence Test and Application in Causal Discovery,” February. http://arxiv.org/abs/1202.3775.
Zhou, Mingyuan, Yulai Cong, and Bo Chen. 2017. “Augmentable Gamma Belief Networks,” 44.