Inference on graphical models

Given what I know about what I know, what do I know?

Given a graphical model and some observations of some nodes, what can I say about other nodes?

Introductory reading

(Barber 2012; Steffen L. Lauritzen 1996) are rigorous introductions. (Murphy 2012) has a minimal introduction intermixed with some related models, with a more ML, more Bayesian formalism. For use in causality, (Pearl 2009; Spirtes, Glymour, and Scheines 2001) are readable.

People recommend me (Koller and Friedman 2009) which is probably the most comprehensive, but I found it was hard to see the forest for the trees in this one. YMMV.

What’s special here is how we handle independence relations and reasoning about them. In one sense there is nothing special about graphical models; it’s just a graph of which variables are conditionally independent of which others. On the other hand, that graph is a powerful analytic tool, telling you what effect is confounded with what, and when, abd so what experiments you do and do not need to do. Moreover, you can use conditional independence tests to construct that graph even without necessarily constructing the whole model (e.g. Zhang et al. (2012)).


Altun, Yasemin, Alex J. Smola, and Thomas Hofmann. 2004. “Exponential Families for Conditional Random Fields.” In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2–9. UAI ’04. Arlington, Virginia, United States: AUAI Press.
Aragam, Bryon, Jiaying Gu, and Qing Zhou. 2017. “Learning Large-Scale Bayesian Networks with the Sparsebn Package.” March 11, 2017.
Aragam, Bryon, and Qing Zhou. 2015. “Concave Penalized Estimation of Sparse Gaussian Bayesian Networks.” Journal of Machine Learning Research 16: 2273–2328.
Aral, Sinan, Lev Muchnik, and Arun Sundararajan. 2009. “Distinguishing Influence-Based Contagion from Homophily-Driven Diffusion in Dynamic Networks.” Proceedings of the National Academy of Sciences 106 (51): 21544–49.
Arnold, Barry C., Enrique Castillo, and Jose M. Sarabia. 1999. Conditional Specification of Statistical Models. Springer Science & Business Media.
Baddeley, A. J., and Marie-Colette NM Van Lieshout. 1995. “Area-Interaction Point Processes.” Annals of the Institute of Statistical Mathematics 47 (4): 601–19.
Baddeley, A. J., Marie-Colette NM Van Lieshout, and J. Møller. 1996. “Markov Properties of Cluster Processes.” Advances in Applied Probability 28 (2): 346–55.
Baddeley, Adrian J, Jesper Møller, and Rasmus Plenge Waagepetersen. 2000. “Non- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patterns.” Statistica Neerlandica 54 (3): 329–50.
Baddeley, Adrian, and Jesper Møller. 1989. “Nearest-Neighbour Markov Point Processes and Random Sets.” International Statistical Review / Revue Internationale de Statistique 57 (2): 89–121.
Barber, David. 2012. Bayesian Reasoning and Machine Learning. Cambridge ; New York: Cambridge University Press.
Bareinboim, Elias, Jin Tian, and Judea Pearl. 2014. “Recovering from Selection Bias in Causal and Statistical Inference.” In AAAI, 2410–16.
Bartolucci, Francesco, and Julian Besag. 2002. “A Recursive Algorithm for Markov Random Fields.” Biometrika 89 (3): 724–30.
Besag, Julian. 1974. “Spatial Interaction and the Statistical Analysis of Lattice Systems.” Journal of the Royal Statistical Society. Series B (Methodological) 36 (2): 192–236.
———. 1975. “Statistical Analysis of Non-Lattice Data.” Journal of the Royal Statistical Society. Series D (The Statistician) 24 (3): 179–95.
———. 1986. “On the Statistical Analysis of Dirty Pictures.” Journal of the Royal Statistical Society. Series B (Methodological) 48 (3): 259–302.
Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. Information Science and Statistics. New York: Springer.
Blake, Andrew, Pushmeet Kohli, and Carsten Rother, eds. 2011. Markov Random Fields for Vision and Image Processing. Cambridge, Mass: MIT Press.
Bloniarz, Adam, Hanzhong Liu, Cun-Hui Zhang, Jasjeet Sekhon, and Bin Yu. 2015. “Lasso Adjustments of Treatment Effect Estimates in Randomized Experiments.” July 13, 2015.
Boyd, Stephen. 2010. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Vol. 3. Now Publishers Inc.
Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” The Annals of Applied Statistics 9 (1): 247–74.
Bu, Yunqi, and Johannes Lederer. 2017. “Integrating Additional Knowledge Into Estimation of Graphical Models.” April 10, 2017.
Bühlmann, Peter, Markus Kalisch, and Lukas Meier. 2014. “High-Dimensional Statistics with a View Toward Applications in Biology.” Annual Review of Statistics and Its Application 1 (1): 255–78.
Bühlmann, Peter, Philipp Rütimann, and Markus Kalisch. 2013. “Controlling False Positive Selections in High-Dimensional Regression and Causal Inference.” Statistical Methods in Medical Research 22 (5): 466–92.
Celeux, Gilles, Florence Forbes, and Nathalie Peyrard. 2003. EM Procedures Using Mean Field-Like Approximations for Markov Model-Based Image Segmentation.” Pattern Recognition 36 (1): 131–44.
Cevher, Volkan, Marco F. Duarte, Chinmay Hegde, and Richard Baraniuk. 2009. “Sparse Signal Recovery Using Markov Random Fields.” In Advances in Neural Information Processing Systems, 257–64. Curran Associates, Inc.
Christakis, Nicholas A., and James H. Fowler. 2007. “The Spread of Obesity in a Large Social Network over 32 Years.” New England Journal of Medicine 357 (4): 370–79.
Clifford, P. 1990. “Markov Random Fields in Statistics.” In Disorder in Physical Systems: A Volume in Honour of John Hammersley, edited by G. R. Grimmett and D. J. A. Welsh. Oxford England : New York: Oxford University Press.
Crisan, Dan, and Joaquín Míguez. 2014. “Particle-Kernel Estimation of the Filter Density in State-Space Models.” Bernoulli 20 (4): 1879–929.
Dawid, A. P. 2001. “Separoids: A Mathematical Framework for Conditional Independence and Irrelevance.” Annals of Mathematics and Artificial Intelligence 32 (1-4): 335–72.
Dawid, A. Philip. 1979. “Conditional Independence in Statistical Theory.” Journal of the Royal Statistical Society. Series B (Methodological) 41 (1): 1–31.
———. 1980. “Conditional Independence for Statistical Operations.” The Annals of Statistics 8 (3): 598–617.
De Luna, Xavier, Ingeborg Waernbaum, and Thomas S. Richardson. 2011. “Covariate Selection for the Nonparametric Estimation of an Average Treatment Effect.” Biometrika, October, asr041.
Edwards, David, and Smitha Ankinakatte. 2015. “Context-Specific Graphical Models for Discrete Longitudinal Data.” Statistical Modelling 15 (4): 301–25.
Fixx, James F. 1977. Games for the Superintelligent. London: Muller.
Forbes, F., and N. Peyrard. 2003. “Hidden Markov Random Field Model Selection Criteria Based on Mean Field-Like Approximations.” IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (9): 1089–1101.
Frey, B. J., and Nebojsa Jojic. 2005. “A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models.” IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (9): 1392–1416.
Frey, Brendan J. 2003. “Extending Factor Graphs so as to Unify Directed and Undirected Graphical Models.” In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, 257–64. UAI’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Fridman, Arthur. 2003. “Mixed Markov Models.” Proceedings of the National Academy of Sciences 100 (14): 8092–96.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008. “Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41.
Friel, Nial, and Håvard Rue. 2007. “Recursive Computing and Simulation-Free Inference for General Factorizable Models.” Biometrika 94 (3): 661–72.
Geyer, Charles J. 1991. “Markov Chain Monte Carlo Maximum Likelihood.”
Geyer, Charles J., and Jesper Møller. 1994. “Simulation Procedures and Likelihood Inference for Spatial Point Processes.” Scandinavian Journal of Statistics, 359–73.
Goldberg, David A. 2013. “Higher Order Markov Random Fields for Independent Sets.” January 9, 2013.
Grenander, Ulf. 1989. “Advances in Pattern Theory.” The Annals of Statistics 17 (1): 1–30.
Griffeath, David. 1976. “Introduction to Random Fields.” In Denumerable Markov Chains, 425–58. Graduate Texts in Mathematics 40. Springer New York.
Gu, Jiaying, Fei Fu, and Qing Zhou. 2014. “Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data.” March 10, 2014.
Häggström, Olle, Marie-Colette N. M. van Lieshout, and Jesper Møller. 1999. “Characterization Results and Markov Chain Monte Carlo Algorithms Including Exact Simulation for Some Spatial Point Processes.” Bernoulli 5 (4): 641–58.
Heckerman, David, David Maxwell Chickering, Christopher Meek, Robert Rounthwaite, and Carl Kadie. 2000. “Dependency Networks for Inference, Collaborative Filtering, and Data Visualization.” Journal of Machine Learning Research 1: 49–75.
Jensen, Jens Ledet, and Jesper Møller. 1991. “Pseudolikelihood for Exponential Family Models of Spatial Point Processes.” The Annals of Applied Probability 1 (3): 445–61.
Jordan, Michael I. 2004. “Graphical Models.” Statistical Science 19 (1): 140–55.
Jordan, Michael I., Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. 1999. “An Introduction to Variational Methods for Graphical Models.” Machine Learning 37 (2): 183–233.
Jordan, Michael I., and Yair Weiss. 2002a. “Graphical Models: Probabilistic Inference.” The Handbook of Brain Theory and Neural Networks, 490–96. honavar/jordan2.pdf.
———. 2002b. “Probabilistic Inference in Graphical Models.” Handbook of Neural Networks and Brain Theory.
Jordan, Michael Irwin. 1999. Learning in Graphical Models. Cambridge, Mass.: MIT Press.
Kalisch, Markus, and Peter Bühlmann. 2007. “Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm.” Journal of Machine Learning Research 8 (May): 613–36.
Kindermann, Ross P., and J. Laurie Snell. 1980. “On the Relation Between Markov Random Fields and Social Networks.” The Journal of Mathematical Sociology 7 (1): 1–13.
Kindermann, Ross, and J. Laurie Snell. 1980. Markov Random Fields and Their Applications. Vol. 1. Contemporary Mathematics. Providence, Rhode Island: American Mathematical Society.
Kjærulff, Uffe B., and Anders L. Madsen. 2008. Bayesian Networks and Influence Diagrams. Information Science and Statistics. New York, NY: Springer New York.
Koller, Daphne, and Nir Friedman. 2009. Probabilistic Graphical Models : Principles and Techniques. Cambridge, MA: MIT Press.
Krause, Andreas, and Carlos Guestrin. 2009. “Optimal Value of Information in Graphical Models.” J. Artif. Int. Res. 35 (1): 557–91.
Krämer, Nicole, Juliane Schäfer, and Anne-Laure Boulesteix. 2009. “Regularized Estimation of Large-Scale Gene Association Networks Using Graphical Gaussian Models.” BMC Bioinformatics 10 (1): 384.
Kschischang, F. R., B. J. Frey, and H.-A. Loeliger. 2001. “Factor Graphs and the Sum-Product Algorithm.” IEEE Transactions on Information Theory 47 (2): 498–519.
Lauritzen, S. L., and D. J. Spiegelhalter. 1988. “Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems.” Journal of the Royal Statistical Society. Series B (Methodological) 50 (2): 157–224.
Lauritzen, Steffen L. 1996. Graphical Models. Clarendon Press.
Lavrenko, Victor, and Jeremy Pickens. 2003a. “Music Modeling with Random Fields.” In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, 389. ACM Press.
———. 2003b. “Polyphonic Music Modeling with Random Fields.” In Proceedings of the Eleventh ACM International Conference on Multimedia, 120. ACM Press.
Lederer, Johannes. 2016. “Graphical Models for Discrete and Continuous Data.” September 18, 2016.
Liu, Han, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. 2012a. “The Nonparanormal SKEPTIC.” June 27, 2012.
———. 2012b. “High-Dimensional Semiparametric Gaussian Copula Graphical Models.” The Annals of Statistics 40 (4): 2293–2326.
Liu, Han, Kathryn Roeder, and Larry Wasserman. 2010. “Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models.” In Advances in Neural Information Processing Systems 23, edited by J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, 1432–40. Curran Associates, Inc.
Loeliger, H.-A. 2004. “An Introduction to Factor Graphs.” IEEE Signal Processing Magazine 21 (1): 28–41.
Maathuis, Marloes H., and Diego Colombo. 2013. “A Generalized Backdoor Criterion.” 2013.
Maddage, Namunu C., Haizhou Li, and Mohan S. Kankanhalli. 2006. “Music Structure Based Vector Space Retrieval.” In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 67. ACM Press.
Malioutov, Dmitry M., Jason K. Johnson, and Alan S. Willsky. 2006. “Walk-Sums and Belief Propagation in Gaussian Graphical Models.” Journal of Machine Learning Research 7 (October): 2031–64.
Mao, Yongyi, Frank R. Kschischang, and Brendan J. Frey. 2004. “Convolutional Factor Graphs As Probabilistic Models.” In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 374–81. UAI ’04. Arlington, Virginia, United States: AUAI Press.
Marbach, Daniel, Robert J. Prill, Thomas Schaffter, Claudio Mattiussi, Dario Floreano, and Gustavo Stolovitzky. 2010. “Revealing Strengths and Weaknesses of Methods for Gene Network Inference.” Proceedings of the National Academy of Sciences 107 (14): 6286–91.
McCallum, Andrew. 2012. “Efficiently Inducing Features of Conditional Random Fields.” October 19, 2012.
Meinshausen, Nicolai, and Peter Bühlmann. 2006. “High-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics 34 (3): 1436–62.
Mihalkova, Lilyana, and Raymond J. Mooney. 2007. “Bottom-up Learning of Markov Logic Network Structure.” In Proceedings of the 24th International Conference on Machine Learning, 625–32. ACM.
Mohan, Karthika, and Judea Pearl. 2018. “Consistent Estimation Given Missing Data.” In International Conference on Probabilistic Graphical Models, 284–95.
Montanari, Andrea. 2011. “Lecture Notes for Stat 375 Inference in Graphical Models.”
Morgan, Jonathan Scott, Iman Barjasteh, Cliff Lampe, and Hayder Radha. 2014. “The Entropy of Attention and Popularity in Youtube Videos.” December 2, 2014.
Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. 1 edition. Adaptive Computation and Machine Learning Series. Cambridge, MA: MIT Press.
Osokin, A., D. Vetrov, and V. Kolmogorov. 2011. “Submodular Decomposition Framework for Inference in Associative Markov Networks with Global Constraints.” In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1889–96.
Pearl, Judea. 1982. “Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach.” In In Proceedings of the National Conference on Artificial Intelligence, 133–36.
———. 1986. “Fusion, Propagation, and Structuring in Belief Networks.” Artificial Intelligence 29 (3): 241–88.
———. 2008. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Rev. 2. print., 12. [Dr.]. The Morgan Kaufmann Series in Representation and Reasoning. San Francisco, Calif: Kaufmann.
———. 2009. Causality: Models, Reasoning and Inference. Cambridge University Press.
Pereda, E, R Q Quiroga, and J Bhattacharya. 2005. “Nonlinear Multivariate Analysis of Neurophysiological Signals.” Progress in Neurobiology 77 (1-2): 1–37.
Pickens, Jeremy, and Costas S. Iliopoulos. 2005. “Markov Random Fields and Maximum Entropy Modeling for Music Information Retrieval.” In ISMIR, 207–14. Citeseer.
Pollard, Dave. 2004. “Hammersley-Clifford Theorem for Markov Random Fields.”
Rabbat, Michael G., MÁrio A. T. Figueiredo, and Robert D. Nowak. 2008. “Network Inference from Co-Occurrences.” IEEE Transactions on Information Theory 54 (9): 4053–68.
Ranzato, M. 2013. “Modeling Natural Images Using Gated MRFs.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9): 2206–22.
Ravikumar, Pradeep, Martin J. Wainwright, and John D. Lafferty. 2010. “High-Dimensional Ising Model Selection Using ℓ1-Regularized Logistic Regression.” The Annals of Statistics 38 (3): 1287–1319.
Reeves, R., and A. N. Pettitt. 2004. “Efficient Recursions for General Factorisable Models.” Biometrika 91 (3): 751–57.
Richardson, Matthew, and Pedro Domingos. 2006. “Markov Logic Networks.” Machine Learning 62 (1-2): 107–36.
Ripley, B. D., and F. P. Kelly. 1977. “Markov Point Processes.” Journal of the London Mathematical Society s2-15 (1): 188–92.
Schmidt, Mark W., and Kevin P. Murphy. 2010. “Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials.” In International Conference on Artificial Intelligence and Statistics, 709–16.
Shachter, Ross D. 1998. “Bayes-Ball: Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams).” In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, 480–87. UAI’98. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Shalizi, Cosma Rohilla, and Edward McFowland III. 2016. “Controlling for Latent Homophily in Social Networks Through Inferring Latent Locations.” July 22, 2016.
Smith, David A., and Jason Eisner. 2008. “Dependency Parsing by Belief Propagation.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 145–56. Association for Computational Linguistics.
Spirtes, Peter, Clark Glymour, and Richard Scheines. 2001. Causation, Prediction, and Search. Second Edition. Adaptive Computation and Machine Learning. The MIT Press.
Studený, Milan. 1997. “A Recovery Algorithm for Chain Graphs.” International Journal of Approximate Reasoning, Uncertainty in AI (UAI’96) Conference, 17 (2–3): 265–93.
———. 2005. Probabilistic Conditional Independence Structures. Information Science and Statistics. London: Springer.
Studený, Milan, and Jiřina Vejnarová. 1998. “On Multiinformation Function as a Tool for Measuring Stochastic Dependence.” In Learning in Graphical Models, 261–97. Cambridge, Mass.: MIT Press.
Su, Ri-Qi, Wen-Xu Wang, and Ying-Cheng Lai. 2012. “Detecting Hidden Nodes in Complex Networks from Time Series.” Phys. Rev. E 85 (6): 065201.
Sutton, Charles, and Andrew McCallum. 2010. “An Introduction to Conditional Random Fields.” November 17, 2010.
Tansey, Wesley, Oscar Hernan Madrid Padilla, Arun Sai Suggala, and Pradeep Ravikumar. 2015. “Vector-Space Markov Random Fields via Exponential Families.” In Journal of Machine Learning Research, 684–92.
Vetrov, Dmitry, and Anton Osokin. 2011. “Graph Preserving Label Decomposition in Discrete MRFs with Selfish Potentials.” In NIPS Workshop on Discrete Optimization in Machine Learning (DISCML NIPS).
Visweswaran, Shyam, and Gregory F. Cooper. 2014. “Counting Markov Blanket Structures.” July 9, 2014.
Wainwright, Martin J., and Michael I. Jordan. 2008. Graphical Models, Exponential Families, and Variational Inference. Vol. 1. Foundations and Trends® in Machine Learning. Now Publishers.
Wainwright, Martin, and Michael I Jordan. 2005. “A Variational Principle for Graphical Models.” In New Directions in Statistical Signal Processing. Vol. 155. MIT Press.
Wang, Chaohui, Nikos Komodakis, and Nikos Paragios. 2013. “Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey.” Computer Vision and Image Understanding 117 (11): 1610–27.
Wasserman, Larry, Mladen Kolar, and Alessandro Rinaldo. 2013. “Estimating Undirected Graphs Under Weak Assumptions.” September 26, 2013.
Weiss, Yair. 2000. “Correctness of Local Probability Propagation in Graphical Models with Loops.” Neural Computation 12 (1): 1–41.
Weiss, Yair, and William T. Freeman. 2001. “Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology.” Neural Computation 13 (10): 2173–2200.
Winn, John M., and Christopher M. Bishop. 2005. “Variational Message Passing.” In Journal of Machine Learning Research, 661–94.
Wright, Sewall. 1934. “The Method of Path Coefficients.” The Annals of Mathematical Statistics 5 (3): 161–215.
Wu, Rui, R. Srikant, and Jian Ni. 2013. “Learning Loosely Connected Markov Random Fields.” Stochastic Systems 3 (2): 362–404.
Yedidia, J. S., W. T. Freeman, and Y. Weiss. 2003. “Understanding Belief Propagation and Its Generalizations.” In Exploring Artificial Intelligence in the New Millennium, edited by G. Lakemeyer and B. Nebel, 239–36. Morgan Kaufmann Publishers.
Yedidia, Jonathan S., W. T. Freeman, and Y. Weiss. 2005. “Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms.” IEEE Transactions on Information Theory 51 (7): 2282–312.
Zhang, Kun, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. “Kernel-Based Conditional Independence Test and Application in Causal Discovery.” February 14, 2012.
Zhou, Mingyuan, Yulai Cong, and Bo Chen. 2017. “Augmentable Gamma Belief Networks,” 44.