# Probabilistic graphical models

Judea Pearl performing graph surgery

The term graphical model is in the context of statistics means a particular thing: a family of ways to relate inference in multivariate models in terms of Calculating marginal and conditional probabilities in terms of graphs that describe how probabilities factorise, where graph here is in the sense of network theory, i.e. a collection of nodes connected by edges. Here are some of those

Loeliger’s 2004 zoo of the predominant graphical models

It turns out that switching back and forth between these different formalisms makes some things easier to do and if you are luck also easier to understand. Within this area, there are several specialties and a lot of material. This is a landing page pointing to actual content.

Thematically, content to this theme is scattered across graphical models in inference, learning graphs from data, diagramming graphical models, learning causation from data plus graphs, quantum graphical models, and yet more pages.

Barber (2012)’s Taxonomy of graphical models

## Introductory texts {##intro-texts}

Barber (2012) and Steffen L. Lauritzen (1996) are rigorous introductions. Murphy (2012) has a minimal introduction intermixed with particular related methods, so takes you straight to applications, although personally I found that confusing. For use in causality, Pearl (2009) and Spirtes, Glymour, and Scheines (2001) are readable.

People recommend me Koller and Friedman (2009) which is probably the most comprehensive, but I found it too comprehensive, to the point it was hard to see the forest for the trees. Maybe better as thing to deepen your understanding when you already know what is going on.

## What are plates?

Invented in Buntine (1994), the plate notation is how we introduce the notion of multiple variables with a regular, i.e. conditionally independent relation to existing variables. These are extremely important if you want to observe more than one data point. AFAICT, really digging deep into data as just another node is what makes Koller and Friedman (2009) a classic text book. But it is really glossed over in lots of papers, especially early ones, where the problem of estimating parameters from data is glossed over.

Concretely, consider Dustin Tran’s example of the kind of dimensionality we are working with:

Dustin Tran: A hierarchical model, with latent variables $$\alpha_k$$ defined locally per group and latent variables $$\phi$$ defined globally to be shared across groups.

We are motivated by hierarchical models[…]. Formally, let $$y_{n k}$$ be the $$n^{t h}$$ data point in group $$k$$, with a total of $$N_{k}$$ data points in group $$k$$ and $$K$$ many groups. We model the data using local latent variables $$\alpha_{k}$$ associated to a group $$k$$, and using global latent variables $$\phi$$ which are shared across groups.[…] (Figure) The posterior distribution of local variables $$\alpha_{k}$$ and global variables $$\phi$$ is $p(\alpha, \phi \mid \mathbf{y}) \propto p(\phi \mid \mathbf{y}) \prod_{k=1}^{K}\left[p\left(\alpha_{k} \mid \beta\right) \prod_{n=1}^{N_{K}} p\left(y_{n k} \mid \alpha_{k}, \phi\right)\right]$ The benefit of distributed updates over the independent factors is immediate. For example, suppose the data consists of 1,000 data points per group (with 5,000 groups); we model it with 2 latent variables per group and 20 global latent variables.

How many dimensions are we integrating over now? 10,020. The number of intermediate dimensions in our data grows very rapidly as we add observations, even if the final marginal is of low dimension. However, some of those dimensions are independent of others, and so may be factored away.

## Directed graphs

Graphs of conditional, directed independence are a convenient formalism for many models. These are also called Bayes nets presumably because the relationships encoded in these graphs have utility in the automatic application of Bayes rule. These models have a natural interpretation in terms of causation and structural models. See directed graphical models.

## Undirected, a.k.a. Markov graphs

a.k.a Markov random fields, Markov random networks… These have a natural interpretation in terms of energy-based models. See undirected graphical models.

## Factor graphs

A unifying formalism for the directed and undirected graphical models. Simpler in some ways, harder in others. See factor graphs.

## Inference on

A key use of this graphical structure is that it can make inference local, in that you can have different compute nodes, which examine part of the data/part of the model and pass messages back and forth to do inference over the entire thing. It is easy to say this, but making practical an performant algorithms this way is… well, it is a whole field. See graphical models in inference,

## Implementations

All of the probabilistic programming languages end up needing to account for graphical model structure in practice, so maybe start there.

## References

Altun, Yasemin, Alex J. Smola, and Thomas Hofmann. 2004. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2–9. UAI ’04. Arlington, Virginia, United States: AUAI Press.
Aragam, Bryon, Jiaying Gu, and Qing Zhou. 2017. arXiv:1703.04025 [Cs, Stat], March.
Aragam, Bryon, and Qing Zhou. 2015. Journal of Machine Learning Research 16: 2273–2328.
Aral, Sinan, Lev Muchnik, and Arun Sundararajan. 2009. Proceedings of the National Academy of Sciences 106 (51): 21544–49.
Arnold, Barry C., Enrique Castillo, and Jose M. Sarabia. 1999. Conditional Specification of Statistical Models. Springer Science & Business Media.
Baddeley, A. J., and Marie-Colette NM Van Lieshout. 1995. Annals of the Institute of Statistical Mathematics 47 (4): 601–19.
Baddeley, A. J., Marie-Colette NM Van Lieshout, and J. Møller. 1996. Advances in Applied Probability 28 (2): 346–55.
Baddeley, Adrian J, Jesper Møller, and Rasmus Plenge Waagepetersen. 2000. Statistica Neerlandica 54 (3): 329–50.
Baddeley, Adrian, and Jesper Møller. 1989. International Statistical Review / Revue Internationale de Statistique 57 (2): 89–121.
Barber, David. 2012. Bayesian Reasoning and Machine Learning. Cambridge ; New York: Cambridge University Press.
Bareinboim, Elias, Jin Tian, and Judea Pearl. 2014. In AAAI, 2410–16.
Bartolucci, Francesco, and Julian Besag. 2002. Biometrika 89 (3): 724–30.
Besag, Julian. 1974. Journal of the Royal Statistical Society. Series B (Methodological) 36 (2): 192–236.
———. 1975. Journal of the Royal Statistical Society. Series D (The Statistician) 24 (3): 179–95.
———. 1986. Journal of the Royal Statistical Society. Series B (Methodological) 48 (3): 259–302.
Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. Information Science and Statistics. New York: Springer.
Blake, Andrew, Pushmeet Kohli, and Carsten Rother, eds. 2011. Markov Random Fields for Vision and Image Processing. Cambridge, Mass: MIT Press.
Bloniarz, Adam, Hanzhong Liu, Cun-Hui Zhang, Jasjeet Sekhon, and Bin Yu. 2015. arXiv:1507.03652 [Math, Stat], July.
Boyd, Stephen. 2010. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Vol. 3. Now Publishers Inc.
Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. The Annals of Applied Statistics 9 (1): 247–74.
Bu, Yunqi, and Johannes Lederer. 2017. arXiv:1704.02739 [Stat], April.
Bühlmann, Peter, Markus Kalisch, and Lukas Meier. 2014. Annual Review of Statistics and Its Application 1 (1): 255–78.
Bühlmann, Peter, Philipp Rütimann, and Markus Kalisch. 2013. Statistical Methods in Medical Research 22 (5): 466–92.
Buntine, W. L. 1994. Journal of Artificial Intelligence Research 2 (1): 159–225.
Celeux, Gilles, Florence Forbes, and Nathalie Peyrard. 2003. Pattern Recognition 36 (1): 131–44.
Cevher, Volkan, Marco F. Duarte, Chinmay Hegde, and Richard Baraniuk. 2009. In Advances in Neural Information Processing Systems, 257–64. Curran Associates, Inc.
Charniak, Eugene. 1991. “Bayesian Networks Without Tears.” AI Magazine 12 (4): 50.
Christakis, Nicholas A., and James H. Fowler. 2007. New England Journal of Medicine 357 (4): 370–79.
Clifford, P. 1990. “Markov random fields in statistics.” In Disorder in Physical Systems: A Volume in Honour of John Hammersley, edited by G. R. Grimmett and D. J. A. Welsh. Oxford England : New York: Oxford University Press.
Crisan, Dan, and Joaquín Míguez. 2014. Bernoulli 20 (4): 1879–929.
Da Costa, Lancelot, Karl Friston, Conor Heins, and Grigorios A. Pavliotis. 2021. arXiv:2106.13830 [Math-Ph, Physics:nlin, q-Bio], June.
Dawid, A. P. 2001. Annals of Mathematics and Artificial Intelligence 32 (1-4): 335–72.
Dawid, A. Philip. 1979. Journal of the Royal Statistical Society. Series B (Methodological) 41 (1): 1–31.
———. 1980. The Annals of Statistics 8 (3): 598–617.
De Luna, Xavier, Ingeborg Waernbaum, and Thomas S. Richardson. 2011. Biometrika, October, asr041.
Edwards, David, and Smitha Ankinakatte. 2015. Statistical Modelling 15 (4): 301–25.
Fixx, James F. 1977. Games for the superintelligent. London: Muller.
Forbes, F., and N. Peyrard. 2003. IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (9): 1089–1101.
Frey, B.J., and Nebojsa Jojic. 2005. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (9): 1392–1416.
Frey, Brendan J. 2003. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, 257–64. UAI’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Fridman, Arthur. 2003. Proceedings of the National Academy of Sciences 100 (14): 8092–96.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008. Biostatistics 9 (3): 432–41.
Friel, Nial, and Håvard Rue. 2007. Biometrika 94 (3): 661–72.
Geyer, Charles J. 1991.
Geyer, Charles J., and Jesper Møller. 1994. Scandinavian Journal of Statistics, 359–73.
Goldberg, David A. 2013. arXiv:1301.1762 [Math-Ph], January.
Grenander, Ulf. 1989. The Annals of Statistics 17 (1): 1–30.
Griffeath, David. 1976. In Denumerable Markov Chains, 425–58. Graduate Texts in Mathematics 40. Springer New York.
Gu, Jiaying, Fei Fu, and Qing Zhou. 2014. arXiv:1403.2310 [Stat], March.
Häggström, Olle, Marie-Colette N. M. van Lieshout, and Jesper Møller. 1999. Bernoulli 5 (4): 641–58.
Heckerman, David, David Maxwell Chickering, Christopher Meek, Robert Rounthwaite, and Carl Kadie. 2000. Journal of Machine Learning Research 1 (Oct): 49–75.
Jensen, Jens Ledet, and Jesper Møller. 1991. The Annals of Applied Probability 1 (3): 445–61.
Jordan, Michael I. 2004. Statistical Science 19 (1): 140–55.
Jordan, Michael I., Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. 1999. Machine Learning 37 (2): 183–233.
Jordan, Michael Irwin. 1999. Learning in Graphical Models. Cambridge, Mass.: MIT Press.
Jordan, Michael I., and Yair Weiss. 2002a. The Handbook of Brain Theory and Neural Networks, 490–96.
———. 2002b. Handbook of Neural Networks and Brain Theory.
Kalisch, Markus, and Peter Bühlmann. 2007. Journal of Machine Learning Research 8 (May): 613–36.
Kindermann, Ross P., and J. Laurie Snell. 1980. The Journal of Mathematical Sociology 7 (1): 1–13.
Kindermann, Ross, and J. Laurie Snell. 1980. Markov Random Fields and Their Applications. Vol. 1. Contemporary Mathematics. Providence, Rhode Island: American Mathematical Society.
Kjærulff, Uffe B., and Anders L. Madsen. 2008. Bayesian Networks and Influence Diagrams. Information Science and Statistics. New York, NY: Springer New York.
Koller, Daphne, and Nir Friedman. 2009. Probabilistic Graphical Models : Principles and Techniques. Cambridge, MA: MIT Press.
Krämer, Nicole, Juliane Schäfer, and Anne-Laure Boulesteix. 2009. BMC Bioinformatics 10 (1): 384.
Krause, Andreas, and Carlos Guestrin. 2009. “Optimal Value of Information in Graphical Models.” J. Artif. Int. Res. 35 (1): 557–91.
Kschischang, F.R., B.J. Frey, and H.-A. Loeliger. 2001. IEEE Transactions on Information Theory 47 (2): 498–519.
Lauritzen, S. L., and D. J. Spiegelhalter. 1988. Journal of the Royal Statistical Society. Series B (Methodological) 50 (2): 157–224.
Lauritzen, Steffen L. 1996. Graphical Models. Oxford Statistical Science Series. Clarendon Press.
Lavrenko, Victor, and Jeremy Pickens. 2003a. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, 389. ACM Press.
———. 2003b. In Proceedings of the Eleventh ACM International Conference on Multimedia, 120. ACM Press.
LeCun, Yann, Sumit Chopra, Raia Hadsell, M. Ranzato, and F. Huang. 2006. In Predicting Structured Data.
Lederer, Johannes. 2016. arXiv:1609.05551 [Math, Stat], September.
Levine, Sergey. 2018. arXiv:1805.00909 [Cs, Stat], May.
Liu, Han, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. 2012a. arXiv:1206.6488 [Cs, Stat], June.
———. 2012b. The Annals of Statistics 40 (4): 2293–2326.
Liu, Han, Kathryn Roeder, and Larry Wasserman. 2010. In Advances in Neural Information Processing Systems 23, edited by J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, 1432–40. Curran Associates, Inc.
Loeliger, H.-A. 2004. IEEE Signal Processing Magazine 21 (1): 28–41.
Maathuis, Marloes H., and Diego Colombo. 2013. arXiv Preprint arXiv:1307.5636.
Maddage, Namunu C., Haizhou Li, and Mohan S. Kankanhalli. 2006. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 67. ACM Press.
Malioutov, Dmitry M., Jason K. Johnson, and Alan S. Willsky. 2006. Journal of Machine Learning Research 7 (October): 2031–64.
Mao, Yongyi, Frank R. Kschischang, and Brendan J. Frey. 2004. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 374–81. UAI ’04. Arlington, Virginia, United States: AUAI Press.
Marbach, Daniel, Robert J. Prill, Thomas Schaffter, Claudio Mattiussi, Dario Floreano, and Gustavo Stolovitzky. 2010. Proceedings of the National Academy of Sciences 107 (14): 6286–91.
McCallum, Andrew. 2012. arXiv:1212.2504 [Cs, Stat], October.
Meinshausen, Nicolai, and Peter Bühlmann. 2006. The Annals of Statistics 34 (3): 1436–62.
Mihalkova, Lilyana, and Raymond J. Mooney. 2007. In Proceedings of the 24th International Conference on Machine Learning, 625–32. ACM.
Mohan, Karthika, and Judea Pearl. 2018. In International Conference on Probabilistic Graphical Models, 284–95.
Montanari, Andrea. 2011.
Morgan, Jonathan Scott, Iman Barjasteh, Cliff Lampe, and Hayder Radha. 2014. arXiv:1412.1185 [Physics], December.
Murphy, Kevin P. 2012. Machine learning: a probabilistic perspective. 1 edition. Adaptive computation and machine learning series. Cambridge, MA: MIT Press.
Obermeyer, Fritz, Eli Bingham, Martin Jankowiak, Du Phan, and Jonathan P. Chen. 2020. arXiv:1910.10775 [Cs, Stat], March.
Osokin, A., D. Vetrov, and V. Kolmogorov. 2011. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1889–96.
Pearl, Judea. 1982. In Proceedings of the Second AAAI Conference on Artificial Intelligence, 133–36. AAAI’82. Pittsburgh, Pennsylvania: AAAI Press.
———. 1986. Artificial Intelligence 29 (3): 241–88.
———. 2008. Probabilistic reasoning in intelligent systems: networks of plausible inference. Rev. 2. print., 12. [Dr.]. The Morgan Kaufmann series in representation and reasoning. San Francisco, Calif: Kaufmann.
———. 2009. Causality: Models, Reasoning and Inference. Cambridge University Press.
Pearl, Judea, Dan Geiger, and Thomas Verma. 1989. Kybernetika 25 (7): 33–44.
Pereda, E, R Q Quiroga, and J Bhattacharya. 2005. “Nonlinear Multivariate Analysis of Neurophysiological Signals.” Progress in Neurobiology 77 (1-2): 1–37.
Pickens, Jeremy, and Costas S. Iliopoulos. 2005. In ISMIR, 207–14. Citeseer.
Pollard, Dave. 2004. “Hammersley-Clifford Theorem for Markov Random Fields.”
Rabbat, Michael G., MÁrio A. T. Figueiredo, and Robert D. Nowak. 2008. IEEE Transactions on Information Theory 54 (9): 4053–68.
Ranzato, M. 2013. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9): 2206–22.
Ravikumar, Pradeep, Martin J. Wainwright, and John D. Lafferty. 2010. The Annals of Statistics 38 (3): 1287–1319.
Reeves, R., and A. N. Pettitt. 2004. Biometrika 91 (3): 751–57.
Richardson, Matthew, and Pedro Domingos. 2006. Machine Learning 62 (1-2): 107–36.
Ripley, B. D., and F. P. Kelly. 1977. Journal of the London Mathematical Society s2-15 (1): 188–92.
Sadeghi, Kayvan. 2020. Electronic Journal of Statistics 14 (2): 2773–97.
Schmidt, Mark W., and Kevin P. Murphy. 2010. In International Conference on Artificial Intelligence and Statistics, 709–16.
Shachter, Ross D. 1998. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, 480–87. UAI’98. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Shalizi, Cosma Rohilla, and Edward McFowland III. 2016. arXiv:1607.06565 [Physics, Stat], July.
Smith, David A., and Jason Eisner. 2008. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 145–56. Association for Computational Linguistics.
Spirtes, Peter, Clark Glymour, and Richard Scheines. 2001. Causation, Prediction, and Search. Second Edition. Adaptive Computation and Machine Learning. The MIT Press.
Studený, Milan. 1997. International Journal of Approximate Reasoning, Uncertainty in AI (UAI’96) Conference, 17 (2–3): 265–93.
———. 2005. Probabilistic Conditional Independence Structures. Information Science and Statistics. London: Springer.
Studený, Milan, and Jiřina Vejnarová. 1998. “On Multiinformation Function as a Tool for Measuring Stochastic Dependence.” In Learning in Graphical Models, 261–97. Cambridge, Mass.: MIT Press.
Su, Ri-Qi, Wen-Xu Wang, and Ying-Cheng Lai. 2012. Phys. Rev. E 85 (6): 065201.
Sutton, Charles, and Andrew McCallum. 2010. arXiv:1011.4088, November.
Tansey, Wesley, Oscar Hernan Madrid Padilla, Arun Sai Suggala, and Pradeep Ravikumar. 2015. In Journal of Machine Learning Research, 684–92.
Vetrov, Dmitry, and Anton Osokin. 2011. In NIPS Workshop on Discrete Optimization in Machine Learning (DISCML NIPS).
Visweswaran, Shyam, and Gregory F. Cooper. 2014. arXiv:1407.2483 [Cs, Stat], July.
Wainwright, Martin J., and Michael I. Jordan. 2008. Graphical Models, Exponential Families, and Variational Inference. Vol. 1. Foundations and Trends® in Machine Learning. Now Publishers.
Wainwright, Martin, and Michael I Jordan. 2005. “A Variational Principle for Graphical Models.” In New Directions in Statistical Signal Processing. Vol. 155. MIT Press.
Wang, Chaohui, Nikos Komodakis, and Nikos Paragios. 2013. Computer Vision and Image Understanding 117 (11): 1610–27.
Wasserman, Larry, Mladen Kolar, and Alessandro Rinaldo. 2013. arXiv:1309.6933 [Cs, Math, Stat], September.
Weiss, Yair. 2000. Neural Computation 12 (1): 1–41.
Weiss, Yair, and William T. Freeman. 2001. Neural Computation 13 (10): 2173–2200.
Winn, John M., and Christopher M. Bishop. 2005. In Journal of Machine Learning Research, 661–94.
Wright, Sewall. 1934. The Annals of Mathematical Statistics 5 (3): 161–215.
Wu, Rui, R. Srikant, and Jian Ni. 2013. Stochastic Systems 3 (2): 362–404.
Yedidia, Jonathan S., W.T. Freeman, and Y. Weiss. 2005. IEEE Transactions on Information Theory 51 (7): 2282–312.
Yedidia, J.S., W.T. Freeman, and Y. Weiss. 2003. In Exploring Artificial Intelligence in the New Millennium, edited by G. Lakemeyer and B. Nebel, 239–36. Morgan Kaufmann Publishers.
Zhang, Kun, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. arXiv:1202.3775 [Cs, Stat], February.
Zhou, Mingyuan, Yulai Cong, and Bo Chen. 2017. “Augmentable Gamma Belief Networks,” 44.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.