Variational inference by message-passing in graphical models

Variational inference where the model factorizes over some graphical independence structure, which means we get cheap and distributed inference. I am currently particularly interested in this for latent GP models. Many things can be expressed as message passing algorithms. The grandparent idea in this unification seems to be “Belief propagation”, a.k.a. “sum-product message-passing”, credited to (Pearl, 1982) for DAGs and then generalised to MRFs, PGMs, factor graphs etc. Apparently this definition subsumes such diverse models as the Viterbi and Baum-Welch algorithms, among others, and refers to more or less any method that allows local computation of a big statistical model using the graphical conditional independence structure. There are many overviews. (Tom P. Minka 2005; H.-A. Loeliger 2004; Yedidia, Freeman, and Weiss 2003; Sutton and Minka 2006; Wand 2017; Cox, van de Laar, and de Vries 2019) Dustin Tran does a good one discussing (Wand 2017).

Advice from (Tom P. Minka 2005):

The recipe to make a message-passing algorithm has four steps:

  • Pick an approximating family for q to be chosen from. For example, the set of fully-factorized distributions, the set of Gaussians, the set of k-component mixtures, etc.
  • Pick a divergence measure to minimize. For example, mean-field methods minimize the Kullback-Leibler divergence \(KL(q \| p)\), expectation propagation minimizes \(KL(p \| q)\), and power EP minimizes α-divergence, \(D\alpha(p \| q)\).
  • Construct an optimization algorithm for the chosen divergence measure and approximating family. Usually this is a fixed-point iteration obtained by setting the gradients to zero.
  • Distribute the optimization across the network, by dividing the network p into factors, and minimizing local divergence at each factor.

There is an overview lecture by Thomas Orton, which connects this with statistical mechanics of statistics.

Last week, we saw how certain computational problems like 3SAT exhibit a thresholding behavior, similar to a phase transition in a physical system. In this post, we’ll continue to look at this phenomenon by exploring a heuristic method, belief propagation (and the cavity method), which has been used to make hardness conjectures, and also has thresholding properties. In particular, we’ll start by looking at belief propagation for approximate inference on sparse graphs as a purely computational problem. After doing this, we’ll switch perspectives and see belief propagation motivated in terms of Gibbs free energy minimization for physical systems.

Interesting projects in this vein:

  • GAMP.

  • ForneyLab looks especially useful for me:

    The message passing paradigm offers a convenient method for leveraging model-specific structures, while remaining generally applicable. Message passing can be conveniently formulated on a Forney-style factor graph (FFG) representation of the model [2]. Inference tasks on the model can then be decomposed in local computations, represented by messages that flow across the graph. This locality allows for storing pre-computed message updates in a look-up table that can be re-used across models. Automated algorithm construction then amounts to scheduling these messages in the order required by the inference task (see also this conference paper at JuliaCon).

    ForneyLab (GitHub) is introduced in this paper as a novel Julia package that allows the user to specify a probabilistic model as an FFG and pose inference problems on this FFG. In return, ForneyLab automatically constructs a Julia program that executes a message passing-based (approximate) inference procedure. ForneyLab is designed with a focus on flexibility, extensibility and applicability to biologically plausible models for perception and decision making, such as the hierarchical Gaussian filter (HGF). With ForneyLab, the search for better models for perception and action can be accelerated

This idea has been baked into various probabilistic programming frameworks by now.


Aji, S.M., and R.J. McEliece. 2000. “The Generalized Distributive Law.” IEEE Transactions on Information Theory 46 (2): 325–43.
Barbier, Jean. 2015. “Statistical Physics and Approximate Message-Passing Algorithms for Sparse Linear Estimation Problems in Signal Processing and Coding Theory.” arXiv:1511.01650 [cs, Math], November.
Barbier, Jean, Florent Krzakala, Nicolas Macris, Léo Miolane, and Lenka Zdeborová. 2017. “Phase Transitions, Optimal Errors and Optimality of Message-Passing in Generalized Linear Models.” arXiv:1708.03395 [cond-Mat, Physics:math-Ph], August.
Bayati, Mohsen, and Andrea Montanari. 2011. “The Dynamics of Message Passing on Dense Graphs, with Applications to Compressed Sensing.” IEEE Transactions on Information Theory 57 (2): 764–85.
Blake, Andrew, Pushmeet Kohli, and Carsten Rother, eds. 2011. Markov Random Fields for Vision and Image Processing. Cambridge, Mass: MIT Press.
Borgerding, Mark, and Philip Schniter. 2016. “Onsager-Corrected Deep Networks for Sparse Linear Inverse Problems.” arXiv:1612.01183 [cs, Math], December.
Cevher, Volkan, Marco F. Duarte, Chinmay Hegde, and Richard Baraniuk. 2009. “Sparse Signal Recovery Using Markov Random Fields.” In Advances in Neural Information Processing Systems, 257–64. Curran Associates, Inc.
Cox, Marco, Thijs van de Laar, and Bert de Vries. 2019. “A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms.” International Journal of Approximate Reasoning 104 (January): 185–204.
Dehaene, Guillaume P. 2016. “Expectation Propagation Performs a Smoothed Gradient Descent.” arXiv:1612.05053 [stat], December.
Donoho, David L., A. Maleki, and A. Montanari. 2010. “Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction.” In 2010 IEEE Information Theory Workshop (ITW), 1–5.
Donoho, David L., Arian Maleki, and Andrea Montanari. 2009a. “Message-Passing Algorithms for Compressed Sensing.” Proceedings of the National Academy of Sciences 106 (45): 18914–19.
———. 2009b. “Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation.” In 2010 IEEE Information Theory Workshop (ITW), 1–5.
Donoho, David L., and Andrea Montanari. 2013. “High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing.” arXiv:1310.7320 [cs, Math, Stat], October.
Forney, G.D. 2001. “Codes on Graphs: Normal Realizations.” IEEE Transactions on Information Theory 47 (2): 520–48.
Frey, B.J., and Nebojsa Jojic. 2005. “A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models.” IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (9): 1392–1416.
Gemp, Ian, Brian McWilliams, Claire Vernade, and Thore Graepel. 2020. “EigenGame: PCA as a Nash Equilibrium.” In.
Jaggi, Martin, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, and Michael I Jordan. 2014. “Communication-Efficient Distributed Dual Coordinate Ascent.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3068–76. Curran Associates, Inc.
Jordan, Michael I., Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. 1999. “An Introduction to Variational Methods for Graphical Models.” Machine Learning 37 (2): 183–233.
Kirkley, Alec, George T. Cantwell, and M. E. J. Newman. 2020. “Message Passing for Probabilistic Models on Networks with Loops.” arXiv:2009.12246 [cond-Mat], September.
Kschischang, F.R., B.J. Frey, and H.-A. Loeliger. 2001. “Factor Graphs and the Sum-Product Algorithm.” IEEE Transactions on Information Theory 47 (2): 498–519.
Laar, Thijs van de, Marco Cox, Ismail Senoz, Ivan Bocharov, and Bert de Vries. n.d. “ForneyLab: A Toolbox for Biologically Plausible Free Energy Minimization in Dynamic Neural Models,” 3.
Loeliger, H.-A. 2004. “An Introduction to Factor Graphs.” IEEE Signal Processing Magazine 21 (1): 28–41.
Loeliger, Hans-Andrea, Justin Dauwels, Junli Hu, Sascha Korl, Li Ping, and Frank R. Kschischang. 2007. “The Factor Graph Approach to Model-Based Signal Processing.” Proceedings of the IEEE 95 (6): 1295–1322.
Ma, Chenxin, Virginia Smith, Martin Jaggi, Michael I. Jordan, Peter Richtárik, and Martin Takáč. 2015. “Adding Vs. Averaging in Distributed Primal-Dual Optimization.” arXiv:1502.03508 [cs], February.
Malioutov, Dmitry M., Jason K. Johnson, and Alan S. Willsky. 2006. “Walk-Sums and Belief Propagation in Gaussian Graphical Models.” Journal of Machine Learning Research 7 (October): 2031–64.
Minka, Thomas P. 2001. “Expectation Propagation for Approximate Bayesian Inference.” In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, 362–69. UAI’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Minka, Tom P. 2005. “Divergence Measures and Message Passing.” Technical report, Microsoft Research.
———. 2008. “EP: A Quick Reference.” Techincal Report.
Montanari, Andrea. 2012. “Graphical Models Concepts in Compressed Sensing.” Compressed Sensing: Theory and Applications, 394–438.
Murphy, Kevin P. 2012. Machine learning: a probabilistic perspective. 1 edition. Adaptive computation and machine learning series. Cambridge, MA: MIT Press.
Nguyen, Trung V., and Edwin V. Bonilla. 2014. “Automated Variational Inference for Gaussian Process Models.” In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, 1404–12. NIPS’14. Cambridge, MA, USA: MIT Press.
Pearl, Judea. 1986. “Fusion, Propagation, and Structuring in Belief Networks.” Artificial Intelligence 29 (3): 241–88.
Peleg, Tomer, Yonina C. Eldar, and Michael Elad. 2010. “Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery.” IEEE Transactions on Signal Processing 60 (5): 2286–2303.
Rajaei, Boshra, Sylvain Gigan, Florent Krzakala, and Laurent Daudet. 2017. “Robust Phase Retrieval with the Swept Approximate Message Passing (prSAMP) Algorithm.” Image Processing On Line 7 (January): 43–55.
Roychowdhury, Anirban, and Brian Kulis. 2015. “Gamma Processes, Stick-Breaking, and Variational Inference.” In Artificial Intelligence and Statistics, 800–808. PMLR.
Schniter, P., and S. Rangan. 2012. “Compressive Phase Retrieval via Generalized Approximate Message Passing.” In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 815–22.
Smith, David A., and Jason Eisner. 2008. “Dependency Parsing by Belief Propagation.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 145–56. Association for Computational Linguistics.
Sutton, Charles, and Tom P Minka. 2006. “Local Training and Belief Propagation.” Technical Report TR-2006-121, Microsoft Research.
Wainwright, Martin J., and Michael I. Jordan. 2008. Graphical Models, Exponential Families, and Variational Inference. Vol. 1. Foundations and Trends® in Machine Learning. Now Publishers.
Wainwright, Martin, and Michael I Jordan. 2005. “A Variational Principle for Graphical Models.” In New Directions in Statistical Signal Processing. Vol. 155. MIT Press.
Wand, M. P. 2017. “Fast Approximate Inference for Arbitrarily Large Semiparametric Regression Models via Message Passing.” Journal of the American Statistical Association 112 (517): 137–68.
Welling, Max, Tom P Minka, and Yee Whye Teh. 2012. “Structured Region Graphs: Morphing EP into GBP.” arXiv:1207.1426 [cs], July.
Winn, John M., and Christopher M. Bishop. 2005. “Variational Message Passing.” In Journal of Machine Learning Research, 661–94.
Xing, Eric P., Michael I. Jordan, and Stuart Russell. 2003. “A Generalized Mean Field Algorithm for Variational Inference in Exponential Families.” In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, 583–91. UAI’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Yedidia, J.S., W.T. Freeman, and Y. Weiss. 2003. “Understanding Belief Propagation and Its Generalizations.” In Exploring Artificial Intelligence in the New Millennium, edited by G. Lakemeyer and B. Nebel, 239–36. Morgan Kaufmann Publishers.
Yoshida, Ryo, and Mike West. 2010. “Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.” Journal of Machine Learning Research 11 (May): 1771–98.
Yuille, Alan. 2011. “Loopy Belief Propagation, Mean Field Theory and Bethe Approximations.” In Markov Random Fields for Vision and Image Processing, edited by Andrew Blake, Pushmeet Kohli, and Carsten Rother. Cambridge, Mass: MIT Press.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.