Gaussian belief propagation

Least squares at maximal elaboration



A particularly tractable model assumption for message-passing inference which generalises classic Gaussian Kalman filters and Bayesian linear regression with Gaussian prior, or, in a frequentist setting, [least squares regression](./least_squares.html. Essentially, we regard the various nodes in our system as jointly Gaussian RVs with given prior mean and covariance (i.e. we do not allow the variances themselves to be random, a Gaussian is not a valid prior for a variance.)

How does this relate to least squares? Bickson (2009) puts it elegantly: \[\begin{array}{c} \mathbf{A} \mathbf{x}=\mathbf{b}\\ \mathbb{\Downarrow}\\ \mathbf{A x}-\mathbf{b}=0\\ \mathbb{\Downarrow}\\ \min _{\mathbf{x}}\left(\frac{1}{2} \mathbf{x}^{T} \mathbf{A} \mathbf{x}-\mathbf{b}^{T} \mathbf{x}\right)\\ \mathbb{\Downarrow}\\ \max _{\mathbf{x}}\left(-\frac{1}{2} \mathbf{x}^{T} \mathbf{A} \mathbf{x}+\mathbf{b}^{T} \mathbf{x}\right)\\ \mathbb{\Downarrow}\\ \max _{\mathbf{x}} \exp \left(-\frac{1}{2} \mathbf{x}^{T} \mathbf{A} \mathbf{x}+\mathbf{b}^{T} \mathbf{x}\right) \end{array}\]

GBP comes with various conveniences. For example, we can implement it mechanically without knowing the structure of the graphical model in advance. We can execute it concurrently since the model is robust against random execution schedules. This seems to be a happy coincidence of several features of multivariate Gaussians, e.g. that the Gaussian is self conjugate and thus passes through the factor and variable nodes in commensurable form. Also, all the required operations can be represented by simple linear algebra.

NB: notation is varied and thus possibly confusing. In the original paper (Weiss and Freeman 2001) variables are presented as pairwise linear models rather than a classic graphical model, which is a good insight but possibly confusing.

A famous interactive tutorial is gaussianbp.github.io, reprising Davison and Ortiz (2019). Loeliger et al. (2007) is also nice; it includes explicit accounting for Fourier transforms and arbitrary matrix products.

Parameterization

Ortiz et al summarize Gaussian parameterisations

Gaussian BP is often conducted using the canonical parameterization, where a particular node prior \(m\) may be written as: \[ p_{m}\left(\mathbf{x}_{m}\right)\propto e^{-\frac{1}{2}\left[\left(\mathbf{x}_{m}-\boldsymbol{\mu}_{m}\right)^{\top} \boldsymbol{\Lambda}_{m}\left(\mathbf{x}_{m}-\boldsymbol{\mu}_{m}\right)\right]} \] where \(\boldsymbol{\mu}_{m}\) is the mean of the distribution and \(\Lambda_{m}\) is its precision (i.e. inverse covariance.)

Davison and Ortiz (2019) recommend the “information parameterization” of Eustice, Singh, and Leonard (2006): \[ p_{m}\left(\mathbf{x}_{m}\right)\propto e^{\left[-\frac{1}{2} \mathbf{x}_{m}^{\top} \Lambda_{m} \mathbf{x}_{m}+\boldsymbol{\eta}_{m}^{\top} \mathbf{x}_{m}\right]} \] the value of either. The information vector \(\boldsymbol{\eta}_{m}\) relates to the mean vector as \[ \boldsymbol{\eta}_{m}=\Lambda_{m} \boldsymbol{\mu}_{m}. \] The information form is convenient as multiplication of distributions (when we update) is handled simply by adding the information vectors and precision matrices.

Linearisation

See GP noise transforms for a different version. In the current parameterisation…

Suppose the expected value of the measurement is given by non-linear map \(h,\) in the sense we that we observe Gaussian noise centered on \(h(\mathbf{x}_s)\). We define the associated Gaussian factor as follows: \[ f_{s}\left(\mathbf{x}_{s}\right)\propto e^{-\frac{1}{2}\left[\left(\mathbf{z}_{s}-\mathbf{h}_{s}\left(\mathbf{x}_{s}\right)\right)^{\top} \Lambda_{s}\left(\mathbf{z}_{s}-\mathbf{h}_{s}\left(\mathbf{x}_{s}\right)\right)\right]} \] We apply the first order Taylor series expansion of non-linear measurement function \(\mathbf{h}_{s}\) to find its approximate value for state values \(\mathbf{x}_{s}\) close to \(\mathbf{x}_{0}\) : \[ \mathbf{h}_{s}\left(\mathbf{x}_{s}\right) \approx \mathbf{h}_{s}\left(\mathbf{x}_{0}\right)+\mathrm{J}_{s}\left(\mathbf{x}_{s}-\mathbf{x}_{0}\right). \] Here \(\mathrm{J}_{s}\) is the Jacobian \(\left.\frac{\partial \mathbf{h}_{s}}{\partial \mathbf{x}_{s}}\right|_{\mathbf{x}_{s}=\mathbf{x}_{0}} .\) With a bit of rearranging quadratic forms, we can massage \(\mathbf{h}_{s}\left(\mathbf{x}_{s}\right)\) around state variables \(\mathbf{x}_{0}\), turning it into a Gaussian factor expressed in terms of \(\mathbf{x}_{s}\). We use the linear factor represented in information form by information vector \(\boldsymbol{\eta}_{s}\) and precision matrix \(\Lambda_{s}^{\prime}\) calculated as follows: \[ \begin{aligned} \boldsymbol{\eta}_{s} &=\mathrm{J}_{s}^{\top} \Lambda_{s}\left[\mathrm{~J}_{s} \mathbf{x}_{0}+\mathbf{z}_{s}-\mathbf{h}_{s}\left(\mathbf{x}_{0}\right)\right] \\ \Lambda_{s}^{\prime} &=\mathrm{J}_{s}^{\top} \Lambda_{s} \mathrm{~J}_{s}. \end{aligned} \]

Parallelisation

El-Kurdi’s thesis (Y. M. El-Kurdi 2014) exploits regularity and some other cunning tricks that I have not read yet to attain high speed convergence

Use in PDEs

Finite Element Models can be expressed as locally-linear constraints and thus expressed using GBP (Y. El-Kurdi et al. 2016; Y. M. El-Kurdi 2014; Y. El-Kurdi et al. 2015.)

References

Barfoot, Timothy D. 2020. Fundamental Linear Algebra Problem of Gaussian Inference.” arXiv:2010.08022 [Cs, Math, Stat], October.
Bickson, Danny. 2009. Gaussian Belief Propagation: Theory and Application.” PhD.
Bishop, Adrian N., and Arnaud Doucet. 2014. Distributed Nonlinear Consensus in the Space of Probability Measures.” IFAC Proceedings Volumes, 19th IFAC World Congress, 47 (3): 8662–68.
Cattivelli, Federico S., Cassio G. Lopes, and Ali H. Sayed. 2008. “Diffusion Recursive Least-Squares for Distributed Estimation over Adaptive Networks.” IEEE Transactions on Signal Processing 56 (5): 1865–77.
Cattivelli, Federico S., and Ali H. Sayed. 2009. “Diffusion LMS Strategies for Distributed Estimation.” IEEE Transactions on Signal Processing 58 (3): 1035–48.
———. 2010. Diffusion Strategies for Distributed Kalman Filtering and Smoothing.” IEEE Transactions on Automatic Control 55 (9): 2069–84.
Chen, Yan, and Dean S. Oliver. 2013. Levenberg–Marquardt Forms of the Iterative Ensemble Smoother for Efficient History Matching and Uncertainty Quantification.” Computational Geosciences 17 (4): 689–703.
Davison, Andrew J., and Joseph Ortiz. 2019. FutureMapping 2: Gaussian Belief Propagation for Spatial AI.” arXiv:1910.14139 [Cs], October.
Donoho, David L., and Andrea Montanari. 2013. High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing.” arXiv:1310.7320 [Cs, Math, Stat], October.
Du, Jian, Shaodan Ma, Yik-Chung Wu, Soummya Kar, and José M. F. Moura. 2018. Convergence Analysis of Belief Propagation on Gaussian Graphical Models.” arXiv:1801.06430 [Cs, Math], January.
El-Kurdi, Yousef Malek. 2014. “Parallel Finite Element Processing Using Gaussian Belief Propagation Inference on Probabilistic Graphical Models.” PhD Thesis, McGill University.
El-Kurdi, Yousef, Maryam Mehri Dehnavi, Warren J. Gross, and Dennis Giannacopoulos. 2015. Parallel Finite Element Technique Using Gaussian Belief Propagation.” Computer Physics Communications 193 (August): 38–48.
El-Kurdi, Yousef, David Fernandez, Warren J. Gross, and Dennis D. Giannacopoulos. 2016. Acceleration of the Finite-Element Gaussian Belief Propagation Solver Using Minimum Residual Techniques.” IEEE Transactions on Magnetics 52 (3): 1–4.
Eustice, Ryan M., Hanumant Singh, and John J. Leonard. 2006. Exactly Sparse Delayed-State Filters for View-Based SLAM.” IEEE Transactions on Robotics 22 (6): 1100–1114.
Jatavallabhula, Krishna Murthy, Ganesh Iyer, and Liam Paull. 2020. ∇SLAM: Dense SLAM Meets Automatic Differentiation.” In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2130–37. Paris, France: IEEE.
Kamper, Francois, and Sarel J Steel. n.d. “On the Convergence of Gaussian Belief Propagation with Nodes of Arbitrary Size,” 37.
Loeliger, Hans-Andrea, Justin Dauwels, Junli Hu, Sascha Korl, Li Ping, and Frank R. Kschischang. 2007. The Factor Graph Approach to Model-Based Signal Processing.” Proceedings of the IEEE 95 (6): 1295–1322.
Malioutov, Dmitry M., Jason K. Johnson, and Alan S. Willsky. 2006. Walk-Sums and Belief Propagation in Gaussian Graphical Models.” Journal of Machine Learning Research 7 (October): 2031–64.
Murphy, Kevin P. 2012. Machine learning: a probabilistic perspective. 1 edition. Adaptive computation and machine learning series. Cambridge, MA: MIT Press.
Nguyen, Trung V., and Edwin V. Bonilla. 2014. Automated Variational Inference for Gaussian Process Models.” In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, 1404–12. NIPS’14. Cambridge, MA, USA: MIT Press.
Ortiz, Joseph, Talfan Evans, and Andrew J. Davison. 2021. A Visual Introduction to Gaussian Belief Propagation.” arXiv:2107.02308 [Cs], July.
Sayed, Ali. 2014. Adaptation, Learning, and Optimization over Networks.” Foundations and Trends® in Machine Learning 7 (4-5): 311–801.
Song, Le, Arthur Gretton, Danny Bickson, Yucheng Low, and Carlos Guestrin. n.d. “Kernel Belief Propagation,” 9.
Su, Qinliang, and Yik-Chung Wu. 2015. On Convergence Conditions of Gaussian Belief Propagation.” IEEE Transactions on Signal Processing 63 (5): 1144–55.
Wang, Shengdi, and Armin Dekorsy. 2020. A Factor Graph-Based Distributed Consensus Kalman Filter.” IEEE Signal Processing Letters 27: 2039–43.
Weiss, Yair, and William T. Freeman. 2001. Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology.” Neural Computation 13 (10): 2173–2200.
Zivojevic, Dino, Muhamed Delalic, Darijo Raca, Dejan Vukobratovic, and Mirsad Cosovic. 2021. Distributed Weighted Least-Squares and Gaussian Belief Propagation: An Integrated Approach.” Preprint.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.