Gaussian belief propagation
Least squares at maximal elaboration
November 25, 2014 — September 1, 2022
A particularly tractable model assumption for message-passing inference generalizes classic Gaussian Kalman filters and Bayesian linear regression with Gaussian prior, or, in a frequentist setting, least squares regression. Essentially, we regard the various nodes in our system as jointly Gaussian RVs with given prior mean and covariance (i.e. we do not allow the variances themselves to be random, as a Gaussian is not a valid prior for a variance). This works well because the Gaussian distribution is a natural exponential family, and moreover, self-conjugate under linear transformations, and thus the message-passing updates are simple linear algebra operations.
How does this relate to least squares? Bickson (2009) puts it elegantly:
GBP comes with various conveniences. For example, we can implement it mechanically without knowing the structure of the graphical model in advance. We can execute it concurrently since the model is robust against random execution schedules. This seems to be a happy coincidence of several features of multivariate Gaussians, e.g. that the Gaussian is self-conjugate and thus passes through the factor and variable nodes in commensurable form. Also, all the required operations can be represented by simple linear algebra.
NB: notation is varied and thus possibly confusing. In the original paper (Weiss and Freeman 2001) variables are presented as pairwise linear models rather than a classic graphical model, which is a good insight but possibly confusing.
A famous interactive tutorial is gaussianbp.github.io, reprising Davison and Ortiz (2019). Loeliger et al. (2007) is also nice; it includes explicit accounting for Fourier transforms and arbitrary matrix products.
1 Parameterization
Gaussian BP is often conducted using the canonical parameterization, in which a particular node prior
Davison and Ortiz (2019) recommend the “information parameterization” of Eustice, Singh, and Leonard (2006):
Let us recap that in Gaussian tricks.
2 Linearization
See GP noise transforms for a typical way to do it. In the information parameterization the rules look different. Suppose the expected value of the measurement is given by non-linear map
3 Variational inference
Intuition building. Noting that the distance between two Gaussians in KL is given by
Suppose I have a Gaussian RV with some prior distribution
4 Parallelisation
El-Kurdi’s thesis (Y. M. El-Kurdi 2014) exploits some assortment of tricks to attain high-speed convergence in pde-type models.
5 Robust
There are generalised belief propagations based on elliptical laws (Agarwal et al. 2013; Davison and Ortiz 2019), making this into a kind of elliptical belief propagation. It is usually referred to as a robust factor or as dynamic covariance scaling. Usually this is done with robust losses; I’m not sure why other elliptical distributions are unpopular.
6 Use in PDEs
Finite Element Models can be expressed as locally-linear constraints and thus expressed using GBP (Y. El-Kurdi et al. 2016; Y. M. El-Kurdi 2014; Y. El-Kurdi et al. 2015).
7 Tools
7.1 jaxfg
7.2 gradslam
Differentiable GPB solver (Jatavallabhula, Iyer, and Paull 2020).
7.3 ceres solver
ceres-solver, (C++), the google least squares solver, seems to solve this kind of problem. I am not sure where the covariance matrices go in. I occasionally see mention of “CUDA” in the source repo so maybe it exploits GPUs these days.