# Posterior Gaussian process samples by updating prior samples

Matheron’s other weird trick

December 3, 2020 — May 8, 2024

functional analysis
Gaussian
generative
Hilbert space
kernel tricks
regression
spatial
stochastic processes
time series

$\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\dif}{\backslash} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}$

Can we find a transformation that will turn a Gaussian process prior sample into a Gaussian process posterior sample. A special trick where we do GP regression by GP simulation.

The main tool is an old insight made useful for modern problems in J. T. Wilson et al. (2020) (brusque) and J. T. Wilson et al. (2021) (deep). Actioned in Ritter et al. (2021) to condition probabilistic neural nets somehow.

## 1 Matheron updates for Gaussian RVs

We start by examining a slightly different way of defining a Gaussian RV , starting from the recipe for sampling:

A random vector $$\boldsymbol{x}=\left(x_{1}, \ldots, x_{n}\right) \in \mathbb{R}^{n}$$ is said to be Gaussian if there exists a matrix $$\mathbf{L}$$ and vector $$\boldsymbol{\mu}$$ such that $\boldsymbol{x} \stackrel{\mathrm{d}}{=} \boldsymbol{\mu}+\mathbf{L} \boldsymbol{\zeta} \quad \boldsymbol{\zeta} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ where $$\mathcal{N}(\mathbf{0}, \mathbf{I})$$ is known as the standard version of a (multivariate) normal distribution, which is defined through its density.

This is the location-scale form of a Gaussian RV, as opposed to the canonical form which we use in Gaussian Belief Propagation. In location-scale form, a non-degenerate Gaussian RV’s distribution is given (uniquely) by its mean $$\boldsymbol{\mu}=\mathbb{E}(\boldsymbol{x})$$ and its covariance $$\boldsymbol{\Sigma}=\mathbb{E}\left[(\boldsymbol{x}-\boldsymbol{\mu})(\boldsymbol{x}-\boldsymbol{\mu})^{\top}\right] .$$ In this notation the density, if defined, is $p(\boldsymbol{x})=\mathcal{N}(\boldsymbol{x} ; \boldsymbol{\mu}, \boldsymbol{\Sigma})=\frac{1}{\sqrt{|2 \pi \boldsymbol{\Sigma}|}} \exp \left(-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{\top} \boldsymbol{\Sigma}^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right).$

Since $$\zeta$$ has identity covariance, any matrix square root of $$\boldsymbol{\Sigma}$$, such as the Cholesky factor $$\mathbf{L}$$ with $$\boldsymbol{\Sigma}=\mathbf{L L}^{\top}$$, may be used to draw $$\boldsymbol{x}=\boldsymbol{\mu}+\mathbf{L} \boldsymbol{\zeta}.$$

tl;dr we can think about drawing any Gaussian RV as transforming a standard Gaussian. So much is basic entry-level stuff. What might a rule which updates a Gaussian prior into a data-conditioned posterior look like? Like this!

We define $$\cov(a,b)=\Sigma_{a,b}$$ as the covariance between two random variables :

Matheron’s Update Rule: Let $$\boldsymbol{a}$$ and $$\boldsymbol{b}$$ be jointly Gaussian, centered random variables. Then the random variable $$\boldsymbol{a}$$ conditional on $$\boldsymbol{b}=\boldsymbol{\beta}$$ may be expressed as $(\boldsymbol{a} \mid \boldsymbol{b}=\boldsymbol{\beta}) \stackrel{\mathrm{d}}{=} \boldsymbol{a}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}}{\boldsymbol{\Sigma}}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}(\boldsymbol{\beta}-\boldsymbol{b})$ Proof: Comparing the mean and covariance on both sides immediately affirms the result \begin{aligned} \mathbb{E}\left(\boldsymbol{a}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}(\boldsymbol{\beta}-\boldsymbol{b})\right) & =\boldsymbol{\mu}_{\boldsymbol{a}}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}\left(\boldsymbol{\beta}-\boldsymbol{\mu}_{\boldsymbol{b}}\right) \\ & =\mathbb{E}(\boldsymbol{a} \mid \boldsymbol{b}=\boldsymbol{\beta}) \end{aligned} \begin{aligned} \operatorname{Cov}\left(\boldsymbol{a}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}(\boldsymbol{\beta}-\boldsymbol{b})\right) &=\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{a}}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1} \operatorname{Cov}(\boldsymbol{b}) \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{a}} \\ & =\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{a}}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{a}}\\ &=\operatorname{Cov}(\boldsymbol{a} \mid \boldsymbol{b} =\boldsymbol{\beta}) \end{aligned}

Can we find a transformation that will turn a Gaussian process prior sample (i.e. function) into a Gaussian process posterior sample, and thus use prior samples, which are presumably pretty easy, to create posterior ones, which are often hard. If we evaluate the sampled function at a finite number of points, then we can use the Matheron formula to do precisely that. Sometimes this can even be useful. The resulting algorithm uses tricks from both analytic GP regression and Monte Carlo.

The sample based approximation to this is precisely the Ensemble Kalman Filter.

## 2 “Exact” updates for Gaussian processes

Exact in the sense that we do not approximate the data. These updates are not exact if our basis function representation is only an approximation to the “true” GP (as with classic GPs) and not exact in the sense that we will be using samples to approximate measures. For now we assume that the observation likelihood is Gaussian.1

For a Gaussian process $$f \sim \mathcal{G P}(\mu, k)$$ with marginal $$\boldsymbol{f}_{m}=f(\mathbf{Z})$$, the process conditioned on $$\boldsymbol{f}_{m}=\boldsymbol{y}$$ admits, in distribution, the representation $\underbrace{(f \mid \boldsymbol{y})(\cdot)}_{\text {posterior }} \stackrel{\mathrm{d}}{=} \underbrace{f(\cdot)}_{\text {prior }}+\underbrace{k(\cdot, \mathbf{Z}) \mathbf{K}_{m, m}^{-1}\left(\boldsymbol{y}-\boldsymbol{f}_{m}\right)}_{\text {update }}.$

If our observations are contaminated by additive i.i.d Gaussian noise, $$\boldsymbol{y}=\boldsymbol{f}_{m} +\boldsymbol{\varepsilon}$$ with $$\boldsymbol{\varepsilon}\sim\mathcal{N}(\boldsymbol{0}, \sigma^2\mathbf{I}),$$ we find \begin{aligned} &\boldsymbol{f}_{*} \mid \boldsymbol{y} \stackrel{\mathrm{d}}{=} \boldsymbol{f}_{*}+\mathbf{K}_{*, n}\left(\mathbf{K}_{n, n}+\sigma^{2} \mathbf{I}\right)^{-1}(\boldsymbol{y}-\boldsymbol{f}-\boldsymbol{\varepsilon}) \end{aligned} When sampling from exact GPs we jointly draw $$\boldsymbol{f}_{*}$$ and $$\boldsymbol{f}$$ from the prior. Then, we combine $$\boldsymbol{f}$$ with noise variates $$\varepsilon \sim \mathcal{N}\left(\mathbf{0}, \sigma^{2} \mathbf{I}\right)$$ such that $$\boldsymbol{f}+\varepsilon$$ constitutes a draw from the prior distribution of $$\boldsymbol{y}$$.

Compare this to the equivalent distributional update from classical GP regression which updates the moments of a distribution, not samples from a path — the formulae are related though:

…the conditional distribution is the Gaussian $$\mathcal{N}\left(\boldsymbol{\mu}_{* \mid y}, \mathbf{K}_{*, * \mid y}\right)$$ with moments \begin{aligned} \boldsymbol{\mu}_{* \mid \boldsymbol{y}}&=\boldsymbol{\mu}_*+\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1}\left(\boldsymbol{y}-\boldsymbol{\mu}_n\right) \\ \mathbf{K}_{*, * \mid \boldsymbol{y}}&=\mathbf{K}_{*, *}-\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1} \mathbf{K}_{n, *}\end{aligned}

## 3 Using basis functions

For many purposes we need a basis function representation, a.k.a. the weight-space representation. We assert the GP can be written as a random function comprising basis functions $$\boldsymbol{\phi}=\left(\phi_{1}, \ldots, \phi_{\ell}\right)$$ with the Gaussian random weight vector $$w$$ so that $f^{(w)}(\cdot)=\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot) \quad \boldsymbol{w} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Sigma}_{\boldsymbol{w}}\right).$ $$f^{(w)}$$ is a random function satisfying $$\boldsymbol{f}^{(\boldsymbol{w})} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Phi}_{n} \boldsymbol{\Sigma}_{\boldsymbol{w}} \boldsymbol{\Phi}^{\top}\right)$$, where $$\boldsymbol{\Phi}_{n}=\boldsymbol{\phi}(\mathbf{X})$$ is a $$|\mathbf{X}| \times \ell$$ matrix of features. If we are lucky, the representation might not be too bad when the basis is truncated to a small size.

The posterior weight distribution $$\boldsymbol{w} \mid \boldsymbol{y} \sim \mathcal{N}\left(\boldsymbol{\mu}_{\boldsymbol{w} \mid n}, \boldsymbol{\Sigma}_{\boldsymbol{w} \mid n}\right)$$ is Gaussian with moments \begin{aligned} \boldsymbol{\mu}_{\boldsymbol{w} \mid n} &=\left(\boldsymbol{\Phi}^{\top} \boldsymbol{\Phi}+\sigma^{2} \mathbf{I}\right)^{-1} \boldsymbol{\Phi}^{\top} \boldsymbol{y} \\ \boldsymbol{\Sigma}_{\boldsymbol{w} \mid n} &=\left(\boldsymbol{\Phi}^{\top} \boldsymbol{\Phi}+\sigma^{2} \mathbf{I}\right)^{-1} \sigma^{2} \end{aligned} where $$\boldsymbol{\Phi}=\boldsymbol{\phi}(\mathbf{X})$$ is an $$n \times \ell$$ feature matrix. We solve for the right-hand side at $$\mathcal{O}\left(\min \{\ell, n\}^{3}\right)$$ cost by applying the Woodbury identity as needed. So far there is nothing unusual here; the cool bit is realising we can represent this update as an independent operation.

In the weight-space setting, the pathwise update given an initial weight vector $$\boldsymbol{w} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$$ is $$\boldsymbol{w} \mid \boldsymbol{y} \stackrel{\mathrm{d}}{=} \boldsymbol{w}+\boldsymbol{\Phi}^{\top}\left(\boldsymbol{\Phi} \boldsymbol{\Phi}^{\top}+\sigma^{2} \mathbf{I}\right)^{-1}\left(\boldsymbol{y}-\boldsymbol{\Phi}^{\top} \boldsymbol{w}-\boldsymbol{\varepsilon}\right).$$

So if we had a nice weight-space representation for everything already we could go home at this point. However, for many models we are not given that; we might find natural bases for the prior and posterior are not the same (the posterior should not be stationary usually, for one thing).

The innovation in J. T. Wilson et al. (2020) is to make different choices of functional bases for prior and posterior updates. We can choose anything really, AFAICT. They suggest Fourier basis for prior and the canonical basis, i.e. the reproducing kernel basis $$k(\cdot,\vv{x})$$ for the update. Then we have $\underbrace{(f \mid \boldsymbol{y})(\cdot)}_{\text {sparse posterior }} \stackrel{\mathrm{d}}{\approx} \underbrace{\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot)}_{\text {weight-space prior}} +\underbrace{\sum_{j=1}^{m} v_{j} k\left(\cdot, \boldsymbol{x}_{j}\right)}_{\text {function-space update}} ,$ where we have defined $$\boldsymbol{v}=\left(\mathbf{K}_{n, n}+\sigma^{2} \mathbf{I}\right)^{-1}\left(\boldsymbol{y}-\boldsymbol{\Phi}^{\top} \boldsymbol{w}- \boldsymbol{\varepsilon}\right) .$$

## 4 Sparse GPs

i.e. using inducing variables.

Given $$q(\boldsymbol{u})$$, we approximate posterior distributions as $p\left(\boldsymbol{f}_{*} \mid \boldsymbol{y}\right) \approx \int_{\mathbb{R}^{m}} p\left(\boldsymbol{f}_{*} \mid \boldsymbol{u}\right) q(\boldsymbol{u}) \mathrm{d} \boldsymbol{u} .$ If $$\boldsymbol{u} \sim \mathcal{N}\left(\boldsymbol{\mu}_{\boldsymbol{u}}, \boldsymbol{\Sigma}_{\boldsymbol{u}}\right)$$, we compute this integral analytically to obtain a Gaussian distribution with mean and covariance \begin{aligned} \boldsymbol{m}_{* \mid m} &=\mathbf{K}_{*, m} \mathbf{K}_{m, m}^{-1} \boldsymbol{\mu}_{m} \\ \mathbf{K}_{*, * \mid m} &=\mathbf{K}_{*, *}+\mathbf{K}_{*, m} \mathbf{K}_{m, m}^{-1}\left(\boldsymbol{\Sigma}_{\boldsymbol{u}}-\mathbf{K}_{m, m}\right) \mathbf{K}_{m, m}^{-1} \mathbf{K}_{m, *^{*}} \end{aligned}

\begin{aligned} &\boldsymbol{f}_{*} \mid \boldsymbol{u} \stackrel{\mathrm{d}}{=} \boldsymbol{f}_{*}+\mathbf{K}_{*, m} \mathbf{K}_{m, m}^{-1}\left(\boldsymbol{u}-\boldsymbol{f}_{m}\right) \\ \end{aligned}

When sampling from sparse GPs we draw $$\boldsymbol{f}_{*}$$ and $$\boldsymbol{f}_{m}$$ together from the prior, and independently generate target values $$\boldsymbol{u} \sim q(\boldsymbol{u}) .$$ $\underbrace{(f \mid \boldsymbol{u})(\cdot)}_{\text {sparse posterior }} \stackrel{\mathrm{d}}{\approx} \underbrace{\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot)}_{\text {weight-space prior}} +\underbrace{\sum_{j=1}^{m} v_{j} k\left(\cdot, \boldsymbol{z}_{j}\right)}_{\text {function-space update}} ,$ where we have defined $$\boldsymbol{v}=\mathbf{K}_{m, m}^{-1}\left(\boldsymbol{u}-\boldsymbol{\Phi}^{\top} \boldsymbol{w}\right).$$

## 5 More pathwise hacks

Conditioning is all well and good, but what else might we want to do with pathwise updates? One possible use is density multiplication: Given $$\boldsymbol{a}_1\sim \mathcal{N}(\boldsymbol{\mu}_1,\boldsymbol{\Sigma}_1)$$ and $$\boldsymbol{a}_2\sim \mathcal{N}(\boldsymbol{\mu}_2,\boldsymbol{\Sigma}_2)$$, can I find a pathwise update that gives me $$\boldsymbol{a}_{1,2}\sim \mathcal{N}(\boldsymbol{\mu}_1,\boldsymbol{\Sigma}_1)\mathcal{N}(\boldsymbol{\mu}_2,\boldsymbol{\Sigma}_2)=:\mathcal{N}(\boldsymbol{\mu}_{12},\boldsymbol{\Sigma}_{12})$$? This ends up being useful in, e.g. belief propagation. A standard result is that the product of two Gaussian densities is another Gaussian density, with mean and covariance given by \$

To keep this simple let’s consider the case where $$\boldsymbol{\mu}_1=\boldsymbol{\mu}_1=0$$.

TBD

## 7 What pathwise updates are possible?

Another way we can think about this is in terms of the action on $$\boldsymbol{x}$$, and the correlated $$\boldsymbol{y}$$ as an auxiliary which we use to update $$\boldsymbol{x}$$. Does that get us anything useful?

## 8 Matrix GPs

(Ritter et al. 2021 appendix D) reframes the Matheron update and generalises it to matrix Gaussians. TBC.

## 9 Stationary moves

Thus far we have talked about moves updating a prior to a posterior; how about moves within a posterior?

We could try Langevin sampling, for example, or SG MCMC but these all seem to require inverting the covariance matrix so are not likely to be efficient in general. Can we do better?

## 11 References

Abrahamsen. 1997.
Abt, and Welch. 1998. Canadian Journal of Statistics.
Altun, Smola, and Hofmann. 2004. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. UAI ’04.
Alvarado, and Stowell. 2018. arXiv:1705.07104 [Cs, Stat].
Ambikasaran, Foreman-Mackey, Greengard, et al. 2015. arXiv:1403.6015 [Astro-Ph, Stat].
Bachoc, F., Gamboa, Loubes, et al. 2018. IEEE Transactions on Information Theory.
Bachoc, Francois, Suvorikova, Ginsbourger, et al. 2019. arXiv:1805.00753 [Stat].
Birgé, and Massart. 2006. Probability Theory and Related Fields.
Bonilla, Chai, and Williams. 2007. In Proceedings of the 20th International Conference on Neural Information Processing Systems. NIPS’07.
Bonilla, Krauth, and Dezfouli. 2019. Journal of Machine Learning Research.
Borovitskiy, Terenin, Mostowsky, et al. 2020. arXiv:2006.10160 [Cs, Stat].
Burt, Rasmussen, and Wilk. 2020. Journal of Machine Learning Research.
Calandra, Peters, Rasmussen, et al. 2016. In 2016 International Joint Conference on Neural Networks (IJCNN).
Cressie. 1990. Mathematical Geology.
———. 2015. Statistics for Spatial Data.
Cressie, and Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics 2.0.
Csató, and Opper. 2002. Neural Computation.
Csató, Opper, and Winther. 2001. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. NIPS’01.
Cunningham, Shenoy, and Sahani. 2008. In Proceedings of the 25th International Conference on Machine Learning. ICML ’08.
Cutajar, Bonilla, Michiardi, et al. 2017. In PMLR.
Dahl, and Bonilla. 2017. In Data Analytics for Renewable Energy Integration: Informing the Generation and Distribution of Renewable Energy. Lecture Notes in Computer Science.
Dahl, and Bonilla. 2019. arXiv:1903.03986 [Cs, Stat].
Damianou, and Lawrence. 2013. In Artificial Intelligence and Statistics.
Damianou, Titsias, and Lawrence. 2011. In Advances in Neural Information Processing Systems 24.
Dezfouli, and Bonilla. 2015. In Advances in Neural Information Processing Systems 28. NIPS’15.
Domingos. 2020. arXiv:2012.00152 [Cs, Stat].
Dubrule. 2018. In Handbook of Mathematical Geosciences: Fifty Years of IAMG.
Dunlop, Girolami, Stuart, et al. 2018. Journal of Machine Learning Research.
Dutordoir, Hensman, van der Wilk, et al. 2021. In arXiv:2105.04504 [Cs, Stat].
Dutordoir, Saul, Ghahramani, et al. 2022.
Duvenaud. 2014.
Duvenaud, Lloyd, Grosse, et al. 2013. In Proceedings of the 30th International Conference on Machine Learning (ICML-13).
Ebden. 2015. arXiv:1505.02965 [Math, Stat].
Eleftheriadis, Nicholson, Deisenroth, et al. 2017. In Advances in Neural Information Processing Systems 30.
Emery. 2007. Mathematical Geology.
Evgeniou, Micchelli, and Pontil. 2005. Journal of Machine Learning Research.
Ferguson. 1973. The Annals of Statistics.
Finzi, Bondesan, and Welling. 2020. arXiv:2010.10876 [Cs].
Föll, Haasdonk, Hanselmann, et al. 2017. arXiv:1711.00799 [Stat].
Frigola, Chen, and Rasmussen. 2014. In Advances in Neural Information Processing Systems 27.
Frigola, Lindsten, Schön, et al. 2013. In Advances in Neural Information Processing Systems 26.
Gal, and Ghahramani. 2015. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
Galliani, Dezfouli, Bonilla, et al. 2017. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics.
Gal, and van der Wilk. 2014. arXiv:1402.1412 [Stat].
Gardner, Pleiss, Bindel, et al. 2018. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
Gardner, Pleiss, Wu, et al. 2018. arXiv:1802.08903 [Cs, Stat].
Garnelo, Rosenbaum, Maddison, et al. 2018. arXiv:1807.01613 [Cs, Stat].
Garnelo, Schwarz, Rosenbaum, et al. 2018.
Ghahramani. 2013. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.
Gilboa, Saatçi, and Cunningham. 2015. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Girolami, and Rogers. 2005. In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05.
Gramacy. 2016. Journal of Statistical Software.
Gramacy, and Apley. 2015. Journal of Computational and Graphical Statistics.
Gratiet, Marelli, and Sudret. 2016. In Handbook of Uncertainty Quantification.
Grosse, Salakhutdinov, Freeman, et al. 2012. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Hartikainen, and Särkkä. 2010. In 2010 IEEE International Workshop on Machine Learning for Signal Processing.
Hensman, Fusi, and Lawrence. 2013. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence. UAI’13.
Huber. 2014. Pattern Recognition Letters.
Huggins, Campbell, Kasprzak, et al. 2018. arXiv:1806.10234 [Cs, Stat].
Jankowiak, Pleiss, and Gardner. 2020. In Conference on Uncertainty in Artificial Intelligence.
Jordan. 1999. Learning in Graphical Models.
Karvonen, and Särkkä. 2016. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
Kasim, Watson-Parris, Deaconu, et al. 2020. arXiv:2001.08055 [Physics, Stat].
Kingma, and Welling. 2014. In ICLR 2014 Conference.
Kocijan, Girard, Banko, et al. 2005. Mathematical and Computer Modelling of Dynamical Systems.
Ko, and Fox. 2009. In Autonomous Robots.
Krauth, Bonilla, Cutajar, et al. 2016. In UAI17.
Krige. 1951. Journal of the Southern African Institute of Mining and Metallurgy.
Kroese, and Botev. 2013. arXiv:1308.0399 [Stat].
Lawrence, Neil. 2005. Journal of Machine Learning Research.
Lawrence, Neil, Seeger, and Herbrich. 2003. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems.
Lawrence, Neil D., and Urtasun. 2009. In Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09.
Lázaro-Gredilla, Quiñonero-Candela, Rasmussen, et al. 2010. Journal of Machine Learning Research.
Lee, Bahri, Novak, et al. 2018. In ICLR.
Leibfried, Dutordoir, John, et al. 2022.
Lenk. 2003. Journal of Computational and Graphical Statistics.
Lindgren, Rue, and Lindström. 2011. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Liutkus, Badeau, and Richard. 2011. IEEE Transactions on Signal Processing.
Lloyd, Duvenaud, Grosse, et al. 2014. In Twenty-Eighth AAAI Conference on Artificial Intelligence.
Louizos, Shi, Schutte, et al. 2019. In Advances in Neural Information Processing Systems.
Lu. 2022.
MacKay. 1998. NATO ASI Series. Series F: Computer and System Sciences.
———. 2002. In Information Theory, Inference & Learning Algorithms.
Matheron. 1963a. Traité de Géostatistique Appliquée. 2. Le Krigeage.
———. 1963b. Economic Geology.
Matthews, van der Wilk, Nickson, et al. 2016. arXiv:1610.08733 [Stat].
Mattos, Dai, Damianou, et al. 2016. In Proceedings of ICLR.
Mattos, Dai, Damianou, et al. 2017. Journal of Process Control, DYCOPS-CAB 2016,.
Micchelli, and Pontil. 2005a. Journal of Machine Learning Research.
———. 2005b. Neural Computation.
Minh. 2022. SIAM/ASA Journal on Uncertainty Quantification.
Mohammadi, Challenor, and Goodfellow. 2021. arXiv:2104.14987 [Stat].
Moreno-Muñoz, Artés-Rodríguez, and Álvarez. 2019. arXiv:1911.00002 [Cs, Stat].
Nagarajan, Peters, and Nevat. 2018. SSRN Electronic Journal.
Nickisch, Solin, and Grigorevskiy. 2018. In International Conference on Machine Learning.
O’Hagan. 1978. Journal of the Royal Statistical Society: Series B (Methodological).
Papaspiliopoulos, Pokern, Roberts, et al. 2012. Biometrika.
Pinder, and Dodd. 2022. Journal of Open Source Software.
Pleiss, Gardner, Weinberger, et al. 2018. In.
Pleiss, Jankowiak, Eriksson, et al. 2020. Advances in Neural Information Processing Systems.
Qi, Abdel-Gawad, and Minka. 2010. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence. UAI’10.
Quiñonero-Candela, and Rasmussen. 2005. Journal of Machine Learning Research.
Raissi, and Karniadakis. 2017. arXiv:1701.02440 [Cs, Math, Stat].
Rasmussen, and Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning.
Reece, and Roberts. 2010. In 2010 13th International Conference on Information Fusion.
Ritter, Kukla, Zhang, et al. 2021. arXiv:2105.14594 [Cs, Stat].
Riutort-Mayol, Bürkner, Andersen, et al. 2020. arXiv:2004.11408 [Stat].
Rossi, Heinonen, Bonilla, et al. 2021. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics.
Saatçi. 2012.
Saatçi, Turner, and Rasmussen. 2010. In Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10.
Saemundsson, Terenin, Hofmann, et al. 2020. arXiv:1910.09349 [Cs, Stat].
Salimbeni, and Deisenroth. 2017. In Advances In Neural Information Processing Systems.
Salimbeni, Eleftheriadis, and Hensman. 2018. In International Conference on Artificial Intelligence and Statistics.
Särkkä. 2011. In Artificial Neural Networks and Machine Learning – ICANN 2011. Lecture Notes in Computer Science.
———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3.
Särkkä, and Hartikainen. 2012. In Artificial Intelligence and Statistics.
Särkkä, Solin, and Hartikainen. 2013. IEEE Signal Processing Magazine.
Schulam, and Saria. 2017. In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17.
Shah, Wilson, and Ghahramani. 2014. In Artificial Intelligence and Statistics.
Sidén. 2020. Scalable Bayesian Spatial Analysis with Gaussian Markov Random Fields. Linköping Studies in Statistics.
Smith, Alvarez, and Lawrence. 2018. arXiv:1809.02010 [Cs, Stat].
Snelson, and Ghahramani. 2005. In Advances in Neural Information Processing Systems.
Solin, and Särkkä. 2020. Statistics and Computing.
Tait, and Damoulas. 2020. arXiv:2006.15641 [Cs, Stat].
Tang, Zhang, and Banerjee. 2019. arXiv:1908.05726 [Math, Stat].
Titsias, Michalis K. 2009a. In International Conference on Artificial Intelligence and Statistics.
Titsias, Michalis, and Lawrence. 2010. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.
Tokdar. 2007. Journal of Computational and Graphical Statistics.
Turner, Ryan, Deisenroth, and Rasmussen. 2010. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.
Turner, Richard E., and Sahani. 2014. IEEE Transactions on Signal Processing.
van der Wilk, Wilson, and Rasmussen. 2014. “Variational Inference for Latent Variable Modelling of Correlation Structure.” In NIPS 2014 Workshop on Advances in Variational Inference.
Vanhatalo, Riihimäki, Hartikainen, et al. 2013. Journal of Machine Learning Research.
———, et al. 2015. arXiv:1206.5754 [Cs, Stat].
Walder, Christian, Kim, and Schölkopf. 2008. In Proceedings of the 25th International Conference on Machine Learning. ICML ’08.
Walder, C., Schölkopf, and Chapelle. 2006. Computer Graphics Forum.
Wang, Pleiss, Gardner, et al. 2019. In Advances in Neural Information Processing Systems.
Wikle, Cressie, and Zammit-Mangion. 2019. Spatio-Temporal Statistics with R.
Wilkinson, Andersen, Reiss, et al. 2019. arXiv:1901.11436 [Cs, Eess, Stat].
Wilkinson, Särkkä, and Solin. 2021.
Williams, Christopher, Klanke, Vijayakumar, et al. 2009. In Advances in Neural Information Processing Systems 21.
Williams, Christopher KI, and Seeger. 2001. In Advances in Neural Information Processing Systems.
Wilson, Andrew Gordon, and Adams. 2013. In International Conference on Machine Learning.
Wilson, James T, Borovitskiy, Terenin, et al. 2020. In Proceedings of the 37th International Conference on Machine Learning.
Wilson, James T, Borovitskiy, Terenin, et al. 2021. Journal of Machine Learning Research.
Wilson, Andrew Gordon, Dann, Lucas, et al. 2015. arXiv:1510.07389 [Cs, Stat].
Wilson, Andrew Gordon, and Ghahramani. 2011. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. UAI’11.
———. 2012. “Modelling Input Varying Correlations Between Multiple Responses.” In Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science.
Wilson, Andrew Gordon, Knowles, and Ghahramani. 2012. In Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12.
Wilson, Andrew Gordon, and Nickisch. 2015. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37. ICML’15.
Zhang, Walder, Bonilla, et al. 2020. In Proceedings of NeurIPS 2020.

## Footnotes

1. There are neat extensions to the non-Gaussian and sparse cases; that comes later.↩︎