Posterior Gaussian process samples by updating prior samples

Matheron’s other weird trick


Can we find a transformation that will turn a Gaussian process prior sample into a Gaussian process posterior sample. A special trick where we do GP regression by GP simulation.

The main tool is an old insight made useful for modern problems in J. T. Wilson et al. (2020) (brusque) and J. T. Wilson et al. (2021) (deep). Actioned in Ritter et al. (2021) to condition probabilistic neural nets somehow.

Danger: notation updates in the pipeline.

We start by examining a slightly different way of defining a Gaussian RV starting from the recipe for sampling:

A random vector $$\boldsymbol{x}=\left(x_{1}, \ldots, x_{n}\right) \in \mathbb{R}^{n}$$ is said to be Gaussian if there exists a matrix $$\mathbf{L}$$ and vector $$\boldsymbol{\mu}$$ such that $\boldsymbol{x} \stackrel{\mathrm{d}}{=} \boldsymbol{\mu}+\mathbf{L} \boldsymbol{\zeta} \quad \boldsymbol{\zeta} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ where $$\mathcal{N}(\mathbf{0}, \mathbf{I})$$ is known as the standard version of a (multivariate) normal distribution, which is defined through its density.

This is the location-scale form of a Gaussian RV, as opposed to the canonical form which we use in Gaussian Belief Propagation. In location-scale form, a non-degenerate Gaussian RV’s distribution is given (uniquely) by its mean $$\boldsymbol{\mu}=\mathbb{E}(\boldsymbol{x})$$ and its covariance $$\boldsymbol{\Sigma}=\mathbb{E}\left[(\boldsymbol{x}-\boldsymbol{\mu})(\boldsymbol{x}-\boldsymbol{\mu})^{\top}\right] .$$ In this notation the density, if defined, is $p(\boldsymbol{x})=\mathcal{N}(\boldsymbol{x} ; \boldsymbol{\mu}, \boldsymbol{\Sigma})=\frac{1}{\sqrt{|2 \pi \boldsymbol{\Sigma}|}} \exp \left(-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{\top} \boldsymbol{\Sigma}^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right).$

Since $$\zeta$$ has identity covariance, any matrix square root of $$\boldsymbol{\Sigma}$$, such as the Cholesky factor $$\mathbf{L}$$ with $$\boldsymbol{\Sigma}=\mathbf{L L}^{\top}$$, may be used to draw $$\boldsymbol{x}=\boldsymbol{\mu}+\mathbf{L} \boldsymbol{\zeta}.$$

tl;dr we can think about drawing any Gaussian RV as transforming a standard Gaussian. So much is basic entry-level stuff. What might a rule which updates a Gaussian prior into a data-conditioned posterior look like? Like this!

We define $$\cov(a,b)=\Sigma_{a,b}$$ as the covariance between two random variables :

Matheron’s Update Rule: Let $$\boldsymbol{a}$$ and $$\boldsymbol{b}$$ be jointly Gaussian, centered random variables. Then the random variable $$\boldsymbol{a}$$ conditional on $$\boldsymbol{b}=\boldsymbol{\beta}$$ may be expressed as $(\boldsymbol{a} \mid \boldsymbol{b}=\boldsymbol{\beta}) \stackrel{\mathrm{d}}{=} \boldsymbol{a}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}}{\boldsymbol{\Sigma}}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}(\boldsymbol{\beta}-\boldsymbol{b})$ Proof: Comparing the mean and covariance on both sides immediately affirms the result \begin{aligned} \mathbb{E}\left(\boldsymbol{a}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}(\boldsymbol{\beta}-\boldsymbol{b})\right) & =\boldsymbol{\mu}_{\boldsymbol{a}}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}\left(\boldsymbol{\beta}-\boldsymbol{\mu}_{\boldsymbol{b}}\right) \\ & =\mathbb{E}(\boldsymbol{a} \mid \boldsymbol{b}=\boldsymbol{\beta}) \end{aligned} \begin{aligned} \operatorname{Cov}\left(\boldsymbol{a}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}(\boldsymbol{\beta}-\boldsymbol{b})\right) &=\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{a}}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1} \operatorname{Cov}(\boldsymbol{b}) \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{a}} \\ & =\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{a}}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{a}}\\ &=\operatorname{Cov}(\boldsymbol{a} \mid \boldsymbol{b} =\boldsymbol{\beta}) \end{aligned}

Visualization of Matheron’s update rule for a bivariate normal distribution with correlation coefficient $$\rho=0.75 .$$ Left: Draws from $$p(\boldsymbol{a}, \boldsymbol{b})$$ are shown along with the marginal distributions. Right: Matheron’s update rule is used to update samples shown on the left subject to the condition $$\boldsymbol{b}=\boldsymbol{\beta}$$. This process is illustrated in full for one a particular draw .

Can we find a transformation that will turn a Gaussian process prior sample into a Gaussian process posterior sample, and thus if we can use prior samples, which are presumably pretty easy, to create posterior ones, which are often hard. If we evaluate the function at a finite number of points, then we can simply use this formula to do precisely that. Turns out, we can sometimes and sometimes it can even be useful. The resulting algorithm uses tricks from both analytic GP regression and Monte Carlo.

Exact in the sense that we do not approximate the data. These updates are not exact if our basis function representation is only an approximation to the “true” GP. There are neat extensions to the non-Gaussian and sparse cases; that comes later. For now we assume that the observation likelihood is Gaussian.

For a Gaussian process $$f \sim \mathcal{G P}(\mu, k)$$ with marginal $$\boldsymbol{f}_{m}=f(\mathbf{Z})$$, the process conditioned on $$\boldsymbol{f}_{m}=\boldsymbol{y}$$ admits, in distribution, the representation $\underbrace{(f \mid \boldsymbol{y})(\cdot)}_{\text {posterior }} \stackrel{\mathrm{d}}{=} \underbrace{f(\cdot)}_{\text {prior }}+\underbrace{k(\cdot, \mathbf{Z}) \mathbf{K}_{m, m}^{-1}\left(\boldsymbol{y}-\boldsymbol{f}_{m}\right)}_{\text {update }}.$

If our observations are contaminated by additive i.i.d Gaussian noise, $$\boldsymbol{y}=\boldsymbol{f}_{m} +\boldsymbol{\varepsilon}$$ with $$\boldsymbol{\varepsilon}\sim\mathcal{N}(\boldsymbol{0}, \sigma^2\mathbf{I}),$$ we find \begin{aligned} &\boldsymbol{f}_{*} \mid \boldsymbol{y} \stackrel{\mathrm{d}}{=} \boldsymbol{f}_{*}+\mathbf{K}_{*, n}\left(\mathbf{K}_{n, n}+\sigma^{2} \mathbf{I}\right)^{-1}(\boldsymbol{y}-\boldsymbol{f}-\boldsymbol{\varepsilon}) \end{aligned} When sampling from exact GPs we jointly draw $$\boldsymbol{f}_{*}$$ and $$\boldsymbol{f}$$ from the prior. Then, we combine $$\boldsymbol{f}$$ with noise variates $$\varepsilon \sim \mathcal{N}\left(\mathbf{0}, \sigma^{2} \mathbf{I}\right)$$ such that $$\boldsymbol{f}+\varepsilon$$ constitutes a draw from the prior distribution of $$\boldsymbol{y}$$.

Compare this to the equivalent distributional update from classical GP regression which updates the moments of a distribution, not samples from a path — the formulae are related though:

…the conditional distribution is the Gaussian $$\mathcal{N}\left(\boldsymbol{\mu}_{* \mid y}, \mathbf{K}_{*, * \mid y}\right)$$ with moments \begin{aligned} \boldsymbol{\mu}_{* \mid \boldsymbol{y}}&=\boldsymbol{\mu}_*+\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1}\left(\boldsymbol{y}-\boldsymbol{\mu}_n\right) \\ \mathbf{K}_{*, * \mid \boldsymbol{y}}&=\mathbf{K}_{*, *}-\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1} \mathbf{K}_{n, *}\end{aligned}

Visual overview for pathwise conditioning of Gaussian processes. Left: The residual $$\boldsymbol{r}=\boldsymbol{y}-\boldsymbol{f}_{n}$$ (dashed black) of a draw $$f \sim \mathcal{G} \mathcal{P}(0, k)$$, shown in orange, given observations $$\boldsymbol{y}$$ (black). Middle: A pathwise update (purple) is constructed by Matheron’s update rule. Right: Prior and update are combined to represent conditional (blue). Empirical moments (light blue) of $$10^{5}$$ conditioned paths are compared with those of the model (dashed black). The sample average, which matches the posterior mean, has been omitted for clarity.

Using basis functions

For many purposes we need a basis function representation, a.k.a. the weight-space representation. We assert the GP can be written as a random function comprising basis functions $$\boldsymbol{\phi}=\left(\phi_{1}, \ldots, \phi_{\ell}\right)$$ with the Gaussian random weight vector $$w$$ so that $f^{(w)}(\cdot)=\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot) \quad \boldsymbol{w} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Sigma}_{\boldsymbol{w}}\right).$ $$f^{(w)}$$ is a random function satisfying $$\boldsymbol{f}^{(\boldsymbol{w})} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Phi}_{n} \boldsymbol{\Sigma}_{\boldsymbol{w}} \boldsymbol{\Phi}^{\top}\right)$$, where $$\boldsymbol{\Phi}_{n}=\boldsymbol{\phi}(\mathbf{X})$$ is a $$|\mathbf{X}| \times \ell$$ matrix of features. If we are lucky, the representation might not be too bad when the basis is truncated to a small size.

The posterior weight distribution $$\boldsymbol{w} \mid \boldsymbol{y} \sim \mathcal{N}\left(\boldsymbol{\mu}_{\boldsymbol{w} \mid n}, \boldsymbol{\Sigma}_{\boldsymbol{w} \mid n}\right)$$ is Gaussian with moments \begin{aligned} \boldsymbol{\mu}_{\boldsymbol{w} \mid n} &=\left(\boldsymbol{\Phi}^{\top} \boldsymbol{\Phi}+\sigma^{2} \mathbf{I}\right)^{-1} \boldsymbol{\Phi}^{\top} \boldsymbol{y} \\ \boldsymbol{\Sigma}_{\boldsymbol{w} \mid n} &=\left(\boldsymbol{\Phi}^{\top} \boldsymbol{\Phi}+\sigma^{2} \mathbf{I}\right)^{-1} \sigma^{2} \end{aligned} where $$\boldsymbol{\Phi}=\boldsymbol{\phi}(\mathbf{X})$$ is an $$n \times \ell$$ feature matrix. We solve for the right-hand side at $$\mathcal{O}\left(\min \{\ell, n\}^{3}\right)$$ cost by applying the Woodbury identity as needed. So far there is nothing unusual here; the cool bit is realising we can represent this update as an independent operation.

GP function updates from J. T. Wilson et al. (2021).

In the weight-space setting, the pathwise update given an initial weight vector $$\boldsymbol{w} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$$ is $$\boldsymbol{w} \mid \boldsymbol{y} \stackrel{\mathrm{d}}{=} \boldsymbol{w}+\boldsymbol{\Phi}^{\top}\left(\boldsymbol{\Phi} \boldsymbol{\Phi}^{\top}+\sigma^{2} \mathbf{I}\right)^{-1}\left(\boldsymbol{y}-\boldsymbol{\Phi}^{\top} \boldsymbol{w}-\boldsymbol{\varepsilon}\right).$$

So if we had a nice weight-space representation for everything already we could go home at this point. However, for many models we are not given that; we might find natural bases for the prior and posterior are not the same (the posterior should not be stationary usually, for one thing).

The innovation in J. T. Wilson et al. (2020) is to make different choices of functional bases for prior and posterior updates. We can choose anything really, AFAICT. They suggest Fourier basis for prior and the canonical basis, i.e. the reproducing kernel basis $$k(\cdot,\vv{x})$$ for the update. Then we have $\underbrace{(f \mid \boldsymbol{y})(\cdot)}_{\text {sparse posterior }} \stackrel{\mathrm{d}}{\approx} \underbrace{\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot)}_{\text {weight-space prior}} +\underbrace{\sum_{j=1}^{m} v_{j} k\left(\cdot, \boldsymbol{x}_{j}\right)}_{\text {function-space update}} ,$ where we have defined $$\boldsymbol{v}=\left(\mathbf{K}_{n, n}+\sigma^{2} \mathbf{I}\right)^{-1}\left(\boldsymbol{y}-\boldsymbol{\Phi}^{\top} \boldsymbol{w}- \boldsymbol{\varepsilon}\right) .$$

Sparse GP setting

I.e. using inducing variables.

Given $$q(\boldsymbol{u})$$, we approximate posterior distributions as $p\left(\boldsymbol{f}_{*} \mid \boldsymbol{y}\right) \approx \int_{\mathbb{R}^{m}} p\left(\boldsymbol{f}_{*} \mid \boldsymbol{u}\right) q(\boldsymbol{u}) \mathrm{d} \boldsymbol{u} .$ If $$\boldsymbol{u} \sim \mathcal{N}\left(\boldsymbol{\mu}_{\boldsymbol{u}}, \boldsymbol{\Sigma}_{\boldsymbol{u}}\right)$$, we compute this integral analytically to obtain a Gaussian distribution with mean and covariance \begin{aligned} \boldsymbol{m}_{* \mid m} &=\mathbf{K}_{*, m} \mathbf{K}_{m, m}^{-1} \boldsymbol{\mu}_{m} \\ \mathbf{K}_{*, * \mid m} &=\mathbf{K}_{*, *}+\mathbf{K}_{*, m} \mathbf{K}_{m, m}^{-1}\left(\boldsymbol{\Sigma}_{\boldsymbol{u}}-\mathbf{K}_{m, m}\right) \mathbf{K}_{m, m}^{-1} \mathbf{K}_{m, *^{*}} \end{aligned}

\begin{aligned} &\boldsymbol{f}_{*} \mid \boldsymbol{u} \stackrel{\mathrm{d}}{=} \boldsymbol{f}_{*}+\mathbf{K}_{*, m} \mathbf{K}_{m, m}^{-1}\left(\boldsymbol{u}-\boldsymbol{f}_{m}\right) \\ \end{aligned}

When sampling from sparse GPs we draw $$\boldsymbol{f}_{*}$$ and $$\boldsymbol{f}_{m}$$ together from the prior, and independently generate target values $$\boldsymbol{u} \sim q(\boldsymbol{u}) .$$ $\underbrace{(f \mid \boldsymbol{u})(\cdot)}_{\text {sparse posterior }} \stackrel{\mathrm{d}}{\approx} \underbrace{\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot)}_{\text {weight-space prior}} +\underbrace{\sum_{j=1}^{m} v_{j} k\left(\cdot, \boldsymbol{z}_{j}\right)}_{\text {function-space update}} ,$ where we have defined $$\boldsymbol{v}=\mathbf{K}_{m, m}^{-1}\left(\boldsymbol{u}-\boldsymbol{\Phi}^{\top} \boldsymbol{w}\right) .$$

Matrix GPs

(Ritter et al. 2021 appendix D) reframes the Matheron update and generalises it to matrix-Gaussians. TBC.

The Matrix Gaussian pathwise update of Ritter et al. (2021).

Stationary moves

Thus far we have talked about moves updating a prior to a posterior; how about moves within a posterior?

We could try Langevin sampling, for example, or SG MCMC but these all seem to require inverting the covariance matrix so are not likely to be efficient in general. Can we do better?

References

Abrahamsen, Petter. 1997.
Abt, Markus, and William J. Welch. 1998. Canadian Journal of Statistics 26 (1): 127–37.
Altun, Yasemin, Alex J. Smola, and Thomas Hofmann. 2004. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2–9. UAI ’04. Arlington, Virginia, United States: AUAI Press.
Alvarado, Pablo A., and Dan Stowell. 2018. arXiv:1705.07104 [Cs, Stat], November.
Ambikasaran, Sivaram, Daniel Foreman-Mackey, Leslie Greengard, David W. Hogg, and Michael O’Neil. 2015. arXiv:1403.6015 [Astro-Ph, Stat], April.
Bachoc, F., F. Gamboa, J. Loubes, and N. Venet. 2018. IEEE Transactions on Information Theory 64 (10): 6620–37.
Bachoc, Francois, Alexandra Suvorikova, David Ginsbourger, Jean-Michel Loubes, and Vladimir Spokoiny. 2019. arXiv:1805.00753 [Stat], April.
Birgé, Lucien, and Pascal Massart. 2006. Probability Theory and Related Fields 138 (1-2): 33–73.
Bonilla, Edwin V., Kian Ming A. Chai, and Christopher K. I. Williams. 2007. In Proceedings of the 20th International Conference on Neural Information Processing Systems, 153–60. NIPS’07. USA: Curran Associates Inc.
Bonilla, Edwin V., Karl Krauth, and Amir Dezfouli. 2019. Journal of Machine Learning Research 20 (117): 1–63.
Borovitskiy, Viacheslav, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2020. arXiv:2006.10160 [Cs, Stat], June.
Burt, David R., Carl Edward Rasmussen, and Mark van der Wilk. 2020. Journal of Machine Learning Research 21 (131): 1–63.
Calandra, R., J. Peters, C. E. Rasmussen, and M. P. Deisenroth. 2016. In 2016 International Joint Conference on Neural Networks (IJCNN), 3338–45. Vancouver, BC, Canada: IEEE.
Cressie, Noel. 1990. Mathematical Geology 22 (3): 239–52.
———. 2015. Statistics for Spatial Data. John Wiley & Sons.
Cressie, Noel, and Christopher K. Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics 2.0. John Wiley and Sons.
Csató, Lehel, and Manfred Opper. 2002. Neural Computation 14 (3): 641–68.
Csató, Lehel, Manfred Opper, and Ole Winther. 2001. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, 657–63. NIPS’01. Cambridge, MA, USA: MIT Press.
Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008. In Proceedings of the 25th International Conference on Machine Learning, 192–99. ICML ’08. New York, NY, USA: ACM Press.
Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. In PMLR.
Dahl, Astrid, and Edwin Bonilla. 2017. In Data Analytics for Renewable Energy Integration: Informing the Generation and Distribution of Renewable Energy, edited by Wei Lee Woon, Zeyar Aung, Oliver Kramer, and Stuart Madnick, 94–106. Lecture Notes in Computer Science. Cham: Springer International Publishing.
Dahl, Astrid, and Edwin V. Bonilla. 2019. arXiv:1903.03986 [Cs, Stat], March.
Damianou, Andreas, and Neil Lawrence. 2013. In Artificial Intelligence and Statistics, 207–15.
Damianou, Andreas, Michalis K. Titsias, and Neil D. Lawrence. 2011. In Advances in Neural Information Processing Systems 24, edited by J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, 2510–18. Curran Associates, Inc.
Dezfouli, Amir, and Edwin V. Bonilla. 2015. In Advances in Neural Information Processing Systems 28, 1414–22. NIPS’15. Cambridge, MA, USA: MIT Press.
Domingos, Pedro. 2020. arXiv:2012.00152 [Cs, Stat], November.
Dunlop, Matthew M., Mark A. Girolami, Andrew M. Stuart, and Aretha L. Teckentrup. 2018. Journal of Machine Learning Research 19 (1): 2100–2145.
Dutordoir, Vincent, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, and Nicolas Durrande. 2021. arXiv:2105.04504 [Cs, Stat], May.
Duvenaud, David. 2014. PhD Thesis, University of Cambridge.
Duvenaud, David, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. 2013. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 1166–74.
Ebden, Mark. 2015. arXiv:1505.02965 [Math, Stat], May.
Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017. In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc.
Emery, Xavier. 2007. Mathematical Geology 39 (6): 607–23.
Evgeniou, Theodoros, Charles A. Micchelli, and Massimiliano Pontil. 2005. Journal of Machine Learning Research 6 (Apr): 615–37.
Ferguson, Thomas S. 1973. The Annals of Statistics 1 (2): 209–30.
Finzi, Marc, Roberto Bondesan, and Max Welling. 2020. arXiv:2010.10876 [Cs], October.
Föll, Roman, Bernard Haasdonk, Markus Hanselmann, and Holger Ulmer. 2017. arXiv:1711.00799 [Stat], November.
Frigola, Roger, Yutian Chen, and Carl Edward Rasmussen. 2014. In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3680–88. Curran Associates, Inc.
Frigola, Roger, Fredrik Lindsten, Thomas B Schön, and Carl Edward Rasmussen. 2013. In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 3156–64. Curran Associates, Inc.
Gal, Yarin, and Zoubin Ghahramani. 2015. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
Gal, Yarin, and Mark van der Wilk. 2014. arXiv:1402.1412 [Stat], February.
Galliani, Pietro, Amir Dezfouli, Edwin V Bonilla, and Novi Quadrianto. n.d. “Gray-Box Inference for Structured Gaussian Process Models,” 9.
Gardner, Jacob R., Geoff Pleiss, David Bindel, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 31:7587–97. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.
Gardner, Jacob R., Geoff Pleiss, Ruihan Wu, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018. arXiv:1802.08903 [Cs, Stat], February.
Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018. arXiv:1807.01613 [Cs, Stat], July, 10.
Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018. July.
Ghahramani, Zoubin. 2013. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371 (1984): 20110553.
Gilboa, E., Y. Saatçi, and J. P. Cunningham. 2015. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2): 424–36.
Girolami, Mark, and Simon Rogers. 2005. In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05, 241–48. Bonn, Germany: ACM Press.
Gramacy, Robert B. 2016. Journal of Statistical Software 72 (1).
Gramacy, Robert B., and Daniel W. Apley. 2015. Journal of Computational and Graphical Statistics 24 (2): 561–78.
Gratiet, Loïc Le, Stefano Marelli, and Bruno Sudret. 2016. In Handbook of Uncertainty Quantification, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1–37. Cham: Springer International Publishing.
Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Hartikainen, J., and S. Särkkä. 2010. In 2010 IEEE International Workshop on Machine Learning for Signal Processing, 379–84. Kittila, Finland: IEEE.
Hensman, James, Nicolò Fusi, and Neil D. Lawrence. 2013. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, 282–90. UAI’13. Arlington, Virginia, USA: AUAI Press.
Huber, Marco F. 2014. Pattern Recognition Letters 45 (August): 85–91.
Huggins, Jonathan H., Trevor Campbell, Mikołaj Kasprzak, and Tamara Broderick. 2018. arXiv:1806.10234 [Cs, Stat], June.
Jankowiak, Martin, Geoff Pleiss, and Jacob Gardner. 2020. In Conference on Uncertainty in Artificial Intelligence, 789–98. PMLR.
Jordan, Michael Irwin. 1999. Learning in Graphical Models. Cambridge, Mass.: MIT Press.
Karvonen, Toni, and Simo Särkkä. 2016. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6. Vietri sul Mare, Salerno, Italy: IEEE.
Kasim, M. F., D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, et al. 2020. arXiv:2001.08055 [Physics, Stat], January.
Kingma, Diederik P., and Max Welling. 2014. In ICLR 2014 Conference.
Ko, Jonathan, and Dieter Fox. 2009. In Autonomous Robots, 27:75–90.
Kocijan, Juš, Agathe Girard, Blaž Banko, and Roderick Murray-Smith. 2005. Mathematical and Computer Modelling of Dynamical Systems 11 (4): 411–24.
Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016. In UAI17.
Krige, D. G. 1951. Journal of the Southern African Institute of Mining and Metallurgy 52 (6): 119–39.
Kroese, Dirk P., and Zdravko I. Botev. 2013. arXiv:1308.0399 [Stat], August.
Lawrence, Neil. 2005. Journal of Machine Learning Research 6 (Nov): 1783–1816.
Lawrence, Neil D., and Raquel Urtasun. 2009. In Proceedings of the 26th Annual International Conference on Machine Learning, 601–8. ICML ’09. New York, NY, USA: ACM.
Lawrence, Neil, Matthias Seeger, and Ralf Herbrich. 2003. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems, 609–16.
Lázaro-Gredilla, Miguel, Joaquin Quiñonero-Candela, Carl Edward Rasmussen, and Aníbal R. Figueiras-Vidal. 2010. Journal of Machine Learning Research 11 (Jun): 1865–81.
Lee, Jaehoon, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018. In ICLR.
Leibfried, Felix, Vincent Dutordoir, S. T. John, and Nicolas Durrande. 2021. arXiv:2012.13962 [Cs, Stat], June.
Lenk, Peter J. 2003. Journal of Computational and Graphical Statistics 12 (3): 548–65.
Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 423–98.
Liutkus, Antoine, Roland Badeau, and Gäel Richard. 2011. IEEE Transactions on Signal Processing 59 (7): 3155–67.
Lloyd, James Robert, David Duvenaud, Roger Grosse, Joshua Tenenbaum, and Zoubin Ghahramani. 2014. In Twenty-Eighth AAAI Conference on Artificial Intelligence.
Louizos, Christos, Xiahan Shi, Klamer Schutte, and Max Welling. 2019. arXiv:1906.08324 [Cs, Stat], June.
MacKay, David J C. 1998. NATO ASI Series. Series F: Computer and System Sciences 168: 133–65.
———. 2002. In Information Theory, Inference & Learning Algorithms, Chapter 45. Cambridge University Press.
Matheron, Georges. 1963a. Traité de Géostatistique Appliquée. 2. Le Krigeage. Editions Technip.
———. 1963b. Economic Geology 58 (8): 1246–66.
Matthews, Alexander Graeme de Garis, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2016. arXiv:1610.08733 [Stat], October.
Mattos, César Lincoln C., Zhenwen Dai, Andreas Damianou, Guilherme A. Barreto, and Neil D. Lawrence. 2017. Journal of Process Control, DYCOPS-CAB 2016, 60 (December): 82–94.
Mattos, César Lincoln C., Zhenwen Dai, Andreas Damianou, Jeremy Forth, Guilherme A. Barreto, and Neil D. Lawrence. 2016. In Proceedings of ICLR.
Micchelli, Charles A., and Massimiliano Pontil. 2005a. Journal of Machine Learning Research 6 (Jul): 1099–1125.
———. 2005b. Neural Computation 17 (1): 177–204.
Minh, Hà Quang. 2022. SIAM/ASA Journal on Uncertainty Quantification, February, 96–124.
Mohammadi, Hossein, Peter Challenor, and Marc Goodfellow. 2021. arXiv:2104.14987 [Stat], April.
Moreno-Muñoz, Pablo, Antonio Artés-Rodríguez, and Mauricio A. Álvarez. 2019. arXiv:1911.00002 [Cs, Stat], October.
Nagarajan, Sai Ganesh, Gareth Peters, and Ido Nevat. 2018. SSRN Electronic Journal.
Nickisch, Hannes, Arno Solin, and Alexander Grigorevskiy. 2018. In International Conference on Machine Learning, 3789–98.
O’Hagan, A. 1978. Journal of the Royal Statistical Society: Series B (Methodological) 40 (1): 1–24.
Papaspiliopoulos, Omiros, Yvo Pokern, Gareth O. Roberts, and Andrew M. Stuart. 2012. Biometrika 99 (3): 511–31.
Pinder, Thomas, and Daniel Dodd. 2022. Journal of Open Source Software 7 (75): 4455.
Pleiss, Geoff, Jacob R. Gardner, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018. arXiv:1803.06058 [Cs, Stat], June.
Pleiss, Geoff, Martin Jankowiak, David Eriksson, Anil Damle, and Jacob Gardner. 2020. Advances in Neural Information Processing Systems 33.
Quiñonero-Candela, Joaquin, and Carl Edward Rasmussen. 2005. Journal of Machine Learning Research 6 (Dec): 1939–59.
Raissi, Maziar, and George Em Karniadakis. 2017. arXiv:1701.02440 [Cs, Math, Stat], January.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press.
Reece, S., and S. Roberts. 2010. In 2010 13th International Conference on Information Fusion, 1–9.
Ritter, Hippolyt, Martin Kukla, Cheng Zhang, and Yingzhen Li. 2021. arXiv:2105.14594 [Cs, Stat], May.
Riutort-Mayol, Gabriel, Paul-Christian Bürkner, Michael R. Andersen, Arno Solin, and Aki Vehtari. 2020. arXiv:2004.11408 [Stat], April.
Rossi, Simone, Markus Heinonen, Edwin V. Bonilla, Zheyang Shen, and Maurizio Filippone. 2020. March.
Saatçi, Yunus. 2012. Ph.D., University of Cambridge.
Saatçi, Yunus, Ryan Turner, and Carl Edward Rasmussen. 2010. In Proceedings of the 27th International Conference on International Conference on Machine Learning, 927–34. ICML’10. Madison, WI, USA: Omnipress.
Saemundsson, Steindor, Alexander Terenin, Katja Hofmann, and Marc Peter Deisenroth. 2020. arXiv:1910.09349 [Cs, Stat], March.
Salimbeni, Hugh, and Marc Deisenroth. 2017. In Advances In Neural Information Processing Systems.
Salimbeni, Hugh, Stefanos Eleftheriadis, and James Hensman. 2018. In International Conference on Artificial Intelligence and Statistics, 689–97.
Särkkä, Simo. 2011. In Artificial Neural Networks and Machine Learning – ICANN 2011, edited by Timo Honkela, Włodzisław Duch, Mark Girolami, and Samuel Kaski, 6792:151–58. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer.
———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3. Cambridge, U.K. ; New York: Cambridge University Press.
Särkkä, Simo, and Jouni Hartikainen. 2012. In Artificial Intelligence and Statistics.
Särkkä, Simo, A. Solin, and J. Hartikainen. 2013. IEEE Signal Processing Magazine 30 (4): 51–61.
Schulam, Peter, and Suchi Saria. 2017. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 1696–706. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.
Shah, Amar, Andrew Wilson, and Zoubin Ghahramani. 2014. In Artificial Intelligence and Statistics, 877–85. PMLR.
Sidén, Per. 2020. Scalable Bayesian Spatial Analysis with Gaussian Markov Random Fields. Vol. 15. Linköping Studies in Statistics. Linköping: Linköping University Electronic Press.
Smith, Michael Thomas, Mauricio A. Alvarez, and Neil D. Lawrence. 2018. arXiv:1809.02010 [Cs, Stat], September.
Snelson, Edward, and Zoubin Ghahramani. 2005. In Advances in Neural Information Processing Systems, 1257–64.
Solin, Arno, and Simo Särkkä. 2020. Statistics and Computing 30 (2): 419–46.
Tait, Daniel J., and Theodoros Damoulas. 2020. arXiv:2006.15641 [Cs, Stat], June.
Tang, Wenpin, Lu Zhang, and Sudipto Banerjee. 2019. arXiv:1908.05726 [Math, Stat], August.
Titsias, Michalis K. 2009a. In International Conference on Artificial Intelligence and Statistics, 567–74. PMLR.
———. 2009b. Technical report, School of Computer Science, University of Manchester.
Titsias, Michalis, and Neil D. Lawrence. 2010. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 844–51.
Tokdar, Surya T. 2007. Journal of Computational and Graphical Statistics 16 (3): 633–55.
Turner, Richard E., and Maneesh Sahani. 2014. IEEE Transactions on Signal Processing 62 (23): 6171–83.
Turner, Ryan, Marc Deisenroth, and Carl Rasmussen. 2010. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 868–75.
Vanhatalo, Jarno, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari. 2013. Journal of Machine Learning Research 14 (April): 1175−1179.
———. 2015. arXiv:1206.5754 [Cs, Stat], July.
Walder, Christian, Kwang In Kim, and Bernhard Schölkopf. 2008. In Proceedings of the 25th International Conference on Machine Learning, 1112–19. ICML ’08. New York, NY, USA: ACM.
Walder, C., B. Schölkopf, and O. Chapelle. 2006. Computer Graphics Forum 25 (3): 635–44.
Wang, Ke, Geoff Pleiss, Jacob Gardner, Stephen Tyree, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2019. Advances in Neural Information Processing Systems 32: 14648–59.
Wikle, Christopher K., Noel Cressie, and Andrew Zammit-Mangion. 2019. Spatio-Temporal Statistics with R.
Wilk, Mark van der, Andrew G. Wilson, and Carl E. Rasmussen. 2014. “Variational Inference for Latent Variable Modelling of Correlation Structure.” In NIPS 2014 Workshop on Advances in Variational Inference.
Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019. arXiv:1901.11436 [Cs, Eess, Stat], January.
Wilkinson, William J., Simo Särkkä, and Arno Solin. 2021. arXiv.
Williams, Christopher KI, and Matthias Seeger. 2001. In Advances in Neural Information Processing Systems, 682–88.
Williams, Christopher, Stefan Klanke, Sethu Vijayakumar, and Kian M. Chai. 2009. In Advances in Neural Information Processing Systems 21, edited by D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, 265–72. Curran Associates, Inc.
Wilson, Andrew Gordon, and Ryan Prescott Adams. 2013. In International Conference on Machine Learning.
Wilson, Andrew Gordon, Christoph Dann, Christopher G. Lucas, and Eric P. Xing. 2015. arXiv:1510.07389 [Cs, Stat], October.
Wilson, Andrew Gordon, and Zoubin Ghahramani. 2011. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 736–44. UAI’11. Arlington, Virginia, United States: AUAI Press.
———. 2012. “Modelling Input Varying Correlations Between Multiple Responses.” In Machine Learning and Knowledge Discovery in Databases, edited by Peter A. Flach, Tijl De Bie, and Nello Cristianini, 858–61. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Wilson, Andrew Gordon, David A. Knowles, and Zoubin Ghahramani. 2012. In Proceedings of the 29th International Coference on International Conference on Machine Learning, 1139–46. ICML’12. Madison, WI, USA: Omnipress.
Wilson, Andrew Gordon, and Hannes Nickisch. 2015. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, 1775–84. ICML’15. Lille, France: JMLR.org.
Wilson, James T, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisenroth. 2020. In Proceedings of the 37th International Conference on Machine Learning, 10292–302. PMLR.
Wilson, James T, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2021. Journal of Machine Learning Research 22 (105): 1–47.
Zhang, Rui, Christian Walder, Edwin V. Bonilla, Marian-Andrei Rizoiu, and Lexing Xie. 2020. In Proceedings of NeurIPS 2020.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.