# Gaussian process regression

And classification. And extensions.

December 3, 2019 — July 28, 2023

functional analysis
Gaussian
generative
Hilbert space
kernel tricks
nonparametric
regression
spatial
stochastic processes
time series

Gaussian random processes/fields are stochastic processes/fields with jointly Gaussian distributions of observations. While “Gaussian process regression” is not wrong per se, there is a common convention in stochastic process theory (and also in pedagogy) to use process to talk about some notionally time-indexed process and field to talk about ones that have a some space-like index without a special arrow of time. This leads to much confusion, because Gaussian field regression is what we usually want to talk about (although the arrow-of-time can pop up usefully). Hereafter I use “field” and “process” interchangeably, as everyone does in this corner of the discipline.

In machine learning, Gaussian fields are used often as a means of regression or classification, since it is fairly easy to conditionalize a Gaussian field on data and produce a posterior distribution over functions. Because the reuslting regression function can have some very funky and weird posterior distributions, we can think of it as a kind of nonparametric Bayesian inference, although as always with that term we probably want to be careful with it; in fact GP regression typically has parameters.

I would further add that GPs are the crystal meth of machine learning methods, in terms of the addictiveness, and of the passion of the people who use it.

The central trick is using a clever union of Hilbert space tricks and probability to give a probabilistic interpretation of functional regression as a kind of nonparametric Bayesian inference.

Useful side divergence into representer theorems and Karhunen-Loève expansions for give us a helpful interpretation. Regression using Gaussian processes is common e.g. spatial statistics where it arises as kriging. Cressie (1990) traces a history of this idea via Matheron (1963a), to works of Krige (1951).

## 1 Lavish intros

I am not the right guy to provide the canonical introduction, because it already exists. Specifically, Rasmussen and Williams (2006). Moreover, because GP regression is so popular and so elegant, there are many excellent interactive introductions online.

This lecture by the late David Mackay is probably good; the man could talk.

There is also a well-illustrated and elementary introduction by Yuge Shi. There are many, many more.

Gaussianprocess.org is a classic.

A Visual Exploration of Gaussian Processes recommends the following:

If you want more of a hands-on experience, there are also many Python notebooks available:

Already read all those? Try the brutally quick intro.

## 2 Brutally quick intro

J. T. Wilson et al. (2021) have a dense and useful perspective. If you are used to this field, they might reboot your perspective. If you are new to the GP area, see the more instructive intros.

A Gaussian process (GP) is a random function $$f: \mathcal{X} \rightarrow \mathbb{R}$$, such that, for any finite collection of points $$\mathbf{X} \subset \mathcal{X}$$, the random vector $$\boldsymbol{f}=f(\mathbf{X})$$ follows a Gaussian distribution. Such a process is uniquely identified by a mean function $$\mu: \mathcal{X} \rightarrow \mathbb{R}$$ and a positive semi-definite kernel $$k: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}$$. Hence, if $$f \sim \mathcal{G} \mathcal{P}(\mu, k)$$, then $$\boldsymbol{f} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{K})$$ is multivariate normal with mean $$\boldsymbol{\mu}=\mu(\mathbf{X})$$ and covariance $$\mathbf{K}=k(\mathbf{X}, \mathbf{X})$$.

[…] we investigate different ways of reasoning about the random variable $$\boldsymbol{f}_* \mid \boldsymbol{f}_n=\boldsymbol{y}$$ for some non-trivial partition $$\boldsymbol{f}=\boldsymbol{f}_n \oplus \boldsymbol{f}_*$$. Here, $$\boldsymbol{f}_n=f\left(\mathbf{X}_n\right)$$ are process values at a set of training locations $$\mathbf{X}_n \subset \mathbf{X}$$ where we would like to introduce a condition $$\boldsymbol{f}_n=\boldsymbol{y}$$, while $$\boldsymbol{f}_*=f\left(\mathbf{X}_*\right)$$ are process values at a set of test locations $$\mathbf{X}_* \subset \mathbf{X}$$ where we would like to obtain a random variable $$\boldsymbol{f}_* \mid \boldsymbol{f}_n=\boldsymbol{y}$$.

[…] we may obtain $$\boldsymbol{f}_* \mid \boldsymbol{y}$$ by first finding its conditional distribution. Since process values $$\left(\boldsymbol{f}_n, \boldsymbol{f}_*\right)$$ are defined as jointly Gaussian, this procedure closely resembles that of [the finite-dimensional case]: we factor out the marginal distribution of $$\boldsymbol{f}_n$$ from the joint distribution $$p\left(\boldsymbol{f}_n, \boldsymbol{f}_*\right)$$ and, upon canceling, identify the remaining distribution as $$p\left(\boldsymbol{f}_* \mid \boldsymbol{y}\right)$$. Having done so, we find that the conditional distribution is the Gaussian $$\mathcal{N}\left(\boldsymbol{\mu}_{* \mid y}, \mathbf{K}_{*, * \mid y}\right)$$ with moments \begin{aligned} \boldsymbol{\mu}_{* \mid \boldsymbol{y}}&=\boldsymbol{\mu}_*+\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1}\left(\boldsymbol{y}-\boldsymbol{\mu}_n\right) \\ \mathbf{K}_{*, * \mid \boldsymbol{y}}&=\mathbf{K}_{*, *}-\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1} \mathbf{K}_{n, *}\end{aligned}

## 3 Kernels

a.k.a. covariance models.

GP regression models are kernel machines. As such covariance kernels are the parameters. More or less. One can also parameterise with a mean function, but (see next) let us ignore that detail for now because usually we do not use them.

## 4 Prior with a mean functions

Almost immediate but not quite trivial .

TODO: discuss identifiability.

## 5 Using state filtering

When one dimension of the input vector can be interpreted as a time dimension we are Kalman filtering Gaussian Processes, which has benefits in terms of speed and hipness.

## 7 On manifolds

I would like to read Terenin on GPs on Manifolds who also makes a suggestive connection to SDEs, which is the filtering GPs trick again.

🏗

## 9 With inducing variables

“Sparse GP”. See Quiñonero-Candela and Rasmussen (2005). 🏗

## 10 By variational inference with inducing variables

See GP factoring.

## 13 Neural processes

See neural processes.

## 15 Observation likelihoods

Gaussian processes need not have a Gaussian likelihood. Classification etc. TBD

## 16 Density estimation

Can I infer a density using GPs? Yes. One popular method is apparently the logistic Gaussian process .

## 17 Approximation with dropout

Unconvincing in practice. See NN ensembles for some vague notes.

## 18 Inhomogeneous with covariates

Integrated nested Laplace approximation connects to GP-as-SDE idea, I think?

e.g. GP-LVM . 🏗

See pathwise GP.

## 22 References

Abrahamsen. 1997.
Abt, and Welch. 1998. Canadian Journal of Statistics.
Altun, Smola, and Hofmann. 2004. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. UAI ’04.
Alvarado, and Stowell. 2018. arXiv:1705.07104 [Cs, Stat].
Ambikasaran, Foreman-Mackey, Greengard, et al. 2015. arXiv:1403.6015 [Astro-Ph, Stat].
Bachoc, F., Gamboa, Loubes, et al. 2018. IEEE Transactions on Information Theory.
Bachoc, Francois, Suvorikova, Ginsbourger, et al. 2019. arXiv:1805.00753 [Stat].
Birgé, and Massart. 2006. Probability Theory and Related Fields.
Bonilla, Chai, and Williams. 2007. In Proceedings of the 20th International Conference on Neural Information Processing Systems. NIPS’07.
Bonilla, Krauth, and Dezfouli. 2019. Journal of Machine Learning Research.
Borovitskiy, Terenin, Mostowsky, et al. 2020. arXiv:2006.10160 [Cs, Stat].
Burt, Rasmussen, and Wilk. 2020. Journal of Machine Learning Research.
Calandra, Peters, Rasmussen, et al. 2016. In 2016 International Joint Conference on Neural Networks (IJCNN).
Cressie. 1990. Mathematical Geology.
———. 2015. Statistics for Spatial Data.
Cressie, and Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics 2.0.
Csató, and Opper. 2002. Neural Computation.
Csató, Opper, and Winther. 2001. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. NIPS’01.
Cunningham, Shenoy, and Sahani. 2008. In Proceedings of the 25th International Conference on Machine Learning. ICML ’08.
Cutajar, Bonilla, Michiardi, et al. 2017. In PMLR.
Dahl, and Bonilla. 2017. In Data Analytics for Renewable Energy Integration: Informing the Generation and Distribution of Renewable Energy. Lecture Notes in Computer Science.
Dahl, and Bonilla. 2019. arXiv:1903.03986 [Cs, Stat].
Damianou, and Lawrence. 2013. In Artificial Intelligence and Statistics.
Damianou, Titsias, and Lawrence. 2011. In Advances in Neural Information Processing Systems 24.
Dezfouli, and Bonilla. 2015. In Advances in Neural Information Processing Systems 28. NIPS’15.
Domingos. 2020. arXiv:2012.00152 [Cs, Stat].
Dubrule. 2018. In Handbook of Mathematical Geosciences: Fifty Years of IAMG.
Dunlop, Girolami, Stuart, et al. 2018. Journal of Machine Learning Research.
Dutordoir, Hensman, van der Wilk, et al. 2021. In arXiv:2105.04504 [Cs, Stat].
Dutordoir, Saul, Ghahramani, et al. 2022.
Duvenaud. 2014.
Duvenaud, Lloyd, Grosse, et al. 2013. In Proceedings of the 30th International Conference on Machine Learning (ICML-13).
Ebden. 2015. arXiv:1505.02965 [Math, Stat].
Eleftheriadis, Nicholson, Deisenroth, et al. 2017. In Advances in Neural Information Processing Systems 30.
Emery. 2007. Mathematical Geology.
Evgeniou, Micchelli, and Pontil. 2005. Journal of Machine Learning Research.
Ferguson. 1973. The Annals of Statistics.
Finzi, Bondesan, and Welling. 2020. arXiv:2010.10876 [Cs].
Föll, Haasdonk, Hanselmann, et al. 2017. arXiv:1711.00799 [Stat].
Frigola, Chen, and Rasmussen. 2014. In Advances in Neural Information Processing Systems 27.
Frigola, Lindsten, Schön, et al. 2013. In Advances in Neural Information Processing Systems 26.
Gal, and Ghahramani. 2015. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
Galliani, Dezfouli, Bonilla, et al. 2017. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics.
Gal, and van der Wilk. 2014. arXiv:1402.1412 [Stat].
Gardner, Pleiss, Bindel, et al. 2018. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
Gardner, Pleiss, Wu, et al. 2018. arXiv:1802.08903 [Cs, Stat].
Garnelo, Rosenbaum, Maddison, et al. 2018. arXiv:1807.01613 [Cs, Stat].
Garnelo, Schwarz, Rosenbaum, et al. 2018.
Ghahramani. 2013. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.
Gilboa, Saatçi, and Cunningham. 2015. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Girolami, and Rogers. 2005. In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05.
Gramacy. 2016. Journal of Statistical Software.
Gramacy, and Apley. 2015. Journal of Computational and Graphical Statistics.
Gratiet, Marelli, and Sudret. 2016. In Handbook of Uncertainty Quantification.
Grosse, Salakhutdinov, Freeman, et al. 2012. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Hartikainen, and Särkkä. 2010. In 2010 IEEE International Workshop on Machine Learning for Signal Processing.
Hensman, Fusi, and Lawrence. 2013. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence. UAI’13.
Huber. 2014. Pattern Recognition Letters.
Huggins, Campbell, Kasprzak, et al. 2018. arXiv:1806.10234 [Cs, Stat].
Jankowiak, Pleiss, and Gardner. 2020. In Conference on Uncertainty in Artificial Intelligence.
Jordan. 1999. Learning in Graphical Models.
Karvonen, and Särkkä. 2016. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
Kasim, Watson-Parris, Deaconu, et al. 2020. arXiv:2001.08055 [Physics, Stat].
Kingma, and Welling. 2014. In ICLR 2014 Conference.
Kocijan, Girard, Banko, et al. 2005. Mathematical and Computer Modelling of Dynamical Systems.
Ko, and Fox. 2009. In Autonomous Robots.
Krauth, Bonilla, Cutajar, et al. 2016. In UAI17.
Krige. 1951. Journal of the Southern African Institute of Mining and Metallurgy.
Kroese, and Botev. 2013. arXiv:1308.0399 [Stat].
Lawrence, Neil. 2005. Journal of Machine Learning Research.
Lawrence, Neil, Seeger, and Herbrich. 2003. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems.
Lawrence, Neil D., and Urtasun. 2009. In Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09.
Lázaro-Gredilla, Quiñonero-Candela, Rasmussen, et al. 2010. Journal of Machine Learning Research.
Lee, Bahri, Novak, et al. 2018. In ICLR.
Leibfried, Dutordoir, John, et al. 2022.
Lenk. 2003. Journal of Computational and Graphical Statistics.
Lindgren, Rue, and Lindström. 2011. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Liutkus, Badeau, and Richard. 2011. IEEE Transactions on Signal Processing.
Lloyd, Duvenaud, Grosse, et al. 2014. In Twenty-Eighth AAAI Conference on Artificial Intelligence.
Louizos, Shi, Schutte, et al. 2019. In Advances in Neural Information Processing Systems.
Lu. 2022.
MacKay. 1998. NATO ASI Series. Series F: Computer and System Sciences.
———. 2002. In Information Theory, Inference & Learning Algorithms.
Matheron. 1963a. Traité de Géostatistique Appliquée. 2. Le Krigeage.
———. 1963b. Economic Geology.
Matthews, van der Wilk, Nickson, et al. 2016. arXiv:1610.08733 [Stat].
Mattos, Dai, Damianou, et al. 2016. In Proceedings of ICLR.
Mattos, Dai, Damianou, et al. 2017. Journal of Process Control, DYCOPS-CAB 2016,.
Micchelli, and Pontil. 2005a. Journal of Machine Learning Research.
———. 2005b. Neural Computation.
Minh. 2022. SIAM/ASA Journal on Uncertainty Quantification.
Mohammadi, Challenor, and Goodfellow. 2021. arXiv:2104.14987 [Stat].
Moreno-Muñoz, Artés-Rodríguez, and Álvarez. 2019. arXiv:1911.00002 [Cs, Stat].
Nagarajan, Peters, and Nevat. 2018. SSRN Electronic Journal.
Nickisch, Solin, and Grigorevskiy. 2018. In International Conference on Machine Learning.
O’Hagan. 1978. Journal of the Royal Statistical Society: Series B (Methodological).
Papaspiliopoulos, Pokern, Roberts, et al. 2012. Biometrika.
Pinder, and Dodd. 2022. Journal of Open Source Software.
Pleiss, Gardner, Weinberger, et al. 2018. In.
Pleiss, Jankowiak, Eriksson, et al. 2020. Advances in Neural Information Processing Systems.
Qi, Abdel-Gawad, and Minka. 2010. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence. UAI’10.
Quiñonero-Candela, and Rasmussen. 2005. Journal of Machine Learning Research.
Raissi, and Karniadakis. 2017. arXiv:1701.02440 [Cs, Math, Stat].
Rasmussen, and Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning.
Reece, and Roberts. 2010. In 2010 13th International Conference on Information Fusion.
Ritter, Kukla, Zhang, et al. 2021. arXiv:2105.14594 [Cs, Stat].
Riutort-Mayol, Bürkner, Andersen, et al. 2020. arXiv:2004.11408 [Stat].
Rossi, Heinonen, Bonilla, et al. 2021. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics.
Saatçi. 2012.
Saatçi, Turner, and Rasmussen. 2010. In Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10.
Saemundsson, Terenin, Hofmann, et al. 2020. arXiv:1910.09349 [Cs, Stat].
Salimbeni, and Deisenroth. 2017. In Advances In Neural Information Processing Systems.
Salimbeni, Eleftheriadis, and Hensman. 2018. In International Conference on Artificial Intelligence and Statistics.
Särkkä. 2011. In Artificial Neural Networks and Machine Learning – ICANN 2011. Lecture Notes in Computer Science.
———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3.
Särkkä, and Hartikainen. 2012. In Artificial Intelligence and Statistics.
Särkkä, Solin, and Hartikainen. 2013. IEEE Signal Processing Magazine.
Schulam, and Saria. 2017. In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17.
Shah, Wilson, and Ghahramani. 2014. In Artificial Intelligence and Statistics.
Sidén. 2020. Scalable Bayesian Spatial Analysis with Gaussian Markov Random Fields. Linköping Studies in Statistics.
Smith, Alvarez, and Lawrence. 2018. arXiv:1809.02010 [Cs, Stat].
Snelson, and Ghahramani. 2005. In Advances in Neural Information Processing Systems.
Solin, and Särkkä. 2020. Statistics and Computing.
Tait, and Damoulas. 2020. arXiv:2006.15641 [Cs, Stat].
Tang, Zhang, and Banerjee. 2019. arXiv:1908.05726 [Math, Stat].
Titsias, Michalis K. 2009a. In International Conference on Artificial Intelligence and Statistics.
Titsias, Michalis, and Lawrence. 2010. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.
Tokdar. 2007. Journal of Computational and Graphical Statistics.
Turner, Ryan, Deisenroth, and Rasmussen. 2010. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.
Turner, Richard E., and Sahani. 2014. IEEE Transactions on Signal Processing.
van der Wilk, Wilson, and Rasmussen. 2014. “Variational Inference for Latent Variable Modelling of Correlation Structure.” In NIPS 2014 Workshop on Advances in Variational Inference.
Vanhatalo, Riihimäki, Hartikainen, et al. 2013. Journal of Machine Learning Research.
———, et al. 2015. arXiv:1206.5754 [Cs, Stat].
Walder, Christian, Kim, and Schölkopf. 2008. In Proceedings of the 25th International Conference on Machine Learning. ICML ’08.
Walder, C., Schölkopf, and Chapelle. 2006. Computer Graphics Forum.
Wang, Pleiss, Gardner, et al. 2019. In Advances in Neural Information Processing Systems.
Wikle, Cressie, and Zammit-Mangion. 2019. Spatio-Temporal Statistics with R.
Wilkinson, Andersen, Reiss, et al. 2019. arXiv:1901.11436 [Cs, Eess, Stat].
Wilkinson, Särkkä, and Solin. 2021.
Williams, Christopher, Klanke, Vijayakumar, et al. 2009. In Advances in Neural Information Processing Systems 21.
Williams, Christopher KI, and Seeger. 2001. In Advances in Neural Information Processing Systems.
Wilson, Andrew Gordon, and Adams. 2013. In International Conference on Machine Learning.
Wilson, James T, Borovitskiy, Terenin, et al. 2020. In Proceedings of the 37th International Conference on Machine Learning.
Wilson, James T, Borovitskiy, Terenin, et al. 2021. Journal of Machine Learning Research.
Wilson, Andrew Gordon, Dann, Lucas, et al. 2015. arXiv:1510.07389 [Cs, Stat].
Wilson, Andrew Gordon, and Ghahramani. 2011. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. UAI’11.
———. 2012. “Modelling Input Varying Correlations Between Multiple Responses.” In Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science.
Wilson, Andrew Gordon, Knowles, and Ghahramani. 2012. In Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12.
Wilson, Andrew Gordon, and Nickisch. 2015. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37. ICML’15.
Zhang, Walder, Bonilla, et al. 2020. In Proceedings of NeurIPS 2020.