# Gaussian process regression

## And classification. And extensions.

Gaussian random processes/fields are stochastic processes/fields with jointly Gaussian distributions of observations. While “Gaussian process regression” is not wrong per se, there is a common convention in stochastic process theory (and also in pedagogy) to use process to talk about some notionally time-indexed process and field to talk about ones that have a some space-like index without a presumption of an arrow of time. This leads to much confusion, because Gaussian field regression is what we usually want to talk about. What we want to use the arrow of time for is a whole other story. Regardless, hereafter I’ll use “field” and “process” interchangeably.

In machine learning, Gaussian fields are used often as a means of regression or classification, since it is fairly easy to conditionalize a Gaussian field on data and produce a posterior distribution over functions. They provide nonparametric method of inferring regression functions, with a conveniently Bayesian interpretation and reasonably elegant learning and inference steps. I would further add that this is the crystal meth of machine learning methods, in terms of the addictiveness, and of the passion of the people who use it.

The central trick is using a clever union of Hilbert spaces and probability to give a probabilistic interpretation of functional regression as a kind of nonparametric Bayesian posterior inference. Useful side divergence into representer theorems and Karhunen-Loève expansions for thinking about this. Regression using Gaussian processes is common e.g. spatial statistics where it arises as kriging. Cressie (1990) traces a history of this idea via Matheron (1963a), to works of Krige (1951).

This web site aims to provide an overview of resources concerned with probabilistic modeling, inference and learning based on Gaussian processes. Although Gaussian processes have a long history in the field of statistics, they seem to have been employed extensively only in niche areas. With the advent of kernel machines in the machine learning community, models based on Gaussian processes have become commonplace for problems of regression (kriging) and classification as well as a host of more specialized applications.

I’ve not been enthusiastic about these in the past. It’s nice to have a principled nonparametric Bayesian formalism, but it has always seemed pointless having a formalism that is so computationally demanding that people don’t try to use more than a thousand data points, or spend most of a paper working out how to approximate this simple elegant model with a complex messy model. However, that previous sentence describes most of my career now, so I guess I must have come around.

Perhaps I should be persuaded by tricks such as AutoGP which breaks some computational deadlocks by clever use of inducing variables and variational approximation to produce a compressed representation of the data with tractable inference and model selection, including kernel selection, and doing the whole thing in many dimensions simultaneously. There are other clever tricks like this one, e.g shows how to use a lattice structure for observations to make computation cheap.

## Quick intro

I am not the right guy to provide the canonical introduction, because it already exists. Specifically, the canonical introduction is Rasmussen and Williams (2006). But here is a quick simple special case sufficient to start from.

We work with a centred (i.e. mean-zero) process, in which case for every finite set $$\mathbf{f}:=\{f(t_k);k=1,\dots,K\}$$ of realisations of that process, the joint distribution is centred Gaussian, \begin{aligned} \mathbf{f}(t) &\sim \operatorname{GP}\left(0, \kappa(t, t';\mathbf{\theta})\right) \\ p(\mathbf{f}) &=(2\pi )^{-{\frac {K}{2}}}\det({\boldsymbol {\mathrm{K} }})^{-{\frac {1}{2}}}\,e^{-{\frac{1}{2}}\mathbf {f}^{\!{\mathsf {T}}}{\boldsymbol {\mathrm{K} }}^{-1}\mathbf {f}}\\ &=\mathcal{N}(\mathbf{f};\mathbf{0},\textrm{K}). \end{aligned} where $$\mathrm{K}$$ is the sample covariance matrix defined such that its entries are given by $$\mathrm{K}_{jk}=\kappa(t_j,t_k).$$ In this case, we are specifying only the second moments and this is giving us all the remaining properties of the process. That is, the unobserved, continuous random function $$f$$ generates realisations $$\mathbf{f}\in\mathbb{R}^T$$ at a discrete times $$\mathbf{t}=t_1,t_2,\dots,t_T.$$ The properties of this chap are explored under Gaussian processes. Now, \begin{aligned} f(t) &\sim \operatorname{GP}\left(0, \kappa(t, t';\mathbf{\theta})\right) & \text{Prior} \\ \mathbf{y}|\mathbf{f} &\sim \prod_{k=1}^{\top} p\left(y_{k} | f\left(t_{k}\right)\right) & \text{Likelihood} \end{aligned}

To begin with these will form a lattice $$\mathbf{t}=1,2,\dots,T.$$

We allow that the observations may be distinct from the realisations in that the realisations may be observed with some noise. The observation noise will be Gaussian also, in the sense that

$y=f(\mathbf{x})+\epsilon,$ where $\epsilon \sim \mathcal{N}\left(0, \sigma_{y}^{2}\right)$

We refer to the set of observations as $$\mathbf{y}\in\mathbb{R}^T$$. The data includes observations and coordinates, and is written $$\mathcal{D}:=\{(t_k, y_k)\}_{k=1,2,\dots,T}$$.

The main insight is that the Gaussian prior is conjugate to the Gaussian likelihood, which means that the posterior distributions are also Gaussian. (Although it will no longer be centred.)

We can find a likelihood for the latent functions given the observations by considering the joint distribution

\begin{aligned} \left(\begin{array}{c}{\mathbf{y}} \\ {\mathbf{f}}\end{array}\right) \sim \mathcal{N}\left(\mathbf{0},\left(\begin{array}{cc}{\mathbf{K}_{y}} & {\mathbf{K}} \\ {\mathbf{K}^{\top}} & {\mathbf{K}_{\mathbf{f}}}\end{array}\right)\right) \end{aligned}

## Incorporating a mean function

Almost immediate but not quite trivial .

## Density estimation

Can I infer a density using these? Yes. One popular method is apparently the logistic Gaussian process.

## Kernels

a.k.a. covariance models.

GP regression models are kernel machines. As such covariance kernels are the parameters. More or less. One can also parameterise with a mean function, but let us ignore that detail for now because usually we do not use them.

## Using state filtering

When one dimension of the input vector can be interpreted as a time dimension we are Kalman filtering Gaussian Processes, which has benefits in terms of speed.

## On manifolds

I would like to read Terenin on GPs on Manifolds who also makes a suggestive connection to SDEs, which is the filtering GPs trick again.

🏗

## With inducing variables

“Sparse GP”. See Quiñonero-Candela and Rasmussen (2005). 🏗

## By variational inference with inducing variables

See GP factoring.

## Approximation with dropout

See NN ensembles.

## For dimension reduction

e.g. GP-LVM . 🏗

See pathwise GP.

This lecture by the late David Mackay is probably good; the man could talk.

There is also a well-illustrated and elementary introduction by Yuge Shi.

## Implementations

I think that GPy is a common default choice in python and GPFlow, for example, has attempted to follow its API. For another value of default, scikit-learn has a GP implementation. Moreover, many generic bayesian inference toolkits support GP models generically.

All things being equal I want better-than-generic support for GP models. Making them go fast can be a subtle business, and there are all kind of fancy bells and whistles I would generally like to support, such as inducing point methods and sparse variational inference.

## Geostat Framework

This Framework was created within the PhD project of Sebastian Müller at the Computational Hydrosystems Department at the UFZ Leipzig.

### GPy

GPy originates in GP-urdaddy Neil Lawrence’s lab. It is well-tested and featureful. However it is also crufty and confusing and slow. It predates some modern autodiff technology and rolls its own, in opaque ways. It is best regarded as a kind of reference implementation but maybe not used in practice.

### Stheno

Stheno seems to be popular for Julia and also comes in an alternative flavour, python stheno. AugmentedGaussianProcess.jl by Théo Galy-Fajou looks nice and has sparse approximation plus some nice variational approx tricks.

It seems to be sponsored by Invenia as a by-product of their main business. They write excellent GP tutorials, e.g. Scaling multi-output Gaussian process models with exact inference.

### GPyTorch

have you used gpytorch? Its the bomb it is truly amazing. This tutorial is well worth the 15 minutes it takes. Crank up the training and testing set datasize to 10000 and it is a breeze I have tried GPy, GPflow and tensorflow in the past and they would struggle

Bonus feature: integration with pyro, for more elaborate likelihood models.

## Plain pyro

There are native GP models in the pytorch-based pyro, which you can use without gPyTorch.

### George

George is a python library implementing Ambikasaran et al. (2015)’s fast method for time-series GPs, as explained in Scaling Gaussian Processes to big datasets.

### GPFlow

GPflow (using tensorflow). The GPflow docs includes the following clarification of its genealogy.

GPflow has origins in GPy…, and much of the interface is intentionally similar for continuity (though some parts of the interface may diverge in future). GPflow has a rather different remit from GPy though:

• GPflow leverages TensorFlow for faster/bigger computation
• GPflow has much less code than GPy, mostly because all gradient computation is handled by TensorFlow.
• GPflow focusses on variational inference and MCMC — there is no expectation propagation or Laplace approximation.
• GPflow does not have any plotting functionality.

It practice it is faster than GPy but does seem any easier or less confusing to implement tricky stuff in.

### Misc python

PyMC3. ladax is a jax-based one.

### Stan

Bayes workhorse Stan can do Gaussian Process regression just like almost everything else; see Michael Betancourt’s blog posts, 1. 2. 3.

### AutoGP

autogp (also using tensorflow) incorporates much fancy GP technology at once. I believe it is no longer actively maintained, which is a pity as it is the only one I have ontributed substantially to.

### scikit-learn

The current scikit-learn has basic Gaussian processes. The introduction disposes me against using this implementation:

Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems.

The advantages of Gaussian processes are:

• The prediction interpolates the observations (at least for regular kernels).
• The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest.
• Versatile: different kernels can be specified. Common kernels are provided, but it is also possible to specify custom kernels.

The disadvantages of Gaussian processes include:

• They are not sparse, i.e., they use the whole samples/features information to perform the prediction.
• They lose efficiency in high dimensional spaces — namely when the number of features exceeds a few dozens.

This list of disadvantage has, at best, imprecision and, at worst, mistakes. Let us suppose that this description is supposed to draw a distinction between GP regression and some other regression model (note that Bayes linear regression and GP regression are intimately linked).

So what is this telling us about the unusual qualities of GP regression?

The first point is strictly correct, but not useful, in that sparse approximate GPs is a whole industry, and this is a statement that this implementation missing a fairly standard approximate inference method rather than a problem with the family of methods per se. (“The problem with cars is that they all run on diesel”). The second point might be correct under a maximally generous interpretation. What do they mean by efficiency here?

In the least generous interpretation it is just plain wrong; I use GPs with dozens of inputs often; it works fine. In particular if they mean ‘efficiency’ in the sense that computational cost grows with input dimension, this is suspect. Naïve inference is $$\mathcal{O}(DN^3)$$ for $$N$$ observations and $$D$$ features. That is, dimensionality cost is no worse than linear regression for prediction and superior for training, although other models that have a linear complexity in sample dimension escape without such a warning in the scikit-learn docs.

Perhaps they mean statistical efficiency, i.e. in terms of variance of the estimate with respect to a fixed number of data points? That is a subtler question. Yes, for a fixed isotropic kernel we can get bad behaviour in high dimensional spaces because, informally, low dimensional projections are all kinda similar (TODO: make that precise). OTOH, if we use a good non-isotropic kernel or have adaptive kernel selection, which is kinda standard, then we can have a kernel that behaves well in many dimensions by learning to adapt to the important ones. This is pretty standard. As the number of input dimensions grows larger still, the hyperparameter selection problem grows less-well-determined. However, that is also true of linear regression in general. If they wish to make an assertion that this is problem in GP regression is even worse than linear regression then they could make that case I suppose, but something better than this one sentence aside would be needed.

### Misc julia

Théo Galy-Fajou compares some options. There are many. It’s easy enough to be bikeshedded is the message I’m getting. There is an attempt at unifying the julia ecosystem in this area via AbstractGPs and other tools in JuliaGaussianProcesses organisation, e.g. KernelFunctions.jl.

### MATLAB

Should mention the various matlab/scilab options.

GPStuff is one for MATLAB/Octave that I have seen around the place.

## References

Abrahamsen, Petter. 1997.
Abt, Markus, and William J. Welch. 1998. Canadian Journal of Statistics 26 (1): 127–37.
Altun, Yasemin, Alex J. Smola, and Thomas Hofmann. 2004. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2–9. UAI ’04. Arlington, Virginia, United States: AUAI Press.
Alvarado, Pablo A., and Dan Stowell. 2018. arXiv:1705.07104 [Cs, Stat], November.
Ambikasaran, Sivaram, Daniel Foreman-Mackey, Leslie Greengard, David W. Hogg, and Michael O’Neil. 2015. arXiv:1403.6015 [Astro-Ph, Stat], April.
Bachoc, F., F. Gamboa, J. Loubes, and N. Venet. 2018. IEEE Transactions on Information Theory 64 (10): 6620–37.
Bachoc, Francois, Alexandra Suvorikova, David Ginsbourger, Jean-Michel Loubes, and Vladimir Spokoiny. 2019. arXiv:1805.00753 [Stat], April.
Birgé, Lucien, and Pascal Massart. 2006. Probability Theory and Related Fields 138 (1-2): 33–73.
Bonilla, Edwin V., Kian Ming A. Chai, and Christopher K. I. Williams. 2007. In Proceedings of the 20th International Conference on Neural Information Processing Systems, 153–60. NIPS’07. USA: Curran Associates Inc.
Bonilla, Edwin V., Karl Krauth, and Amir Dezfouli. 2019. Journal of Machine Learning Research 20 (117): 1–63.
Borovitskiy, Viacheslav, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2020. arXiv:2006.10160 [Cs, Stat], June.
Burt, David R., Carl Edward Rasmussen, and Mark van der Wilk. 2020. Journal of Machine Learning Research 21 (131): 1–63.
Calandra, R., J. Peters, C. E. Rasmussen, and M. P. Deisenroth. 2016. In 2016 International Joint Conference on Neural Networks (IJCNN), 3338–45. Vancouver, BC, Canada: IEEE.
Cressie, Noel. 1990. Mathematical Geology 22 (3): 239–52.
———. 2015. Statistics for Spatial Data. John Wiley & Sons.
Cressie, Noel, and Christopher K. Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics 2.0. John Wiley and Sons.
Csató, Lehel, and Manfred Opper. 2002. Neural Computation 14 (3): 641–68.
Csató, Lehel, Manfred Opper, and Ole Winther. 2001. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, 657–63. NIPS’01. Cambridge, MA, USA: MIT Press.
Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008. In Proceedings of the 25th International Conference on Machine Learning, 192–99. ICML ’08. New York, NY, USA: ACM Press.
Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. In PMLR.
Dahl, Astrid, and Edwin Bonilla. 2017. In Data Analytics for Renewable Energy Integration: Informing the Generation and Distribution of Renewable Energy, edited by Wei Lee Woon, Zeyar Aung, Oliver Kramer, and Stuart Madnick, 94–106. Lecture Notes in Computer Science. Cham: Springer International Publishing.
Dahl, Astrid, and Edwin V. Bonilla. 2019. arXiv:1903.03986 [Cs, Stat], March.
Damianou, Andreas, and Neil Lawrence. 2013. In Artificial Intelligence and Statistics, 207–15.
Damianou, Andreas, Michalis K. Titsias, and Neil D. Lawrence. 2011. In Advances in Neural Information Processing Systems 24, edited by J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, 2510–18. Curran Associates, Inc.
Dezfouli, Amir, and Edwin V. Bonilla. 2015. In Advances in Neural Information Processing Systems 28, 1414–22. NIPS’15. Cambridge, MA, USA: MIT Press.
Domingos, Pedro. 2020. arXiv:2012.00152 [Cs, Stat], November.
Dunlop, Matthew M., Mark A. Girolami, Andrew M. Stuart, and Aretha L. Teckentrup. 2018. Journal of Machine Learning Research 19 (1): 2100–2145.
Dutordoir, Vincent, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, and Nicolas Durrande. 2021. arXiv:2105.04504 [Cs, Stat], May.
Duvenaud, David. 2014. PhD Thesis, University of Cambridge.
Duvenaud, David, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. 2013. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 1166–74.
Ebden, Mark. 2015. arXiv:1505.02965 [Math, Stat], May.
Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017. In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc.
Emery, Xavier. 2007. Mathematical Geology 39 (6): 607–23.
Evgeniou, Theodoros, Charles A. Micchelli, and Massimiliano Pontil. 2005. Journal of Machine Learning Research 6 (Apr): 615–37.
Ferguson, Thomas S. 1973. The Annals of Statistics 1 (2): 209–30.
Finzi, Marc, Roberto Bondesan, and Max Welling. 2020. arXiv:2010.10876 [Cs], October.
Föll, Roman, Bernard Haasdonk, Markus Hanselmann, and Holger Ulmer. 2017. arXiv:1711.00799 [Stat], November.
Frigola, Roger, Yutian Chen, and Carl Edward Rasmussen. 2014. In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3680–88. Curran Associates, Inc.
Frigola, Roger, Fredrik Lindsten, Thomas B Schön, and Carl Edward Rasmussen. 2013. In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 3156–64. Curran Associates, Inc.
Gal, Yarin, and Zoubin Ghahramani. 2015. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
Gal, Yarin, and Mark van der Wilk. 2014. arXiv:1402.1412 [Stat], February.
Galliani, Pietro, Amir Dezfouli, Edwin V Bonilla, and Novi Quadrianto. n.d. “Gray-Box Inference for Structured Gaussian Process Models,” 9.
Gardner, Jacob R., Geoff Pleiss, Ruihan Wu, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018. arXiv:1802.08903 [Cs, Stat], February.
Gardner, Jacob, Geoff Pleiss, Kilian Q. Weinberger, David Bindel, and Andrew G. Wilson. 2018. Advances in Neural Information Processing Systems 31: 7576–86.
Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018. arXiv:1807.01613 [Cs, Stat], July, 10.
Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018. July.
Ghahramani, Zoubin. 2013. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371 (1984): 20110553.
Gilboa, E., Y. Saatçi, and J. P. Cunningham. 2015. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2): 424–36.
Girolami, Mark, and Simon Rogers. 2005. In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05, 241–48. Bonn, Germany: ACM Press.
Gramacy, Robert B. 2016. Journal of Statistical Software 72 (1).
Gramacy, Robert B., and Daniel W. Apley. 2015. Journal of Computational and Graphical Statistics 24 (2): 561–78.
Gratiet, Loïc Le, Stefano Marelli, and Bruno Sudret. 2016. In Handbook of Uncertainty Quantification, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1–37. Cham: Springer International Publishing.
Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Hartikainen, J., and S. Särkkä. 2010. In 2010 IEEE International Workshop on Machine Learning for Signal Processing, 379–84. Kittila, Finland: IEEE.
Hensman, James, Nicolo Fusi, and Neil D. Lawrence. 2013. “Gaussian Processes for Big Data.” In Uncertainty in Artificial Intelligence, 282. Citeseer.
Huber, Marco F. 2014. Pattern Recognition Letters 45 (August): 85–91.
Huggins, Jonathan H., Trevor Campbell, Mikołaj Kasprzak, and Tamara Broderick. 2018. arXiv:1806.10234 [Cs, Stat], June.
Jankowiak, Martin, Geoff Pleiss, and Jacob Gardner. 2020. In Conference on Uncertainty in Artificial Intelligence, 789–98. PMLR.
Jordan, Michael Irwin. 1999. Learning in Graphical Models. Cambridge, Mass.: MIT Press.
Karvonen, Toni, and Simo Särkkä. 2016. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6. Vietri sul Mare, Salerno, Italy: IEEE.
Kasim, M. F., D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, et al. 2020. arXiv:2001.08055 [Physics, Stat], January.
Kingma, Diederik P., and Max Welling. 2014. In ICLR 2014 Conference.
Ko, Jonathan, and Dieter Fox. 2009. In Autonomous Robots, 27:75–90.
Kocijan, Juš, Agathe Girard, Blaž Banko, and Roderick Murray-Smith. 2005. Mathematical and Computer Modelling of Dynamical Systems 11 (4): 411–24.
Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016. In Uai17.
Krige, D. G. 1951. Journal of the Southern African Institute of Mining and Metallurgy 52 (6): 119–39.
Kroese, Dirk P., and Zdravko I. Botev. 2013. arXiv:1308.0399 [Stat], August.
Lawrence, Neil. 2005. Journal of Machine Learning Research 6 (Nov): 1783–1816.
Lawrence, Neil D., and Raquel Urtasun. 2009. In Proceedings of the 26th Annual International Conference on Machine Learning, 601–8. ICML ’09. New York, NY, USA: ACM.
Lawrence, Neil, Matthias Seeger, and Ralf Herbrich. 2003. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems, 609–16.
Lázaro-Gredilla, Miguel, Joaquin Quiñonero-Candela, Carl Edward Rasmussen, and Aníbal R. Figueiras-Vidal. 2010. Journal of Machine Learning Research 11 (Jun): 1865–81.
Lee, Jaehoon, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018. In ICLR.
Leibfried, Felix, Vincent Dutordoir, S. T. John, and Nicolas Durrande. 2021. arXiv:2012.13962 [Cs, Stat], June.
Lenk, Peter J. 2003. Journal of Computational and Graphical Statistics 12 (3): 548–65.
Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 423–98.
Liutkus, Antoine, Roland Badeau, and Gäel Richard. 2011. IEEE Transactions on Signal Processing 59 (7): 3155–67.
Lloyd, James Robert, David Duvenaud, Roger Grosse, Joshua Tenenbaum, and Zoubin Ghahramani. 2014. In Twenty-Eighth AAAI Conference on Artificial Intelligence.
Louizos, Christos, Xiahan Shi, Klamer Schutte, and Max Welling. 2019. arXiv:1906.08324 [Cs, Stat], June.
MacKay, David J C. 1998. NATO ASI Series. Series F: Computer and System Sciences 168: 133–65.
———. 2002. In Information Theory, Inference & Learning Algorithms, Chapter 45. Cambridge University Press.
Matheron, Georges. 1963a. Traité de Géostatistique Appliquée. 2. Le Krigeage. Editions Technip.
———. 1963b. Economic Geology 58 (8): 1246–66.
Matthews, Alexander Graeme de Garis, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2016. arXiv:1610.08733 [Stat], October.
Mattos, César Lincoln C., Zhenwen Dai, Andreas Damianou, Guilherme A. Barreto, and Neil D. Lawrence. 2017. Journal of Process Control, DYCOPS-CAB 2016, 60 (December): 82–94.
Mattos, César Lincoln C., Zhenwen Dai, Andreas Damianou, Jeremy Forth, Guilherme A. Barreto, and Neil D. Lawrence. 2016. In Proceedings of ICLR.
Micchelli, Charles A., and Massimiliano Pontil. 2005a. Journal of Machine Learning Research 6 (Jul): 1099–1125.
———. 2005b. Neural Computation 17 (1): 177–204.
Minh, Hà Quang. 2022. SIAM/ASA Journal on Uncertainty Quantification, February, 96–124.
Mohammadi, Hossein, Peter Challenor, and Marc Goodfellow. 2021. arXiv:2104.14987 [Stat], April.
Moreno-Muñoz, Pablo, Antonio Artés-Rodríguez, and Mauricio A. Álvarez. 2019. arXiv:1911.00002 [Cs, Stat], October.
Nagarajan, Sai Ganesh, Gareth Peters, and Ido Nevat. 2018. SSRN Electronic Journal.
Nickisch, Hannes, Arno Solin, and Alexander Grigorevskiy. 2018. In International Conference on Machine Learning, 3789–98.
O’Hagan, A. 1978. Journal of the Royal Statistical Society: Series B (Methodological) 40 (1): 1–24.
Papaspiliopoulos, Omiros, Yvo Pokern, Gareth O. Roberts, and Andrew M. Stuart. 2012. Biometrika 99 (3): 511–31.
Pleiss, Geoff, Jacob R. Gardner, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018. arXiv:1803.06058 [Cs, Stat], June.
Pleiss, Geoff, Martin Jankowiak, David Eriksson, Anil Damle, and Jacob Gardner. 2020. Advances in Neural Information Processing Systems 33.
Quiñonero-Candela, Joaquin, and Carl Edward Rasmussen. 2005. Journal of Machine Learning Research 6 (Dec): 1939–59.
Raissi, Maziar, and George Em Karniadakis. 2017. arXiv:1701.02440 [Cs, Math, Stat], January.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press.
Reece, S., and S. Roberts. 2010. In 2010 13th International Conference on Information Fusion, 1–9.
Ritter, Hippolyt, Martin Kukla, Cheng Zhang, and Yingzhen Li. 2021. arXiv:2105.14594 [Cs, Stat], May.
Riutort-Mayol, Gabriel, Paul-Christian Bürkner, Michael R. Andersen, Arno Solin, and Aki Vehtari. 2020. arXiv:2004.11408 [Stat], April.
Rossi, Simone, Markus Heinonen, Edwin V. Bonilla, Zheyang Shen, and Maurizio Filippone. 2020. March.
Saatçi, Yunus. 2012. Ph.D., University of Cambridge.
Saatçi, Yunus, Ryan Turner, and Carl Edward Rasmussen. 2010. In Proceedings of the 27th International Conference on International Conference on Machine Learning, 927–34. ICML’10. Madison, WI, USA: Omnipress.
Saemundsson, Steindor, Alexander Terenin, Katja Hofmann, and Marc Peter Deisenroth. 2020. arXiv:1910.09349 [Cs, Stat], March.
Salimbeni, Hugh, and Marc Deisenroth. 2017. In Advances In Neural Information Processing Systems.
Salimbeni, Hugh, Stefanos Eleftheriadis, and James Hensman. 2018. In International Conference on Artificial Intelligence and Statistics, 689–97.
Särkkä, Simo. 2011. In Artificial Neural Networks and Machine Learning – ICANN 2011, edited by Timo Honkela, Włodzisław Duch, Mark Girolami, and Samuel Kaski, 6792:151–58. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer.
———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3. Cambridge, U.K. ; New York: Cambridge University Press.
Särkkä, Simo, and Jouni Hartikainen. 2012. In Artificial Intelligence and Statistics.
Särkkä, Simo, A. Solin, and J. Hartikainen. 2013. IEEE Signal Processing Magazine 30 (4): 51–61.
Schulam, Peter, and Suchi Saria. 2017. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 1696–706. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.
Shah, Amar, Andrew Wilson, and Zoubin Ghahramani. 2014. In Artificial Intelligence and Statistics, 877–85. PMLR.
Smith, Michael Thomas, Mauricio A. Alvarez, and Neil D. Lawrence. 2018. arXiv:1809.02010 [Cs, Stat], September.
Snelson, Edward, and Zoubin Ghahramani. 2005. In Advances in Neural Information Processing Systems, 1257–64.
Solin, Arno, and Simo Särkkä. 2020. Statistics and Computing 30 (2): 419–46.
Tait, Daniel J., and Theodoros Damoulas. 2020. arXiv:2006.15641 [Cs, Stat], June.
Tang, Wenpin, Lu Zhang, and Sudipto Banerjee. 2019. arXiv:1908.05726 [Math, Stat], August.
Titsias, Michalis K. 2009a. In International Conference on Artificial Intelligence and Statistics, 567–74. PMLR.
———. 2009b. Technical report, School of Computer Science, University of Manchester.
Titsias, Michalis, and Neil D. Lawrence. 2010. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 844–51.
Tokdar, Surya T. 2007. Journal of Computational and Graphical Statistics 16 (3): 633–55.
Turner, Richard E., and Maneesh Sahani. 2014. IEEE Transactions on Signal Processing 62 (23): 6171–83.
Turner, Ryan, Marc Deisenroth, and Carl Rasmussen. 2010. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 868–75.
Vanhatalo, Jarno, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari. 2013. Journal of Machine Learning Research 14 (April): 1175−1179.
———. 2015. arXiv:1206.5754 [Cs, Stat], July.
Walder, Christian, Kwang In Kim, and Bernhard Schölkopf. 2008. In Proceedings of the 25th International Conference on Machine Learning, 1112–19. ICML ’08. New York, NY, USA: ACM.
Walder, C., B. Schölkopf, and O. Chapelle. 2006. Computer Graphics Forum 25 (3): 635–44.
Wang, Ke, Geoff Pleiss, Jacob Gardner, Stephen Tyree, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2019. Advances in Neural Information Processing Systems 32: 14648–59.
Wikle, Christopher K., Noel Cressie, and Andrew Zammit-Mangion. 2019. Spatio-Temporal Statistics with R.
Wilk, Mark van der, Andrew G. Wilson, and Carl E. Rasmussen. 2014. “Variational Inference for Latent Variable Modelling of Correlation Structure.” In NIPS 2014 Workshop on Advances in Variational Inference.
Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019. arXiv:1901.11436 [Cs, Eess, Stat], January.
Williams, Christopher KI, and Matthias Seeger. 2001. In Advances in Neural Information Processing Systems, 682–88.
Williams, Christopher, Stefan Klanke, Sethu Vijayakumar, and Kian M. Chai. 2009. In Advances in Neural Information Processing Systems 21, edited by D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, 265–72. Curran Associates, Inc.
Wilson, Andrew Gordon, and Ryan Prescott Adams. 2013. In International Conference on Machine Learning.
Wilson, Andrew Gordon, Christoph Dann, Christopher G. Lucas, and Eric P. Xing. 2015. arXiv:1510.07389 [Cs, Stat], October.
Wilson, Andrew Gordon, and Zoubin Ghahramani. 2011. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 736–44. UAI’11. Arlington, Virginia, United States: AUAI Press.
———. 2012. “Modelling Input Varying Correlations Between Multiple Responses.” In Machine Learning and Knowledge Discovery in Databases, edited by Peter A. Flach, Tijl De Bie, and Nello Cristianini, 858–61. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Wilson, Andrew Gordon, David A. Knowles, and Zoubin Ghahramani. 2012. In Proceedings of the 29th International Coference on International Conference on Machine Learning, 1139–46. ICML’12. Madison, WI, USA: Omnipress.
Wilson, Andrew Gordon, and Hannes Nickisch. 2015. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, 1775–84. ICML’15. Lille, France: JMLR.org.
Wilson, James T., Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2021. Journal of Machine Learning Research 22 (105): 1–47.
Wilson, James, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisenroth. 2020. In Proceedings of the 37th International Conference on Machine Learning, 10292–302. PMLR.
Zhang, Rui, Christian Walder, Edwin V. Bonilla, Marian-Andrei Rizoiu, and Lexing Xie. 2020. In Proceedings of NeurIPS 2020.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.