Gaussian process regression

And classification. And extensions.



Gaussian random processes/fields are stochastic processes/fields with jointly Gaussian distributions of observations. While “Gaussian process regression” is not wrong per se, there is a common convention in stochastic process theory (and also in pedagogy) to use process to talk about some notionally time-indexed process and field to talk about ones that have a some space-like index without a presumption of an arrow of time. This leads to much confusion, because Gaussian field regression is what we usually want to talk about. What we want to use the arrow of time for is a whole other story. Regardless, hereafter I’ll use “field” and “process” interchangeably.

In machine learning, Gaussian fields are used often as a means of regression or classification, since it is fairly easy to conditionalize a Gaussian field on data and produce a posterior distribution over functions. They provide nonparametric method of inferring regression functions, with a conveniently Bayesian interpretation and reasonably elegant learning and inference steps. I would further add that this is the crystal meth of machine learning methods, in terms of the addictiveness, and of the passion of the people who use it.

The central trick is using a clever union of Hilbert space trickss and probability to give a probabilistic interpretation of functional regression as a kind of nonparametric Bayesian inference.

Useful side divergence into representer theorems and Karhunen-Loève expansions for thinking about this. Regression using Gaussian processes is common e.g. spatial statistics where it arises as kriging. Cressie (1990) traces a history of this idea via Matheron (1963a), to works of Krige (1951).

Gaussianprocess.org:

This web site aims to provide an overview of resources concerned with probabilistic modeling, inference and learning based on Gaussian processes. Although Gaussian processes have a long history in the field of statistics, they seem to have been employed extensively only in niche areas. With the advent of kernel machines in the machine learning community, models based on Gaussian processes have become commonplace for problems of regression (kriging) and classification as well as a host of more specialized applications.

I’ve not been enthusiastic about these in the past. It’s nice to have a principled nonparametric Bayesian formalism, but it has always seemed pointless having a formalism that is so computationally demanding that people don’t try to use more than a thousand data points, or spend most of a paper working out how to approximate this simple elegant model with a complex messy model. However, that previous sentence describes most of my career now, so I guess I must have come around.

Perhaps I should be persuaded by tricks such as AutoGP (Krauth et al. 2016) which breaks some computational deadlocks by clever use of inducing variables and variational approximation to produce a compressed representation of the data with tractable inference and model selection, including kernel selection, and doing the whole thing in many dimensions simultaneously. There are other clever tricks like this one, e.g (Saatçi 2012) shows how to use a lattice structure for observations to make computation cheap.

Quick intro

I am not the right guy to provide the canonical introduction, because it already exists. Specifically, Rasmussen and Williams (2006).

This lecture by the late David Mackay is probably good; the man could talk.

There is also a well-illustrated and elementary introduction by Yuge Shi. There are many, many more.

J. T. Wilson et al. (2021):

A Gaussian process (GP) is a random function \(f: \mathcal{X} \rightarrow \mathbb{R}\), such that, for any finite collection of points \(\mathbf{X} \subset \mathcal{X}\), the random vector \(\boldsymbol{f}=f(\mathbf{X})\) follows a Gaussian distribution. Such a process is uniquely identified by a mean function \(\mu: \mathcal{X} \rightarrow \mathbb{R}\) and a positive semi-definite kernel \(k: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}\). Hence, if \(f \sim \mathcal{G} \mathcal{P}(\mu, k)\), then \(\boldsymbol{f} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{K})\) is multivariate normal with mean \(\boldsymbol{\mu}=\mu(\mathbf{X})\) and covariance \(\mathbf{K}=k(\mathbf{X}, \mathbf{X})\).

[…] we investigate different ways of reasoning about the random variable \(\boldsymbol{f}_* \mid \boldsymbol{f}_n=\boldsymbol{y}\) for some non-trivial partition \(\boldsymbol{f}=\boldsymbol{f}_n \oplus \boldsymbol{f}_*\). Here, \(\boldsymbol{f}_n=f\left(\mathbf{X}_n\right)\) are process values at a set of training locations \(\mathbf{X}_n \subset \mathbf{X}\) where we would like to introduce a condition \(\boldsymbol{f}_n=\boldsymbol{y}\), while \(\boldsymbol{f}_*=f\left(\mathbf{X}_*\right)\) are process values at a set of test locations \(\mathbf{X}_* \subset \mathbf{X}\) where we would like to obtain a random variable \(\boldsymbol{f}_* \mid \boldsymbol{f}_n=\boldsymbol{y}\).

[…] we may obtain \(\boldsymbol{f}_* \mid \boldsymbol{y}\) by first finding its conditional distribution. Since process values \(\left(\boldsymbol{f}_n, \boldsymbol{f}_*\right)\) are defined as jointly Gaussian, this procedure closely resembles that of [the finite-dimensinal case]: we factor out the marginal distribution of \(\boldsymbol{f}_n\) from the joint distribution \(p\left(\boldsymbol{f}_n, \boldsymbol{f}_*\right)\) and, upon canceling, identify the remaining distribution as \(p\left(\boldsymbol{f}_* \mid \boldsymbol{y}\right)\). Having done so, we find that the conditional distribution is the Gaussian \(\mathcal{N}\left(\boldsymbol{\mu}_{* \mid y}, \mathbf{K}_{*, * \mid y}\right)\) with moments \[\begin{aligned} \boldsymbol{\mu}_{* \mid \boldsymbol{y}}&=\boldsymbol{\mu}_*+\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1}\left(\boldsymbol{y}-\boldsymbol{\mu}_n\right) \\ \mathbf{K}_{*, * \mid \boldsymbol{y}}&=\mathbf{K}_{*, *}-\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1} \mathbf{K}_{n, *}\end{aligned} \]

Observation likelihoods

Classification etc. TBD

Incorporating a mean function

Almost immediate but not quite trivial (Rasmussen and Williams 2006, 2.7).

TODO: discuss identifiability.

Density estimation

Can I infer a density using GPs? Yes. One popular method is apparently the logistic Gaussian process. (Tokdar 2007; Lenk 2003)

Kernels

a.k.a. covariance models.

GP regression models are kernel machines. As such covariance kernels are the parameters. More or less. One can also parameterise with a mean function, but let us ignore that detail for now because usually we do not use them.

Using state filtering

When one dimension of the input vector can be interpreted as a time dimension we are Kalman filtering Gaussian Processes, which has benefits in terms of speed.

On lattice observations

Gaussian processes on lattices.

On manifolds

I would like to read Terenin on GPs on Manifolds who also makes a suggestive connection to SDEs, which is the filtering GPs trick again.

By variational inference

🏗

Neural processes

See neural processes.

With inducing variables

“Sparse GP”. See Quiñonero-Candela and Rasmussen (2005). 🏗

By variational inference with inducing variables

See GP factoring.

With vector output

See vector gaussian process regression.

Approximation with dropout

See NN ensembles.

Inhomogeneous with covariates

Integrated nested Laplace approximation connects to GP-as-SDE idea, I think?

For dimension reduction

e.g. GP-LVM (N. Lawrence 2005). 🏗

Pathwise/Matheron updates

See pathwise GP.

References

Abrahamsen, Petter. 1997. A Review of Gaussian Random Fields and Correlation Functions.”
Abt, Markus, and William J. Welch. 1998. Fisher Information and Maximum-Likelihood Estimation of Covariance Parameters in Gaussian Stochastic Processes.” Canadian Journal of Statistics 26 (1): 127–37.
Altun, Yasemin, Alex J. Smola, and Thomas Hofmann. 2004. Exponential Families for Conditional Random Fields.” In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2–9. UAI ’04. Arlington, Virginia, United States: AUAI Press.
Alvarado, Pablo A., and Dan Stowell. 2018. Efficient Learning of Harmonic Priors for Pitch Detection in Polyphonic Music.” arXiv:1705.07104 [Cs, Stat], November.
Ambikasaran, Sivaram, Daniel Foreman-Mackey, Leslie Greengard, David W. Hogg, and Michael O’Neil. 2015. Fast Direct Methods for Gaussian Processes.” arXiv:1403.6015 [Astro-Ph, Stat], April.
Bachoc, F., F. Gamboa, J. Loubes, and N. Venet. 2018. A Gaussian Process Regression Model for Distribution Inputs.” IEEE Transactions on Information Theory 64 (10): 6620–37.
Bachoc, Francois, Alexandra Suvorikova, David Ginsbourger, Jean-Michel Loubes, and Vladimir Spokoiny. 2019. Gaussian Processes with Multidimensional Distribution Inputs via Optimal Transport and Hilbertian Embedding.” arXiv:1805.00753 [Stat], April.
Birgé, Lucien, and Pascal Massart. 2006. Minimal Penalties for Gaussian Model Selection.” Probability Theory and Related Fields 138 (1-2): 33–73.
Bonilla, Edwin V., Kian Ming A. Chai, and Christopher K. I. Williams. 2007. Multi-Task Gaussian Process Prediction.” In Proceedings of the 20th International Conference on Neural Information Processing Systems, 153–60. NIPS’07. USA: Curran Associates Inc.
Bonilla, Edwin V., Karl Krauth, and Amir Dezfouli. 2019. Generic Inference in Latent Gaussian Process Models.” Journal of Machine Learning Research 20 (117): 1–63.
Borovitskiy, Viacheslav, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2020. Matérn Gaussian Processes on Riemannian Manifolds.” arXiv:2006.10160 [Cs, Stat], June.
Burt, David R., Carl Edward Rasmussen, and Mark van der Wilk. 2020. Convergence of Sparse Variational Inference in Gaussian Processes Regression.” Journal of Machine Learning Research 21 (131): 1–63.
Calandra, R., J. Peters, C. E. Rasmussen, and M. P. Deisenroth. 2016. Manifold Gaussian Processes for Regression.” In 2016 International Joint Conference on Neural Networks (IJCNN), 3338–45. Vancouver, BC, Canada: IEEE.
Cressie, Noel. 1990. The Origins of Kriging.” Mathematical Geology 22 (3): 239–52.
———. 2015. Statistics for Spatial Data. John Wiley & Sons.
Cressie, Noel, and Christopher K. Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics 2.0. John Wiley and Sons.
Csató, Lehel, and Manfred Opper. 2002. Sparse On-Line Gaussian Processes.” Neural Computation 14 (3): 641–68.
Csató, Lehel, Manfred Opper, and Ole Winther. 2001. TAP Gibbs Free Energy, Belief Propagation and Sparsity.” In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, 657–63. NIPS’01. Cambridge, MA, USA: MIT Press.
Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008. Fast Gaussian Process Methods for Point Process Intensity Estimation.” In Proceedings of the 25th International Conference on Machine Learning, 192–99. ICML ’08. New York, NY, USA: ACM Press.
Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. Random Feature Expansions for Deep Gaussian Processes.” In PMLR.
Dahl, Astrid, and Edwin Bonilla. 2017. Scalable Gaussian Process Models for Solar Power Forecasting.” In Data Analytics for Renewable Energy Integration: Informing the Generation and Distribution of Renewable Energy, edited by Wei Lee Woon, Zeyar Aung, Oliver Kramer, and Stuart Madnick, 94–106. Lecture Notes in Computer Science. Cham: Springer International Publishing.
Dahl, Astrid, and Edwin V. Bonilla. 2019. Sparse Grouped Gaussian Processes for Solar Power Forecasting.” arXiv:1903.03986 [Cs, Stat], March.
Damianou, Andreas, and Neil Lawrence. 2013. Deep Gaussian Processes.” In Artificial Intelligence and Statistics, 207–15.
Damianou, Andreas, Michalis K. Titsias, and Neil D. Lawrence. 2011. Variational Gaussian Process Dynamical Systems.” In Advances in Neural Information Processing Systems 24, edited by J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, 2510–18. Curran Associates, Inc.
Dezfouli, Amir, and Edwin V. Bonilla. 2015. Scalable Inference for Gaussian Process Models with Black-Box Likelihoods.” In Advances in Neural Information Processing Systems 28, 1414–22. NIPS’15. Cambridge, MA, USA: MIT Press.
Domingos, Pedro. 2020. Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.” arXiv:2012.00152 [Cs, Stat], November.
Dubrule, Olivier. 2018. Kriging, Splines, Conditional Simulation, Bayesian Inversion and Ensemble Kalman Filtering.” In Handbook of Mathematical Geosciences: Fifty Years of IAMG, edited by B.S. Daya Sagar, Qiuming Cheng, and Frits Agterberg, 3–24. Cham: Springer International Publishing.
Dunlop, Matthew M., Mark A. Girolami, Andrew M. Stuart, and Aretha L. Teckentrup. 2018. How Deep Are Deep Gaussian Processes? Journal of Machine Learning Research 19 (1): 2100–2145.
Dutordoir, Vincent, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, and Nicolas Durrande. 2021. Deep Neural Networks as Point Estimates for Deep Gaussian Processes.” arXiv:2105.04504 [Cs, Stat], May.
Dutordoir, Vincent, Alan Saul, Zoubin Ghahramani, and Fergus Simpson. 2022. Neural Diffusion Processes.” arXiv.
Duvenaud, David. 2014. Automatic Model Construction with Gaussian Processes.” PhD Thesis, University of Cambridge.
Duvenaud, David, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. 2013. Structure Discovery in Nonparametric Regression Through Compositional Kernel Search.” In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 1166–74.
Ebden, Mark. 2015. Gaussian Processes: A Quick Introduction.” arXiv:1505.02965 [Math, Stat], May.
Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017. Identification of Gaussian Process State Space Models.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc.
Emery, Xavier. 2007. Conditioning Simulations of Gaussian Random Fields by Ordinary Kriging.” Mathematical Geology 39 (6): 607–23.
Evgeniou, Theodoros, Charles A. Micchelli, and Massimiliano Pontil. 2005. Learning Multiple Tasks with Kernel Methods.” Journal of Machine Learning Research 6 (Apr): 615–37.
Ferguson, Thomas S. 1973. A Bayesian Analysis of Some Nonparametric Problems.” The Annals of Statistics 1 (2): 209–30.
Finzi, Marc, Roberto Bondesan, and Max Welling. 2020. Probabilistic Numeric Convolutional Neural Networks.” arXiv:2010.10876 [Cs], October.
Föll, Roman, Bernard Haasdonk, Markus Hanselmann, and Holger Ulmer. 2017. Deep Recurrent Gaussian Process with Variational Sparse Spectrum Approximation.” arXiv:1711.00799 [Stat], November.
Frigola, Roger, Yutian Chen, and Carl Edward Rasmussen. 2014. Variational Gaussian Process State-Space Models.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3680–88. Curran Associates, Inc.
Frigola, Roger, Fredrik Lindsten, Thomas B Schön, and Carl Edward Rasmussen. 2013. Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC.” In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 3156–64. Curran Associates, Inc.
Gal, Yarin, and Zoubin Ghahramani. 2015. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
Gal, Yarin, and Mark van der Wilk. 2014. Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models - a Gentle Tutorial.” arXiv:1402.1412 [Stat], February.
Galliani, Pietro, Amir Dezfouli, Edwin V Bonilla, and Novi Quadrianto. n.d. “Gray-Box Inference for Structured Gaussian Process Models,” 9.
Gardner, Jacob R., Geoff Pleiss, David Bindel, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 31:7587–97. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.
Gardner, Jacob R., Geoff Pleiss, Ruihan Wu, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018. Product Kernel Interpolation for Scalable Gaussian Processes.” arXiv:1802.08903 [Cs, Stat], February.
Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018. Conditional Neural Processes.” arXiv:1807.01613 [Cs, Stat], July, 10.
Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018. Neural Processes,” July.
Ghahramani, Zoubin. 2013. Bayesian Non-Parametrics and the Probabilistic Approach to Modelling.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371 (1984): 20110553.
Gilboa, E., Y. Saatçi, and J. P. Cunningham. 2015. Scaling Multidimensional Inference for Structured Gaussian Processes.” IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2): 424–36.
Girolami, Mark, and Simon Rogers. 2005. Hierarchic Bayesian Models for Kernel Learning.” In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05, 241–48. Bonn, Germany: ACM Press.
Gramacy, Robert B. 2016. laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R.” Journal of Statistical Software 72 (1).
Gramacy, Robert B., and Daniel W. Apley. 2015. Local Gaussian Process Approximation for Large Computer Experiments.” Journal of Computational and Graphical Statistics 24 (2): 561–78.
Gratiet, Loïc Le, Stefano Marelli, and Bruno Sudret. 2016. Metamodel-Based Sensitivity Analysis: Polynomial Chaos Expansions and Gaussian Processes.” In Handbook of Uncertainty Quantification, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1–37. Cham: Springer International Publishing.
Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012. Exploiting Compositionality to Explore a Large Space of Model Structures.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Hartikainen, J., and S. Särkkä. 2010. Kalman Filtering and Smoothing Solutions to Temporal Gaussian Process Regression Models.” In 2010 IEEE International Workshop on Machine Learning for Signal Processing, 379–84. Kittila, Finland: IEEE.
Hensman, James, Nicolò Fusi, and Neil D. Lawrence. 2013. Gaussian Processes for Big Data.” In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, 282–90. UAI’13. Arlington, Virginia, USA: AUAI Press.
Huber, Marco F. 2014. Recursive Gaussian Process: On-Line Regression and Learning.” Pattern Recognition Letters 45 (August): 85–91.
Huggins, Jonathan H., Trevor Campbell, Mikołaj Kasprzak, and Tamara Broderick. 2018. Scalable Gaussian Process Inference with Finite-Data Mean and Variance Guarantees.” arXiv:1806.10234 [Cs, Stat], June.
Jankowiak, Martin, Geoff Pleiss, and Jacob Gardner. 2020. Deep Sigma Point Processes.” In Conference on Uncertainty in Artificial Intelligence, 789–98. PMLR.
Jordan, Michael Irwin. 1999. Learning in Graphical Models. Cambridge, Mass.: MIT Press.
Karvonen, Toni, and Simo Särkkä. 2016. Approximate State-Space Gaussian Processes via Spectral Transformation.” In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6. Vietri sul Mare, Salerno, Italy: IEEE.
Kasim, M. F., D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, et al. 2020. Up to Two Billion Times Acceleration of Scientific Simulations with Deep Neural Architecture Search.” arXiv:2001.08055 [Physics, Stat], January.
Kingma, Diederik P., and Max Welling. 2014. Auto-Encoding Variational Bayes.” In ICLR 2014 Conference.
Ko, Jonathan, and Dieter Fox. 2009. GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction and Observation Models.” In Autonomous Robots, 27:75–90.
Kocijan, Juš, Agathe Girard, Blaž Banko, and Roderick Murray-Smith. 2005. Dynamic Systems Identification with Gaussian Processes.” Mathematical and Computer Modelling of Dynamical Systems 11 (4): 411–24.
Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016. AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In UAI17.
Krige, D. G. 1951. A Statistical Approach to Some Basic Mine Valuation Problems on the Witwatersrand.” Journal of the Southern African Institute of Mining and Metallurgy 52 (6): 119–39.
Kroese, Dirk P., and Zdravko I. Botev. 2013. Spatial Process Generation.” arXiv:1308.0399 [Stat], August.
Lawrence, Neil. 2005. Probabilistic Non-Linear Principal Component Analysis with Gaussian Process Latent Variable Models.” Journal of Machine Learning Research 6 (Nov): 1783–1816.
Lawrence, Neil D., and Raquel Urtasun. 2009. Non-Linear Matrix Factorization with Gaussian Processes.” In Proceedings of the 26th Annual International Conference on Machine Learning, 601–8. ICML ’09. New York, NY, USA: ACM.
Lawrence, Neil, Matthias Seeger, and Ralf Herbrich. 2003. Fast Sparse Gaussian Process Methods: The Informative Vector Machine.” In Proceedings of the 16th Annual Conference on Neural Information Processing Systems, 609–16.
Lázaro-Gredilla, Miguel, Joaquin Quiñonero-Candela, Carl Edward Rasmussen, and Aníbal R. Figueiras-Vidal. 2010. Sparse Spectrum Gaussian Process Regression.” Journal of Machine Learning Research 11 (Jun): 1865–81.
Lee, Jaehoon, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018. Deep Neural Networks as Gaussian Processes.” In ICLR.
Leibfried, Felix, Vincent Dutordoir, S. T. John, and Nicolas Durrande. 2022. A Tutorial on Sparse Gaussian Processes and Variational Inference.” arXiv.
Lenk, Peter J. 2003. Bayesian Semiparametric Density Estimation and Model Verification Using a Logistic–Gaussian Process.” Journal of Computational and Graphical Statistics 12 (3): 548–65.
Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 423–98.
Liutkus, Antoine, Roland Badeau, and Gäel Richard. 2011. Gaussian Processes for Underdetermined Source Separation.” IEEE Transactions on Signal Processing 59 (7): 3155–67.
Lloyd, James Robert, David Duvenaud, Roger Grosse, Joshua Tenenbaum, and Zoubin Ghahramani. 2014. Automatic Construction and Natural-Language Description of Nonparametric Regression Models.” In Twenty-Eighth AAAI Conference on Artificial Intelligence.
Louizos, Christos, Xiahan Shi, Klamer Schutte, and Max Welling. 2019. The Functional Neural Process.” In Advances in Neural Information Processing Systems. Vol. 32. Curran Associates, Inc.
Lu, Jun. 2022. A Rigorous Introduction to Linear Models.” arXiv.
MacKay, David J C. 1998. Introduction to Gaussian Processes.” NATO ASI Series. Series F: Computer and System Sciences 168: 133–65.
———. 2002. Gaussian Processes.” In Information Theory, Inference & Learning Algorithms, Chapter 45. Cambridge University Press.
Matheron, Georges. 1963a. Traité de Géostatistique Appliquée. 2. Le Krigeage. Editions Technip.
———. 1963b. Principles of Geostatistics.” Economic Geology 58 (8): 1246–66.
Matthews, Alexander Graeme de Garis, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2016. GPflow: A Gaussian Process Library Using TensorFlow.” arXiv:1610.08733 [Stat], October.
Mattos, César Lincoln C., Zhenwen Dai, Andreas Damianou, Guilherme A. Barreto, and Neil D. Lawrence. 2017. Deep Recurrent Gaussian Processes for Outlier-Robust System Identification.” Journal of Process Control, DYCOPS-CAB 2016, 60 (December): 82–94.
Mattos, César Lincoln C., Zhenwen Dai, Andreas Damianou, Jeremy Forth, Guilherme A. Barreto, and Neil D. Lawrence. 2016. Recurrent Gaussian Processes.” In Proceedings of ICLR.
Micchelli, Charles A., and Massimiliano Pontil. 2005a. Learning the Kernel Function via Regularization.” Journal of Machine Learning Research 6 (Jul): 1099–1125.
———. 2005b. On Learning Vector-Valued Functions.” Neural Computation 17 (1): 177–204.
Minh, Hà Quang. 2022. Finite Sample Approximations of Exact and Entropic Wasserstein Distances Between Covariance Operators and Gaussian Processes.” SIAM/ASA Journal on Uncertainty Quantification, February, 96–124.
Mohammadi, Hossein, Peter Challenor, and Marc Goodfellow. 2021. Emulating Computationally Expensive Dynamical Simulators Using Gaussian Processes.” arXiv:2104.14987 [Stat], April.
Moreno-Muñoz, Pablo, Antonio Artés-Rodríguez, and Mauricio A. Álvarez. 2019. Continual Multi-Task Gaussian Processes.” arXiv:1911.00002 [Cs, Stat], October.
Nagarajan, Sai Ganesh, Gareth Peters, and Ido Nevat. 2018. Spatial Field Reconstruction of Non-Gaussian Random Fields: The Tukey G-and-H Random Process.” SSRN Electronic Journal.
Nickisch, Hannes, Arno Solin, and Alexander Grigorevskiy. 2018. State Space Gaussian Processes with Non-Gaussian Likelihood.” In International Conference on Machine Learning, 3789–98.
O’Hagan, A. 1978. Curve Fitting and Optimal Design for Prediction.” Journal of the Royal Statistical Society: Series B (Methodological) 40 (1): 1–24.
Papaspiliopoulos, Omiros, Yvo Pokern, Gareth O. Roberts, and Andrew M. Stuart. 2012. Nonparametric Estimation of Diffusions: A Differential Equations Approach.” Biometrika 99 (3): 511–31.
Pinder, Thomas, and Daniel Dodd. 2022. GPJax: A Gaussian Process Framework in JAX.” Journal of Open Source Software 7 (75): 4455.
Pleiss, Geoff, Jacob R. Gardner, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018. Constant-Time Predictive Distributions for Gaussian Processes.” In. arXiv.
Pleiss, Geoff, Martin Jankowiak, David Eriksson, Anil Damle, and Jacob Gardner. 2020. Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization.” Advances in Neural Information Processing Systems 33.
Qi, Yuan Alan, Ahmed H. Abdel-Gawad, and Thomas P. Minka. 2010. Sparse-Posterior Gaussian Processes for General Likelihoods.” In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, 450–57. UAI’10. Arlington, Virginia, USA: AUAI Press.
Quiñonero-Candela, Joaquin, and Carl Edward Rasmussen. 2005. A Unifying View of Sparse Approximate Gaussian Process Regression.” Journal of Machine Learning Research 6 (Dec): 1939–59.
Raissi, Maziar, and George Em Karniadakis. 2017. Machine Learning of Linear Differential Equations Using Gaussian Processes.” arXiv:1701.02440 [Cs, Math, Stat], January.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press.
Reece, S., and S. Roberts. 2010. An Introduction to Gaussian Processes for the Kalman Filter Expert.” In 2010 13th International Conference on Information Fusion, 1–9.
Ritter, Hippolyt, Martin Kukla, Cheng Zhang, and Yingzhen Li. 2021. Sparse Uncertainty Representation in Deep Learning with Inducing Weights.” arXiv:2105.14594 [Cs, Stat], May.
Riutort-Mayol, Gabriel, Paul-Christian Bürkner, Michael R. Andersen, Arno Solin, and Aki Vehtari. 2020. Practical Hilbert Space Approximate Bayesian Gaussian Processes for Probabilistic Programming.” arXiv:2004.11408 [Stat], April.
Rossi, Simone, Markus Heinonen, Edwin Bonilla, Zheyang Shen, and Maurizio Filippone. 2021. Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable Approximations.” In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, 1837–45. PMLR.
Saatçi, Yunus. 2012. Scalable inference for structured Gaussian process models.” Ph.D., University of Cambridge.
Saatçi, Yunus, Ryan Turner, and Carl Edward Rasmussen. 2010. Gaussian Process Change Point Models.” In Proceedings of the 27th International Conference on International Conference on Machine Learning, 927–34. ICML’10. Madison, WI, USA: Omnipress.
Saemundsson, Steindor, Alexander Terenin, Katja Hofmann, and Marc Peter Deisenroth. 2020. Variational Integrator Networks for Physically Structured Embeddings.” arXiv:1910.09349 [Cs, Stat], March.
Salimbeni, Hugh, and Marc Deisenroth. 2017. Doubly Stochastic Variational Inference for Deep Gaussian Processes.” In Advances In Neural Information Processing Systems.
Salimbeni, Hugh, Stefanos Eleftheriadis, and James Hensman. 2018. Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models.” In International Conference on Artificial Intelligence and Statistics, 689–97.
Särkkä, Simo. 2011. Linear Operators and Stochastic Partial Differential Equations in Gaussian Process Regression.” In Artificial Neural Networks and Machine Learning – ICANN 2011, edited by Timo Honkela, Włodzisław Duch, Mark Girolami, and Samuel Kaski, 6792:151–58. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer.
———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3. Cambridge, U.K. ; New York: Cambridge University Press.
Särkkä, Simo, and Jouni Hartikainen. 2012. Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression.” In Artificial Intelligence and Statistics.
Särkkä, Simo, A. Solin, and J. Hartikainen. 2013. Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” IEEE Signal Processing Magazine 30 (4): 51–61.
Schulam, Peter, and Suchi Saria. 2017. Reliable Decision Support Using Counterfactual Models.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 1696–706. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.
Shah, Amar, Andrew Wilson, and Zoubin Ghahramani. 2014. Student-t Processes as Alternatives to Gaussian Processes.” In Artificial Intelligence and Statistics, 877–85. PMLR.
Sidén, Per. 2020. Scalable Bayesian Spatial Analysis with Gaussian Markov Random Fields. Vol. 15. Linköping Studies in Statistics. Linköping: Linköping University Electronic Press.
Smith, Michael Thomas, Mauricio A. Alvarez, and Neil D. Lawrence. 2018. Gaussian Process Regression for Binned Data.” arXiv:1809.02010 [Cs, Stat], September.
Snelson, Edward, and Zoubin Ghahramani. 2005. Sparse Gaussian Processes Using Pseudo-Inputs.” In Advances in Neural Information Processing Systems, 1257–64.
Solin, Arno, and Simo Särkkä. 2020. Hilbert Space Methods for Reduced-Rank Gaussian Process Regression.” Statistics and Computing 30 (2): 419–46.
Tait, Daniel J., and Theodoros Damoulas. 2020. Variational Autoencoding of PDE Inverse Problems.” arXiv:2006.15641 [Cs, Stat], June.
Tang, Wenpin, Lu Zhang, and Sudipto Banerjee. 2019. On Identifiability and Consistency of the Nugget in Gaussian Spatial Process Models.” arXiv:1908.05726 [Math, Stat], August.
Titsias, Michalis K. 2009a. Variational Learning of Inducing Variables in Sparse Gaussian Processes.” In International Conference on Artificial Intelligence and Statistics, 567–74. PMLR.
———. 2009b. Variational Model Selection for Sparse Gaussian Process Regression: TEchical Supplement.” Technical report, School of Computer Science, University of Manchester.
Titsias, Michalis, and Neil D. Lawrence. 2010. Bayesian Gaussian Process Latent Variable Model.” In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 844–51.
Tokdar, Surya T. 2007. Towards a Faster Implementation of Density Estimation With Logistic Gaussian Process Priors.” Journal of Computational and Graphical Statistics 16 (3): 633–55.
Turner, Richard E., and Maneesh Sahani. 2014. Time-Frequency Analysis as Probabilistic Inference.” IEEE Transactions on Signal Processing 62 (23): 6171–83.
Turner, Ryan, Marc Deisenroth, and Carl Rasmussen. 2010. State-Space Inference and Learning with Gaussian Processes.” In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 868–75.
Vanhatalo, Jarno, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari. 2013. GPstuff: Bayesian Modeling with Gaussian Processes.” Journal of Machine Learning Research 14 (April): 1175−1179.
———. 2015. Bayesian Modeling with Gaussian Processes Using the GPstuff Toolbox.” arXiv:1206.5754 [Cs, Stat], July.
Walder, Christian, Kwang In Kim, and Bernhard Schölkopf. 2008. Sparse Multiscale Gaussian Process Regression.” In Proceedings of the 25th International Conference on Machine Learning, 1112–19. ICML ’08. New York, NY, USA: ACM.
Walder, C., B. Schölkopf, and O. Chapelle. 2006. Implicit Surface Modelling with a Globally Regularised Basis of Compact Support.” Computer Graphics Forum 25 (3): 635–44.
Wang, Ke, Geoff Pleiss, Jacob Gardner, Stephen Tyree, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2019. Exact Gaussian Processes on a Million Data Points.” In Advances in Neural Information Processing Systems, 32:14648–59. Red Hook, NY, USA.
Wikle, Christopher K., Noel Cressie, and Andrew Zammit-Mangion. 2019. Spatio-Temporal Statistics with R.
Wilk, Mark van der, Andrew G. Wilson, and Carl E. Rasmussen. 2014. “Variational Inference for Latent Variable Modelling of Correlation Structure.” In NIPS 2014 Workshop on Advances in Variational Inference.
Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019. End-to-End Probabilistic Inference for Nonstationary Audio Analysis.” arXiv:1901.11436 [Cs, Eess, Stat], January.
Wilkinson, William J., Simo Särkkä, and Arno Solin. 2021. Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees.” arXiv.
Williams, Christopher KI, and Matthias Seeger. 2001. Using the Nyström Method to Speed Up Kernel Machines.” In Advances in Neural Information Processing Systems, 682–88.
Williams, Christopher, Stefan Klanke, Sethu Vijayakumar, and Kian M. Chai. 2009. Multi-Task Gaussian Process Learning of Robot Inverse Dynamics.” In Advances in Neural Information Processing Systems 21, edited by D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, 265–72. Curran Associates, Inc.
Wilson, Andrew Gordon, and Ryan Prescott Adams. 2013. Gaussian Process Kernels for Pattern Discovery and Extrapolation.” In International Conference on Machine Learning.
Wilson, Andrew Gordon, Christoph Dann, Christopher G. Lucas, and Eric P. Xing. 2015. The Human Kernel.” arXiv:1510.07389 [Cs, Stat], October.
Wilson, Andrew Gordon, and Zoubin Ghahramani. 2011. Generalised Wishart Processes.” In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 736–44. UAI’11. Arlington, Virginia, United States: AUAI Press.
———. 2012. “Modelling Input Varying Correlations Between Multiple Responses.” In Machine Learning and Knowledge Discovery in Databases, edited by Peter A. Flach, Tijl De Bie, and Nello Cristianini, 858–61. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Wilson, Andrew Gordon, David A. Knowles, and Zoubin Ghahramani. 2012. Gaussian Process Regression Networks.” In Proceedings of the 29th International Coference on International Conference on Machine Learning, 1139–46. ICML’12. Madison, WI, USA: Omnipress.
Wilson, Andrew Gordon, and Hannes Nickisch. 2015. Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP).” In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, 1775–84. ICML’15. Lille, France: JMLR.org.
Wilson, James T, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisenroth. 2020. Efficiently Sampling Functions from Gaussian Process Posteriors.” In Proceedings of the 37th International Conference on Machine Learning, 10292–302. PMLR.
Wilson, James T, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2021. Pathwise Conditioning of Gaussian Processes.” Journal of Machine Learning Research 22 (105): 1–47.
Zhang, Rui, Christian Walder, Edwin V. Bonilla, Marian-Andrei Rizoiu, and Lexing Xie. 2020. Quantile Propagation for Wasserstein-Approximate Gaussian Processes.” In Proceedings of NeurIPS 2020.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.