Gaussian process regression software
And classification.
December 3, 2019 — July 29, 2022
Implementations of Gaussian process regression.
GPy was a common default choice in Python, and GPFlow, for example, has attempted to follow its API. For another value of default, scikit-learn has a GP implementation. Moreover, many generic Bayesian inference toolkits support GP models generically. All things being equal, I want better-than-generic support for GP models. Making them go fast can be a subtle business, and there are all kinds of fancy bells and whistles I would generally like to support, such as inducing point methods and sparse variational inference.
1 GPy
GPy originates in GP-urdaddy Neil Lawrence’s lab. It is well-tested and featureful. However, it is also crufty, confusing, and slow. It predates some modern autodiff technology and rolls its own, in opaque ways. It is best regarded as a kind of reference implementation but maybe not used in practice.
2 Stheno
Stheno seems to be popular for Julia and also comes in an alternative flavour, python stheno, targeting PyTorch, TensorFlow, or JAX. It seems to be sponsored by Invenia as a by-product of their main business. They write excellent GP tutorials, e.g. Scaling multi-output Gaussian process models with exact inference.
3 GPyTorch
Gardner et al. (2018)
have you used gpytorch? Its the bomb it is truly amazing. This tutorial is well worth the 15 minutes it takes. Crank up the training and testing set datasize to 10000 and it is a breeze I have tried GPy, GPflow and tensorflow in the past and they would struggle
Bonus feature: integration (in the API sense) with pyro, for more elaborate likelihood models.
4 Plain pyro
There are native GP models in the PyTorch-based pyro, which you can use without GPyTorch.
5 JuliaGaussianProcesses Ecosystem
JuliaGaussianProcesses includes various interoperable Julia GP regression packages.
AugmentedGaussianProcess.jl by Théo Galy-Fajou looks nice and has sparse approximation plus some nice variational approx tricks. Deprecated in favour of JuliaGaussianProcesses/AugmentedGPLikelihoods.jl
6 GPJax
Via Cheng Soon Ong:
GPJax (Pinder and Dodd 2022) is a didactic Gaussian process library that supports GPU acceleration and just-in-time compilation. We seek to provide a flexible API as close as possible to how the underlying mathematics is written on paper to enable researchers to rapidly prototype and develop new ideas.
7 TinyGP
tinygp
is an extremely lightweight library for building Gaussian Process (GP) models in Python, built on top ofjax
. It has a nice interface, and it’s pretty fast (see Benchmarks). Thanks tojax
,tinygp
supports things like GPU acceleration and automatic differentiation.
8 BayesNewton
Bayes-Newton (Wilkinson, Särkkä, and Solin 2021) is a library for approximate inference in Gaussian processes (GPs) in JAX (with objax), built and actively maintained by Will Wilkinson. Bayes-Newton provides a unifying view of approximate Bayesian inference, and allows for the combination of many models (e.g. GPs, sparse GPs, Markov GPs, sparse Markov GPs) with the inference method of your choice (VI, EP, Laplace, Linearization).
9 George
George is a Python library implementing Ambikasaran et al. (2015)’s fast method for time-series GPs, as explained in Scaling Gaussian Processes to big datasets.
10 Geostat Framework
This Framework was created within the PhD project of Sebastian Müller at the Computational Hydrosystems Department at the UFZ Leipzig.
11 Stan
Bayes workhorse Stan can do Gaussian Process regression just like almost everything else; see Michael Betancourt’s blog posts, 1. 2. 3.
12 scikit-learn
The current scikit-learn has basic Gaussian processes. The introduction disposes me against using this implementation:
Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems.
The advantages of Gaussian processes are:
- The prediction interpolates the observations (at least for regular kernels).
- The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest.
- Versatile: different kernels can be specified. Common kernels are provided, but it is also possible to specify custom kernels.
The disadvantages of Gaussian processes include:
- They are not sparse, i.e., they use the whole samples/features information to perform the prediction.
- They lose efficiency in high dimensional spaces — namely when the number of features exceeds a few dozens.
This list of disadvantages has, at best, imprecision and, at worst, mistakes. Let us suppose that this description is supposed to draw a distinction between GP regression and some other regression model (note that Bayes linear regression and GP regression are intimately linked).
So what is this telling us about the unusual qualities of GP regression?
The first point is strictly correct, but not useful, in that sparse approximate GPs is a whole industry, and this is a statement that this implementation is missing a fairly standard approximate inference method rather than a problem with the family of methods per se. (“The problem with cars is that they all run on diesel”). The second point might be correct under a maximally generous interpretation. What do they mean by efficiency here?
In the least generous interpretation, it is just plain wrong; I use GPs with dozens of inputs often; it works fine. In particular, if they mean ‘efficiency’ in the sense that computational cost grows with input dimension, this is suspect. Naïve inference is \(\mathcal{O}(DN^3)\) for \(N\) observations and \(D\) features. That is, dimensionality cost is no worse than linear regression for prediction and superior for training, although other models that have a linear complexity in sample dimension escape without such a warning in the scikit-learn docs.
Perhaps they mean statistical efficiency, i.e. in terms of variance of the estimate with respect to a fixed number of data points? That is a subtler question. Yes, for a fixed isotropic kernel we can get bad behaviour in high dimensional spaces because, informally, low dimensional projections are all kinda similar 🚧TODO🚧 clarify. OTOH, if we use a good non-isotropic kernel or have adaptive kernel selection, which is kinda standard, then we can have a kernel that behaves well in many dimensions by learning to adapt to the important ones. This is pretty standard. As the number of input dimensions grows larger still, the hyperparameter selection problem grows less-well-determined. However, that is also true of linear regression in general. If they wish to make an assertion that this problem in GP regression is even worse than linear regression then they could make that case I suppose, but something better than this one sentence aside would be needed.
13 GPFlow
GPflow (Matthews et al. 2016) (using TensorFlow) is an option. The GPflow docs include the following clarification of its genealogy.
GPflow has origins in GPy…, and much of the interface is intentionally similar for continuity (though some parts of the interface may diverge in future). GPflow has a rather different remit from GPy though:
- GPflow leverages TensorFlow for faster/bigger computation
- GPflow has much less code than GPy, mostly because all gradient computation is handled by TensorFlow.
- GPflow focuses on variational inference and MCMC — there is no expectation propagation or Laplace approximation.
- GPflow does not have any plotting functionality.
In practice, it is faster than GPy but does not seem any easier or less confusing to implement tricky stuff in.
14 Misc Python
15 AutoGP
autogp (also using TensorFlow) incorporates much fancy GP technology at once. I believe it is no longer actively maintained, which is a pity as it is the only one I have contributed substantially to.
16 MATLAB
Should mention the various MATLAB/Scilab options.
GPStuff is one for MATLAB/Octave that I have seen around the place.