Another way of cunningly chopping up the work of fitting a Gaussian process is to represent the process as a random function comprising basis functions with the Gaussian random weight vector so that is a random function satisfying , where is a matrix of features. This is referred to as a weight space approach in ML.
TODO: I just assumed centred weights here, but that is crazy. Update to relax that assumption.
We might imagine this representation would be exact if we had countably many basis functions, and under sane conditions it is. We would like to know, further, that we can find a basis such that we need not too many basis functions to represent the process. Looking at the Karhunen-Loève theorem we might imagine that this can sometimes work out fine, and indeed it does, sometimes.
This is a classic; see Chapter 3 of Bishop (2006) is classic and nicely clear. Cressie and Wikle (2011) targets for the spatiotemporal context.
Hijinks ensue when selecting the basis functions. If we were to treat the natural Hilbert space here seriously we could consider identifying the bases as eigenfunctions of the kernel. This is not generally easy. We tend to use either global bases such as Fourier bases or more generally Karhunen-Loéve bases, or construct local bases of limited overlap (usually piecewise polynomials AFAICT).
The kernel trick writes a kernel as an inner product in a corresponding reproducing kernel Hilbert space (RKHS) with a feature map In sufficiently nice cases the kernel is well approximated where is a finite-dimensional feature map. TODO: What is the actual guarantee here?
Fourier features
When the Fourier basis is natural for the problem we are in a pretty good situation. We can use the Wiener Khintchine relations to analyse and simulate the process. Connection perhaps to Fourier features in neural nets?
Random Fourier features
The random Fourier features method (Rahimi and Recht 2007, 2008) constructs a Monte Carlo estimate to a stationary kernel by representing the inner product in terms of complex exponential basis functions with frequency parameters sampled proportionally to the spectral density
This sometimes has a favourable error rate (Sutherland and Schneider 2015).
K-L basis
We recall from the Karhunen-Loéve notebook that the mean-square-optimal for approximating a Gaussian process is found by truncating the Karhunen-Loéve expansion where and are, respectively, the -th (orthogonal) eigenfunction and eigenvalue of the covariance operator , written in decreasing order of . What is the orthogonal basis though? That depends on the problem and can be a lot of work to calculate.
In the case that our field is stationary on a “nice” domain, though, this can easy — we simply have the Fourier features as the natural basis.
“Decoupled” bases
Cheng and Boots (2017);Salimbeni et al. (2018);Shi, Titsias, and Mnih (2020);Wilson et al. (2020).
References
Ambikasaran, Foreman-Mackey, Greengard, et al. 2015.
“Fast Direct Methods for Gaussian Processes.” arXiv:1403.6015 [Astro-Ph, Stat].
Bishop. 2006. Pattern Recognition and Machine Learning. Information Science and Statistics.
Cheng, and Boots. 2017.
“Variational Inference for Gaussian Process Models with Linear Complexity.” In
Advances in Neural Information Processing Systems.
Cressie, and Huang. 1999.
“Classes of Nonseparable, Spatio-Temporal Stationary Covariance Functions.” Journal of the American Statistical Association.
Cressie, and Johannesson. 2008.
“Fixed Rank Kriging for Very Large Spatial Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Cressie, Shi, and Kang. 2010.
“Fixed Rank Filtering for Spatio-Temporal Data.” Journal of Computational and Graphical Statistics.
Cressie, and Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics 2.0.
———. 2014.
“Space-Time Kalman Filter.” In
Wiley StatsRef: Statistics Reference Online.
Ghanem, and Spanos. 1990.
“Polynomial Chaos in Stochastic Finite Elements.” Journal of Applied Mechanics.
Gilboa, Saatçi, and Cunningham. 2015.
“Scaling Multidimensional Inference for Structured Gaussian Processes.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Gulian, Frankel, and Swiler. 2020.
“Gaussian Process Regression Constrained by Boundary Value Problems.” arXiv:2012.11857 [Cs, Math, Stat].
Le, Sarlós, and Smola. 2013.
“Fastfood-Approximating Kernel Expansions in Loglinear Time.” In
Proceedings of the International Conference on Machine Learning.
Lord, Powell, and Shardlow. 2014.
An Introduction to Computational Stochastic PDEs. Cambridge Texts in Applied Mathematics.
Miller, Glennie, and Seaton. 2020.
“Understanding the Stochastic Partial Differential Equation Approach to Smoothing.” Journal of Agricultural, Biological and Environmental Statistics.
Nguyen, Cressie, and Braverman. 2012.
“Spatial Statistical Data Fusion for Remote Sensing Applications.” Journal of the American Statistical Association.
O’Hagan. 2013. “Polynomial Chaos: A Tutorial and Critique from a Statistician’s Perspective.”
Phillips, Seror, Hutchinson, et al. 2022.
“Spectral Diffusion Processes.” In.
Queipo, Haftka, Shyy, et al. 2005.
“Surrogate-Based Analysis and Optimization.” Progress in Aerospace Sciences.
Rahimi, and Recht. 2007.
“Random Features for Large-Scale Kernel Machines.” In
Advances in Neural Information Processing Systems.
Salimbeni, Cheng, Boots, et al. 2018.
“Orthogonally Decoupled Variational Gaussian Processes.” In
Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
Shi, Titsias, and Mnih. 2020.
“Sparse Orthogonal Variational Inference for Gaussian Processes.” In
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics.
Solin, and Kok. 2019.
“Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features.” In
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics.
Stein. 2008.
“A Modeling Approach for Large Spatial Datasets.” Journal of the Korean Statistical Society.
Sutherland, and Schneider. 2015.
“On the Error of Random Fourier Features.” In
Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence. UAI’15.
Wilson, Borovitskiy, Terenin, et al. 2020.
“Efficiently Sampling Functions from Gaussian Process Posteriors.” In
Proceedings of the 37th International Conference on Machine Learning.