Basis-functions in Gaussian process regression

a.k.a Fixed Rank Kriging, basis function regression, weight space, spatial random effects



No, these are officers. You want low rank Gaussian processes.

Another way of cunningly chopping up the work of fitting a Gaussian process is to represent the process as a random function comprising basis functions \(\phi=\left(\phi_{1}, \ldots, \phi_{\ell}\right)\) with the Gaussian random weight vector \(w\) so that \[ f^{(w)}(\cdot)=\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot) \quad \boldsymbol{w} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Sigma}_{\boldsymbol{w}}\right). \] \(f^{(w)}\) is a random function satisfying \(\boldsymbol{f}^{(\boldsymbol{w})} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Phi}_{n} \boldsymbol{\Sigma}_{\boldsymbol{w}} \boldsymbol{\Phi}^{\top}\right)\), where \(\boldsymbol{\Phi}_{n}=\boldsymbol{\phi}(\mathbf{X})\) is a \(|\mathbf{X}| \times \ell\) matrix of features. This is referred to as a weight space approach in ML.

TODO: I just assumed centred weights here, but that is crazy. Update to relax that assumption.

We might imagine this representation would be exact if we had countably many basis functions, and under sane conditions it is. We would like to know, further, that we can find a basis such that we need not too many basis functions to represent the process. Looking at the Karhunen-Loève theorem theorem we might imagine that this can sometimes work out fine, and indeed it does, sometimes.

Recent guides to this seem to include Riutort-Mayol et al. (2020) for a recent version based on Laplace operators, and Cressie, Shi, and Kang (2010) for the spatiotemporal context.

Hijinks ensue when selecting the basis functions. If we were to treat the natural Hilbert space here seriously we could consider identifying the bases as eigenfunctions of the kernel. This is not generally easy. We tend to use either global bases such as Fourier bases or more generally Karhunen-Loéve bases, or construct local bases of limited overlap (usually piecewise polynomials AFAICT).

The kernel trick writes a kernel \(k\) as an inner product in a corresponding reproducing kernel Hilbert space (RKHS) \(\mathcal{H}_{k}\) with a feature map \(\varphi: \mathcal{X} \rightarrow \mathcal{H}_{k} .\) In sufficiently nice cases the kernel is well approximated \[ k\left(\boldsymbol{x}, \boldsymbol{x}^{\prime}\right)=\left\langle\varphi(\boldsymbol{x}), \varphi\left(\boldsymbol{x}^{\prime}\right)\right\rangle_{\mathcal{H}_{k}} \approx \boldsymbol{\phi}(\boldsymbol{x})^{\top} \overline{\boldsymbol{\phi}\left(\boldsymbol{x}^{\prime}\right)} \] where \(\boldsymbol{\phi}: \mathcal{X} \rightarrow \mathbb{C}^{\ell}\) is a finite-dimensional feature map. TODO: What is the actual guarantee here?

Fourier features

When the Fourier basis is natural for the problem we are in a pretty good situation. We can use the Wiener Khintchine relations to analyse and simulate the process.

Random fourier features

The random Fourier features method (Rahimi and Recht 2007, 2008) constructs a Monte Carlo estimate to a stationary kernel by representing the inner product in terms of \(\ell\) complex exponential basis functions \(\phi_{j}(\boldsymbol{x})=\ell^{-1 / 2} \exp \left(i \boldsymbol{\omega}_{j}^{\top} \boldsymbol{x}\right)\) with frequency parameters \(\boldsymbol{\omega}_{j}\) sampled proportionally to the spectral density \(\rho\left(\boldsymbol{\omega}_{j}\right).\)

This has a favourable error rate (Sutherland and Schneider 2015).

K-L basis

We recall from the Karhunen-Loéve notebook that the mean-square-optimal \(f^{(w)}\) for approximating a Gaussian process \(f\) is found by truncating the Karhunen-Loéve expansion \[ f(\cdot)=\sum_{i=1}^{\infty} w_{i} \phi_{i}(\cdot) \quad w_{i} \sim \mathcal{N}\left(0, \lambda_{i}\right) \] where \(\phi_{i}\) and \(\lambda_{i}\) are, respectively, the \(i\)-th (orthogonal) eigenfunction and eigenvalue of the covariance operator \(\psi \mapsto \int_{\mathcal{X}} \psi(\boldsymbol{x}) k(\boldsymbol{x}, \cdot) \mathrm{d} \boldsymbol{x}\), written in decreasing order of \(\lambda_{i}\). What is the orthogonal basis \(\{\phi_{i}\}_i\) though? That depends on the problem and can be a lot of work to calculate.

In the case that our field is stationary on a “nice” domain, though, this can easy — we simply have the Fourier features as the natural basis.

Compactly-supported basis functions

As seen in GPs as SDEs and FEMs (Lindgren, Rue, and Lindström 2011; Lord 2014).

“Decoupled” bases

Cheng and Boots (2017); Salimbeni et al. (2018); Shi, Titsias, and Mnih (2020); Wilson et al. (2020)

References

Ambikasaran, Sivaram, Daniel Foreman-Mackey, Leslie Greengard, David W. Hogg, and Michael O’Neil. 2015. Fast Direct Methods for Gaussian Processes.” arXiv:1403.6015 [Astro-Ph, Stat], April.
Cheng, Ching-An, and Byron Boots. 2017. Variational Inference for Gaussian Process Models with Linear Complexity.” In Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc.
Cressie, Noel, and Hsin-Cheng Huang. 1999. Classes of Nonseparable, Spatio-Temporal Stationary Covariance Functions.” Journal of the American Statistical Association 94 (448): 1330–39.
Cressie, Noel, and Gardar Johannesson. 2008. Fixed Rank Kriging for Very Large Spatial Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70 (1): 209–26.
Cressie, Noel, Tao Shi, and Emily L. Kang. 2010. Fixed Rank Filtering for Spatio-Temporal Data.” Journal of Computational and Graphical Statistics 19 (3): 724–45.
Cressie, Noel, and Christopher K. Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics 2.0. John Wiley and Sons.
———. 2014. Space-Time Kalman Filter.” In Wiley StatsRef: Statistics Reference Online. American Cancer Society.
Dahl, Astrid, and Edwin V. Bonilla. 2019. Sparse Grouped Gaussian Processes for Solar Power Forecasting.” arXiv:1903.03986 [Cs, Stat], March.
Finley, Andrew O., Sudipto Banerjee, and Alan E. Gelfand. 2015. spBayes for Large Univariate and Multivariate Point-Referenced Spatio-Temporal Data Models.” Journal of Statistical Software 63 (February): 1–28.
Ghanem, Roger, and P. D. Spanos. 1990. Polynomial Chaos in Stochastic Finite Elements.” Journal of Applied Mechanics 57 (1): 197–202.
Gilboa, E., Y. Saatçi, and J. P. Cunningham. 2015. Scaling Multidimensional Inference for Structured Gaussian Processes.” IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2): 424–36.
Gulian, Mamikon, Ari Frankel, and Laura Swiler. 2020. Gaussian Process Regression Constrained by Boundary Value Problems.” arXiv:2012.11857 [Cs, Math, Stat], December.
Hu, Xiangping, Ingelin Steinsland, Daniel Simpson, Sara Martino, and Håvard Rue. 2013. Spatial Modelling of Temperature and Humidity Using Systems of Stochastic Partial Differential Equations,” July.
Le, Quoc, Tamás Sarlós, and Alex Smola. 2013. Fastfood-Approximating Kernel Expansions in Loglinear Time.” In Proceedings of the International Conference on Machine Learning.
Lei, Huan, Jing Li, Peiyuan Gao, Panos Stinis, and Nathan Baker. 2018. A Data-Driven Framework for Sparsity-Enhanced Surrogates with Arbitrary Mutually Dependent Randomness,” April.
Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 423–98.
Liu, Chong, Surajit Ray, and Giles Hooker. 2014. Functional Principal Components Analysis of Spatially Correlated Data.” arXiv:1411.4681 [Math, Stat], November.
Lord, Gabriel J. 2014. An Introduction to Computational Stochastic PDEs. 1st edition. New York, NY, USA: Cambridge University Press.
Luo, Wuan. 2006. Wiener Chaos Expansion and Numerical Solutions of Stochastic Partial Differential Equations.” Phd, California Institute of Technology.
Nguyen, Hai, Noel Cressie, and Amy Braverman. 2012. Spatial Statistical Data Fusion for Remote Sensing Applications.” Journal of the American Statistical Association 107 (499): 1004–18.
Nowak, W., and A. Litvinenko. 2013. Kriging and Spatial Design Accelerated by Orders of Magnitude: Combining Low-Rank Covariance Approximations with FFT-Techniques.” Mathematical Geosciences 45 (4): 411–35.
O’Hagan, Anthony. 2013. “Polynomial Chaos: A Tutorial and Critique from a Statistician’s Perspective,” 20.
Petra, Noemi, James Martin, Georg Stadler, and Omar Ghattas. 2014. A Computational Framework for Infinite-Dimensional Bayesian Inverse Problems, Part II: Stochastic Newton MCMC with Application to Ice Sheet Flow Inverse Problems.” SIAM Journal on Scientific Computing 36 (4): A1525–55.
Queipo, Nestor V., Raphael T. Haftka, Wei Shyy, Tushar Goel, Rajkumar Vaidyanathan, and P. Kevin Tucker. 2005. Surrogate-Based Analysis and Optimization.” Progress in Aerospace Sciences 41 (1): 1–28.
Rahimi, Ali, and Benjamin Recht. 2007. Random Features for Large-Scale Kernel Machines.” In Advances in Neural Information Processing Systems, 1177–84. Curran Associates, Inc.
———. 2008. Uniform Approximation of Functions with Random Bases.” In 2008 46th Annual Allerton Conference on Communication, Control, and Computing, 555–61.
———. 2009. Weighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning.” In Advances in Neural Information Processing Systems, 1313–20. Curran Associates, Inc.
Riutort-Mayol, Gabriel, Paul-Christian Bürkner, Michael R. Andersen, Arno Solin, and Aki Vehtari. 2020. Practical Hilbert Space Approximate Bayesian Gaussian Processes for Probabilistic Programming.” arXiv:2004.11408 [Stat], April.
Salimbeni, Hugh, Ching-An Cheng, Byron Boots, and Marc Deisenroth. 2018. Orthogonally Decoupled Variational Gaussian Processes.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 31:8725–34. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.
Särkkä, Simo, A. Solin, and J. Hartikainen. 2013. Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” IEEE Signal Processing Magazine 30 (4): 51–61.
Shi, Jiaxin, Michalis Titsias, and Andriy Mnih. 2020. Sparse Orthogonal Variational Inference for Gaussian Processes.” In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, 1932–42. PMLR.
Solin, Arno, and Manon Kok. 2019. Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features.” In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, 2193–2202. PMLR.
Stein, Michael L. 2008. A Modeling Approach for Large Spatial Datasets.” Journal of the Korean Statistical Society 37 (1): 3–10.
Sutherland, Danica J., and Jeff Schneider. 2015. On the Error of Random Fourier Features.” In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 862–71. UAI’15. Arlington, Virginia, USA: AUAI Press.
Wilson, James, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisenroth. 2020. Efficiently Sampling Functions from Gaussian Process Posteriors.” In Proceedings of the 37th International Conference on Machine Learning, 10292–302. PMLR.
Zammit-Mangion, Andrew, and Noel Cressie. 2021. FRK: An R Package for Spatial and Spatio-Temporal Prediction with Large Datasets.” Journal of Statistical Software 98 (May): 1–48.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.