\(\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\mathrm{#1}} \renewcommand{\mmm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\II}[1]{\mathbb{I}\{#1\}} \renewcommand{\inner}[2]{\langle #1,#2\rangle} \renewcommand{\Inner}[2]{\left\langle #1,#2\right\rangle} \renewcommand{\norm}[1]{\| #1\|} \renewcommand{\Norm}[1]{\|\langle #1\right\|} \renewcommand{\argmax}{\mathop{\mathrm{argmax}}} \renewcommand{\argmin}{\mathop{\mathrm{argmin}}} \renewcommand{\omp}{\mathop{\mathrm{OMP}}}\)
Gaussian Processes with a stationary kernel are faster if you are working on a grid of points. I have not used this trick, but I understand it involves various simplifications arising from the structure of Gram matrices which end up being kronecker products of Toeplitz matrices under lexical ordering of the input points (sounds plausible but I have not worked this out on paper). The keyword to highlight this is Kronecker inference in the ML literature (e.g. SaatΓ§i 2012; Flaxman et al. 2015; A. G. Wilson and Nickisch 2015). Another keyword is circulant embedding, the approximation of a Toeplitz matrix by a circulant matrix, which apparently enables one to leverage fast Fourier transforms to calculate some quantities of interest, and some other nice linear algebra properties besides. That would imply that this method is has its origins in Whittle likelihoods (P. Whittle 1953a; Peter Whittle 1952), if I am not mistaken.
Lattice GP methods complements, perhaps, the trick of filtering Gaussian processes, which can also exploit structure lattice inputs, although the setup is different between these.
TBC.
Simulating Gaussian fields with the desired covariance structure
Following the introduction in (Dietrich and Newsam 1993):
Letβs say we wish to generate a stationary Gaussian process \(Y(x)\) on a points \(\Omega\). \(\Omega=(x_0, x_1,\dots, x_m)\).
Stationary in this context means that the covariance function \(r\) is translation-invariance and depend only on distance, so that it may be given \(r(|x|)\). Without loss of generality, we assume that \(\bb E[Y(x)]=0\) and \(\var[Y(x)]=1\).
The problem then reduces to generating a vector \(\vv y=(Y(x_0), Y(x_1), \dots, Y(x_m) )\sim \mathcal{N}(0, R)\) where \(R\) has entries \(R[p,q]=r(|x_p-x_q|).\)
Note that if \(\bb \varepsilon\sim\mathcal{N}(0, I)\) is an \(m+1\)-dimensional normal random variable, and \(AA^T=R\), then \(\vv y=\mm A \vv \varepsilon\) has the required distribution.
The circulant embedding trick
If we have additional structure, we can calculate this more efficiently.
Suppose further that our points form a grid, \(\Omega=(x_0, x_0+h,\dots, x_0+mh)\); specifically, equally-spaced-points on a line.
We know that \(R\) has a Toeplitz structure. Moreover it is non-negative definite, with \(\vv x^t\mm R \vv x \geq 0\forall \vv x.\) (Why?) π
J. T. Wilson et al. (2021) credits the following authors:
Well-known examples of this trend include banded and sparse matrices in the context of one-dimensional Gaussian processes and GaussβMarkov random fields [RueGaussian2005@LoperLineartime2021;Durrande et al. (2019)], as well as Kronecker and Toeplitz matrices when working with regularly-spaced grids X β X (Dietrich and Newsam 1997; Grace Chan and Wood 1997).
Flaxman et al. (2015) deals with the lattice structure trick in a non-Gaussian likelihood setting using Laplace approximation.
In regression
OK, so what does this allow us to do with posterior inference in GPs? Apparently quite a lot. The KISS-GP method (A. G. Wilson and Nickisch 2015) presumably leverages something similar. So do SaatΓ§i (2012) and Flaxman et al. (2015).
TBC
Incoming
- Scaling multi-output Gaussian process models with exact inference
- Time Series β Pyro documentation
- Gaussian Process Latent Variable Model
- Cox Processes (w/ Pyro/GPyTorch Low-Level Interface)
- Latent Function Inference with Pyro + GPyTorch (Low-Level Interface)
- Exact DKL (Deep Kernel Learning) Regression w/ KISS-GP
- Variational GPs w/ Multiple Outputs
- GP Regression with LOVE for Fast Predictive Variances and Sampling
No comments yet. Why not leave one?