Vector Gaussian processses

December 2, 2020 — August 16, 2021

Gaussian
Hilbert space
kernel tricks
regression
spatial
stochastic processes
time series
Figure 1: Adjusting the cross-covariance

As scalar Gaussian processes are to GP regression so are vector Gaussian processes to vector GP regression.

We recall that a classic Gaussian random process/field over some index set \(\mathcal{T}\) is a random function \(f:\mathcal{T}\times\mathcal{T}\to\mathbb{R},\) specified by the (deterministic) functions giving its mean \[ m(t)=\mathbb{E}\{f(t)\} \] and covariance \[ K(s, t)=\mathbb{E}\{(f(s)-m(s))(f(t)-m(t))\}. \]

We can extend this to multivariate Gaussian fields \(\mathcal{T}\times\mathcal{T}\to\mathbb{R}^d\). This exposition follows (Robert J. Adler, Taylor, and Worsley 2016).

For such fields we require that \(\left\langle\alpha, f_{t}\right\rangle\) is a real-valued Gaussian field for every \(\alpha \in \mathbb{R}^{d}\). In this case, the mean function is now \(m:\mathcal{T}\to\mathbb{R}^{d}\) and the covariance function now \(K:\mathcal{T}\times\mathcal{T}\to \mathbb{R}^{d \times d}\) matrices where every \(K(s,t)\) must be positive definite. In statistical practice we refer to a matrix \(K(s,t)\) by various names, but in my office we talk about cross-covariance. It has the interpretation \[ K(s, t)=\mathbb{E}\left\{(f(s)-m(s))^{\top}(f(t)-m(t))\right\}. \] The individual elements of \(K\) are thus given by \[ K_{i j}(s, t)=\mathbb{E}\left\{\left(f_{i}(s)-m_{i}(s)\right)\left(f_{j}(t)-m_{j}(t)\right)\right\} . \] We now have to consider two positive-definitenesses:

  1. For each fixed pair \(s, t\), the matrix \(K(s, t)\) is positive-definite.
  2. For each pair \(i, j\), the function \(K_{i j}:\mathcal{T} \times \mathcal{T}\to\mathbb{R}\) is positive-definite.

We know that finding positive-definite scalar-valued functions can be difficult. So finding positive-definite matrix-valued functions feels like it might also be a burden both to our monkey brains and/or even our computation power.

We can come up with other perspectives on this construction. One version of this is given by Eric Perim, Wessel Bruinsma, and Will Tebbutt, in Gaussian Processes: from one to many outputs.

In their version, if we want to model \(p\) outputs over some input space \(\mathcal{T}\) we construct an extended input space: \(\mathcal{T}_{\text {ext }}=\{1, \ldots, p\} \times \mathcal{T}\). Then, the vector GP’s mean function and covariance functions take inputs in the extended space \(m: \mathcal{T}_{\text {ext }} \rightarrow \mathbb{R}\), and \(K: \mathcal{T}_{\text {ext }}^{2} \rightarrow \mathbb{R}.\) i.e. we still have a GP, but with inputs handled in a clever way. This does not, to me, make it easier to define covariance kernels, but it makes them look different.

Both of these constructions are useful in using multi-output kernels in practice. Have a look at some details from a computational perspective over at vector GP regression.

1 References

Adler, Robert J., and Taylor. 2007. Random Fields and Geometry. Springer Monographs in Mathematics 115.
Adler, Robert J, Taylor, and Worsley. 2016. Applications of Random Fields and Geometry Draft.
Álvarez, and Lawrence. 2011. Computationally Efficient Convolved Multiple Output Gaussian Processes.” Journal of Machine Learning Research.
Álvarez, Rosasco, and Lawrence. 2012. Kernels for Vector-Valued Functions: A Review.” Foundations and Trends® in Machine Learning.
Bruinsma, Perim, Tebbutt, et al. 2020. Scalable Exact Inference in Multi-Output Gaussian Processes.” In International Conference on Machine Learning.
Gelfand, and Banerjee. 2010. Multivariate Spatial Process Models.” In Handbook of Spatial Statistics.
Heaps. 2020. Enforcing Stationarity Through the Prior in Vector Autoregressions.” arXiv:2004.09455 [Stat].
Liu, and Röckner. 2015. Stochastic Partial Differential Equations: An Introduction.
Parra, and Tobar. 2017. Spectral Mixture Kernels for Multi-Output Gaussian Processes.” In Advances in Neural Information Processing Systems.
Rasmussen, and Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning.
Schlather, Malinowski, Menck, et al. 2015. Analysis, Simulation and Prediction of Multivariate Random Fields with Package Random Fields.” Journal of Statistical Software.