Vector Gaussian processses



Adjusting the cross-covariance

As scalar Gaussian processes are to GP regression so are vector Gaussian processes to vector GP regression.

We recall that a classic Gaussian random process/field over some index set \(\mathcal{T}\) is a random function \(f:\mathcal{T}\times\mathcal{T}\to\mathbb{R},\) specified by the (deterministic) functions giving its mean \[ m(t)=\mathbb{E}\{f(t)\} \] and covariance \[ K(s, t)=\mathbb{E}\{(f(s)-m(s))(f(t)-m(t))\}. \]

We can extend this to multivariate Gaussian fields \(\mathcal{T}\times\mathcal{T}\to\mathbb{R}^d\). This exposition follows (Robert J. Adler, Taylor, and Worsley 2016).

For such fields we require that \(\left\langle\alpha, f_{t}\right\rangle\) is a real valued Gaussian field for every \(\alpha \in \mathbb{R}^{d}\). In this case, the mean function is now \(m:\mathcal{T}\to\mathbb{R}^{d}\) and the covariance function now \(K:\mathcal{T}\times\mathcal{T}\to \mathbb{R}^{d \times d}\) matrices where every \(K(s,t) must be positive definite.\) In statistical practice we refer to a matrix \(K(s,t)\) by various names, but in my office we talk about cross-covariance. It has the interpretation \[ K(s, t)=\mathbb{E}\left\{(f(s)-m(s))^{\top}(f(t)-m(t))\right\}. \] The individual elements of \(K\) are thus given by \[ K_{i j}(s, t)=\mathbb{E}\left\{\left(f_{i}(s)-m_{i}(s)\right)\left(f_{j}(t)-m_{j}(t)\right)\right\} . \] We now have consider two positive-definitenesses:

  1. For each fixed pair \(s, t\), the matrix \(K(s, t)\) is positive-definite.
  2. for each pair \(i, j\), the function \(K_{i j}:\mathcal{T} \times \mathcal{T}\to\mathbb{R}\) is positive-definite.

We know that finding positive-definite scalar-valued functions can be difficult. So finding positive-definite matrix-valued functions feels like it might also be a burden both to our monkey brains and/or even our computation power.

We can come up with other perspectives on this construction. One version of this is given by Eric Perim, Wessel Bruinsma, and Will Tebbutt, in Gaussian Processes: from one to many outputs.

In their version, if we want to model \(p\) outputs over some input space \(\mathcal{T}\) we construct an extended input space: \(\mathcal{T}_{\text {ext }}=\{1, \ldots, p\} \times \mathcal{T}\). Then, the vector GP’s mean function and covariance functions take inputs in the extended space \(m: \mathcal{T}_{\text {ext }} \rightarrow \mathbb{R}\), and \(K: \mathcal{T}_{\text {ext }}^{2} \rightarrow \mathbb{R}.\) i.e. we still have a GP, but with inputs handled in a clever way. This does not, to me, make it easier to define covariance kernels, but it makes them look different.

Both of these constructions are useful in using multi-output kernels in practice. Have a look at some details from a computaional perspective over at vector GP regression.

References

Adler, Robert J., and Jonathan E. Taylor. 2007. Random Fields and Geometry. Springer Monographs in Mathematics 115. New York: Springer. https://doi.org/10.1007/978-0-387-48116-6.
Adler, Robert J, Jonathan E Taylor, and Keith J Worsley. 2016. Applications of Random Fields and Geometry Draft. https://robert.net.technion.ac.il/files/2016/08/hrf1.pdf.
Álvarez, Mauricio A., and Neil D. Lawrence. 2011. “Computationally Efficient Convolved Multiple Output Gaussian Processes.” Journal of Machine Learning Research 12 (41): 1459–1500. http://jmlr.org/papers/v12/alvarez11a.html.
Álvarez, Mauricio A., Lorenzo Rosasco, and Neil D. Lawrence. 2012. “Kernels for Vector-Valued Functions: A Review.” Foundations and Trends® in Machine Learning 4 (3): 195–266. https://doi.org/10.1561/2200000036.
Bruinsma, Wessel, Eric Perim, William Tebbutt, Scott Hosking, Arno Solin, and Richard Turner. 2020. “Scalable Exact Inference in Multi-Output Gaussian Processes.” In International Conference on Machine Learning, 1190–1201. PMLR. http://proceedings.mlr.press/v119/bruinsma20a.html.
Gelfand, Alan, and Sudipto Banerjee. 2010. “Multivariate Spatial Process Models.” In Handbook of Spatial Statistics, edited by Alan Gelfand, Peter Diggle, Montserrat Fuentes, and Peter Guttorp, 20103158:495–515. CRC Press. https://doi.org/10.1201/9781420072884-c28.
Heaps, Sarah E. 2020. “Enforcing Stationarity Through the Prior in Vector Autoregressions.” arXiv:2004.09455 [stat], April. http://arxiv.org/abs/2004.09455.
Liu, Wei, and Michael Röckner. 2015. Stochastic Partial Differential Equations: An Introduction. Springer.
Parra, Gabriel, and Felipe Tobar. 2017. “Spectral Mixture Kernels for Multi-Output Gaussian Processes.” In Advances in Neural Information Processing Systems, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 30:6681–90. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/333cb763facc6ce398ff83845f224d62-Paper.pdf.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press. http://www.gaussianprocess.org/gpml/.
Schlather, Martin, Alexander Malinowski, Peter J. Menck, Marco Oesting, and Kirstin Strokorb. 2015. “Analysis, Simulation and Prediction of Multivariate Random Fields with Package Random Fields.” Journal of Statistical Software 63 (8): 1. https://doi.org/10.18637/jss.v063.i08.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.