As scalar Gaussian processes are to GP regression so are vector Gaussian processes to vector GP regression.

We recall that a classic Gaussian random process/field over some index set \(\mathcal{T}\) is a random function \(f:\mathcal{T}\times\mathcal{T}\to\mathbb{R},\) specified by the (deterministic) functions giving its mean \[ m(t)=\mathbb{E}\{f(t)\} \] and covariance \[ K(s, t)=\mathbb{E}\{(f(s)-m(s))(f(t)-m(t))\}. \]

We can extend this to multivariate Gaussian fields \(\mathcal{T}\times\mathcal{T}\to\mathbb{R}^d\). This exposition follows (Robert J. Adler, Taylor, and Worsley 2016).

For such fields we require that \(\left\langle\alpha, f_{t}\right\rangle\) is a real valued Gaussian field for every \(\alpha \in \mathbb{R}^{d}\).
In this case, the mean function is now \(m:\mathcal{T}\to\mathbb{R}^{d}\) and the covariance function now \(K:\mathcal{T}\times\mathcal{T}\to \mathbb{R}^{d \times d}\) matrices where every \(K(s,t) must be positive definite.\)
In statistical practice we refer to a matrix \(K(s,t)\) by various names, but in my office we talk about *cross-covariance*.
It has the interpretation
\[
K(s, t)=\mathbb{E}\left\{(f(s)-m(s))^{\top}(f(t)-m(t))\right\}.
\]
The individual elements of \(K\) are thus given by
\[
K_{i j}(s, t)=\mathbb{E}\left\{\left(f_{i}(s)-m_{i}(s)\right)\left(f_{j}(t)-m_{j}(t)\right)\right\} .
\]
We now have consider two positive-definitenesses:

- For each fixed pair \(s, t\), the matrix \(K(s, t)\) is positive-definite.
- for each pair \(i, j\), the function \(K_{i j}:\mathcal{T} \times \mathcal{T}\to\mathbb{R}\) is positive-definite.

We know that finding positive-definite scalar-valued functions can be difficult. So finding positive-definite matrix-valued functions feels like it might also be a burden both to our monkey brains and/or even our computation power.

We can come up with other perspectives on this construction. One version of this is given by Eric Perim, Wessel Bruinsma, and Will Tebbutt, in Gaussian Processes: from one to many outputs.

In their version, if we want to model \(p\) outputs over some input space \(\mathcal{T}\) we construct an extended input space: \(\mathcal{T}_{\text {ext }}=\{1, \ldots, p\} \times \mathcal{T}\). Then, the vector GP’s mean function and covariance functions take inputs in the extended space \(m: \mathcal{T}_{\text {ext }} \rightarrow \mathbb{R}\), and \(K: \mathcal{T}_{\text {ext }}^{2} \rightarrow \mathbb{R}.\) i.e. we still have a GP, but with inputs handled in a clever way. This does not, to me, make it easier to define covariance kernels, but it makes them look different.

Both of these constructions are useful in using multi-output kernels in practice. Have a look at some details from a computational perspective over at vector GP regression.

## References

*Random Fields and Geometry*. Springer Monographs in Mathematics 115. New York: Springer.

*Applications of Random Fields and Geometry Draft*.

*Journal of Machine Learning Research*12 (41): 1459–1500.

*Foundations and Trends® in Machine Learning*4 (3): 195–266.

*International Conference on Machine Learning*, 1190–1201. PMLR.

*Handbook of Spatial Statistics*, edited by Alan Gelfand, Peter Diggle, Montserrat Fuentes, and Peter Guttorp, 20103158:495–515. CRC Press.

*arXiv:2004.09455 [Stat]*, April.

*Stochastic Partial Differential Equations: An Introduction*. Springer.

*Advances in Neural Information Processing Systems*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 30:6681–90. Curran Associates, Inc.

*Gaussian Processes for Machine Learning*. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press.

*Journal of Statistical Software*63 (8): 1.

## No comments yet. Why not leave one?