# Learning covariance functions

Learning a family of covariances at once

September 16, 2019 — March 1, 2021

The generalisation of covariance matrix estimation to the case of continuous index sets. This is often seen in the context of Gaussian processes where everything can work out nicely if we are lucky.

## 1 Selecting parametric kernel by maximising marginal likelihood

The goal for most of these is to maximise the marginal posterior likelihood, a.k.a. model evidence, as is conventional in Bayesian ML. But we could also apply hyperpriors to kernels.

## 2 Learning kernel composition

Automating kernel design by some composition of simpler atomic kernels. AFAICT this started from summaries like (Genton 2001) and went via Duvenaud’s aforementioned notes to became a small industry (Lloyd et al. 2014; D. K. Duvenaud, Nickisch, and Rasmussen 2011; D. Duvenaud et al. 2013; Grosse et al. 2012). A prominent example was the Automated statistician project by David Duvenaud, James Robert Lloyd, Roger Grosse and colleagues, which works by greedy combinatorial search over possible compositions.

More fashionable, presumably, are the differentiable search methods. For example, the AutoGP system (Krauth et al. 2016; Bonilla, Krauth, and Dezfouli 2019) incorporates tricks like these to use gradient descent to design kernels for Gaussian processes. (Sun et al. 2018) construct deep networks of composed kernels. I imagine the Deep Gaussian Process literature is also of this kind, but have not read it.

## 3 Via neural nets

🏗

## 4 Hyperkernels

Kernels on kernels, for kernel learning kernels 🏗 (Ong, Smola, and Williamson 2005, 2002; Ong and Smola 2003; Kondor and Jebara 2006).

## 5 References

