t-processes, t-distributions
2021-11-09 — 2021-11-24
Suspiciously similar content
Stochastic processes with Student-t marginals. Much as Student-\(t\) distributions generalize Gaussian distributions, \(t\)-processes generalize Gaussian processes, useful in the regression setting. Another useful member of the family of elliptically contoured distributions. Another fun generalization of the Gaussian process is the q-exponential process.
I was exploring the implications of \(t\)-processes in various contexts, particularly in regression settings. My conclusion is that this is essentially a curiosity, as it converges very rapidly to the Gaussian process as the number of observations increases, so you may as well use a Gaussian. The other elliptic extension might be more intersting, or a variational Gaussian process with a non-Gaussian likelihood, depending on the application.
1 Multivariate Student-\(t\)
The multivariate \(t\) (MVT) distribution \(\boldsymbol{X} \sim \boldsymbol{t}_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \boldsymbol{v})\), with location \(\boldsymbol{\mu}\), scale matrix \(\boldsymbol{\Sigma}\), and degrees of freedom \(v\), has the probability density function \[ f(\boldsymbol{x})=\frac{\Gamma\{(\boldsymbol{v}+p) / 2\}}{\Gamma(\boldsymbol{v} / 2)(\boldsymbol{v} \pi)^{p / 2}|\boldsymbol{\Sigma}|^{1 / 2}}\left\{1+\boldsymbol{v}^{-1}(\boldsymbol{x}-\boldsymbol{\mu})^{\top} \boldsymbol{\Sigma}^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right\}^{-(v+p) / 2} . \] There is a cool relationship to the multivariate normal: \[ \boldsymbol{X}=\boldsymbol{\mu}+\boldsymbol{\Sigma}^{1 / 2} \boldsymbol{Z} / \sqrt{q}, \] where \(Z\) follows a \(p\) dimensional standard normal distribution, \(q \sim \chi_v^2 / v\), and \(Z\) is independent of \(q\). (\(W \sim \chi_b^2 / c\) denotes the scaled \(\chi^2\) distribution, with density proportional to \(w^{b / 2-1} e^{-c w / 2}.)\) It differs from the multivariate normal distribution \(\mathscr{N}_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})\) only by the random scaling factor \(\sqrt{q}\).
Ding (2016) uses this latter property to show that the conditional distribution of \(\boldsymbol{X}_2\) given \(\boldsymbol{X}_1\) is \[ \boldsymbol{X}_2 \mid \boldsymbol{X}_1 \sim \boldsymbol{t}_{p_2}\left(\boldsymbol{\mu}_{2 \mid 1}, \frac{v+d_1}{v+p_1} \boldsymbol{\Sigma}_{22 \mid 1}, \boldsymbol{v}+p_1\right). \]
2 t-process regression
There are a couple of classic cases in ML where \(t\)-processes arise, e.g. in Bayes NNs (Neal 1996) or GP literature (9.9 Rasmussen and Williams 2006). Recently there has been an uptick in actual applications of these processes in regression (Chen, Wang, and Gorban 2020; Shah, Wilson, and Ghahramani 2014; Tang et al. 2017; Tracey and Wolpert 2018). See Wilson and Ghahramani (2011) for a Generalized Wishart Process construction that may be helpful? This prior is available in GPyTorch. Recent papers (Shah, Wilson, and Ghahramani 2014; Tracey and Wolpert 2018) make it seem fairly straightforward.
At first blush it looks like it might be a more robust regression model than Gaussian process regression. However, I upon reflection I’m not at all sold on the idea.
As Ding (2016) points out, the conditional distribution of \(\boldsymbol{X}_2\) given \(\boldsymbol{X}_1\) jointly \(t\)-distributed where the degrees of freedom \(\nu\) grows eventually linearly in the number of observation sites; which is relatively intuitive. This means that it is essentially just Gaussian for even small problems.
Some papers discuss \(t\)-process regression in terms of inference using Inverse Wishart distributions.
3 Markov t-process
Process with \(t\)-distributed increments is in fact a Lévy process, which follows from the fact that the Student-\(t\) distribution is divisible. As far as I can see here Grigelionis (2013) is the definitive collation of results on that observation.