t-processes, t-distributions



Stochastic processes with Student-t marginals. Much as Student-\(t\) distributions generalise Gaussian distributions, \(t\)-processes generalise Gaussian processes. Another useful member of the family of elliptically contoured distributions.

Multivariate Student-\(t\)

The multivariate \(t\) (MVT) distribution \(\boldsymbol{X} \sim \boldsymbol{t}_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \boldsymbol{v})\), with location \(\boldsymbol{\mu}\), scale matrix \(\boldsymbol{\Sigma}\), and degrees of freedom \(v\), has the probability density function \[ f(\boldsymbol{x})=\frac{\Gamma\{(\boldsymbol{v}+p) / 2\}}{\Gamma(\boldsymbol{v} / 2)(\boldsymbol{v} \pi)^{p / 2}|\boldsymbol{\Sigma}|^{1 / 2}}\left\{1+\boldsymbol{v}^{-1}(\boldsymbol{x}-\boldsymbol{\mu})^{\top} \boldsymbol{\Sigma}^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right\}^{-(v+p) / 2} . \] There is a cool relationship to the multivariate normal: \[ \boldsymbol{X}=\boldsymbol{\mu}+\boldsymbol{\Sigma}^{1 / 2} \boldsymbol{Z} / \sqrt{q}, \] where \(Z\) follows a \(p\) dimensional standard normal distribution, \(q \sim \chi_v^2 / v\), and \(Z\) is independent of \(q\). ( \(W \sim \chi_b^2 / c\) denotes the scaled \(\chi^2\) distribution, with density proportional to \(w^{b / 2-1} e^{-c w / 2}\).) It differs from the multivariate normal distribution \(\mathscr{N}_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})\) only by the random scaling factor \(\sqrt{q}\).

Ding (2016) uses this latter property to show that the conditional distribution of \(\boldsymbol{X}_2\) given \(\boldsymbol{X}_1\) is \[ \boldsymbol{X}_2 \mid \boldsymbol{X}_1 \sim \boldsymbol{t}_{p_2}\left(\boldsymbol{\mu}_{2 \mid 1}, \frac{v+d_1}{v+p_1} \boldsymbol{\Sigma}_{22 \mid 1}, \boldsymbol{v}+p_1\right). \]

t-process regression

There are a couple of classic cases in ML where \(t\)-processes arise, e.g. in Bayes NNs (Neal 1996) or GP literature (9.9 Rasmussen and Williams 2006). Recently there has been an uptick in actual applications of these processes in regression (Chen, Wang, and Gorban 2020; Shah, Wilson, and Ghahramani 2014; Tang et al. 2017; Tracey and Wolpert 2018). See Wilson and Ghahramani (2011) for a Generalized Wishart Process construction that may be helpful? This prior is available in GPyTorch. Recent papers (Shah, Wilson, and Ghahramani 2014; Tracey and Wolpert 2018) make it seem fairly straightforward.

At first blush it looks like it might be a more robust regression model than Gaussian process regression. However, I am not so sure. As Ding (2016) points out, the conditional distribution of \(\boldsymbol{X}_2\) given \(\boldsymbol{X}_1\) jointly $t4-distributed grows eventually linearly in the number of observations sites, which means that it is essentially just Gaussian for even small problems.

Some papers discuss t-process regression in term of inference using Inverse Wishart distribuitons.

Markov t-process

Process with t-distributed increments is in fact a Lévy process, which follows from the fact that the Student-\(t\) distribution is divisible. As far as I can see here Grigelionis (2013) is the definitive collation of results on that observation.

References

Aste, Tomaso. 2021. Stress Testing and Systemic Risk Measures Using Multivariate Conditional Probability.” arXiv.
Bånkestad, Maria, Jens Sjölund, Jalil Taghia, and Thomas Schön. 2020. The Elliptical Processes: A Family of Fat-Tailed Stochastic Processes.” arXiv.
Chen, Zexun, Bo Wang, and Alexander N. Gorban. 2020. Multivariate Gaussian and Student-t Process Regression for Multi-Output Prediction.” Neural Computing and Applications 32 (8): 3005–28.
Ding, Peng. 2016. On the Conditional Distribution of the Multivariate \(t\) Distribution.” arXiv.
Fang, Kai-Tai, Samuel Kotz, and Kai Wang Ng. 2017. Symmetric Multivariate and Related Distributions. Boca Raton: Chapman and Hall/CRC.
Grigelionis, Bronius. 2013. Student’s t-Distribution and Related Stochastic Processes. SpringerBriefs in Statistics. Berlin, Heidelberg: Springer Berlin Heidelberg.
Grosswald, E. 1976. The Student t-Distribution of Any Degree of Freedom Is Infinitely Divisible.” Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete 36 (2): 103–9.
Ismail, Mourad E. H. 1977. Bessel Functions and the Infinite Divisibility of the Student \(t\)- Distribution.” The Annals of Probability 5 (4): 582–85.
Kibria, B M Golam, and Anwar H Joarder. n.d. “A Short Review of Multivariate t-Distribution.”
Lange, Kenneth L., Roderick J. A. Little, and Jeremy M. G. Taylor. 1989. Robust Statistical Modeling Using the t Distribution.” Journal of the American Statistical Association 84 (408): 881–96.
Neal, Radford M. 1996. Bayesian Learning for Neural Networks.” Secaucus, NJ, USA: Springer-Verlag New York, Inc.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press.
Roth, Michael. 2012. On the Multivariate t Distribution. Linköping University Electronic Press.
———. 2013. Kalman Filters for Nonlinear Systems and Heavy-Tailed Noise.”
Shah, Amar, Andrew Wilson, and Zoubin Ghahramani. 2014. Student-t Processes as Alternatives to Gaussian Processes.” In Artificial Intelligence and Statistics, 877–85. PMLR.
Song, Dae-Kun, Hyoung-Jin Park, and Hyoung-Moon Kim. 2014. A Note on the Characteristic Function of Multivariate t Distribution.” Communications for Statistical Applications and Methods 21 (1): 81–91.
Tang, Qingtao, Li Niu, Yisen Wang, Tao Dai, Wangpeng An, Jianfei Cai, and Shu-Tao Xia. 2017. Student-t Process Regression with Student-t Likelihood,” 2822–28.
Tracey, Brendan D., and David H. Wolpert. 2018. Upgrading from Gaussian Processes to Student’s-T Processes.” 2018 AIAA Non-Deterministic Approaches Conference, January.
Wilson, Andrew Gordon, and Zoubin Ghahramani. 2011. Generalised Wishart Processes.” In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 736–44. UAI’11. Arlington, Virginia, United States: AUAI Press.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.