Gaussian process inference by gradient descent



\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\dif}{\backslash} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]

Notoriously, GP regression scales badly with dataset size, requiring us to invert a matrix full of observation covariances. But inverting a matrix is just solving a least square optimisation, when you think about it. So can we solve it by gradient descent and have it somehow come out cheaper? Maybe.

References

Chen, Hao, Lili Zheng, Raed Al Kontar, and Garvesh Raskutti. 2020. “Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes.” In Proceedings of the 34th International Conference on Neural Information Processing Systems, 2722–33. NIPS’20. Red Hook, NY, USA: Curran Associates Inc.
Gardner, Jacob R., Geoff Pleiss, David Bindel, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 31:7587–97. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.
Hensman, James, Nicolò Fusi, and Neil D. Lawrence. 2013. Gaussian Processes for Big Data.” In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, 282–90. UAI’13. Arlington, Virginia, USA: AUAI Press.
Minh, Hà Quang. 2022. Finite Sample Approximations of Exact and Entropic Wasserstein Distances Between Covariance Operators and Gaussian Processes.” SIAM/ASA Journal on Uncertainty Quantification, February, 96–124.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.