Gaussian Processes as stochastic differential equations

Imposing time on things

🏗️🏗️🏗️ Under heavy construction 🏗️🏗️🏗️

Classic flavours together, Gaussian processes and state filters/ stochastic differential equations and random fields as stochastic differential equations.

Not covered, another concept which includes the same keywords but is distinct: using Gaussian processes to define state process dynamics or observation distribution.

GP regression via state filtering

I am interested in the trick which makes certain Gaussian process regression problems soluble by making them local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. This means you get to solve a GP as an SDE using a state filter.

The GP-filtering trick is explained in intro articles , based on various antecedents , possible also . Aside: O’Hagan (1978) is an incredible paper that invented several research areas at once (GP regression, surrogate models for experiment design as well as this) and AFAICT no one noticed at the time. Also Whittle did some foundational work, but I cannot find the original paper to read it.

The idea is that if your GP covariance kernel is (or can be well approximated by) a rational function then it is possible to factorise it into a tractable state space model, using a duality between random fields and stochastic differential equations. That sounds simple enough conceptually; I wonder about the practice. Of course, when you want some complications, such as non-stationary kernels or hierarchical models, this state space inference trick gets more complicated, and posterior distributions are no longer so simple. But possibly it can still go. (This is a research interest of mine.)

William J. Wilkinson et al. (2020) introduces a computational toolkit and many worked examples of inference algorithms. Cox, van de Laar, and de Vries (2019) looks like it might be solving a similar problem but I do not yet understand their framing.

This complements, perhaps, the trick of fast Gaussian process calculations on lattices.

Nickisch, Solin, and Grigorevskiy (2018) tries to introduce a vocabulary for inference based on this insight, by discussing it in terms of computational primitives

In time-series data, with D = 1, the data sets tend to become long (or unbounded) when observations accumulate over time. For these time-series models, leveraging sequential state space methods from signal processing makes it possible to solve GP inference problems in linear time complexity O(n) if the underlying GP has Markovian structure . This reformulation is exact for Markovian covariance functions (see, e.g., Solin (2016)) such as the exponential, half-integer Matérn, noise, constant, linear, polynomial, Wiener, etc. (and their sums and products).…

While existing literature has focused on the connection between GP regression and state space methods, the computational primitives allowing for inference using general likelihoods in combination with the Laplace approximation (LA), variational Bayes (VB), and assumed density filtering (ADF, a.k.a. single-sweep expectation propagation, EP) schemes has been largely overlooked.… We present a unifying framework for solving computational primitives for non-Gaussian inference schemes in the state space setting, thus directly enabling inference to be done through LA, VB, KL, and ADF/EP.

The following computational primitives allow to cast the covariance approximation in more generic terms: 1. Linear system with “regularized” covariance: $\text { solve }_{\mathbf{K}}(\mathbf{W}, \mathbf{r}):=\left(\mathbf{K}+\mathbf{W}^{-1}\right)^{-1} \mathbf{r}$ 2. Matrix-vector multiplications: $$\operatorname{mvm}_{\mathbf{K}}(\mathbf{r}):=\mathbf{K r}$$. For learning we also need $$\frac{\operatorname{mvm}_{K}(\mathbf{r})}{\partial \theta}$$. 3. Log-determinants: $$\operatorname{ld}_{\mathbf{K}}(\mathbf{W}):=\log |\mathbf{B}|$$ with symmetric and well-conditioned $$\mathbf{B}=\mathbf{I}+\mathbf{W}^{\frac{1}{2}} \mathbf{K} \mathbf{W}^{\frac{1}{2}}$$. For learning, we need derivatives: $$\frac{\partial \operatorname{ld} \mathbf{K}(\mathbf{W})}{\partial \boldsymbol{\theta}}, \frac{\partial \operatorname{ld} \mathbf{K}(\mathbf{W})}{\partial \mathbf{W}}$$ 4. Predictions need latent mean $$\mathbb{E}\left[f_{*}\right]$$ and variance $$\mathbb{V}\left[f_{*}\right]$$.

Using these primitives, GP regression can be compactly written as $$\mathbf{W}=\mathbf{I} / \sigma_{n}^{2}, \boldsymbol{\alpha}=\operatorname{solve}_{\mathbf{K}}(\mathbf{W}, \mathbf{y}-\mathbf{m}),$$ and $$\log Z_{\mathrm{GPR}}=$$ $-\frac{1}{2}\left[\boldsymbol{\alpha}^{\top} \mathrm{mvm}_{\mathrm{K}}(\boldsymbol{\alpha})+\mathrm{ld}_{\mathrm{K}}(\mathbf{W})+n \log \left(2 \pi \sigma_{n}^{2}\right)\right]$ Approximate inference $$(\mathrm{LA}, \mathrm{VB}, \mathrm{KL}, \mathrm{ADF} / \mathrm{EP})-$$ in case of non-Gaussian likelihoods - requires these primitives as necessary building blocks. Depending on the covariance approximation method e.g. exact, sparse, grid-based, or state space, the four primitives differ in their implementation and computational complexity.

Recent works I should also inspect include .

Ambikasaran et al. (2015) seems to be related but not quite the same — it operates time-wise over inputs but then constructs the GP posterior using rank-1 updates.

Latent force models

I am going to argue that some latent force models fit in this classification, if I ever get time to define them .

References

Adam, Vincent, Stefanos Eleftheriadis, Nicolas Durrande, Artem Artemev, and James Hensman. 2020. In AISTATS.
Álvarez, Mauricio A., David Luengo, and Neil D. Lawrence. 2013. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (11): 2693–2705.
Álvarez, Mauricio, David Luengo, and Neil D. Lawrence. 2009. In Artificial Intelligence and Statistics, 9–16. PMLR.
Ambikasaran, Sivaram, Daniel Foreman-Mackey, Leslie Greengard, David W. Hogg, and Michael O’Neil. 2015. arXiv:1403.6015 [Astro-Ph, Stat], April.
Bakka, Haakon, Håvard Rue, Geir-Arne Fuglstad, Andrea Riebler, David Bolin, Janine Illian, Elias Krainski, Daniel Simpson, and Finn Lindgren. 2018. WIREs Computational Statistics 10 (6): e1443.
Bolin, David, Alexandre B. Simas, and Jonas Wallin. 2022. May.
Chang, Paul E, William J Wilkinson, Mohammad Emtiyaz Khan, and Arno Solin. 2020. “Fast Variational Learning in State-Space Gaussian Process Models.” In MLSP, 6.
Cotter, S. L., G. O. Roberts, A. M. Stuart, and D. White. 2013. Statistical Science 28 (3): 424–46.
Cox, Marco, Thijs van de Laar, and Bert de Vries. 2019. International Journal of Approximate Reasoning 104 (January): 185–204.
Cressie, Noel, Tao Shi, and Emily L. Kang. 2010. Journal of Computational and Graphical Statistics 19 (3): 724–45.
Cressie, Noel, and Christopher K. Wikle. 2014. In Wiley StatsRef: Statistics Reference Online. American Cancer Society.
Csató, Lehel, and Manfred Opper. 2002. Neural Computation 14 (3): 641–68.
Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008. In Proceedings of the 25th International Conference on Machine Learning, 192–99. ICML ’08. New York, NY, USA: ACM Press.
Curtain, Ruth F. 1975. SIAM Journal on Control 13 (1): 89–104.
Durrande, Nicolas, Vincent Adam, Lucas Bordeaux, Stefanos Eleftheriadis, and James Hensman. 2019. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, 2780–89. PMLR.
Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017. In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc.
Gilboa, E., Y. Saatçi, and J. P. Cunningham. 2015. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2): 424–36.
Gorad, Ajinkya, Zheng Zhao, and Simo Särkkä. 2020. “Parameter Estimation in Non-Linear State-Space Models by Automatic Differentiation of Non-Linear Kalman Filters.” In, 6.
Grigorievskiy, Alexander, and Juha Karhunen. 2016. In 2016 International Joint Conference on Neural Networks (IJCNN), 3354–63. Vancouver, BC, Canada: IEEE.
Grigorievskiy, Alexander, Neil Lawrence, and Simo Särkkä. 2017. In arXiv:1610.08035 [Stat].
Hartikainen, Jouni, and Simo Särkkä. 2011. “Sequential Inference for Latent Force Models.” In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 311–18. UAI’11. Arlington, Virginia, USA: AUAI Press.
Hartikainen, Jouni, Mari Seppänen, and Simo Särkkä. 2012. In Proceedings of the 29th International Coference on International Conference on Machine Learning, 723–30. ICML’12. Madison, WI, USA: Omnipress.
Hartikainen, J., and S. Särkkä. 2010. In 2010 IEEE International Workshop on Machine Learning for Signal Processing, 379–84. Kittila, Finland: IEEE.
Heaps, Sarah E. 2020. arXiv:2004.09455 [Stat], April.
Hensman, James, Nicolas Durrande, and Arno Solin. 2018. Journal of Machine Learning Research 18 (151): 1–52.
Hildeman, Anders, David Bolin, and Igor Rychlik. 2019. arXiv:1906.00286 [Stat], June.
Hu, Xiangping, and Ingelin Steinsland. 2016. WIREs Computational Statistics 8 (2): 112–25.
Huber, Marco F. 2014. Pattern Recognition Letters 45 (August): 85–91.
Karvonen, Toni, and Simo Särkkä. 2016. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6. Vietri sul Mare, Salerno, Italy: IEEE.
Kuzin, Danil, Le Yang, Olga Isupova, and Lyudmila Mihaylova. 2018. 2018 21st International Conference on Information Fusion (FUSION), July, 39–46.
Lindgren, Finn, David Bolin, and Håvard Rue. 2021. arXiv:2111.01084 [Stat], November.
Lindgren, Finn, and Håvard Rue. 2015. Journal of Statistical Software 63 (i19): 1–25.
Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 423–98.
Liu, Wei, and Michael Röckner. 2015. Stochastic Partial Differential Equations: An Introduction. Springer.
Loeliger, Hans-Andrea, Justin Dauwels, Junli Hu, Sascha Korl, Li Ping, and Frank R. Kschischang. 2007. Proceedings of the IEEE 95 (6): 1295–1322.
Lord, Gabriel J. 2014. An Introduction to Computational Stochastic PDEs. 1st edition. New York, NY, USA: Cambridge University Press.
Mbalawata, Isambi S., Simo Särkkä, and Heikki Haario. 2013. Computational Statistics 28 (3): 1195–1223.
Nickisch, Hannes, Arno Solin, and Alexander Grigorevskiy. 2018. In International Conference on Machine Learning, 3789–98.
O’Hagan, A. 1978. Journal of the Royal Statistical Society: Series B (Methodological) 40 (1): 1–24.
Opitz, Thomas, Raphaël Huser, Haakon Bakka, and Håvard Rue. 2018. Extremes 21 (3): 441–62.
Peluchetti, Stefano, and Stefano Favaro. 2020. In International Conference on Artificial Intelligence and Statistics, 1126–36. PMLR.
Rackauckas, Christopher, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Rohit Supekar, Dominic Skinner, Ali Ramadhan, and Alan Edelman. 2020. arXiv:2001.04385 [Cs, Math, q-Bio, Stat], August.
Reece, S., and S. Roberts. 2010. In 2010 13th International Conference on Information Fusion, 1–9.
Reece, Steven, Siddhartha Ghosh, Alex Rogers, Stephen Roberts, and Nicholas R. Jennings. 2014. The Journal of Machine Learning Research 15 (1): 2337–97.
Remes, Sami, Markus Heinonen, and Samuel Kaski. 2017. In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4642–51. Curran Associates, Inc.
———. 2018. arXiv:1811.10978 [Cs, Stat], November.
Rue, Håvard, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren. 2016. arXiv:1604.00860 [Stat], September.
Saatçi, Yunus. 2012. Ph.D., University of Cambridge.
Särkkä, Simo. 2011. In Artificial Neural Networks and Machine Learning – ICANN 2011, edited by Timo Honkela, Włodzisław Duch, Mark Girolami, and Samuel Kaski, 6792:151–58. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer.
———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3. Cambridge, U.K. ; New York: Cambridge University Press.
Särkkä, Simo, Mauricio A. Álvarez, and Neil D. Lawrence. 2019. IEEE Transactions on Automatic Control 64 (7): 2953–60.
Särkkä, Simo, and Jouni Hartikainen. 2012. In Artificial Intelligence and Statistics.
Särkkä, Simo, A. Solin, and J. Hartikainen. 2013. IEEE Signal Processing Magazine 30 (4): 51–61.
Särkkä, Simo, and Arno Solin. 2019. Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks 10. Cambridge ; New York, NY: Cambridge University Press.
Sigrist, Fabio, Hans R. Künsch, and Werner A. Stahel. 2015a. Application/pdf. Journal of Statistical Software 63 (14).
———. 2015b. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77 (1): 3–33.
Singer, Hermann. 2011. AStA Advances in Statistical Analysis 95 (4): 375–413.
Solin, Arno. 2016. Aalto University.
Solin, Arno, and Simo Särkkä. 2013. Physical Review E 88 (5): 052909.
———. 2014. In Artificial Intelligence and Statistics, 904–12.
———. 2020. Statistics and Computing 30 (2): 419–46.
Tobar, Felipe. 2019. Advances in Neural Information Processing Systems 32: 12749–59.
Tzinis, Efthymios, Zhepei Wang, and Paris Smaragdis. 2020. “Sudo Rm -Rf: Efficient Networks for Universal Audio Source Separation.” In, 6.
Valenzuela, C., and F. Tobar. 2019. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3367–71.
Vandenberg-Rodes, Alexander, and Babak Shahbaba. 2015. arXiv:1502.03466 [Stat], February.
Wilkinson, William J., M. Riis Andersen, J. D. Reiss, D. Stowell, and A. Solin. 2019a. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3352–56.
Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019b. arXiv:1901.11436 [Cs, Eess, Stat], January.
Wilkinson, William J., Paul E. Chang, Michael Riis Andersen, and Arno Solin. 2020. In ICML.
Wilkinson, William J, Paul E Chang, Michael Riis Andersen, and Arno Solin. 2019c. “Global Approximate Inference via Local Linearisation for Temporal Gaussian Processes,” 12.
Wilkinson, William J., Simo Särkkä, and Arno Solin. 2021. arXiv:2111.01721 [Cs, Stat], November.
Wilson, James T., Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2021. Journal of Machine Learning Research 22 (105): 1–47.
Wilson, James, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisenroth. 2020. In Proceedings of the 37th International Conference on Machine Learning, 10292–302. PMLR.
Zammit-Mangion, Andrew, and Christopher K. Wikle. 2020. Spatial Statistics 37 (June): 100408.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.