Gaussian Processes as stochastic differential equations

Imposing time on things

September 18, 2019 — November 25, 2021

algebra

approximation

Bayes

distributed

dynamical systems

Gaussian

generative

graphical models

Hilbert space

kernel tricks

linear algebra

machine learning

networks

optimization

probability

sciml

signal processing

state space models

statistics

stochastic processes

time series

🏗️🏗️🏗️ Under heavy construction 🏗️🏗️🏗️

Classic flavours together, Gaussian processes and state filters/ stochastic differential equations and random fields as stochastic differential equations.

Not covered, another concept which includes the same keywords but is distinct: using Gaussian processes to define state process dynamics or observation distribution.

1 GP regression via state filtering

I am interested in the trick which makes certain Gaussian process regression problems soluble by making them local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. This means you get to solve a GP as an SDE using a state filter.

The GP-filtering trick is explained in intro articles (Särkkä, Solin, and Hartikainen 2013; Lindgren, Bolin, and Rue 2021), based on various antecedents (O’Hagan 1978; S. Reece and Roberts 2010; Lindgren, Rue, and Lindström 2011; Särkkä and Hartikainen 2012; J. Hartikainen and Särkkä 2010; Solin 2016; Huber 2014), possible also (Csató and Opper 2002). Aside: O’Hagan (1978) is an incredible paper that invented several research areas at once (GP regression, surrogate models for experiment design as well as this) and AFAICT no one noticed at the time. Also Whittle did some foundational work, but I cannot find the original paper to read it.

The idea is that if your GP covariance kernel is (or can be well approximated by) a rational function then it is possible to factorise it into a tractable state space model, using a duality between random fields and stochastic differential equations. That sounds simple enough conceptually; I wonder about the practice. Of course, when you want some complications, such as non-stationary kernels or hierarchical models, this state space inference trick gets more complicated, and posterior distributions are no longer so simple. But possibly it can still go. (This is a research interest of mine.)

William J. Wilkinson et al. (2020) introduces a computational toolkit and many worked examples of inference algorithms. Cox, van de Laar, and de Vries (2019) looks like it might be solving a similar problem but I do not yet understand their framing.

This complements, perhaps, the trick of fast Gaussian process calculations on lattices.

Nickisch, Solin, and Grigorevskiy (2018) tries to introduce a vocabulary for inference based on this insight, by discussing it in terms of computational primitives

In time-series data, with D = 1, the data sets tend to become long (or unbounded) when observations accumulate over time. For these time-series models, leveraging sequential state space methods from signal processing makes it possible to solve GP inference problems in linear time complexity O(n) if the underlying GP has Markovian structure (S. Reece and Roberts 2010; J. Hartikainen and Särkkä 2010). This reformulation is exact for Markovian covariance functions (see, e.g., Solin (2016)) such as the exponential, half-integer Matérn, noise, constant, linear, polynomial, Wiener, etc. (and their sums and products).…

While existing literature has focused on the connection between GP regression and state space methods, the computational primitives allowing for inference using general likelihoods in combination with the Laplace approximation (LA), variational Bayes (VB), and assumed density filtering (ADF, a.k.a. single-sweep expectation propagation, EP) schemes has been largely overlooked.… We present a unifying framework for solving computational primitives for non-Gaussian inference schemes in the state space setting, thus directly enabling inference to be done through LA, VB, KL, and ADF/EP.

The following computational primitives allow to cast the covariance approximation in more generic terms: 1. Linear system with “regularized” covariance: \[ \text { solve }_{\mathbf{K}}(\mathbf{W}, \mathbf{r}):=\left(\mathbf{K}+\mathbf{W}^{-1}\right)^{-1} \mathbf{r} \] 2. Matrix-vector multiplications: \(\operatorname{mvm}_{\mathbf{K}}(\mathbf{r}):=\mathbf{K r}\). For learning we also need \(\frac{\operatorname{mvm}_{K}(\mathbf{r})}{\partial \theta}\). 3. Log-determinants: \(\operatorname{ld}_{\mathbf{K}}(\mathbf{W}):=\log |\mathbf{B}|\) with symmetric and well-conditioned \(\mathbf{B}=\mathbf{I}+\mathbf{W}^{\frac{1}{2}} \mathbf{K} \mathbf{W}^{\frac{1}{2}}\). For learning, we need derivatives: \(\frac{\partial \operatorname{ld} \mathbf{K}(\mathbf{W})}{\partial \boldsymbol{\theta}}, \frac{\partial \operatorname{ld} \mathbf{K}(\mathbf{W})}{\partial \mathbf{W}}\) 4. Predictions need latent mean \(\mathbb{E}\left[f_{*}\right]\) and variance \(\mathbb{V}\left[f_{*}\right]\).

Using these primitives, GP regression can be compactly written as \(\mathbf{W}=\mathbf{I} / \sigma_{n}^{2}, \boldsymbol{\alpha}=\operatorname{solve}_{\mathbf{K}}(\mathbf{W}, \mathbf{y}-\mathbf{m}),\) and \(\log Z_{\mathrm{GPR}}=\) \[ -\frac{1}{2}\left[\boldsymbol{\alpha}^{\top} \mathrm{mvm}_{\mathrm{K}}(\boldsymbol{\alpha})+\mathrm{ld}_{\mathrm{K}}(\mathbf{W})+n \log \left(2 \pi \sigma_{n}^{2}\right)\right] \] Approximate inference \((\mathrm{LA}, \mathrm{VB}, \mathrm{KL}, \mathrm{ADF} / \mathrm{EP})-\) in case of non-Gaussian likelihoods - requires these primitives as necessary building blocks. Depending on the covariance approximation method e.g. exact, sparse, grid-based, or state space, the four primitives differ in their implementation and computational complexity.

Recent works I should also inspect include (Chang et al. 2020; Gorad, Zhao, and Särkkä 2020; Nickisch, Solin, and Grigorevskiy 2018; Remes, Heinonen, and Kaski 2018; Solin 2016; William J. Wilkinson et al. 2019a; William J. Wilkinson et al. 2019c; Karvonen and Särkkä 2016).

Ambikasaran et al. (2015) seems to be related but not quite the same — it operates time-wise over inputs but then constructs the GP posterior using rank-1 updates.

2 Spatio-temporal usage

Solving PDEs by filtering! (Cressie and Wikle 2014; Karvonen and Särkkä 2016; Lindgren, Rue, and Lindström 2011; Särkkä and Hartikainen 2012; Särkkä, Solin, and Hartikainen 2013; Solin and Särkkä 2013; Solin 2016; Zammit-Mangion and Wikle 2020)

3 Latent force models

I am going to argue that some latent force models fit in this classification, if I ever get time to define them (M. Álvarez, Luengo, and Lawrence 2009; M. A. Álvarez, Luengo, and Lawrence 2013; Jouni Hartikainen and Särkkä 2011; Jouni Hartikainen, Seppänen, and Särkkä 2012; Steven Reece et al. 2014; Särkkä, Álvarez, and Lawrence 2019).

4 Miscellaneous notes towards implementation

Julia’s DynamicPolynomials.jl implements the MultivariatePolynomials, appears to be differentiable and handles rational polynomials.
TemporalGPs.jl, introduced by Will Tebbutt, is a julia implementation of this.
The abstract ForneyLab.jl system might relate to this, behind its abstruse framing. Cox, van de Laar, and de Vries (2019)
https://pyro.ai/examples/dkl.html
https://pyro.ai/examples/gp.html
https://pyro.ai/examples/ekf.html
https://julialang.org/blog/2019/01/fluxdiffeq
https://github.com/FluxML/model-zoo/tree/master/contrib/diffeq
http://pyro.ai/examples/gplvm.html
http://pyro.ai/examples/dmm.html
http://docs.pyro.ai/en/stable/contrib.gp.html
https://nbviewer.jupyter.org/github/SheffieldML/notebook/blob/master/GPy/index.ipynb
https://blog.dominodatalab.com/fitting-gaussian-process-models-python/

5 References

Adam, Eleftheriadis, Durrande, et al. 2020. “Doubly Sparse Variational Gaussian Processes.” In AISTATS.

Álvarez, Mauricio, Luengo, and Lawrence. 2009. “Latent Force Models.” In Artificial Intelligence and Statistics.

Álvarez, Mauricio A., Luengo, and Lawrence. 2013. “Linear Latent Force Models Using Gaussian Processes.” IEEE Transactions on Pattern Analysis and Machine Intelligence.

Ambikasaran, Foreman-Mackey, Greengard, et al. 2015. “Fast Direct Methods for Gaussian Processes.” arXiv:1403.6015 [Astro-Ph, Stat].

Bakka, Rue, Fuglstad, et al. 2018. “Spatial Modeling with R-INLA: A Review.” WIREs Computational Statistics.

Bolin. 2016. Models and Methods for Random Fields in Spatial Statistics with Computational Efficiency from Markov Properties.

Bolin, Simas, and Wallin. 2022. “Gaussian Whittle-Matérn Fields on Metric Graphs.”

Chang, Wilkinson, Khan, et al. 2020. “Fast Variational Learning in State-Space Gaussian Process Models.” In MLSP.

Cotter, Roberts, Stuart, et al. 2013. “MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster.” Statistical Science.

Cox, van de Laar, and de Vries. 2019. “A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms.” International Journal of Approximate Reasoning.

Cressie, Shi, and Kang. 2010. “Fixed Rank Filtering for Spatio-Temporal Data.” Journal of Computational and Graphical Statistics.

Cressie, and Wikle. 2014. “Space-Time Kalman Filter.” In Wiley StatsRef: Statistics Reference Online.

Csató, and Opper. 2002. “Sparse On-Line Gaussian Processes.” Neural Computation.

Cunningham, Shenoy, and Sahani. 2008. “Fast Gaussian Process Methods for Point Process Intensity Estimation.” In Proceedings of the 25th International Conference on Machine Learning. ICML ’08.

Curtain. 1975. “Infinite-Dimensional Filtering.” SIAM Journal on Control.

Dowling, Sokół, and Park. 2021. “Hida-Matérn Kernel.”

Durrande, Adam, Bordeaux, et al. 2019. “Banded Matrix Operators for Gaussian Markov Models in the Automatic Differentiation Era.” In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics.

Eleftheriadis, Nicholson, Deisenroth, et al. 2017. “Identification of Gaussian Process State Space Models.” In Advances in Neural Information Processing Systems 30.

Gilboa, Saatçi, and Cunningham. 2015. “Scaling Multidimensional Inference for Structured Gaussian Processes.” IEEE Transactions on Pattern Analysis and Machine Intelligence.

Gorad, Zhao, and Särkkä. 2020. “Parameter Estimation in Non-Linear State-Space Models by Automatic Differentiation of Non-Linear Kalman Filters.” In.

Grigorievskiy, and Karhunen. 2016. “Gaussian Process Kernels for Popular State-Space Time Series Models.” In 2016 International Joint Conference on Neural Networks (IJCNN).

Grigorievskiy, Lawrence, and Särkkä. 2017. “Parallelizable Sparse Inverse Formulation Gaussian Processes (SpInGP).” In arXiv:1610.08035 [Stat].

Hartikainen, J., and Särkkä. 2010. “Kalman Filtering and Smoothing Solutions to Temporal Gaussian Process Regression Models.” In 2010 IEEE International Workshop on Machine Learning for Signal Processing.

Hartikainen, Jouni, and Särkkä. 2011. “Sequential Inference for Latent Force Models.” In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. UAI’11.

Hartikainen, Jouni, Seppänen, and Särkkä. 2012. “State-Space Inference for Non-Linear Latent Force Models with Application to Satellite Orbit Prediction.” In Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12.

Heaps. 2020. “Enforcing Stationarity Through the Prior in Vector Autoregressions.” arXiv:2004.09455 [Stat].

Hensman, Durrande, and Solin. 2018. “Variational Fourier Features for Gaussian Processes.” Journal of Machine Learning Research.

Hildeman, Bolin, and Rychlik. 2019. “Joint Spatial Modeling of Significant Wave Height and Wave Period Using the SPDE Approach.” arXiv:1906.00286 [Stat].

Huber. 2014. “Recursive Gaussian Process: On-Line Regression and Learning.” Pattern Recognition Letters.

Hu, and Steinsland. 2016. “Spatial Modeling with System of Stochastic Partial Differential Equations.” WIREs Computational Statistics.

Karvonen, and Särkkä. 2016. “Approximate State-Space Gaussian Processes via Spectral Transformation.” In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

Kavčić, and Moura. 2000. “Matrices with Banded Inverses: Inversion Algorithms and Factorization of Gauss-Markov Processes.” IEEE Transactions on Information Theory.

Kuzin, Yang, Isupova, et al. 2018. “Ensemble Kalman Filtering for Online Gaussian Process Regression and Learning.” 2018 21st International Conference on Information Fusion (FUSION).

Lemercier, Salvi, Cass, et al. 2021. “SigGPDE: Scaling Sparse Gaussian Processes on Sequential Data.” In Proceedings of the 38th International Conference on Machine Learning.

Lindgren, Bolin, and Rue. 2021. “The SPDE Approach for Gaussian and Non-Gaussian Fields: 10 Years and Still Running.” arXiv:2111.01084 [Stat].

Lindgren, and Rue. 2015. “Bayesian Spatial Modelling with R-INLA.” Journal of Statistical Software.

Lindgren, Rue, and Lindström. 2011. “An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Liu, and Röckner. 2015. Stochastic Partial Differential Equations: An Introduction.

Loeliger, Dauwels, Hu, et al. 2007. “The Factor Graph Approach to Model-Based Signal Processing.” Proceedings of the IEEE.

Lord, Powell, and Shardlow. 2014. An Introduction to Computational Stochastic PDEs. Cambridge Texts in Applied Mathematics.

Mbalawata, Särkkä, and Haario. 2013. “Parameter Estimation in Stochastic Differential Equations with Markov Chain Monte Carlo and Non-Linear Kalman Filtering.” Computational Statistics.

Nickisch, Solin, and Grigorevskiy. 2018. “State Space Gaussian Processes with Non-Gaussian Likelihood.” In International Conference on Machine Learning.

O’Hagan. 1978. “Curve Fitting and Optimal Design for Prediction.” Journal of the Royal Statistical Society: Series B (Methodological).

Opitz, Huser, Bakka, et al. 2018. “INLA Goes Extreme: Bayesian Tail Regression for the Estimation of High Spatio-Temporal Quantiles.” Extremes.

Peluchetti, and Favaro. 2020. “Infinitely Deep Neural Networks as Diffusion Processes.” In International Conference on Artificial Intelligence and Statistics.

Rackauckas, Ma, Martensen, et al. 2020. “Universal Differential Equations for Scientific Machine Learning.” arXiv.org.

Reece, Steven, Ghosh, Rogers, et al. 2014. “Efficient State-Space Inference of Periodic Latent Force Models.” The Journal of Machine Learning Research.

Reece, S., and Roberts. 2010. “An Introduction to Gaussian Processes for the Kalman Filter Expert.” In 2010 13th International Conference on Information Fusion.

Remes, Heinonen, and Kaski. 2017. “Non-Stationary Spectral Kernels.” In Advances in Neural Information Processing Systems 30.

———. 2018. “Neural Non-Stationary Spectral Kernel.” arXiv:1811.10978 [Cs, Stat].

Rue, Riebler, Sørbye, et al. 2016. “Bayesian Computing with INLA: A Review.” arXiv:1604.00860 [Stat].

Saatçi. 2012. “Scalable inference for structured Gaussian process models.”

Särkkä. 2011. “Linear Operators and Stochastic Partial Differential Equations in Gaussian Process Regression.” In Artificial Neural Networks and Machine Learning – ICANN 2011. Lecture Notes in Computer Science.

———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3.

Särkkä, Álvarez, and Lawrence. 2019. “Gaussian Process Latent Force Models for Learning and Stochastic Control of Physical Systems.” IEEE Transactions on Automatic Control.

Särkkä, and Hartikainen. 2012. “Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression.” In Artificial Intelligence and Statistics.

Särkkä, and Solin. 2019. Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks 10.

Särkkä, Solin, and Hartikainen. 2013. “Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” IEEE Signal Processing Magazine.

Sidén. 2020. Scalable Bayesian Spatial Analysis with Gaussian Markov Random Fields. Linköping Studies in Statistics.

Sigrist, Künsch, and Stahel. 2015a. “Spate : An R Package for Spatio-Temporal Modeling with a Stochastic Advection-Diffusion Process.” Application/pdf. Journal of Statistical Software.

———. 2015b. “Stochastic Partial Differential Equation Based Modelling of Large Space-Time Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Singer. 2011. “Continuous-Discrete State-Space Modeling of Panel Data with Nonlinear Filter Algorithms.” AStA Advances in Statistical Analysis.

Solin. 2016. “Stochastic Differential Equation Methods for Spatio-Temporal Gaussian Process Regression.”

Solin, and Särkkä. 2013. “Infinite-Dimensional Bayesian Filtering for Detection of Quasiperiodic Phenomena in Spatiotemporal Data.” Physical Review E.

———. 2014. “Explicit Link Between Periodic Covariance Functions and State Space Models.” In Artificial Intelligence and Statistics.

———. 2020. “Hilbert Space Methods for Reduced-Rank Gaussian Process Regression.” Statistics and Computing.

Tobar. 2019. “Band-Limited Gaussian Processes: The Sinc Kernel.” Advances in Neural Information Processing Systems.

Tzinis, Wang, and Smaragdis. 2020. “Sudo Rm -Rf: Efficient Networks for Universal Audio Source Separation.” In.

Valenzuela, and Tobar. 2019. “Low-Pass Filtering as Bayesian Inference.” In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Vandenberg-Rodes, and Shahbaba. 2015. “Dependent Matérn Processes for Multivariate Time Series.” arXiv:1502.03466 [Stat].

Wilkinson, William J., Andersen, Reiss, et al. 2019a. “End-to-End Probabilistic Inference for Nonstationary Audio Analysis.” arXiv:1901.11436 [Cs, Eess, Stat].

Wilkinson, William J., Andersen, Reiss, et al. 2019b. “Unifying Probabilistic Models for Time-Frequency Analysis.” In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Wilkinson, William J, Chang, Andersen, et al. 2019c. “Global Approximate Inference via Local Linearisation for Temporal Gaussian Processes.”

Wilkinson, William J., Chang, Andersen, et al. 2020. “State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes.” In ICML.

Wilkinson, William J., Särkkä, and Solin. 2021. “Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees.”

Wilson, Borovitskiy, Terenin, et al. 2020. “Efficiently Sampling Functions from Gaussian Process Posteriors.” In Proceedings of the 37th International Conference on Machine Learning.

Wilson, Borovitskiy, Terenin, et al. 2021. “Pathwise Conditioning of Gaussian Processes.” Journal of Machine Learning Research.

Zammit-Mangion, and Wikle. 2020. “Deep Integro-Difference Equation Models for Spatio-Temporal Forecasting.” Spatial Statistics.