Classic flavours together, Gaussian processes and state filters/ stochastic differential equations.

I am interested here in the trick which makes certain Gaussian process regression problems soluble by making them local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. This means you get to solve a GP as an SDE.

Not covered here: There is another concept which includes the same keywords but is distinct: using Gaussian processes to define state process dynamics or observation distribution. I have no use for that at the moment.

This trick is explained in an intro article in Särkkä, Solin, and Hartikainen (2013), based on previous work (O’Hagan 1978; Reece and Roberts 2010; Lindgren, Rue, and Lindström 2011; Särkkä and Hartikainen 2012; Hartikainen and Särkkä 2010; Solin 2016; Huber 2014), possible also (Csató and Opper 2002). Aside: O’Hagan (1978) is an incredible paper that invented several research areas at once (GP regression, surrogate experiemnt design as well as this) and AFAICT no one noticed at the time.

Recent works I should also check here include [Karvonen and Särkkä (2016);Nickisch, Solin, and Grigorevskiy (2018);Chang et al. (2020);].

The idea is that if your covariance kernel is, or can be well approximated by a rational function then it is possible to factorise it into a state space model tractably, which makes it cheap due to the favourable properties of such models. That sounds simple enough conceptually; I wonder about the practice. Of course, when you want some some complications, such as non-stationary kernels or hierarchical models, this state space indererence trick gets more complicated, and posterior distributions are no longer so simple,

Nickisch, Solin, and Grigorevskiy (2018) tries to introduce a vocabulary for inference based on this insight, by discussing it in terms of *computational primitives*

In time-series data, with D = 1, the data sets tend to become long (or unbounded) when observations accumulate over time. For these time-series models, leveraging sequential state space methods from signal processing makes it possible to solve GP inference problems in linear time com- plexity O(n) if the underlying GP has Markovian structure (Reece and Roberts 2010; Hartikainen and Särkkä 2010). This reformulation is exact for Markovian covariance functions (see, e.g., Solin (2016)) such as the exponential, half-integer Matérn, noise, constant, linear, polynomial, Wiener, etc. (and their sums and products).…

William J. Wilkinson et al. (2020) introduces a computational toolkit and many worked examples of inferene algorithms. Cox, van de Laar, and de Vries (2019) looks like it might be solving a similar problem but I do not yet understand their framing.

This complements, perhaps, the trick of fast Gaussian process calculations on lattices.

There is a good attempt to harmonise the language and make this general in Nickisch, Solin, and Grigorevskiy (2018) by discussing *computational primitives*

In time-series data, with D = 1, the data sets tend to become long (or unbounded) when observations accumulate over time. For these time-series models, leveraging sequential state space methods from signal processing makes it possible to solve GP inference problems in linear time com- plexity O(n) if the underlying GP has Markovian structure (Reece and Roberts 2010; Hartikainen and Särkkä 2010). This reformulation is exact for Markovian covariance functions (see, e.g., Solin (2016)) such as the exponential, half-integer Matérn, noise, constant, linear, polynomial, Wiener, etc. (and their sums and products).…

While existing literature has focused on the connection between GP regression and state space methods, the computational primitives allowing for inference using general likelihoods in combination with the Laplace approximation (LA), variational Bayes (VB), and assumed density filtering (ADF, a.k.a. single-sweep expectation propagation, EP) schemes has been largely overlooked.… We present a unifying framework for solving computational primitives for non-Gaussian inference schemes in the state space setting, thus directly enabling inference to be done through LA, VB, KL, and ADF/EP.

The following computational primitives allow to cast the covariance approximation in more generic terms: \(\begin{array}{llll}\text { 1. Linear } & \text { system } & \text { with } & \text { "regularized" } & \text { covariance: }\end{array}\) \[ \text { solve }_{\mathbf{K}}(\mathbf{W}, \mathbf{r}):=\left(\mathbf{K}+\mathbf{W}^{-1}\right)^{-1} \mathbf{r} \] 2. Matrix-vector multiplications: \(\operatorname{mvm}_{\mathbf{K}}(\mathbf{r}):=\mathbf{K r}\). For learning we also need \(\frac{\operatorname{mvm}_{K}(\mathbf{r})}{\partial \theta}\). 3. Log-determinants: \(\operatorname{ld}_{\mathbf{K}}(\mathbf{W}):=\log |\mathbf{B}|\) with symmet- ric and well-conditioned \(\mathbf{B}=\mathbf{I}+\mathbf{W}^{\frac{1}{2}} \mathbf{K} \mathbf{W}^{\frac{1}{2}}\). For learning, we need derivatives: \(\frac{\partial \operatorname{ld} \mathbf{K}(\mathbf{W})}{\partial \boldsymbol{\theta}}, \frac{\partial \operatorname{ld} \mathbf{K}(\mathbf{W})}{\partial \mathbf{W}}\) 4. Predictions need latent mean \(\mathbb{E}\left[f_{*}\right]\) and variance \(\mathbb{V}\left[f_{*}\right]\). Using these primitives, GP regression can be compactly written as \(\mathbf{W}=\mathbf{I} / \sigma_{n}^{2}, \boldsymbol{\alpha}=\operatorname{solve}_{\mathbf{K}}(\mathbf{W}, \mathbf{y}-\mathbf{m}),\) and \(\log Z_{\mathrm{GPR}}=\) \[ -\frac{1}{2}\left[\boldsymbol{\alpha}^{\top} \mathrm{mvm}_{\mathrm{K}}(\boldsymbol{\alpha})+\mathrm{ld}_{\mathrm{K}}(\mathbf{W})+n \log \left(2 \pi \sigma_{n}^{2}\right)\right] \] Approximate inference \((\mathrm{LA}, \mathrm{VB}, \mathrm{KL}, \mathrm{ADF} / \mathrm{EP})-\) in case of non-Gaussian likelihoods - requires these primitives as necessary building blocks. Depending on the covariance approximation method e.g. exact, sparse, grid-based, or state space, the four primitives differ in their implementation and computational complexity.

Latest papers (Chang et al. 2020; Gorad, Zhao, and särkkä 2020; Nickisch, Solin, and Grigorevskiy 2018; Remes, Heinonen, and Kaski 2018; Särkkä, Solin, and Hartikainen 2013; Solin 2016; William J. Wilkinson et al. 2019b; William J. Wilkinson et al. 2019).

## Spatio-temporal usage

Solving PDEs by filtering!

Latest papers (Cressie and Wikle 2014; Karvonen and Särkkä 2016; Lindgren, Rue, and Lindström 2011; Särkkä and Hartikainen 2012; Särkkä, Solin, and Hartikainen 2013; Solin and Särkkä 2013; Solin 2016; Zammit-Mangion and Wikle 2020)

## miscellaneous notes towards implementations

- Julia’s DynamicPolynomials.jl implements the MultivariatePolynomials, appears to be differentiable and handles rational polynomials.
- TemporalGPs.jl, introduced by Will Tebbutt, is a julia implementation of this.
- The abstract ForneyLab.jl system might be this behind its abstruse framing. Cox, van de Laar, and de Vries (2019)
- pyro.ai/examples/dkl.html
- http://pyro.ai/examples/gp.html
- http://pyro.ai/examples/ekf.html
- https://julialang.org/blog/2019/01/fluxdiffeq
- https://github.com/FluxML/model-zoo/tree/master/contrib/diffeq
- pyro.ai/examples/gplvm.html
- pyro.ai/examples/dmm.html
- http://docs.pyro.ai/en/stable/contrib.gp.html
- https://nbviewer.jupyter.org/github/SheffieldML/notebook/blob/master/GPy/index.ipynb
- https://blog.dominodatalab.com/fitting-gaussian-process-models-python/

*AISTATS*. http://arxiv.org/abs/2001.05363.

*MLSP*, 6.

*Statistical Science*28 (3): 424–46. https://doi.org/10.1214/13-STS421.

*International Journal of Approximate Reasoning*104 (January): 185–204. https://doi.org/10.1016/j.ijar.2018.11.002.

*Wiley StatsRef: Statistics Reference Online*. American Cancer Society. https://doi.org/10.1002/9781118445112.stat07813.

*Neural Computation*14 (3): 641–68. https://doi.org/10.1162/089976602317250933.

*Proceedings of the 25th International Conference on Machine Learning*, 192–99. ICML ’08. New York, NY, USA: ACM Press. https://doi.org/10.1145/1390156.1390181.

*SIAM Journal on Control*13 (1): 89–104. https://doi.org/10.1137/0313005.

*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc. http://papers.nips.cc/paper/7115-identification-of-gaussian-process-state-space-models.pdf.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*37 (2): 424–36. https://doi.org/10.1109/TPAMI.2013.192.

*2016 International Joint Conference on Neural Networks (IJCNN)*, 3354–63. Vancouver, BC, Canada: IEEE. https://doi.org/10.1109/IJCNN.2016.7727628.

*2010 IEEE International Workshop on Machine Learning for Signal Processing*, 379–84. Kittila, Finland: IEEE. https://doi.org/10.1109/MLSP.2010.5589113.

*Journal of Machine Learning Research*18 (151): 1–52. http://jmlr.org/papers/v18/16-579.html.

*Pattern Recognition Letters*45 (August): 85–91. https://doi.org/10.1016/j.patrec.2014.03.004.

*2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)*, 1–6. Vietri sul Mare, Salerno, Italy: IEEE. https://doi.org/10.1109/MLSP.2016.7738812.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*73 (4): 423–98. https://doi.org/10.1111/j.1467-9868.2011.00777.x.

*Proceedings of the IEEE*95 (6): 1295–1322. https://doi.org/10.1109/JPROC.2007.896497.

*Computational Statistics*28 (3): 1195–1223. https://doi.org/10.1007/s00180-012-0352-y.

*International Conference on Machine Learning*, 3789–98. http://proceedings.mlr.press/v80/nickisch18a.html.

*Journal of the Royal Statistical Society: Series B (Methodological)*40 (1): 1–24. https://doi.org/10.1111/j.2517-6161.1978.tb01643.x.

*International Conference on Artificial Intelligence and Statistics*, 1126–36. PMLR. http://proceedings.mlr.press/v108/peluchetti20a.html.

*2010 13th International Conference on Information Fusion*, 1–9. https://doi.org/10.1109/ICIF.2010.5711863.

*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4642–51. Curran Associates, Inc. http://papers.nips.cc/paper/7050-non-stationary-spectral-kernels.pdf.

*Artificial Neural Networks and Machine Learning – ICANN 2011*, edited by Timo Honkela, Włodzisław Duch, Mark Girolami, and Samuel Kaski, 6792:151–58. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-21738-8_20.

*Bayesian Filtering and Smoothing*. Institute of Mathematical Statistics Textbooks 3. Cambridge, U.K. ; New York: Cambridge University Press. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.461.4042&rep=rep1&type=pdf.

*Artificial Intelligence and Statistics*. http://www.jmlr.org/proceedings/papers/v22/sarkka12.html.

*IEEE Signal Processing Magazine*30 (4, 4): 51–61. https://doi.org/10.1109/MSP.2013.2246292.

*AStA Advances in Statistical Analysis*95 (4): 375–413. https://doi.org/10.1007/s10182-011-0172-3.

*Physical Review E*88 (5): 052909. https://doi.org/10.1103/PhysRevE.88.052909.

*Artificial Intelligence and Statistics*, 904–12. http://proceedings.mlr.press/v33/solin14.html.

*Statistics and Computing*30 (2): 419–46. https://doi.org/10.1007/s11222-019-09886-w.

*Advances in Neural Information Processing Systems*32: 12749–59. https://proceedings.neurips.cc/paper/2019/hash/ccce2fab7336b8bc8362d115dec2d5a2-Abstract.html.

*ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 3367–71. https://doi.org/10.1109/ICASSP.2019.8683798.

*ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 3352–56. https://doi.org/10.1109/ICASSP.2019.8682306.

*ICML*. http://arxiv.org/abs/2007.05994.

*Spatial Statistics*37 (June): 100408. https://doi.org/10.1016/j.spasta.2020.100408.