# Hidden Markov Model inference for Gaussian Process regression

Classic flavours together, Gaussian processes and state filters/ stochastic differential equations.

I am interested here in the trick which makes certain Gaussian process regression problems soluble by making them local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. This means you get to solve a GP as an SDE.

Not covered here: There is another concept which includes the same keywords but is distinct: using Gaussian processes to define state process dynamics or observation distribution. I have no use for that at the moment.

This trick is explained in an intro article in , based on previous work , possible also . Aside: is an incredible paper that invented several research areas at once (GP regression, surrogate experiemnt design as well as this) and AFAICT no one noticed at the time.

Recent works I should also check here include [;;;].

The idea is that if your covariance kernel is, or can be well approximated by a rational function then it is possible to factorise it into a state space model tractably, which makes it cheap due to the favourable properties of such models. That sounds simple enough conceptually; I wonder about the practice. Of course, when you want some some complications, such as non-stationary kernels or hierarchical models, this state space indererence trick gets more complicated, and posterior distributions are no longer so simple,

tries to introduce a vocabulary for inference based on this insight, by discussing it in terms of computational primitives

In time-series data, with D = 1, the data sets tend to become long (or unbounded) when observations accumulate over time. For these time-series models, leveraging sequential state space methods from signal processing makes it possible to solve GP inference problems in linear time com- plexity O(n) if the underlying GP has Markovian structure . This reformulation is exact for Markovian covariance functions (see, e.g., ) such as the exponential, half-integer Matérn, noise, constant, linear, polynomial, Wiener, etc. (and their sums and products).…

introduces a computational toolkit and many worked examples of inferene algorithms. looks like it might be solving a similar problem but I do not yet understand their framing.

This complements, perhaps, the trick of fast Gaussian process calculations on lattices.

There is a good attempt to harmonise the language and make this general in by discussing computational primitives

In time-series data, with D = 1, the data sets tend to become long (or unbounded) when observations accumulate over time. For these time-series models, leveraging sequential state space methods from signal processing makes it possible to solve GP inference problems in linear time com- plexity O(n) if the underlying GP has Markovian structure . This reformulation is exact for Markovian covariance functions (see, e.g., ) such as the exponential, half-integer Matérn, noise, constant, linear, polynomial, Wiener, etc. (and their sums and products).…

While existing literature has focused on the connection between GP regression and state space methods, the computational primitives allowing for inference using general likelihoods in combination with the Laplace approximation (LA), variational Bayes (VB), and assumed density filtering (ADF, a.k.a. single-sweep expectation propagation, EP) schemes has been largely overlooked.… We present a unifying framework for solving computational primitives for non-Gaussian inference schemes in the state space setting, thus directly enabling inference to be done through LA, VB, KL, and ADF/EP.

The following computational primitives allow to cast the covariance approximation in more generic terms: $$\begin{array}{llll}\text { 1. Linear } & \text { system } & \text { with } & \text { "regularized" } & \text { covariance: }\end{array}$$ $\text { solve }_{\mathbf{K}}(\mathbf{W}, \mathbf{r}):=\left(\mathbf{K}+\mathbf{W}^{-1}\right)^{-1} \mathbf{r}$ 2. Matrix-vector multiplications: $$\operatorname{mvm}_{\mathbf{K}}(\mathbf{r}):=\mathbf{K r}$$. For learning we also need $$\frac{\operatorname{mvm}_{K}(\mathbf{r})}{\partial \theta}$$. 3. Log-determinants: $$\operatorname{ld}_{\mathbf{K}}(\mathbf{W}):=\log |\mathbf{B}|$$ with symmet- ric and well-conditioned $$\mathbf{B}=\mathbf{I}+\mathbf{W}^{\frac{1}{2}} \mathbf{K} \mathbf{W}^{\frac{1}{2}}$$. For learning, we need derivatives: $$\frac{\partial \operatorname{ld} \mathbf{K}(\mathbf{W})}{\partial \boldsymbol{\theta}}, \frac{\partial \operatorname{ld} \mathbf{K}(\mathbf{W})}{\partial \mathbf{W}}$$ 4. Predictions need latent mean $$\mathbb{E}\left[f_{*}\right]$$ and variance $$\mathbb{V}\left[f_{*}\right]$$. Using these primitives, GP regression can be compactly written as $$\mathbf{W}=\mathbf{I} / \sigma_{n}^{2}, \boldsymbol{\alpha}=\operatorname{solve}_{\mathbf{K}}(\mathbf{W}, \mathbf{y}-\mathbf{m}),$$ and $$\log Z_{\mathrm{GPR}}=$$ $-\frac{1}{2}\left[\boldsymbol{\alpha}^{\top} \mathrm{mvm}_{\mathrm{K}}(\boldsymbol{\alpha})+\mathrm{ld}_{\mathrm{K}}(\mathbf{W})+n \log \left(2 \pi \sigma_{n}^{2}\right)\right]$ Approximate inference $$(\mathrm{LA}, \mathrm{VB}, \mathrm{KL}, \mathrm{ADF} / \mathrm{EP})-$$ in case of non-Gaussian likelihoods - requires these primitives as necessary building blocks. Depending on the covariance approximation method e.g. exact, sparse, grid-based, or state space, the four primitives differ in their implementation and computational complexity.

## Spatio-temporal usage

Solving PDEs by filtering!

## miscellaneous notes towards implementations

Adam, Vincent, Stefanos Eleftheriadis, Nicolas Durrande, Artem Artemev, and James Hensman. 2020. “Doubly Sparse Variational Gaussian Processes.” In AISTATS. http://arxiv.org/abs/2001.05363.
Chang, Paul E, William J Wilkinson, Mohammad Emtiyaz Khan, and Arno Solin. 2020. “Fast Variational Learning in State-Space Gaussian Process Models.” In MLSP, 6.
Cotter, S. L., G. O. Roberts, A. M. Stuart, and D. White. 2013. MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster.” Statistical Science 28 (3): 424–46. https://doi.org/10.1214/13-STS421.
Cox, Marco, Thijs van de Laar, and Bert de Vries. 2019. “A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms.” International Journal of Approximate Reasoning 104 (January): 185–204. https://doi.org/10.1016/j.ijar.2018.11.002.
Cressie, Noel, and Christopher K. Wikle. 2014. “Space-Time Kalman Filter.” In Wiley StatsRef: Statistics Reference Online. American Cancer Society. https://doi.org/10.1002/9781118445112.stat07813.
Csató, Lehel, and Manfred Opper. 2002. “Sparse On-Line Gaussian Processes.” Neural Computation 14 (3): 641–68. https://doi.org/10.1162/089976602317250933.
Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008. “Fast Gaussian Process Methods for Point Process Intensity Estimation.” In Proceedings of the 25th International Conference on Machine Learning, 192–99. ICML ’08. New York, NY, USA: ACM Press. https://doi.org/10.1145/1390156.1390181.
Curtain, Ruth F. 1975. “Infinite-Dimensional Filtering.” SIAM Journal on Control 13 (1): 89–104. https://doi.org/10.1137/0313005.
Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017. “Identification of Gaussian Process State Space Models.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc. http://papers.nips.cc/paper/7115-identification-of-gaussian-process-state-space-models.pdf.
Gilboa, E., Y. Saatçi, and J. P. Cunningham. 2015. “Scaling Multidimensional Inference for Structured Gaussian Processes.” IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2): 424–36. https://doi.org/10.1109/TPAMI.2013.192.
Gorad, Ajinkya, Zheng Zhao, and Simo särkkä. 2020. “Parameter Estimation in Non-Linear State-Space Models by Automatic Differentiation of Non-Linear Kalman Filters.” In, 6.
Grigorievskiy, Alexander, and Juha Karhunen. 2016. “Gaussian Process Kernels for Popular State-Space Time Series Models.” In 2016 International Joint Conference on Neural Networks (IJCNN), 3354–63. Vancouver, BC, Canada: IEEE. https://doi.org/10.1109/IJCNN.2016.7727628.
Grigorievskiy, Alexander, Neil Lawrence, and Simo Särkkä. 2017. “Parallelizable Sparse Inverse Formulation Gaussian Processes (SpInGP).” In. http://arxiv.org/abs/1610.08035.
Hartikainen, J., and S. Särkkä. 2010. “Kalman Filtering and Smoothing Solutions to Temporal Gaussian Process Regression Models.” In 2010 IEEE International Workshop on Machine Learning for Signal Processing, 379–84. Kittila, Finland: IEEE. https://doi.org/10.1109/MLSP.2010.5589113.
Hensman, James, Nicolas Durrande, and Arno Solin. 2018. “Variational Fourier Features for Gaussian Processes.” Journal of Machine Learning Research 18 (151): 1–52. http://jmlr.org/papers/v18/16-579.html.
Huber, Marco F. 2014. “Recursive Gaussian Process: On-Line Regression and Learning.” Pattern Recognition Letters 45 (August): 85–91. https://doi.org/10.1016/j.patrec.2014.03.004.
Karvonen, Toni, and Simo Särkkä. 2016. “Approximate State-Space Gaussian Processes via Spectral Transformation.” In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6. Vietri sul Mare, Salerno, Italy: IEEE. https://doi.org/10.1109/MLSP.2016.7738812.
Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. “An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 423–98. https://doi.org/10.1111/j.1467-9868.2011.00777.x.
Loeliger, Hans-Andrea, Justin Dauwels, Junli Hu, Sascha Korl, Li Ping, and Frank R. Kschischang. 2007. “The Factor Graph Approach to Model-Based Signal Processing.” Proceedings of the IEEE 95 (6): 1295–1322. https://doi.org/10.1109/JPROC.2007.896497.
Mbalawata, Isambi S., Simo Särkkä, and Heikki Haario. 2013. “Parameter Estimation in Stochastic Differential Equations with Markov Chain Monte Carlo and Non-Linear Kalman Filtering.” Computational Statistics 28 (3): 1195–1223. https://doi.org/10.1007/s00180-012-0352-y.
Nickisch, Hannes, Arno Solin, and Alexander Grigorevskiy. 2018. “State Space Gaussian Processes with Non-Gaussian Likelihood.” In International Conference on Machine Learning, 3789–98. http://proceedings.mlr.press/v80/nickisch18a.html.
O’Hagan, A. 1978. “Curve Fitting and Optimal Design for Prediction.” Journal of the Royal Statistical Society: Series B (Methodological) 40 (1): 1–24. https://doi.org/10.1111/j.2517-6161.1978.tb01643.x.
Peluchetti, Stefano, and Stefano Favaro. 2020. “Infinitely Deep Neural Networks as Diffusion Processes.” In International Conference on Artificial Intelligence and Statistics, 1126–36. PMLR. http://proceedings.mlr.press/v108/peluchetti20a.html.
Rackauckas, Christopher, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Rohit Supekar, Dominic Skinner, and Ali Ramadhan. 2020. “Universal Differential Equations for Scientific Machine Learning.” January 13, 2020. https://arxiv.org/abs/2001.04385v1.
Reece, S., and S. Roberts. 2010. “An Introduction to Gaussian Processes for the Kalman Filter Expert.” In 2010 13th International Conference on Information Fusion, 1–9. https://doi.org/10.1109/ICIF.2010.5711863.
Remes, Sami, Markus Heinonen, and Samuel Kaski. 2017. “Non-Stationary Spectral Kernels.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4642–51. Curran Associates, Inc. http://papers.nips.cc/paper/7050-non-stationary-spectral-kernels.pdf.
———. 2018. “Neural Non-Stationary Spectral Kernel.” November 27, 2018. http://arxiv.org/abs/1811.10978.
Saatçi, Yunus. 2012. “Scalable Inference for Structured Gaussian Process Models.” Ph.D., University of Cambridge. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.610016.
Särkkä, Simo. 2011. “Linear Operators and Stochastic Partial Differential Equations in Gaussian Process Regression.” In Artificial Neural Networks and Machine LearningICANN 2011, edited by Timo Honkela, Włodzisław Duch, Mark Girolami, and Samuel Kaski, 6792:151–58. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-21738-8_20.
———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3. Cambridge, U.K. ; New York: Cambridge University Press. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.461.4042&rep=rep1&type=pdf.
Särkkä, Simo, and Jouni Hartikainen. 2012. “Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression.” In Artificial Intelligence and Statistics. http://www.jmlr.org/proceedings/papers/v22/sarkka12.html.
Särkkä, Simo, A. Solin, and J. Hartikainen. 2013. “Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” IEEE Signal Processing Magazine 30 (4, 4): 51–61. https://doi.org/10.1109/MSP.2013.2246292.
Singer, Hermann. 2011. “Continuous-Discrete State-Space Modeling of Panel Data with Nonlinear Filter Algorithms.” AStA Advances in Statistical Analysis 95 (4): 375–413. https://doi.org/10.1007/s10182-011-0172-3.
Solin, Arno. 2016. “Stochastic Differential Equation Methods for Spatio-Temporal Gaussian Process Regression.” Aalto University. https://aaltodoc.aalto.fi:443/handle/123456789/19842.
Solin, Arno, and Simo Särkkä. 2013. “Infinite-Dimensional Bayesian Filtering for Detection of Quasiperiodic Phenomena in Spatiotemporal Data.” Physical Review E 88 (5): 052909. https://doi.org/10.1103/PhysRevE.88.052909.
———. 2014. “Explicit Link Between Periodic Covariance Functions and State Space Models.” In Artificial Intelligence and Statistics, 904–12. http://proceedings.mlr.press/v33/solin14.html.
———. 2020. “Hilbert Space Methods for Reduced-Rank Gaussian Process Regression.” Statistics and Computing 30 (2): 419–46. https://doi.org/10.1007/s11222-019-09886-w.
Tobar, Felipe. 2019. “Band-Limited Gaussian Processes: The Sinc Kernel.” Advances in Neural Information Processing Systems 32: 12749–59. https://proceedings.neurips.cc/paper/2019/hash/ccce2fab7336b8bc8362d115dec2d5a2-Abstract.html.
Tzinis, Efthymios, Zhepei Wang, and Paris Smaragdis. 2020. “Sudo Rm -Rf: Efficient Networks for Universal Audio Source Separation.” In, 6.
Valenzuela, C., and F. Tobar. 2019. “Low-Pass Filtering as Bayesian Inference.” In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3367–71. https://doi.org/10.1109/ICASSP.2019.8683798.
Vandenberg-Rodes, Alexander, and Babak Shahbaba. 2015. “Dependent Matérn Processes for Multivariate Time Series.” February 11, 2015. http://arxiv.org/abs/1502.03466.
Wilkinson, William J., M. Riis Andersen, J. D. Reiss, D. Stowell, and A. Solin. 2019a. “Unifying Probabilistic Models for Time-Frequency Analysis.” In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3352–56. https://doi.org/10.1109/ICASSP.2019.8682306.
Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019b. “End-to-End Probabilistic Inference for Nonstationary Audio Analysis.” January 31, 2019. https://arxiv.org/abs/1901.11436v1.
Wilkinson, William J., Paul E. Chang, Michael Riis Andersen, and Arno Solin. 2020. “State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes.” In ICML. http://arxiv.org/abs/2007.05994.
Wilkinson, William J, Paul E Chang, Michael Riis Andersen, and Arno Solin. 2019. “Global Approximate Inference via Local Linearisation for Temporal Gaussian Processes,” 12.
Zammit-Mangion, Andrew, and Christopher K. Wikle. 2020. “Deep Integro-Difference Equation Models for Spatio-Temporal Forecasting.” Spatial Statistics 37 (June): 100408. https://doi.org/10.1016/j.spasta.2020.100408.