# State filtering for hidden Markov models

Kalman and friends

June 22, 2015 — May 24, 2023

Bayes
dynamical systems
linear algebra
probability
signal processing
state space models
statistics
time series

Kalman-Bucy filter and variants, recursive estimation, predictive state models, Data assimilation. A particular sub-field of signal processing for models with hidden state.

In statistics terms, the state filters are a kind of online-updating hierarchical model for sequential observations of a dynamical system where the random state is unobserved, but you can get an optimal estimate of it based on incoming measurements and known parameters.

A unifying feature of all these is by assuming a sparse influence graph between observations and dynamics, that you can estimate behaviour using efficient message passing.

This is a twin problem to optimal control. If I wish to tackle this problem from the perspective of observations rather than true state, perhaps I could do it from the perspective of Koopman operators.

## 1 Linear dynamical systems

In Kalman filters per se the default problem is usually concerned with multivariate real vector signals representing different axes of some telemetry data. In the degenerate case, where there is no observation noise, we can just design a linear filter which solves the target problem.

The classic Kalman filter assumes a linear model with Gaussian noise, although it might work with not-quite Gaussian, not-quite linear models if you prod it. You can extend this flavour to somewhat more general dynamics. For that, see later.

NB I’m conflating linear observation and linear process models, for now. We can relax that when there are some concrete examples in play.

There are a large number of equivalent formulations of the Kalman filter. The notation of Fearnhead and Künsch (2018) is representative. They start from the usual state filter setting: The state process $$\left(\mathbf{X}_{t}\right)$$ is assumed to be Markovian and the $$i$$-th observation, $$\mathbf{Y}_{i}$$, depends only on the state at time $$i, \mathbf{X}_{i}$$, so that the evolution and observation variates are defined by \begin{aligned} \mathbf{X}_{t} \mid\left(\mathbf{x}_{0: t-1}, \mathbf{y}_{1: t-1}\right) & \sim P\left(d \mathbf{x}_{t} \mid \mathbf{x}_{t-1}\right), \quad \mathbf{X}_{0} \sim \pi_{0}\left(d \mathbf{x}_{0}\right) \\ \mathbf{Y}_{t} \mid\left(\mathbf{x}_{0: t}, \mathbf{y}_{1: t-1}\right) & \sim g\left(\mathbf{y}_{t} \mid \mathbf{x}_{t}\right) d \nu\left(\mathbf{y}_{t}\right) \end{aligned} with joint distribution $\left(\mathbf{X}_{0: s}, \mathbf{Y}_{1: t}\right) \sim \pi_{0}\left(d \mathbf{x}_{0}\right) \prod_{i=1}^{s} P\left(d \mathbf{x}_{i} \mid \mathbf{x}_{i-1}\right) \prod_{j=1}^{t} g\left(\mathbf{y}_{j} \mid \mathbf{x}_{j}\right) \nu\left(d \mathbf{y}_{j}\right), \quad s \geq t.$

Integrating out the path of the state process, we obtain that \begin{aligned} \mathbf{Y}_{1: t} &\sim p\left(\mathbf{y}_{1: t}\right) \prod_{j} \nu\left(d \mathbf{y}_{j}\right)\text{, where}\\ p\left(\mathbf{y}_{1: t}\right) &=\int \pi_{0}\left(d \mathbf{x}_{0}\right) \prod_{i=1}^{s} P\left(d \mathbf{x}_{i} \mid \mathbf{x}_{i-1}\right) \prod_{j=1}^{t} g\left(\mathbf{y}_{j} \mid \mathbf{x}_{j}\right). \end{aligned} We wish to find the distribution $$\pi_{0: s \mid t}=\frac{p(\mathbf{y}_{1: t},\mathbf{x}_{0:s})}{p(\mathbf{y}_{1: t})}$$ (by Bayes’ rule). We deduce the recursion \begin{aligned} \pi_{0: t \mid t-1}\left(d \mathbf{x}_{0: t} \mid \mathbf{y}_{1: t-1}\right) &=\pi_{0: t-1 \mid t-1}\left(d \mathbf{x}_{0: t-1} \mid \mathbf{y}_{1: t-1}\right) P\left(d \mathbf{x}_{t} \mid \mathbf{x}_{t-1}\right) &\text{ prediction}\\ \pi_{0: t \mid t}\left(d \mathbf{x}_{0: t} \mid \mathbf{y}_{1: t}\right) &=\pi_{0: t \mid t-1}\left(d \mathbf{x}_{0: t} \mid \mathbf{y}_{1: t-1}\right) \frac{g\left(\mathbf{y}_{t} \mid \mathbf{x}_{t}\right)}{p\left(\mathbf{y}_{t} \mid \mathbf{y}_{1: t-1}\right)} &\text{ correction} \end{aligned} where $p\left(\mathbf{y}_{t} \mid \mathbf{y}_{1: t-1}\right)=\frac{p\left(\mathbf{y}_{1: t}\right)}{p\left(\mathbf{y}_{1: t-1}\right)}=\int \pi_{t \mid t-1}\left(d \mathbf{x}_{t} \mid \mathbf{y}_{1: t-1}\right) g\left(\mathbf{y}_{t} \mid \mathbf{x}_{t}\right) .$ Integrating out all but the latest states $$\mathbf{x}_{0: t-1}$$ gives us the one-step recursion \begin{aligned} \pi_{t \mid t-1}\left(d \mathbf{x}_{t} \mid \mathbf{y}_{1: t-1}\right) &=\int \pi_{t-1}\left(d \mathbf{x}_{t-1} \mid \mathbf{y}_{1: t-1}\right) P\left(d \mathbf{x}_{t} \mid \mathbf{x}_{t-1}\right) &\text{ prediction}\\ \pi_{t}\left(d \mathbf{x}_{t} \mid \mathbf{y}_{1: t}\right) &=\pi_{t \mid t-1}\left(d \mathbf{x}_{t} \mid \mathbf{y}_{1: t-1}\right) \frac{g\left(\mathbf{y}_{t} \mid \mathbf{x}_{t}\right)}{p_{t}\left(\mathbf{y}_{t} \mid \mathbf{y}_{1: t-1}\right)}&\text{ correction} \end{aligned}

If we approximate the filter distribution $$\pi_t$$ with a Monte Carlo sample, we are doing particle filtering, which Fearnhead and Künsch (2018) refer to as bootstrap filtering.

TODO: implied Kalman gain etc.

## 2 Non-linear dynamical systems

Cute exercise: you can derive the analytic Kalman filter for any noise and process dynamics of with Bayesian conjugate, and this leads to filters of nonlinear behaviour. Multivariate distributions are a bit of a mess for non-Gaussians, though, and a beta-Kalman filter feels contrived.

Upshot is, the non-linear extensions don’t usually rely on non-Gaussian conjugate distributions and analytic forms, but rather do some Gaussian/linear approximation, or use randomised methods such as particle filters.

For some examples in Stan see Sinhrks’ stan-statespace.

## 3 As errors-in-variables models

see, e.g. Bagge Carlson (2018).

## 4 Discrete state Hidden Markov models

🏗 Viterbi algorithm.

## 5 Unscented Kalman filter

i.e. using the unscented transform.

## 9 State filter inference

How about learning the parameters of the model generating your states? Ways that you can do this in dynamical systems include basic linear system identification, general system identification, .

## 10 References

Aasnaes, and Kailath. 1973. IEEE Transactions on Automatic Control.
Alliney. 1992. IEEE Transactions on Signal Processing.
Alzraiee, White, Knowling, et al. 2022. Environmental Modelling & Software.
Ansley, and Kohn. 1985. The Annals of Statistics.
Arulampalam, Maskell, Gordon, et al. 2002. IEEE Transactions on Signal Processing.
Bagge Carlson. 2018.
Battey, and Sancetta. 2013. Journal of Multivariate Analysis.
Batz, Ruttor, and Opper. 2017. arXiv:1702.05390 [Physics, Stat].
Becker, Pandya, Gebhardt, et al. 2019. In International Conference on Machine Learning.
Berkhout, and Zaanen. 1976. Geophysical Prospecting.
Bilmes. 1998. International Computer Science Institute.
Bishop, and Del Moral. 2016. SIAM Journal on Control and Optimization.
———. 2023. Mathematics of Control, Signals, and Systems.
Bishop, Del Moral, and Pathiraja. 2017. arXiv:1701.05978 [Math].
Bretó, He, Ionides, et al. 2009. The Annals of Applied Statistics.
Brunton, Proctor, and Kutz. 2016. Proceedings of the National Academy of Sciences.
Campbell, Shi, Rainforth, et al. 2021. In.
Carmi. 2013. Digital Signal Processing.
———. 2014. In Compressed Sensing & Sparse Filtering. Signals and Communication Technology.
Cassidy, Rae, and Solo. 2015. IEEE Transactions on Medical Imaging.
Cauchemez, and Ferguson. 2008. Journal of The Royal Society Interface.
Charles, Balavoine, and Rozell. 2016. IEEE Transactions on Signal Processing.
Chen, Y., and Hero. 2012. IEEE Transactions on Signal Processing.
Chen, Bin, and Hong. 2012. Econometric Theory.
Chung, Kastner, Dinh, et al. 2015. In Advances in Neural Information Processing Systems 28.
Commandeur, and Koopman. 2007. An Introduction to State Space Time Series Analysis.
Cox, van de Laar, and de Vries. 2019. International Journal of Approximate Reasoning.
Cressie, and Huang. 1999. Journal of the American Statistical Association.
Cressie, Shi, and Kang. 2010. Journal of Computational and Graphical Statistics.
Cressie, and Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics 2.0.
Freitas, João FG de, Doucet, Niranjan, et al. 1998. “Global Optimisation of Neural Network Models via Sequential Sampling.” In Proceedings of the 11th International Conference on Neural Information Processing Systems. NIPS’98.
Freitas, J. F. G. de, Niranjan, Gee, et al. 1998. “Sequential Monte Carlo Methods for Optimisation of Neural Network Models.” Cambridge University Engineering Department, Cambridge, England, Technical Report TR-328.
Deisenroth, and Mohamed. 2012. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2. NIPS’12.
Del Moral, Kurtzmann, and Tugaut. 2017. SIAM Journal on Control and Optimization.
Doucet, Jacob, and Rubenthaler. 2013. arXiv:1304.5768 [Stat].
Durbin, and Koopman. 1997. Biometrika.
———. 2012. Time Series Analysis by State Space Methods. Oxford Statistical Science Series 38.
Duttweiler, and Kailath. 1973a. IEEE Transactions on Information Theory.
———. 1973b. IEEE Transactions on Information Theory.
Easley, and Berry. 2020. arXiv:2006.13429 [Cs, Math].
Eddy. 1996. Current Opinion in Structural Biology.
Eden, Frank, Barbieri, et al. 2004. Neural Computation.
Edwards, and Ankinakatte. 2015. Statistical Modelling.
Eleftheriadis, Nicholson, Deisenroth, et al. 2017. In Advances in Neural Information Processing Systems 30.
Fearnhead, and Künsch. 2018. Annual Review of Statistics and Its Application.
Finke, and Singh. 2016. arXiv:1606.08650 [Stat].
Föll, Haasdonk, Hanselmann, et al. 2017. arXiv:1711.00799 [Stat].
Fraccaro, Sø nderby, Paquet, et al. 2016. In Advances in Neural Information Processing Systems 29.
Fraser. 2008. Hidden Markov Models and Dynamical Systems.
Friedlander, Kailath, and Ljung. 1975. In 1975 IEEE Conference on Decision and Control Including the 14th Symposium on Adaptive Processes.
Frigola, Chen, and Rasmussen. 2014. In Advances in Neural Information Processing Systems 27.
Frigola, Lindsten, Schön, et al. 2013. In Advances in Neural Information Processing Systems 26.
Friston. 2008. NeuroImage.
Gevers, and Kailath. 1973. IEEE Transactions on Automatic Control.
Gorad, Zhao, and Särkkä. 2020. “Parameter Estimation in Non-Linear State-Space Models by Automatic Differentiation of Non-Linear Kalman Filters.” In.
Gottwald, and Reich. 2020. arXiv:2007.07383 [Physics, Stat].
Gourieroux, and Jasiak. 2015. Journal of Time Series Analysis.
Gu, Johnson, Goel, et al. 2021. In Advances in Neural Information Processing Systems.
Haber, Lucka, and Ruthotto. 2018. arXiv:1805.08034 [Cs, Math].
Hamilton, Berry, and Sauer. 2016. arXiv:1611.05414 [Physics, Stat].
Hartikainen, and Särkkä. 2010. In 2010 IEEE International Workshop on Machine Learning for Signal Processing.
Harvey, A., and Koopman. 2005. In Encyclopedia of Biostatistics.
Harvey, Andrew, and Luati. 2014. Journal of the American Statistical Association.
Hefny, Downey, and Gordon. 2015. arXiv:1505.05310 [Cs, Stat].
He, Ionides, and King. 2010. Journal of The Royal Society Interface.
Hong, Mitchell, Chen, et al. 2008. International Journal of Systems Science.
Hou, Lawrence, and Hero. 2016. arXiv:1610.00195 [Physics, Stat].
Hsiao, and Schultz. 2011. “Generalized Baum-Welch Algorithm and Its Implication to a New Extended Baum-Welch Algorithm.” In In Proceedings of INTERSPEECH.
Hsu, Kakade, and Zhang. 2012. Journal of Computer and System Sciences, JCSS Special Issue: Cloud Computing 2011,.
Huber. 2014. Pattern Recognition Letters.
Ionides, Edward L., Bhadra, Atchadé, et al. 2011. The Annals of Statistics.
Ionides, E. L., Bretó, and King. 2006. Proceedings of the National Academy of Sciences.
Johansen, Doucet, and Davy. 2006. Scis & Isis.
Johnson. 2012. arXiv:1204.2477 [Cs, Stat].
Julier, Uhlmann, and Durrant-Whyte. 1995. In American Control Conference, Proceedings of the 1995.
Kailath. 1971. IEEE Transactions on Information Theory.
———. 1974. IEEE Transactions on Information Theory.
Kailath, and Duttweiler. 1972. IEEE Transactions on Information Theory.
Kailath, and Geesey. 1971. IEEE Transactions on Automatic Control.
———. 1973. IEEE Transactions on Automatic Control.
Kailath, and Weinert. 1975. IEEE Transactions on Information Theory.
Kalman, R. 1959. IRE Transactions on Automatic Control.
Kalman, R. E. 1960. Journal of Basic Engineering.
Kalouptsidis, Mileounis, Babadi, et al. 2011. Signal Processing.
Karvonen, and Särkkä. 2016. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
Kelly, Law, and Stuart. 2014. Nonlinearity.
Kirch, Edwards, Meier, et al. 2019. Bayesian Analysis.
Kitagawa. 1987. Journal of the American Statistical Association.
———. 1996. Journal of Computational and Graphical Statistics.
Kitagawa, and Gersch. 1996. Smoothness Priors Analysis of Time Series. Lecture notes in statistics 116.
Kobayashi, Mark, and Turin. 2011. Probability, Random Processes, and Statistical Analysis: Applications to Communications, Signal Processing, Queueing Theory and Mathematical Finance.
Koopman, and Durbin. 2000. Journal of Time Series Analysis.
Krishnan, Shalit, and Sontag. 2017. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.
Kulhavý. 1990. Automatica.
———. 1996. Recursive Nonlinear Estimation. Lecture Notes in Control and Information Sciences.
Kutschireiter, Surace, Sprekeler, et al. 2015. BMC Neuroscience.
Lázaro-Gredilla, Quiñonero-Candela, Rasmussen, et al. 2010. Journal of Machine Learning Research.
Le Gland, Monbet, and Tran. 2009. Report.
Lei, Bickel, and Snyder. 2009. Monthly Weather Review.
Levin. 2017. arXiv:1703.08596 [Cs, Math, Stat].
Lindgren, Rue, and Lindström. 2011. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Ljung, and Kailath. 1976. IEEE Transactions on Information Theory.
Ljung, Kailath, and Friedlander. 1975. In 1975 IEEE Conference on Decision and Control Including the 14th Symposium on Adaptive Processes.
Loeliger, Dauwels, Hu, et al. 2007. Proceedings of the IEEE.
Manton, Krishnamurthy, and Poor. 1998. IEEE Transactions on Signal Processing.
Mattos, Dai, Damianou, et al. 2016. In Proceedings of ICLR.
Mattos, Dai, Damianou, et al. 2017. Journal of Process Control, DYCOPS-CAB 2016,.
Meyer, Edwards, Maturana-Russel, et al. 2020. WIREs Computational Statistics.
Micchelli, and Olsen. 2000. Journal of Computational and Applied Mathematics.
Miller, Glennie, and Seaton. 2020. Journal of Agricultural, Biological and Environmental Statistics.
Nickisch, Solin, and Grigorevskiy. 2018. In International Conference on Machine Learning.
Olfati-Saber. 2005. In 44th IEEE Conference on Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC ’05.
Ollivier. 2017. arXiv:1703.00209 [Math, Stat].
Papadopoulos, Pachet, Roy, et al. 2015. In Principles and Practice of Constraint Programming. Lecture Notes in Computer Science.
Perry. 2010. IEEE Spectrum.
Picci. 1991. In Mathematical System Theory: The Influence of R. E. Kalman.
Psiaki. 2013. IEEE Control Systems.
Pugachev, V.S. 1982. Automatica.
Pugachev, V. S., and Sinit︠s︡yn. 2001. Stochastic systems: theory and applications.
Quiñonero-Candela, and Rasmussen. 2005. Journal of Machine Learning Research.
Rabiner, L.R. 1989. Proceedings of the IEEE.
Rabiner, L., and Juang. 1986. IEEE ASSP Magazine.
Raol, and Sinha. 1987. In Stochastic Control. IFAC Symposia Series.
Reece, and Roberts. 2010. In 2010 13th International Conference on Information Fusion.
Reller. 2013. Application/pdf.
Revach, Shlezinger, van Sloun, et al. 2021. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Robertson, Andrew N. 2011. In.
Robertson, Andrew, and Plumbley. 2007. In Proceedings of the 7th International Conference on New Interfaces for Musical Expression. NIME ’07.
Robertson, Andrew, Stark, and Davies. 2013. In Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.
Robertson, Andrew, Stark, and Plumbley. 2011. In Proceedings of the International Computer Music Conference 2011.
Rodriguez, and Ruiz. 2009. Journal of Time Series Analysis.
Roth, Hendeby, Fritsche, et al. 2017. EURASIP Journal on Advances in Signal Processing.
Rozet, and Louppe. 2023.
Rudenko. 2013. Journal of Computer and Systems Sciences International.
Särkkä, Simo. 2007. IEEE Transactions on Automatic Control.
———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3.
Särkkä, Simo, and Hartikainen. 2012. In Artificial Intelligence and Statistics.
Särkkä, S., and Hartikainen. 2013. In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).
Särkkä, Simo, and Nummenmaa. 2009. IEEE Transactions on Automatic Control.
Särkkä, Simo, Solin, and Hartikainen. 2013. IEEE Signal Processing Magazine.
Schein, Wallach, and Zhou. 2016. In Advances In Neural Information Processing Systems.
Schirmer, Zhang, and Nalisnick. 2024.
Schmidt, Krämer, and Hennig. 2021. arXiv:2103.10153 [Cs, Stat].
Segall, Davis, and Kailath. 1975. IEEE Transactions on Information Theory.
Šindelář, Vajda, and Kárnỳ. 2008. Kybernetika.
Sorenson. 1970. IEEE Spectrum.
Städler, and Mukherjee. 2013. The Annals of Applied Statistics.
Surace, and Pfister. 2016. “Online Maximum Likelihood Estimation of the Parameters of Partially Observed Diffusion Processes.” In.
Tavakoli, and Panaretos. 2016. Journal of the American Statistical Association.
Thrun, and Langford. 1998.
Thrun, Langford, and Fox. 1999. In Proceedings of the International Conference on Machine Learning.
Turner, Deisenroth, and Rasmussen. 2010. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.
Wikle, and Berliner. 2007. Physica D: Nonlinear Phenomena, Data Assimilation,.
Wikle, Berliner, and Cressie. 1998. Environmental and Ecological Statistics.
Zhao, and Cui. 2023.
Zoeter. 2007. In 2007 5th International Symposium on Image and Signal Processing and Analysis.