# State filtering for hidden Markov models

Kalman and friends

June 22, 2015 — May 24, 2023

Kalman-Bucy filter and variants, recursive estimation, predictive state models, Data assimilation. A particular sub-field of signal processing for models with hidden state.

In statistics terms, the state filters are a kind of online-updating hierarchical model for sequential observations of a dynamical system where the random state is unobserved, but you can get an optimal estimate of it based on incoming measurements and known parameters.

A unifying feature of all these is by assuming a sparse influence graph between observations and dynamics, that you can estimate behaviour using efficient message passing.

This is a twin problem to optimal control. If I wish to tackle this problem from the perspective of *observations* rather than true state, perhaps I could do it from the perspective of Koopman operators.

## 1 Linear dynamical systems

In Kalman filters *per se* the default problem is usually concerned with multivariate real vector signals representing different axes of some telemetry data. In the degenerate case, where there is no observation noise, we can just design a linear filter which solves the target problem.

The classic Kalman filter (R. E. Kalman 1960) assumes a linear model with Gaussian noise, although it might work with not-quite Gaussian, not-quite linear models if you prod it. You can extend this flavour to somewhat more general dynamics. For that, see later.

NB I’m conflating linear observation and linear process models, for now. We can relax that when there are some concrete examples in play.

There are a large number of equivalent formulations of the Kalman filter. The notation of Fearnhead and Künsch (2018) is representative. They start from the usual state filter setting: The state process \(\left(\mathbf{X}_{t}\right)\) is assumed to be Markovian and the \(i\)-th observation, \(\mathbf{Y}_{i}\), depends only on the state at time \(i, \mathbf{X}_{i}\), so that the evolution and observation variates are defined by \[ \begin{aligned} \mathbf{X}_{t} \mid\left(\mathbf{x}_{0: t-1}, \mathbf{y}_{1: t-1}\right) & \sim P\left(d \mathbf{x}_{t} \mid \mathbf{x}_{t-1}\right), \quad \mathbf{X}_{0} \sim \pi_{0}\left(d \mathbf{x}_{0}\right) \\ \mathbf{Y}_{t} \mid\left(\mathbf{x}_{0: t}, \mathbf{y}_{1: t-1}\right) & \sim g\left(\mathbf{y}_{t} \mid \mathbf{x}_{t}\right) d \nu\left(\mathbf{y}_{t}\right) \end{aligned} \] with joint distribution \[ \left(\mathbf{X}_{0: s}, \mathbf{Y}_{1: t}\right) \sim \pi_{0}\left(d \mathbf{x}_{0}\right) \prod_{i=1}^{s} P\left(d \mathbf{x}_{i} \mid \mathbf{x}_{i-1}\right) \prod_{j=1}^{t} g\left(\mathbf{y}_{j} \mid \mathbf{x}_{j}\right) \nu\left(d \mathbf{y}_{j}\right), \quad s \geq t. \]

Integrating out the path of the state process, we obtain that \[\begin{aligned} \mathbf{Y}_{1: t} &\sim p\left(\mathbf{y}_{1: t}\right) \prod_{j} \nu\left(d \mathbf{y}_{j}\right)\text{, where}\\ p\left(\mathbf{y}_{1: t}\right) &=\int \pi_{0}\left(d \mathbf{x}_{0}\right) \prod_{i=1}^{s} P\left(d \mathbf{x}_{i} \mid \mathbf{x}_{i-1}\right) \prod_{j=1}^{t} g\left(\mathbf{y}_{j} \mid \mathbf{x}_{j}\right). \end{aligned} \] We wish to find the distribution \(\pi_{0: s \mid t}=\frac{p(\mathbf{y}_{1: t},\mathbf{x}_{0:s})}{p(\mathbf{y}_{1: t})}\) (by Bayes’ rule). We deduce the recursion \[ \begin{aligned} \pi_{0: t \mid t-1}\left(d \mathbf{x}_{0: t} \mid \mathbf{y}_{1: t-1}\right) &=\pi_{0: t-1 \mid t-1}\left(d \mathbf{x}_{0: t-1} \mid \mathbf{y}_{1: t-1}\right) P\left(d \mathbf{x}_{t} \mid \mathbf{x}_{t-1}\right) &\text{ prediction}\\ \pi_{0: t \mid t}\left(d \mathbf{x}_{0: t} \mid \mathbf{y}_{1: t}\right) &=\pi_{0: t \mid t-1}\left(d \mathbf{x}_{0: t} \mid \mathbf{y}_{1: t-1}\right) \frac{g\left(\mathbf{y}_{t} \mid \mathbf{x}_{t}\right)}{p\left(\mathbf{y}_{t} \mid \mathbf{y}_{1: t-1}\right)} &\text{ correction} \end{aligned} \] where \[ p\left(\mathbf{y}_{t} \mid \mathbf{y}_{1: t-1}\right)=\frac{p\left(\mathbf{y}_{1: t}\right)}{p\left(\mathbf{y}_{1: t-1}\right)}=\int \pi_{t \mid t-1}\left(d \mathbf{x}_{t} \mid \mathbf{y}_{1: t-1}\right) g\left(\mathbf{y}_{t} \mid \mathbf{x}_{t}\right) . \] Integrating out all but the latest states \(\mathbf{x}_{0: t-1}\) gives us the one-step recursion \[ \begin{aligned} \pi_{t \mid t-1}\left(d \mathbf{x}_{t} \mid \mathbf{y}_{1: t-1}\right) &=\int \pi_{t-1}\left(d \mathbf{x}_{t-1} \mid \mathbf{y}_{1: t-1}\right) P\left(d \mathbf{x}_{t} \mid \mathbf{x}_{t-1}\right) &\text{ prediction}\\ \pi_{t}\left(d \mathbf{x}_{t} \mid \mathbf{y}_{1: t}\right) &=\pi_{t \mid t-1}\left(d \mathbf{x}_{t} \mid \mathbf{y}_{1: t-1}\right) \frac{g\left(\mathbf{y}_{t} \mid \mathbf{x}_{t}\right)}{p_{t}\left(\mathbf{y}_{t} \mid \mathbf{y}_{1: t-1}\right)}&\text{ correction} \end{aligned} \]

If we approximate the filter distribution \(\pi_t\) with a Monte Carlo sample, we are doing particle filtering, which Fearnhead and Künsch (2018) refer to as *bootstrap filtering*.

TODO: implied Kalman gain etc.

## 2 Non-linear dynamical systems

Cute exercise: you can derive the analytic Kalman filter for any noise and process dynamics of with Bayesian conjugate, and this leads to filters of nonlinear behaviour. Multivariate distributions are a bit of a mess for non-Gaussians, though, and a beta-Kalman filter feels contrived.

Upshot is, the non-linear extensions don’t usually rely on non-Gaussian conjugate distributions and analytic forms, but rather do some Gaussian/linear approximation, or use randomised methods such as particle filters.

For some examples in Stan see Sinhrks’ stan-statespace.

## 3 As errors-in-variables models

see, e.g. Bagge Carlson (2018).

## 5 Unscented Kalman filter

i.e. using the unscented transform.

## 6 Variational state filters

## 7 Kalman filtering Gaussian processes

## 8 Ensemble Kalman filters

## 9 State filter inference

How about learning the *parameters* of the model generating your states? Ways that you can do this in dynamical systems include basic linear system identification, general system identification, .

## 10 References

*IEEE Transactions on Automatic Control*.

*IEEE Transactions on Signal Processing*.

*Environmental Modelling & Software*.

*The Annals of Statistics*.

*IEEE Transactions on Signal Processing*.

*Journal of Multivariate Analysis*.

*arXiv:1702.05390 [Physics, Stat]*.

*International Conference on Machine Learning*.

*Geophysical Prospecting*.

*International Computer Science Institute*.

*SIAM Journal on Control and Optimization*.

*Mathematics of Control, Signals, and Systems*.

*arXiv:1701.05978 [Math]*.

*The Annals of Applied Statistics*.

*Proceedings of the National Academy of Sciences*.

*Digital Signal Processing*.

*Compressed Sensing & Sparse Filtering*. Signals and Communication Technology.

*IEEE Transactions on Medical Imaging*.

*Journal of The Royal Society Interface*.

*IEEE Transactions on Signal Processing*.

*IEEE Transactions on Signal Processing*.

*Econometric Theory*.

*Advances in Neural Information Processing Systems 28*.

*Ecology*.

*An Introduction to State Space Time Series Analysis*.

*International Journal of Approximate Reasoning*.

*Journal of the American Statistical Association*.

*Journal of Computational and Graphical Statistics*.

*Statistics for Spatio-Temporal Data*. Wiley Series in Probability and Statistics 2.0.

*Proceedings of the 11th International Conference on Neural Information Processing Systems*. NIPS’98.

*Cambridge University Engineering Department, Cambridge, England, Technical Report TR-328*.

*Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2*. NIPS’12.

*SIAM Journal on Control and Optimization*.

*arXiv:1304.5768 [Stat]*.

*Biometrika*.

*Time Series Analysis by State Space Methods*. Oxford Statistical Science Series 38.

*IEEE Transactions on Information Theory*.

*IEEE Transactions on Information Theory*.

*arXiv:2006.13429 [Cs, Math]*.

*Current Opinion in Structural Biology*.

*Neural Computation*.

*Statistical Modelling*.

*Advances in Neural Information Processing Systems 30*.

*Annual Review of Statistics and Its Application*.

*arXiv:1606.08650 [Stat]*.

*arXiv:1711.00799 [Stat]*.

*Advances in Neural Information Processing Systems 29*.

*Hidden Markov Models and Dynamical Systems*.

*1975 IEEE Conference on Decision and Control Including the 14th Symposium on Adaptive Processes*.

*Advances in Neural Information Processing Systems 27*.

*Advances in Neural Information Processing Systems 26*.

*NeuroImage*.

*IEEE Transactions on Automatic Control*.

*arXiv:2007.07383 [Physics, Stat]*.

*Journal of Time Series Analysis*.

*Advances in Neural Information Processing Systems*.

*arXiv:1805.08034 [Cs, Math]*.

*arXiv:1611.05414 [Physics, Stat]*.

*2010 IEEE International Workshop on Machine Learning for Signal Processing*.

*Encyclopedia of Biostatistics*.

*Journal of the American Statistical Association*.

*arXiv:1505.05310 [Cs, Stat]*.

*Journal of The Royal Society Interface*.

*International Journal of Systems Science*.

*arXiv:1610.00195 [Physics, Stat]*.

*In Proceedings of INTERSPEECH*.

*Journal of Computer and System Sciences*, JCSS Special Issue: Cloud Computing 2011,.

*Pattern Recognition Letters*.

*The Annals of Statistics*.

*Proceedings of the National Academy of Sciences*.

*Scis & Isis*.

*arXiv:1204.2477 [Cs, Stat]*.

*American Control Conference, Proceedings of the 1995*.

*IEEE Transactions on Information Theory*.

*IEEE Transactions on Information Theory*.

*IEEE Transactions on Information Theory*.

*IEEE Transactions on Automatic Control*.

*IEEE Transactions on Automatic Control*.

*IEEE Transactions on Information Theory*.

*IRE Transactions on Automatic Control*.

*Journal of Basic Engineering*.

*Signal Processing*.

*2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)*.

*Nonlinearity*.

*Bayesian Analysis*.

*Journal of the American Statistical Association*.

*Journal of Computational and Graphical Statistics*.

*Smoothness Priors Analysis of Time Series*. Lecture notes in statistics 116.

*Probability, Random Processes, and Statistical Analysis: Applications to Communications, Signal Processing, Queueing Theory and Mathematical Finance*.

*Journal of Time Series Analysis*.

*Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence*.

*Automatica*.

*Recursive Nonlinear Estimation*. Lecture Notes in Control and Information Sciences.

*BMC Neuroscience*.

*Journal of Machine Learning Research*.

*Monthly Weather Review*.

*arXiv:1703.08596 [Cs, Math, Stat]*.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*.

*IEEE Transactions on Information Theory*.

*1975 IEEE Conference on Decision and Control Including the 14th Symposium on Adaptive Processes*.

*Proceedings of the IEEE*.

*IEEE Transactions on Signal Processing*.

*Proceedings of ICLR*.

*Journal of Process Control*, DYCOPS-CAB 2016,.

*WIREs Computational Statistics*.

*Journal of Computational and Applied Mathematics*.

*Journal of Agricultural, Biological and Environmental Statistics*.

*International Conference on Machine Learning*.

*44th IEEE Conference on Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC ’05*.

*arXiv:1703.00209 [Math, Stat]*.

*Principles and Practice of Constraint Programming*. Lecture Notes in Computer Science.

*IEEE Spectrum*.

*Mathematical System Theory: The Influence of R. E. Kalman*.

*IEEE Control Systems*.

*Automatica*.

*Stochastic systems: theory and applications*.

*Journal of Machine Learning Research*.

*Proceedings of the IEEE*.

*IEEE ASSP Magazine*.

*Stochastic Control*. IFAC Symposia Series.

*2010 13th International Conference on Information Fusion*.

*ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*.

*Proceedings of the 7th International Conference on New Interfaces for Musical Expression*. NIME ’07.

*Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases*.

*Proceedings of the International Computer Music Conference 2011*.

*Journal of Time Series Analysis*.

*EURASIP Journal on Advances in Signal Processing*.

*Journal of Computer and Systems Sciences International*.

*IEEE Transactions on Automatic Control*.

*Bayesian Filtering and Smoothing*. Institute of Mathematical Statistics Textbooks 3.

*Artificial Intelligence and Statistics*.

*2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)*.

*IEEE Transactions on Automatic Control*.

*IEEE Signal Processing Magazine*.

*Advances In Neural Information Processing Systems*.

*arXiv:2103.10153 [Cs, Stat]*.

*IEEE Transactions on Information Theory*.

*Kybernetika*.

*IEEE Spectrum*.

*The Annals of Applied Statistics*.

*Journal of the American Statistical Association*.

*Proceedings of the International Conference on Machine Learning*.

*Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics*.

*Physica D: Nonlinear Phenomena*, Data Assimilation,.

*Environmental and Ecological Statistics*.

*2007 5th International Symposium on Image and Signal Processing and Analysis*.