# Ensemble Kalman methods

## Data Assimilation; Data fusion; Sloppy filters for over-ambitious models


A random-sampling variant/generalisation of the Kalman-Bucy filter. That also describes particle filters, but the randomisation is different than those. We can do both types of randomisation. This sent has a few tweaks that make it more tenable in tricky situations with high dimensional state spaces or nonlinearities in inconvenient places. A popular data assimilation method for spatiotemporal models.

Ensemble Kalman filters make it somewhat easier to wring estimates out of data.

## Tutorial introductions

Katzfuss, Stroud, and Wikle (2016);Roth et al. (2017);Fearnhead and Künsch (2018), are all pretty good. Schillings and Stuart (2017) has been recommended by Haber, Lucka, and Ruthotto (2018) as the canonical modern version. Wikle and Berliner (2007) present a broad data assimilation context on these methods, although it is too curt to be helpful for me. Mandel (2009) is helpfully longer. The inventor of the method explains it in Geir Evensen (2003), but I could make neither head nor tail of that, since it uses too much oceanography terminology. Roth et al. (2017) is probably the best for my background. Let us copy their notation.

We start from the discrete-time state-space models; the basic one is the linear system \begin{aligned} x_{k+1} &=F x_{k}+G v_{k}, \\ y_{k} &=H x_{k}+e_{k}, \end{aligned} with state $$x\in\mathbb{R}^n$$ and the measurement $$y\in\mathbb{R}^m$$. The initial state $$x_{0}$$, the process noise $$v_{k}$$, and the measurement noise $$e_{k}$$ are mutually independent such that \begin{aligned} \Ex x_{0}&=\hat{x}_{0}\\ \Ex v_{k}&=0\\ \Ex e_{k}&=0\\ \cov x_{0} &=P_{0}\\ \cov v_{k} & =Q\\ \cov e_{k}&=R \end{aligned} and all are Gaussian.

The Kalman filter propagates state estimates $$\hat{x}_{k \mid k}$$ and covariance matrices $$P_{k \mid k}$$ for this model. The KF update or prediction or forecast is given by the step \begin{aligned} &\hat{x}_{k+1 \mid k}=F \hat{x}_{k \mid k} \\ &P_{k+1 \mid k}=F P_{k \mid k} F^{\top}+G Q G^{\top} \end{aligned} We predict the observations forward using these state estimates via \begin{aligned} \hat{y}_{k \mid k-1} &=H \hat{x}_{k \mid k-1}, \\ S_{k} &=H P_{k \mid k-1} H^{\top}+R . \end{aligned} Given these and an actual observation, we update the state estimates using a gain matrix, $$K_{k}$$ \begin{aligned} \hat{x}_{k \mid k} &=\hat{x}_{k \mid k-1}+K_{k}\left(y_{k}-\hat{y}_{k \mid k-1}\right) \\ &=\left(I-K_{k} H\right) \hat{x}_{k \mid k-1}+K_{k} y_{k}, \\ P_{k \mid k} &=\left(I-K_{k} H\right) P_{k \mid k-1}\left(I-K_{k} H\right)^{\top}+K_{k} R K_{k}^{\top}. \end{aligned} in what geoscience types refer to as an analysis update. The variance-minimising gain is given $K_{k}=P_{k \mid k-1} H^{\top} S_{k}^{-1}=M_{k} S_{k}^{-1},$ where $$M_{k}$$ is the cross-covariance between the state and output predictions.

In the Ensemble Kalman filter, we approximate some of these quantities of interest using samples; this allows us to relax the assumption of Gaussianity and gets us computational savings in certain problems of interest. That does sound very similar to particle filters, and indeed there is a relation.

Various extensions of Kalman filters as per Katzfuss, Stroud, and Wikle (2016).

Instead of maintaining the $$n$$-dimensional estimate $$\hat{x}_{k \mid k}$$ and the $$n \times n$$ covariance $$P_{k \mid k}$$ as such, we maintain an ensemble of $$N<n$$ sampled state realizations $X_{k}:=\left[x_{k}^{(i)}\right]_{i=1}^{N}.$ This notation is intended to imply that we are treating these realisations as an $$n \times N$$ matrix $$X_{k \mid k}$$ with columns $$x_{k}^{(i)}$$. We introduce the following notation for ensemble moments: \begin{aligned} &\bar{x}_{k \mid k}=\frac{1}{N} X_{k \mid k} \one \\ &\bar{P}_{k \mid k}=\frac{1}{N-1} \widetilde{X}_{k \mid k} \widetilde{X}_{k \mid k}^{\top}, \end{aligned} where $$\one=[1, \ldots, 1]^{\top}$$ is an $$N$$-dimensional vector and $\widetilde{X}_{k \mid k}=X_{k \mid k}-\bar{x}_{k \mid k} \one^{\top}=X_{k \mid k}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right)$ is an ensemble of anomalies/deviations from $$\bar{x}_{k \mid k}$$, which I would call it the centred version. We attempt to match the moments of the ensemble with those realised by a true Kalman filter, in the sense that \begin{aligned} &\bar{x}_{k \mid k}:=\frac{1}{N} \sum_{i=1}^{N} x_{k}^{(i)} \approx \hat{x}_{k \mid k}, \\ &\bar{P}_{k \mid k}:=\frac{1}{N-1} \sum_{i=1}^{N}\left(x_{k}^{(i)}-\bar{x}_{k \mid k}\right)\left(x_{k}^{(i)}-\bar{x}_{k \mid k}\right)^{\top} \approx P_{k \mid k} . \end{aligned} The forecast step computes $$X_{k+1 \mid k}$$ such that its moments are close to $$\hat{x}_{k+1 \mid k}$$ and $$P_{k+1 \mid k}$$. An ensemble of $$N$$ independent process noise realizations $$V_{k}:=\left[v_{k}^{(i)}\right]_{i=1}^{N}$$ with zero mean and covariance $$Q$$, is used in $X_{k+1 \mid k}=F X_{k \mid k}+G V_{k}.$

Next the $$X_{k \mid k-1}$$ is adjusted to obtain the filtering ensemble $$X_{k \mid k}$$ by applying an update to each ensemble member: With some gain matrix $$\bar{K}_{k}$$ the KF update is applied to the ensemble by the update $X_{k \mid k}=\left(I-\bar{K}_{k} H\right) X_{k \mid k-1}+\bar{K}_{k} y_{k} \one^{\top} .$ This does not yet approximate the update of the full Kalman observation — there is no term $$\bar{K}_{k} R \bar{K}_{k}^{\top}$$; We have a choice how to implement that.

### Stochastic EnKF update

In the stochastic method, we use artificial zero-mean measurement noise realizations $$E_{k}:=\left[e_{k}^{(i)}\right]_{i=1}^{N}$$ with covariance $$R$$. $X_{k \mid k}=\left(I-\bar{K}_{k} H\right) X_{k \mid k-1}+\bar{K}_{k} y_{k} \one^{\top}-\bar{K}_{k} E_{k} .$ The resulting $$X_{k \mid k}$$ has the correct ensemble mean and covariance, $$\hat{x}_{k \mid k}$$ and $$P_{k \mid k}$$.

If we define a predicted output ensemble $Y_{k \mid k-1}=H X_{k \mid k-1}+E_{k}$ that evokes the classic Kalman update (and encapsulates information about) $$\hat{y}_{k \mid k-1}$$ and $$S_{k}$$, we can rewrite this update into one that resembles the Kalman update: $X_{k \mid k}=X_{k \mid k-1}+\bar{K}_{k}\left(y_{k} \one^{\top}-Y_{k \mid k-1}\right) .$

Now, the gain matrix $$\bar{K}_{k}$$ in the classic KF is computed from the covariance matrices of the predicted state and output. In the EnKF, the required $$M_{k}$$ and $$S_{k}$$ must be estimated from the prediction ensembles. The obvious way of doing that is to once again centre the ensemble, \begin{aligned} &\widetilde{X}_{k \mid k-1}=X_{k \mid k-1}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right) \\ &\widetilde{Y}_{k \mid k-1}=Y_{k \mid k-1}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right) \end{aligned} and use the empirical ensemble covariances \begin{aligned} \bar{M}_{k} &=\frac{1}{N-1} \widetilde{X}_{k \mid k-1} \widetilde{X}_{k \mid k-1}^{\top}, \\ \bar{S}_{k} &=\frac{1}{N-1} \widetilde{Y}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top} . \end{aligned} The gain $$\bar{K}_{k}$$ is then the solution to the system of linear equations, $\bar{K}_{k} \widetilde{Y}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top}=\widetilde{X}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top}$

### Deterministic update

Resemblance to unscented/sigma-point filtering also apparent. TBD.

The additive measurement noise model we have used the $$e_{k}$$ for should not affect the cross covariance $$M_k$$. Thus it is reasonable to make the substitution $\widetilde{Y}_{k \mid k-1}\longrightarrow \widetilde{Z}_{k \mid k-1}=H \widetilde{X}_{k \mid k-1}$ to get a less noisy update \begin{aligned} \bar{M}_{k} &=\frac{1}{N-1} \widetilde{X}_{k \mid k-1} \widetilde{Z}_{k \mid k-1}^{\top} \\ \bar{S}_{k} &=\frac{1}{N-1} \widetilde{Z}_{k \mid k-1} \widetilde{Z}_{k \mid k-1}^{\top}+R \end{aligned} The Kalman gain $$\bar{K}_{k}$$ is then computed as in the KF. Or we can interpret it as a matrix square-root $$R^{\frac{1}{2}}$$ with $$R^{\frac{1}{2}} R^{\frac{\top}{2}}=R$$ and then factorize $\bar{S}_{k}=\left[\begin{array}{cc}\frac{1}{\sqrt{N-1}} \widetilde{Z}_{k \mid k-1}\quad R^{\frac{1}{2}}\end{array}\right] \left[\begin{array}{c}\frac{1}{\sqrt{N-1}} \widetilde{Z}^{\top}_{k \mid k-1} \\ R^{\frac{\top}{2}}\end{array}\right].$

TBD: EAKF and ETKF which deterministically propagate an estimate $P_{k \mid k}^{\frac{1}{2}} P_{k \mid k}^{\frac{\top}{2}}=P_{k \mid k}$ which introduces less sampling noise. Roth et al. (2017) explain it as rewriting the measurement update to use a square root $$P_{k \mid k-1}^{\frac{1}{2}}$$ and in particular the ensemble approximation $$\frac{1}{N-1} \widetilde{X}_{k \mid k-1}$$ : \begin{aligned} P_{k \mid k} &=\left(I-K_{k} H\right) P_{k \mid k-1} \\ &=P_{k \mid k-1}^{\frac{1}{2}}\left(I-P_{k \mid k-1}^{\frac{\top}{2}} H^{\top} S_{k}^{-1} H P_{k \mid k-1}^{\frac{1}{2}}\right) P_{k \mid k-1}^{\frac{\top}{2}} \\ & \approx \frac{1}{N-1} \widetilde{X}_{k \mid k-1}\left(I-\frac{1}{N-1} \widetilde{Z}_{k \mid k-1}^{\top} \bar{S}_{k}^{-1} \widetilde{Z}_{k \mid k-1}\right) \widetilde{X}_{k \mid k-1}^{\top}. \end{aligned} Factorising, $\left(I-\frac{1}{N-1} \widetilde{Z}_{k \mid k-1}^{\top} \bar{S}_{k}^{-1} \widetilde{Z}_{k \mid k-1}\right)=\Pi_{k}^{\frac{1}{2}} \Pi_{k}^{\frac{\top}{2}},$ The $$\Pi_{k}^{\frac{1}{2}}\in\mathbb{R}^{N\times N}$$ can be used to create a deviation ensemble $\tilde{X}_{k \mid k}=\tilde{X}_{k \mid k-1} \Pi_{k}^{\frac{1}{2}}$ that correctly encodes $$P_{k \mid k}$$ without using random perturbations. The actual filtering is achieved by updating each sample according to $\bar{x}_{k \mid k}=\left(I-\bar{K}_{k} H\right) F_{x_{k-1 \mid k-1}}+\bar{K}_{k} y_{k},$ where $$\bar{K}_{k}$$ is computed from the deviation ensembles.

## As least-squares

TBD. Permits calculating the operations without forming covariance matrices.

TBD

## Monte Carlo moves in the ensemble

The ensemble is rank deficient. Question: When can we sample other states from the ensemble to improve the rank by stationary posterior moves?

TBD

## Ensemble methods in smoothing

Katzfuss, Stroud, and Wikle (2016) claims there are two major approaches to smoothing: Stroud et al. (2010) -type reverse methods, and the EnKS which augments the states with lagged copies rather than doing a reverse pass.

Here are some other papers I saw N. K. Chada, Chen, and Sanz-Alonso (2021);Luo et al. (2015);White (2018);Zhang et al. (2018).

## System identification in

Can we use ensemble methods for online parameter estimation? Apparently. G. Evensen (2009);Malartic, Farchi, and Bocquet (2021);Moradkhani et al. (2005);Fearnhead and Künsch (2018).

## Theoretical basis for probabilists

Bishop and Del Moral (2020);Del Moral, Kurtzmann, and Tugaut (2017);Garbuno-Inigo et al. (2020);Kelly, Law, and Stuart (2014);Le Gland, Monbet, and Tran (2009);Taghvaei and Mehta (2019).

## Lanczos trick in precision estimates

Pleiss et al. (2018),Ubaru, Chen, and Saad (2017).

## Relation to particle filters

Intimate. See particle filters.

## Schilling’s filter

Claudia Schilling’s filter is an elegant version which looks somehow more general than the original but also simpler. Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.

## Hutchinson trace estimator for

Shakir Mohamed mentions Hutchinson’s Trick, and was introduced to it, as I was, by Dr Maurizio Filippone. This trick also works with efficiently with the ensemble Kalman filter, where the randomised products are cheap.

TBD

## References

Alsup, Terrence, Luca Venturi, and Benjamin Peherstorfer. 2022. In Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, 93–117. PMLR.
Alzraiee, Ayman H., Jeremy T. White, Matthew J. Knowling, Randall J. Hunt, and Michael N. Fienen. 2022. Environmental Modelling & Software 150 (April): 105284.
Anderson, Jeffrey L. 2007. Physica D: Nonlinear Phenomena, Data Assimilation, 230 (1): 99–111.
———. 2009. IEEE Control Systems Magazine 29 (3): 66–82.
Anderson, Jeffrey, Tim Hoar, Kevin Raeder, Hui Liu, Nancy Collins, Ryan Torn, and Avelino Avellano. 2009. Bulletin of the American Meteorological Society 90 (9): 1283–96.
Bickel, Peter J., and Elizaveta Levina. 2008. The Annals of Statistics 36 (1): 199–227.
Bishop, Adrian N., and Pierre Del Moral. 2020. arXiv:2006.08843 [Math, Stat], June.
Bocquet, Marc, Carlos A. Pires, and Lin Wu. 2010. Monthly Weather Review 138 (8): 2997–3023.
Chada, Neil K., Yuming Chen, and Daniel Sanz-Alonso. 2021. Foundations of Data Science 3 (3): 331.
Chada, Neil, and Xin Tong. 2022. Mathematics of Computation 91 (335): 1247–80.
Chen, Chong, Yixuan Dou, Jie Chen, and Yaru Xue. 2022. The Journal of Supercomputing, June.
Chen, Yuming, Daniel Sanz-Alonso, and Rebecca Willett. 2021. arXiv:2107.07687 [Cs, Stat], July.
Del Moral, P., A. Kurtzmann, and J. Tugaut. 2017. SIAM Journal on Control and Optimization 55 (1): 119–55.
Dubrule, Olivier. 2018. In Handbook of Mathematical Geosciences: Fifty Years of IAMG, edited by B.S. Daya Sagar, Qiuming Cheng, and Frits Agterberg, 3–24. Cham: Springer International Publishing.
Duffin, Connor, Edward Cripps, Thomas Stemler, and Mark Girolami. 2021. Proceedings of the National Academy of Sciences 118 (2).
Dunbar, Oliver R. A., Andrew B. Duncan, Andrew M. Stuart, and Marie-Therese Wolfram. 2022. SIAM Journal on Applied Dynamical Systems 21 (2): 1539–72.
Evensen, G. 2009. IEEE Control Systems 29 (3): 83–104.
Evensen, Geir. 2003. Ocean Dynamics 53 (4): 343–67.
———. 2004. Ocean Dynamics 54 (6): 539–60.
———. 2009. Data Assimilation - The Ensemble Kalman Filter. Berlin; Heidelberg: Springer.
Evensen, Geir, and Peter Jan van Leeuwen. 2000. Monthly Weather Review 128 (6): 1852–67.
Fearnhead, Paul, and Hans R. Künsch. 2018. Annual Review of Statistics and Its Application 5 (1): 421–49.
Finn, Tobias Sebastian, Gernot Geppert, and Felix Ament. 2021. Preprint. Catchment hydrology/Modelling approaches.
Furrer, R., and T. Bengtsson. 2007. Journal of Multivariate Analysis 98 (2): 227–55.
Furrer, Reinhard, Marc G Genton, and Douglas Nychka. 2006. Journal of Computational and Graphical Statistics 15 (3): 502–23.
Galy-Fajou, Théo, Valerio Perrone, and Manfred Opper. 2021. Entropy 23 (8): 990.
Garbuno-Inigo, Alfredo, Franca Hoffmann, Wuchen Li, and Andrew M. Stuart. 2020. SIAM Journal on Applied Dynamical Systems 19 (1): 412–41.
Grooms, Ian, and Gregor Robinson. 2021. PLOS ONE 16 (3): e0248266.
Guth, Philipp A., Claudia Schillings, and Simon Weissmann. 2020. arXiv.
Haber, Eldad, Felix Lucka, and Lars Ruthotto. 2018. arXiv:1805.08034 [Cs, Math], May.
Hou, Elizabeth, Earl Lawrence, and Alfred O. Hero. 2016. arXiv:1610.00195 [Physics, Stat], October.
Houtekamer, P. L., and Herschel L. Mitchell. 2001. Monthly Weather Review 129 (1): 123–37.
Houtekamer, P. L., and Fuqing Zhang. 2016. Monthly Weather Review 144 (12): 4489–4532.
Huang, Daniel Zhengyu, Tapio Schneider, and Andrew M. Stuart. 2022. Journal of Computational Physics 463 (August): 111262.
Kantas, Nikolas, Arnaud Doucet, Sumeetpal S. Singh, Jan Maciejowski, and Nicolas Chopin. 2015. Statistical Science 30 (3): 328–51.
Katzfuss, Matthias, Jonathan R. Stroud, and Christopher K. Wikle. 2016. The American Statistician 70 (4): 350–57.
Kelly, D. T. B., K. J. H. Law, and A. M. Stuart. 2014. Nonlinearity 27 (10): 2579.
Kovachki, Nikola B., and Andrew M. Stuart. 2019. Inverse Problems 35 (9): 095005.
Kuzin, Danil, Le Yang, Olga Isupova, and Lyudmila Mihaylova. 2018. 2018 21st International Conference on Information Fusion (FUSION), July, 39–46.
Lakshmivarahan, S., and David J. Stensrud. 2009. IEEE Control Systems Magazine 29 (3): 34–46.
Law, Kody J. H., Hamidou Tembine, and Raul Tempone. 2016. SIAM Journal on Scientific Computing 38 (3).
Le Gland, François, Valerie Monbet, and Vu-Duc Tran. 2009. 25.
Lei, Jing, Peter Bickel, and Chris Snyder. 2009. Monthly Weather Review 138 (4): 1293–1306.
Luo, Xiaodong, Andreas S. Stordal, Rolf J. Lorentzen, and Geir Nævdal. 2015. SPE Journal 20 (05): 962–82.
Malartic, Quentin, Alban Farchi, and Marc Bocquet. 2021. arXiv:2107.11253 [Nlin, Physics:physics, Stat], July.
Mandel, Jan. 2009. arXiv:0901.3725 [Physics], January.
Mitchell, Herschel L., and P. L. Houtekamer. 2000. Monthly Weather Review 128 (2): 416.
Moradkhani, Hamid, Soroosh Sorooshian, Hoshin V. Gupta, and Paul R. Houser. 2005. Advances in Water Resources 28 (2): 135–47.
Pleiss, Geoff, Jacob R. Gardner, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018. In. arXiv.
Popov, Andrey Anatoliyevich. 2022. ETD. Virginia Tech.
Reich, Sebastian, and Simon Weissmann. 2019. November.
Roth, Michael, Gustaf Hendeby, Carsten Fritsche, and Fredrik Gustafsson. 2017. EURASIP Journal on Advances in Signal Processing 2017 (1): 56.
Sainsbury-Dale, Matthew, Andrew Zammit-Mangion, and Raphaël Huser. 2022. arXiv.
Schillings, Claudia, and Andrew M. Stuart. 2017. SIAM Journal on Numerical Analysis 55 (3): 1264–90.
Schneider, Tapio, Andrew M. Stuart, and Jin-Long Wu. 2022. Journal of Computational Physics 470 (December): 111559.
Shumway, Robert H., and David S. Stoffer. 2011. Time Series Analysis and Its Applications. Springer Texts in Statistics. New York, NY: Springer New York.
Spantini, Alessio, Ricardo Baptista, and Youssef Marzouk. 2022. SIAM Review 64 (4): 921–53.
Stordal, Andreas S., Rafael J. Moraes, Patrick N. Raanes, and Geir Evensen. 2021. Mathematical Geosciences 53 (3): 375–93.
Stroud, Jonathan R., Matthias Katzfuss, and Christopher K. Wikle. 2018. Monthly Weather Review 146 (1): 373–86.
Stroud, Jonathan R., Michael L. Stein, Barry M. Lesht, David J. Schwab, and Dmitry Beletsky. 2010. Journal of the American Statistical Association 105 (491): 978–90.
Taghvaei, Amirhossein, and Prashant G. Mehta. 2019. October.
———. 2021. IEEE Transactions on Automatic Control 66 (7): 3052–67.
Tamang, Sagar K., Ardeshir Ebtehaj, Peter J. van Leeuwen, Dongmian Zou, and Gilad Lerman. 2021. Nonlinear Processes in Geophysics 28 (3): 295–309.
Tippett, Michael K., Jeffrey L. Anderson, Craig H. Bishop, Thomas M. Hamill, and Jeffrey S. Whitaker. 2003. Monthly Weather Review 131 (7): 1485–90.
Ubaru, Shashanka, Jie Chen, and Yousef Saad. 2017. SIAM Journal on Matrix Analysis and Applications 38 (4): 1075–99.
Wen, Linjie, and Jinglai Li. 2022. Statistics and Computing 32 (6): 97.
White, Jeremy T. 2018. Environmental Modelling & Software 109 (November): 191–201.
Wikle, Christopher K., and L. Mark Berliner. 2007. Physica D: Nonlinear Phenomena, Data Assimilation, 230 (1): 1–16.
Wikle, Christopher K., and Mevin B. Hooten. 2010. TEST 19 (3): 417–51.
Yang, Biao, Jonathan R. Stroud, and Gabriel Huerta. 2018. Bayesian Analysis 13 (4): 1137–61.
Yegenoglu, Alper, Kai Krajsek, Sandra Diaz Pier, and Michael Herty. 2020. In Machine Learning, Optimization, and Data Science, edited by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Giorgio Jansen, Vincenzo Sciacca, Panos Pardalos, Giovanni Giuffrida, and Renato Umeton, 12566:78–92. Cham: Springer International Publishing.
Zhang, Jiangjiang, Guang Lin, Weixuan Li, Laosheng Wu, and Lingzao Zeng. 2018. Water Resources Research 54 (3): 1716–33.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.