# Ensemble Kalman methods

Data Assimilation; Data fusion; Sloppy updates for messy models

June 22, 2015 — October 30, 2024

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\one}{\unicode{x1D7D9}}\]

A random-sampling variant/generalisation of the Kalman-Bucy filter. That also describes particle filters, but the randomisation in ensemble methods is different from those. We can do both types of randomisation. This sent has a few tweaks that make it more tenable in tricky situations with high-dimensional state spaces or nonlinearities in inconvenient places. A popular data assimilation method for spatiotemporal models.

## 1 Tutorial introductions

Katzfuss, Stroud, and Wikle (2016), Roth et al. (2017) and Fearnhead and Künsch (2018), are all pretty good. Schillings and Stuart (2017) has been recommended by Haber, Lucka, and Ruthotto (2018) as the canonical ‘modern’ version. Wikle and Berliner (2007) presents a broad data assimilation context, although it is too curt to be helpful for me. Mandel (2009) is helpfully longer. The inventor of the method explains it in Evensen (2003), but I found that book hard going, since it uses lots of oceanography terminology, which a barrier to entry for non-oceanographers. Roth et al. (2017) is probably the best for my background. Let us copy their notation.

We start from the discrete-time state-space models; the basic one is the linear system: \[ \begin{aligned} x_{k+1} &=F x_{k}+G v_{k}, \\ y_{k} &=H x_{k}+e_{k}, \end{aligned} \] with state \(x\in\mathbb{R}^n\) and the measurement \(y\in\mathbb{R}^m\). The initial state \(x_{0}\), the process noise \(v_{k}\), and the measurement noise \(e_{k}\) are mutually independent such that \[\begin{aligned} \Ex x_{0}&=\hat{x}_{0}\\ \Ex v_{k}&=0\\ \Ex e_{k}&=0\\ \cov x_{0} &=P_{0}\\ \cov v_{k} & =Q\\ \cov e_{k}&=R \end{aligned}\] and all are Gaussian.

The Kalman filter propagates state estimates \(\hat{x}_{k \mid k}\) and covariance matrices \(P_{k \mid k}\) for this model. The KF *update* or *prediction* or *forecast* is given by the step \[
\begin{aligned}
&\hat{x}_{k+1 \mid k}=F \hat{x}_{k \mid k} \\
&P_{k+1 \mid k}=F P_{k \mid k} F^{\top}+G Q G^{\top}
\end{aligned}
\] We predict the observation statistics implied by these state estimates as \[
\begin{aligned}
\hat{y}_{k \mid k-1} &=H \hat{x}_{k \mid k-1}, \\
S_{k} &=H P_{k \mid k-1} H^{\top}+R .
\end{aligned}
\] Given these, and an actual observation, we update the state estimates using a *gain matrix*, \(K_{k}\) \[
\begin{aligned}
\hat{x}_{k \mid k} &=\hat{x}_{k \mid k-1}+K_{k}\left(y_{k}-\hat{y}_{k \mid k-1}\right) \\
&=\left(I-K_{k} H\right) \hat{x}_{k \mid k-1}+K_{k} y_{k}, \\
P_{k \mid k} &=\left(I-K_{k} H\right) P_{k \mid k-1}\left(I-K_{k} H\right)^{\top}+K_{k} R K_{k}^{\top}.
\end{aligned}
\] in what geoscience types refer to as an *analysis* update. The variance-minimising gain is given \[
K_{k}=P_{k \mid k-1} H^{\top} S_{k}^{-1}=M_{k} S_{k}^{-1},
\] where \(M_{k}\) is the cross-covariance between the state and output predictions.

In the Ensemble Kalman filter, we approximate some of these quantities of interest using samples; this allows us to relax the assumption of Gaussianity and gets us computational savings in certain problems of interest. That does sound very similar to particle filters, and indeed there is a relation.

Instead of maintaining the \(n\)-dimensional estimate \(\hat{x}_{k \mid k}\) and the \(n \times n\) covariance \(P_{k \mid k}\) as such, we maintain an ensemble of \(N<n\) sampled state realizations \[X_{k}:=\left[x_{k}^{(i)}\right]_{i=1}^{N}.\] This notation is intended to imply that we are treating these realisations as an \(n \times N\) matrix \(X_{k \mid k}\) with columns \(x_{k}^{(i)}\). We introduce the following notation for ensemble moments: \[
\begin{aligned}
&\bar{x}_{k \mid k}=\frac{1}{N} X_{k \mid k} \one \\
&\bar{P}_{k \mid k}=\frac{1}{N-1} \widetilde{X}_{k \mid k} \widetilde{X}_{k \mid k}^{\top},
\end{aligned}
\] where \(\one=[1, \ldots, 1]^{\top}\) is an \(N\)-dimensional vector and \[
\widetilde{X}_{k \mid k}=X_{k \mid k}-\bar{x}_{k \mid k} \one^{\top}=X_{k \mid k}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right)
\] is an ensemble of *anomalies*/*deviations* from \(\bar{x}_{k \mid k}\), which I would call the *centred version*. We attempt to match the moments of the ensemble with those realised by a true Kalman filter, in the sense that \[
\begin{aligned}
&\bar{x}_{k \mid k}:=\frac{1}{N} \sum_{i=1}^{N} x_{k}^{(i)} \approx \hat{x}_{k \mid k}, \\
&\bar{P}_{k \mid k}:=\frac{1}{N-1} \sum_{i=1}^{N}\left(x_{k}^{(i)}-\bar{x}_{k \mid k}\right)\left(x_{k}^{(i)}-\bar{x}_{k \mid k}\right)^{\top} \approx P_{k \mid k} .
\end{aligned}
\] The forecast step computes \(X_{k+1 \mid k}\) such that its moments are close to \(\hat{x}_{k+1 \mid k}\) and \(P_{k+1 \mid k}\). An ensemble of \(N\) independent process noise realizations \(V_{k}:=\left[v_{k}^{(i)}\right]_{i=1}^{N}\) with zero mean and covariance \(Q\), is used in \[
X_{k+1 \mid k}=F X_{k \mid k}+G V_{k}.
\]

Next the \(X_{k \mid k-1}\) is adjusted to obtain the filtering ensemble \(X_{k \mid k}\) by applying an update to each ensemble member: With some gain matrix \(\bar{K}_{k}\) the KF update is applied to the ensemble by the update \[ X_{k \mid k}=\left(I-\bar{K}_{k} H\right) X_{k \mid k-1}+\bar{K}_{k} y_{k} \one^{\top} . \] This does not yet approximate the update of the full Kalman observation — there is no term \(\bar{K}_{k} R \bar{K}_{k}^{\top}\); We have a choice how to implement that.

### 1.1 “Stochastic” EnKF update

In the stochastic method, we use artificial zero-mean measurement noise realisations \(E_{k}:=\left[e_{k}^{(i)}\right]_{i=1}^{N}\) with covariance \(R\). \[ X_{k \mid k}=\left(I-\bar{K}_{k} H\right) X_{k \mid k-1}+\bar{K}_{k} y_{k} \one^{\top}-\bar{K}_{k} E_{k} . \] The resulting \(X_{k \mid k}\) has the correct ensemble mean and covariance, \(\hat{x}_{k \mid k}\) and \(P_{k \mid k}\).

If we define a predicted output ensemble \[ Y_{k \mid k-1}=H X_{k \mid k-1}+E_{k} \] that evokes the classic Kalman update (and encapsulates information about) \(\hat{y}_{k \mid k-1}\) and \(S_{k}\), we can rewrite this update into one that resembles the Kalman update: \[ X_{k \mid k}=X_{k \mid k-1}+\bar{K}_{k}\left(y_{k} \one^{\top}-Y_{k \mid k-1}\right) . \]

Now, the gain matrix \(\bar{K}_{k}\) in the classic KF is computed from the covariance matrices of the predicted state and output. In the EnKF, the required \(M_{k}\) and \(S_{k}\) must be estimated from the prediction ensembles. The obvious way of doing that is to once again centre the ensemble, \[ \begin{aligned} &\widetilde{X}_{k \mid k-1}=X_{k \mid k-1}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right) \\ &\widetilde{Y}_{k \mid k-1}=Y_{k \mid k-1}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right) \end{aligned} \] and use the empirical ensemble covariances \[ \begin{aligned} \bar{M}_{k} &=\frac{1}{N-1} \widetilde{X}_{k \mid k-1} \widetilde{X}_{k \mid k-1}^{\top}, \\ \bar{S}_{k} &=\frac{1}{N-1} \widetilde{Y}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top} . \end{aligned} \] The gain \(\bar{K}_{k}\) is then the solution to the system of linear equations, \[ \bar{K}_{k} \widetilde{Y}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top}=\widetilde{X}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top} \]

### 1.2 “Deterministic” update

Resemblance to unscented/sigma-point filtering also apparent. TBD.

The additive measurement noise model we have used the \(e_{k}\) for should not affect the cross covariance \(M_k\). Thus it is reasonable to make the substitution \[ \widetilde{Y}_{k \mid k-1}\longrightarrow \widetilde{Z}_{k \mid k-1}=H \widetilde{X}_{k \mid k-1} \] to get a less noisy update \[ \begin{aligned} \bar{M}_{k} &=\frac{1}{N-1} \widetilde{X}_{k \mid k-1} \widetilde{Z}_{k \mid k-1}^{\top} \\ \bar{S}_{k} &=\frac{1}{N-1} \widetilde{Z}_{k \mid k-1} \widetilde{Z}_{k \mid k-1}^{\top}+R \end{aligned} \] The Kalman gain \(\bar{K}_{k}\) is then computed as in the KF. Or we can interpret it as a matrix square-root \(R^{\frac{1}{2}}\) with \(R^{\frac{1}{2}} R^{\frac{\top}{2}}=R\) and then factorize \[ \bar{S}_{k}=\left[\begin{array}{cc}\frac{1}{\sqrt{N-1}} \widetilde{Z}_{k \mid k-1}\quad R^{\frac{1}{2}}\end{array}\right] \left[\begin{array}{c}\frac{1}{\sqrt{N-1}} \widetilde{Z}^{\top}_{k \mid k-1} \\ R^{\frac{\top}{2}}\end{array}\right]. \]

### 1.3 Square root versions

EAKF and ETKF (Tippett et al. 2003) which propagate an estimate \[ P_{k \mid k}^{\frac{1}{2}} P_{k \mid k}^{\frac{\top}{2}}=P_{k \mid k} \] which seems to be more stable. Roth et al. (2017) explain it as rewriting the measurement update to use a square root \(P_{k \mid k-1}^{\frac{1}{2}}\) and in particular the ensemble approximation \(\frac{1}{N-1} \widetilde{X}_{k \mid k-1}\) : \[ \begin{aligned} P_{k \mid k} &=\left(I-K_{k} H\right) P_{k \mid k-1} \\ &=P_{k \mid k-1}^{\frac{1}{2}}\left(I-P_{k \mid k-1}^{\frac{\top}{2}} H^{\top} S_{k}^{-1} H P_{k \mid k-1}^{\frac{1}{2}}\right) P_{k \mid k-1}^{\frac{\top}{2}} \\ & \approx \frac{1}{N-1} \widetilde{X}_{k \mid k-1}\left(I-\frac{1}{N-1} \widetilde{Z}_{k \mid k-1}^{\top} \bar{S}_{k}^{-1} \widetilde{Z}_{k \mid k-1}\right) \widetilde{X}_{k \mid k-1}^{\top}. \end{aligned} \] Factorising, \[ \left(I-\frac{1}{N-1} \widetilde{Z}_{k \mid k-1}^{\top} \bar{S}_{k}^{-1} \widetilde{Z}_{k \mid k-1}\right)=\Pi_{k}^{\frac{1}{2}} \Pi_{k}^{\frac{\top}{2}}, \] The \(\Pi_{k}^{\frac{1}{2}}\in\mathbb{R}^{N\times N}\) can be used to create a deviation ensemble \[ \tilde{X}_{k \mid k}=\tilde{X}_{k \mid k-1} \Pi_{k}^{\frac{1}{2}} \] that correctly encodes \(P_{k \mid k}\) without using random perturbations. The actual filtering is achieved by updating each sample according to \[ \bar{x}_{k \mid k}=\left(I-\bar{K}_{k} H\right) F_{x_{k-1 \mid k-1}}+\bar{K}_{k} y_{k}, \] where \(\bar{K}_{k}\) is computed from the deviation ensembles.

### 1.4 As an empirical Matheron update

The EnKF is an empirical Matheron update.

## 2 As Approximate Bayesian Computation

Nott, Marshall, and Ngoc (2012) uses Beaumont, Zhang, and Balding (2002), Blum and François (2010) and Lei and Bickel (2009) to interpret EnKF as an Approximate Bayesian computation method.

## 3 Going nonlinear

TBD

## 4 Monte Carlo moves in the ensemble

The ensemble is rank deficient. Question: When can we sample other states from the ensemble to improve the rank by stationary posterior moves?

## 5 Use in smoothing

Katzfuss, Stroud, and Wikle (2016) claims there are two major approaches to smoothing: Reverse methods (Stroud et al. 2010) and the EnKS (Evensen and van Leeuwen 2000) which augments the states with lagged copies rather than doing a reverse pass.

There seem to be many tweaks on this idea (N. K. Chada, Chen, and Sanz-Alonso 2021; Luo et al. 2015; White 2018; Zhang et al. 2018).

## 6 Use in system identification

Can we use ensemble methods for online parameter estimation? Apparently (Evensen 2009b; Malartic, Farchi, and Bocquet 2021; Moradkhani et al. 2005; Fearnhead and Künsch 2018).

## 7 Theoretical basis for probabilists

Various (Bishop and Del Moral 2023; P. Del Moral, Kurtzmann, and Tugaut 2017; Garbuno-Inigo et al. 2020; Kelly, Law, and Stuart 2014; Le Gland, Monbet, and Tran 2009; Taghvaei and Mehta 2021).

## 8 Lanczos trick in precision estimates

## 9 Localisation

I’m not sure what this means, but it seems to be a thing (Hunt, Kostelich, and Szunyogh 2007; Ott et al. 2004).

## 10 Relation to particle filters

Intimate. See particle filters.

## 11 Schilling’s filter

Claudia Schilling’s filter (Schillings and Stuart 2017) is an version which looks somehow more general than the original but also simpler. I would like to work out what is going on there.

Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.

## 12 Handy low-rank tricks for

See low-rank tricks.

## 13 Incoming

- DART | The Data Assimilation Research Testbed (Fortran, …matlab?) has nice tutorials, e.g. DART Tutorial
- OpenDA: Integrating models and observations (python and c++?)

## 14 References

*Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference*.

*Environmental Modelling & Software*.

*Applied Mathematics and Computation*.

*Physica D: Nonlinear Phenomena*, Data Assimilation,.

*IEEE Control Systems Magazine*.

*Bulletin of the American Meteorological Society*.

*Genetics*.

*The Annals of Statistics*.

*Electronic Journal of Probability*.

*Mathematics of Control, Signals, and Systems*.

*Annales de l’Institut Henri Poincaré, Probabilités Et Statistiques*.

*arXiv:1701.05978 [Math]*.

*Statistics and Computing*.

*Monthly Weather Review*.

*Foundations of Data Science*.

*Mathematics of Computation*.

*The Journal of Supercomputing*.

*Computational Geosciences*.

*SIAM Journal on Control and Optimization*.

*Handbook of Mathematical Geosciences: Fifty Years of IAMG*.

*Proceedings of the National Academy of Sciences*.

*SIAM Journal on Applied Dynamical Systems*.

*Journal of Geophysical Research: Oceans*.

*Ocean Dynamics*.

*Ocean Dynamics*.

*Data Assimilation - The Ensemble Kalman Filter*.

*IEEE Control Systems*.

*Monthly Weather Review*.

*Annual Review of Statistics and Its Application*.

*Journal of Multivariate Analysis*.

*Journal of Computational and Graphical Statistics*.

*Entropy*.

*SIAM Journal on Applied Dynamical Systems*.

*PLOS ONE*.

*arXiv:1805.08034 [Cs, Math]*.

*Monthly Weather Review*.

*arXiv:1610.00195 [Physics, Stat]*.

*Monthly Weather Review*.

*Monthly Weather Review*.

*Journal of Computational Physics*.

*Physica D: Nonlinear Phenomena*, Data Assimilation,.

*Signal Processing, Sensor Fusion, and Target Recognition VI*.

*Statistical Science*.

*The American Statistician*.

*Nonlinearity*.

*Inverse Problems*.

*2018 21st International Conference on Information Fusion (FUSION)*.

*Flow, Turbulence and Combustion*.

*IEEE Control Systems Magazine*.

*SIAM Journal on Scientific Computing*.

*University of California, Berkeley, Rep*.

*Monthly Weather Review*.

*SPE Journal*.

*arXiv:2107.11253 [Nlin, Physics:physics, Stat]*.

*Proceedings of the 34th International Conference on Neural Information Processing Systems*. NIPS’20.

*Monthly Weather Review*.

*Advances in Water Resources*.

*Statistics and Computing*.

*Handbook of Spatial Statistics*.

*Journal of Climate*.

*Journal of Climate*.

*Mathematical Geosciences*.

*Tellus A: Dynamic Meteorology and Oceanography*.

*Nonlinear Processes in Geophysics*.

*EURASIP Journal on Advances in Signal Processing*.

*Advanced Numerical Modeling and Data Assimilation Techniques for Tropical Cyclone Prediction*.

*Monthly Weather Review*.

*SIAM Journal on Numerical Analysis*.

*Journal of Computational Physics*.

*Time Series Analysis and Its Applications*. Springer Texts in Statistics.

*SIAM Review*.

*Mathematical Geosciences*.

*Monthly Weather Review*.

*Journal of the American Statistical Association*.

*IEEE Transactions on Automatic Control*.

*Nonlinear Processes in Geophysics*.

*Monthly Weather Review*.

*SIAM Journal on Matrix Analysis and Applications*.

*Stochastic Hydrology and Hydraulics*.

*Statistics and Computing*.

*Environmental Modelling & Software*.

*Physica D: Nonlinear Phenomena*, Data Assimilation,.

*TEST*.

*Bayesian Analysis*.

*Machine Learning, Optimization, and Data Science*.

*Water Resources Research*.