Ensemble Kalman methods
Data Assimilation; Data fusion; Sloppy updates for messy models
2015-06-22 — 2025-09-07
Wherein ensemble approximations are employed to propagate low-rank state covariances via N-member anomalies, and updates are effected in the N−1 ensemble subspace using perturbed or square‑root observation transforms
\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\one}{\unicode{x1D7D9}}\]
A random-sampling variant/generalisation of the Kalman-Bucy filter. That also describes particle filters, but the randomisation in ensemble methods is different from those. We can do both types of randomisation. This method has a few tweaks that make it more tenable in tricky situations with high-dimensional state spaces or nonlinearities in inconvenient places. A popular data assimilation method for spatiotemporal models.
1 Tutorial introductions
Katzfuss, Stroud, and Wikle (2016), Roth et al. (2017) and Fearnhead and Künsch (2018) are all pretty good. Schillings and Stuart (2017) has been recommended by Haber, Lucka, and Ruthotto (2018) as the canonical ‘modern’ version. Wikle and Berliner (2007) presents a broad data assimilation context, although it is too curt to be helpful for me. Mandel (2009) is helpfully longer. The inventor of the method explains it in Evensen (2003), but I found that paper hard going, since it uses lots of oceanography terminology, which is a barrier to entry for non-oceanographers. Roth et al. (2017) is probably the best for my background. Let us copy their notation.
We start from the discrete-time state-space models; the basic one is the linear system: \[ \begin{aligned} x_{k+1} &=F x_{k}+G v_{k}, \\ y_{k} &=H x_{k}+e_{k}, \end{aligned} \] with state \(x\in\mathbb{R}^n\) and measurement \(y\in\mathbb{R}^m\). The initial state \(x_{0}\), the process noise \(v_{k}\), and the measurement noise \(e_{k}\) are mutually independent such that \[\begin{aligned} \Ex x_{0}&=\hat{x}_{0}\\ \Ex v_{k}&=0\\ \Ex e_{k}&=0\\ \cov x_{0} &=P_{0}\\ \cov v_{k} & =Q\\ \cov e_{k}&=R \end{aligned}\] and all are Gaussian.
The Kalman filter propagates state estimates \(\hat{x}_{k \mid k}\) and covariance matrices \(P_{k \mid k}\) for this model.
Time (prediction/forecast) step \[ \hat{x}_{k+1 \mid k}=F \hat{x}_{k \mid k},\qquad P_{k+1 \mid k}=F P_{k \mid k} F^{\top}+G Q G^{\top}. \] Predicted observation and its covariance \[ \hat{y}_{k \mid k-1} =H \hat{x}_{k \mid k-1},\qquad S_{k} =H P_{k \mid k-1} H^{\top}+R . \]
Measurement (update/analysis) step \[ \begin{aligned} \hat{x}_{k \mid k} &=\hat{x}_{k \mid k-1}+K_{k}\left(y_{k}-\hat{y}_{k \mid k-1}\right), \\ P_{k \mid k} &=\left(I-K_{k} H\right) P_{k \mid k-1}\left(I-K_{k} H\right)^{\top}+K_{k} R K_{k}^{\top}, \end{aligned} \] with variance-minimising gain \[ K_{k}=P_{k \mid k-1} H^{\top} S_{k}^{-1}=M_{k} S_{k}^{-1}, \] where \(M_{k}\) is the cross-covariance between the state and predicted output, \(M_k=P_{k|k-1}H^\top\).
In the Ensemble Kalman filter (EnKF), we approximate these statistics with samples. That relaxes strict Gaussianity and, crucially, enables low-rank computation when ensembles are small.
Instead of maintaining the \(n\times n\) covariance \(P_{k \mid k}\) explicitly, we maintain an ensemble of \(N\) state realisations \[ X_{k}:=\left[x_{k}^{(i)}\right]_{i=1}^{N}\in\mathbb R^{n\times N}. \] Ensemble moments: \[ \bar{x}_{k \mid k}=\frac{1}{N} X_{k \mid k} \one,\qquad \widetilde{X}_{k \mid k}=X_{k \mid k}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right),\qquad \bar{P}_{k \mid k}=\frac{1}{N-1} \widetilde{X}_{k \mid k} \widetilde{X}_{k \mid k}^{\top}. \] We attempt to match the ensemble moments to the KF moments: \[ \bar{x}_{k \mid k}\approx \hat{x}_{k \mid k},\qquad \bar{P}_{k \mid k}\approx P_{k \mid k}. \] Forecasting with process noise samples \(V_k=[v_k^{(i)}]_{i=1}^N\) (zero-mean, covariance \(Q\)): \[ X_{k+1 \mid k}=F X_{k \mid k}+G V_{k}. \]
Next the \(X_{k \mid k-1}\) is adjusted to obtain the filtering ensemble \(X_{k \mid k}\) by applying an update to each ensemble member: With some gain matrix \(\bar{K}_{k}\) the KF update is applied to the ensemble by the update \[ X_{k \mid k}=\left(I-\bar{K}_{k} H\right) X_{k \mid k-1}+\bar{K}_{k} y_{k} \one^{\top} . \] This does not yet approximate the update of the full Kalman observation — there is no term \(\bar{K}_{k} R \bar{K}_{k}^{\top}\); We have a choice how to implement that.
1.1 “Stochastic” EnKF update
Using measurement-noise realisations \(E_k=[e_k^{(i)}]_{i=1}^N\) with covariance \(R\), \[ X_{k \mid k}=X_{k \mid k-1}+\bar{K}_{k}\left(y_{k} \one^{\top}-Y_{k \mid k-1}\right), \quad Y_{k \mid k-1}:=H X_{k \mid k-1}+E_{k}. \] This gives the correct mean and covariance in expectation. The gain is solved from sample (co)variances: \[ \widetilde{Y}_{k \mid k-1}=Y_{k \mid k-1}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right),\qquad \bar{M}_{k}=\frac{1}{N-1}\widetilde{X}_{k \mid k-1}\,\widetilde{Y}_{k \mid k-1}^{\top},\qquad \bar{S}_{k}=\frac{1}{N-1}\widetilde{Y}_{k \mid k-1}\,\widetilde{Y}_{k \mid k-1}^{\top}, \] and \[ \bar{K}_{k}\,\bar{S}_{k}=\bar{M}_{k}. \]
1.2 “Deterministic” / square-root EnKF
To reduce Monte Carlo noise, replace perturbed observations with a deterministic transform that updates both the mean and the anomalies (no \(E_k\) needed). For additive noise and linear \(H\): \[ \widetilde{Z}_{k \mid k-1}:=H\widetilde{X}_{k \mid k-1},\qquad \bar{S}_{k}=\frac{1}{N-1}\widetilde{Z}_{k \mid k-1}\widetilde{Z}_{k \mid k-1}^{\top}+R,\qquad \bar{M}_{k}=\frac{1}{N-1}\widetilde{X}_{k \mid k-1}\widetilde{Z}_{k \mid k-1}^{\top}. \] Square-root EnKF variants (EAKF/ETKF) apply a right-side transform to the anomalies, \[ \widetilde{X}_{k \mid k}=\widetilde{X}_{k \mid k-1}\,\Pi_k^{1/2}, \] chosen so that the sample covariance matches the Kalman analysis covariance (equivalently, a carefully constructed low-rank Joseph update).
2 Low-rank ensemble-space form (ETKF / Woodbury trick)
Let \(m=N-1\) be the ensemble subspace dimension. Define whitened obs anomalies and innovation \[ \widehat Y=\;R^{-1/2}\,\widetilde{Z}_{k \mid k-1}\in\mathbb R^{m_y\times m},\qquad \widehat d=\;R^{-1/2}\,(y_k-\hat y_{k|k-1})\in\mathbb R^{m_y}, \] and \[ S=\frac{\widehat Y}{\sqrt m},\qquad A=I_m+S^\top S. \] Then the mean increment and anomaly transform are \[ \Delta\bar x\;=\;\frac{\widetilde X_{k|k-1}}{\sqrt m}\,A^{-1}S^\top \widehat d,\qquad T\;=\;A^{-1/2},\quad \widetilde X_{k|k}=\frac{\widetilde X_{k|k-1}}{\sqrt m}\,T\,\sqrt m. \] Everything is solved in \(\mathbb R^{m\times m}\) — the Woodbury / ensemble-space trick — so cost scales with \(m=N-1\), not with state or obs dimension.
3 Localization
Tapering the covariance by spatial distance reduces spurious long-range correlations (Ott et al. 2004). A naive Schur product on full covariances, \(\tilde P = \rho \odot P\), destroys the low-rank factorisation and the efficiency above. Two ensemble-space–safe ways to localise are:
3.1 Square-root Schur localisation (global, keeps low rank)
Build SPSD tapers \(C_y\) (obs–obs) and optionally \(C_x\) (state–state). Use (matrix) square roots on anomalies: \[ \widetilde Y'\leftarrow C_y^{1/2}\,\widetilde Y',\qquad \widetilde X'\leftarrow C_x^{1/2}\,\widetilde X'. \] Whiten and proceed with the same ETKF algebra, \[ \widehat{\widetilde Y}=R^{-1/2}\,\widetilde Y',\quad \widetilde S=\widehat{\widetilde Y}/\sqrt m,\quad \widetilde A=I_m+\widetilde S^\top\widetilde S, \] \[ \Delta\bar x=\frac{\widetilde X'}{\sqrt m}\,\widetilde A^{-1}\,\widetilde S^\top\,(R^{-1/2}d),\qquad \widetilde X_a'=\frac{\widetilde X'}{\sqrt m}\,\widetilde A^{-1/2}\,\sqrt m. \] This realises the effect of a Schur taper at the covariance level while preserving the ensemble-space structure (all inversions remain \(m\times m\)). In practice, \(C_y^{1/2}\) is applied with sparse matvecs using compactly supported kernels (e.g. Gaspari–Cohn).
3.2 Local Ensemble Transform Kalman Filter (LETKF)
The LETKF (Hunt, Kostelich, and Szunyogh 2007) reduces computational burden and improves robustness by restricting each update to a small spatial window (or tile) around each state location, while keeping the low-rank ensemble-space algebra intact.
For each state location (or tile) \(s\), select nearby observations \(\mathcal J(s)\) and form local anomalies/innovation: \[ \widehat Y_s=R_s^{-1/2}\,Y'_s,\quad \widehat d_s=R_s^{-1/2}\,d_s,\quad S_s=\widehat Y_s/\sqrt m,\quad A_s=I_m+S_s^\top S_s. \] Then \[ \Delta\bar x_s=\frac{X'_s}{\sqrt m}\,A_s^{-1}S_s^\top \widehat d_s,\qquad X'_{a,s}=\frac{X'_s}{\sqrt m}\,A_s^{-1/2}\,\sqrt m. \] Apply these only to the local state entries (blend overlaps smoothly). Locality is enforced by construction; all inversions remain \(m\times m\).
3.3 As an empirical Matheron update
4 As Approximate Bayesian Computation
Nott, Marshall, and Ngoc (2012) uses Beaumont, Zhang, and Balding (2002), Blum and François (2010) and Lei and Bickel (2009) to interpret EnKF as an Approximate Bayesian computation method.
5 Convergence and consistency
Seems to be complicated (P. Del Moral, Kurtzmann, and Tugaut 2017; Kelly, Law, and Stuart 2014; Kwiatkowski and Mandel 2015; Le Gland, Monbet, and Tran 2009; Mandel, Cobb, and Beezley 2011).
6 Going nonlinear
TBD
7 Monte Carlo moves in the ensemble
The ensemble is rank deficient. Question: When can we sample other states from the ensemble to improve the rank by stationary posterior moves?
8 Use in smoothing
Katzfuss, Stroud, and Wikle (2016) claims there are two major approaches to smoothing: Reverse methods (Stroud et al. 2010) and the EnKS (Evensen and van Leeuwen 2000) which augments the states with lagged copies rather than doing a reverse pass.
There seem to be many tweaks on this idea (N. K. Chada, Chen, and Sanz-Alonso 2021; Luo et al. 2015; White 2018; Zhang et al. 2018).
9 Use in system identification
Can we use ensemble methods for online parameter estimation? Apparently there are some tricks to enable this (Evensen 2009b; Malartic, Farchi, and Bocquet 2021; Moradkhani et al. 2005; Fearnhead and Künsch 2018; Bocquet, Farchi, and Malartic 2020). The canonical method in my own opinion, which is highly biased, is the Gaussian Ensemble Belief Propagation trick.
10 Theoretical basis of EnKF for probabilists
Various works quantify this filter in terms of its convergence to interesting densities (Bishop and Del Moral 2023a; P. Del Moral, Kurtzmann, and Tugaut 2017; Garbuno-Inigo et al. 2020; Kelly, Law, and Stuart 2014; Le Gland, Monbet, and Tran 2009; Taghvaei and Mehta 2021).
11 Lanczos trick in precision estimates
12 Relation to particle filters
Intimate. See particle filters.
13 Schilling’s filter
Claudia Schilling’s filter (Schillings and Stuart 2017) is a version which looks somehow more general than the original but simple. I would like to work out what is going on there.
Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.
14 Handy low-rank tricks for
See low-rank tricks.
15 Tooling
Every grad student and every climate modeling lab makes thier own implementation.
I use the simple research library, nansencenter/DAPPER: Data Assimilation with Python: a Package for Experimental Research.
The also provide us with a helpful overview of the field. The following two tables are quoted verbatim from their work
A table of industrial Ensemble Kalman data assimilation frameworks
Name | Developers | Purpose (approximately) |
---|---|---|
DART | NCAR | General |
PDAF | AWI | General |
JEDI | JCSDA (NOAA, NASA, ++) | General |
OpenDA | TU Delft | General |
EMPIRE | Reading (Met) | General |
ERT | Statoil | History matching (Petroleum DA) |
PIPT | CIPR | History matching (Petroleum DA) |
MIKE | DHI | Oceanographic |
OAK | Liège | Oceanographic |
Siroco | OMP | Oceanographic |
Verdandi | INRIA | Biophysical DA |
PyOSSE | Edinburgh, Reading | Earth-observation DA |
A list of projects that research how to do data assimilation:
Name | Developers | Notes |
---|---|---|
DAPPER | Raanes, Chen, Grudzien | Python |
SANGOMA | Conglomerate* | Fortran, Matlab |
hIPPYlib | Villa, Petra, Ghattas | Python, adjoint-based PDE methods |
FilterPy | R. Labbe | Python. Engineering oriented. |
[DASoftware][13] | Yue Li, Stanford | Matlab. Large inverse probs. |
Pomp | U of Michigan | R |
EnKF-Matlab | Sakov | Matlab |
EnKF-C | Sakov | C. Light-weight, off-line DA |
pyda | Hickman | Python |
PyDA | Shady-Ahmed | Python |
DasPy | Xujun Han | Python |
DataAssim.jl | Alexander-Barth | Julia |
DataAssimilationBenchmarks.jl | Grudzien | Julia, Python |
EnsembleKalmanProcesses.jl | Clim. Modl. Alliance | Julia, EKI (optim) |
Datum | Raanes | Matlab |
IEnKS code | Bocquet | Python |
*
: AWI/Liege/CNRS/NERSC/Reading/Delft
16 Incoming
- DART | The Data Assimilation Research Testbed (Fortran, …matlab?) has nice tutorials, e.g. DART Tutorial
- OpenDA: Integrating models and observations (python and c++?)