Ensemble Kalman methods
Data Assimilation; Data fusion; Sloppy updates for messy models
June 22, 2015 — October 29, 2024
A random-sampling variant/generalisation of the Kalman-Bucy filter. That also describes particle filters, but the randomisation in ensemble methods is different from those. We can do both types of randomisation. This sent has a few tweaks that make it more tenable in tricky situations with high-dimensional state spaces or nonlinearities in inconvenient places. A popular data assimilation method for spatiotemporal models.
1 Tutorial introductions
Katzfuss, Stroud, and Wikle (2016), Roth et al. (2017) and Fearnhead and Künsch (2018), are all pretty good. Schillings and Stuart (2017) has been recommended by Haber, Lucka, and Ruthotto (2018) as the canonical ‘modern’ version. Wikle and Berliner (2007) presents a broad data assimilation context, although it is too curt to be helpful for me. Mandel (2009) is helpfully longer. The inventor of the method explains it in Evensen (2003), but I found that book hard going, since it uses lots of oceanography terminology, which a barrier to entry for non-oceanographers. Roth et al. (2017) is probably the best for my background. Let us copy their notation.
We start from the discrete-time state-space models; the basic one is the linear system:
The Kalman filter propagates state estimates
In the Ensemble Kalman filter, we approximate some of these quantities of interest using samples; this allows us to relax the assumption of Gaussianity and gets us computational savings in certain problems of interest. That does sound very similar to particle filters, and indeed there is a relation.
Instead of maintaining the
Next the
1.1 “Stochastic” EnKF update
In the stochastic method, we use artificial zero-mean measurement noise realisations
If we define a predicted output ensemble
Now, the gain matrix
1.2 “Deterministic” update
Resemblance to unscented/sigma-point filtering also apparent. TBD.
The additive measurement noise model we have used the
1.3 Square root versions
EAKF and ETKF (Tippett et al. 2003) which propagate an estimate
1.4 As an empirical Matheron update
2 As Approximate Bayesian Computation
Nott, Marshall, and Ngoc (2012) uses Beaumont, Zhang, and Balding (2002), Blum and François (2010) and Lei and Bickel (2009) to interpret EnKF as an Approximate Bayesian computation method.
3 Convergence and consistency
Seems to be complicated (P. Del Moral, Kurtzmann, and Tugaut 2017; Kelly, Law, and Stuart 2014; Kwiatkowski and Mandel 2015; Le Gland, Monbet, and Tran 2009; Mandel, Cobb, and Beezley 2011).
4 Going nonlinear
TBD
5 Monte Carlo moves in the ensemble
The ensemble is rank deficient. Question: When can we sample other states from the ensemble to improve the rank by stationary posterior moves?
6 Use in smoothing
Katzfuss, Stroud, and Wikle (2016) claims there are two major approaches to smoothing: Reverse methods (Stroud et al. 2010) and the EnKS (Evensen and van Leeuwen 2000) which augments the states with lagged copies rather than doing a reverse pass.
There seem to be many tweaks on this idea (N. K. Chada, Chen, and Sanz-Alonso 2021; Luo et al. 2015; White 2018; Zhang et al. 2018).
7 Use in system identification
Can we use ensemble methods for online parameter estimation? Apparently (Evensen 2009b; Malartic, Farchi, and Bocquet 2021; Moradkhani et al. 2005; Fearnhead and Künsch 2018; Bocquet, Farchi, and Malartic n.d.).
8 Theoretical basis for probabilists
Various works quantify this filter in terms of its convergence to interesting densities (Bishop and Del Moral 2023a; P. Del Moral, Kurtzmann, and Tugaut 2017; Garbuno-Inigo et al. 2020; Kelly, Law, and Stuart 2014; Le Gland, Monbet, and Tran 2009; Taghvaei and Mehta 2021).
9 Lanczos trick in precision estimates
10 Localisation
Tapering the covariance by spatial distance to reduce spurious global correlation (Ott et al. 2004).
11 Local Ensemble Transform Kalman Filter
The LETKF (Hunt, Kostelich, and Szunyogh 2007) is a variant that uses a localisation step to reduce the computational burden of the EnKF. I am not sure what that means exactly, but I have seen it mentioned so am tracking it.
12 Relation to particle filters
Intimate. See particle filters.
13 Schilling’s filter
Claudia Schilling’s filter (Schillings and Stuart 2017) is an version which looks somehow more general than the original but also simpler. I would like to work out what is going on there.
Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.
14 Handy low-rank tricks for
See low-rank tricks.
15 Incoming
- DART | The Data Assimilation Research Testbed (Fortran, …matlab?) has nice tutorials, e.g. DART Tutorial
- OpenDA: Integrating models and observations (python and c++?)