Ensemble Kalman updates are empirical Matheron updates

2022-06-22 — 2025-08-11

Bayes
distributed
dynamical systems
generative
linear algebra
machine learning
Monte Carlo
optimization
particle
probabilistic algorithms
probability
sciml
signal processing
state space models
statistics
stochastic processes
time series

Assumed audience:

Machine learning researchers and geospatial statisticians, together in peace at last

A note about a fact that I use a lot, which is that the Ensemble Kalman Filter can be formulated as, and IMO more conveniently explained as, an empirical version of the Matheron update, which is a handy tool from Gaussian process regression.

This is one of those facts that I am amazed does not seem to be widely used. I am not the first person to notice this simple fact — in fact, it is constantly rediscovered — but then it is somehow still “unknown” in that no-one seems to exploit it practically.

If you prefer your maths in PDF form, check out the arXiv report that this post is translated from (MacKinlay 2025). The fact that this was translated from a semi-formal paper might also warn you that the language here will be stilted and jargony. I may fix that if I have time some day.

TL;DR

Given an ensemble of joint prior samples \(\{(x^{(i)},y^{(i)})\}_{i=1}^N\), the empirical Matheron update

\[ x^{(i)}_{\text{post}} \;=\; x^{(i)} \;+\; \widehat C_{xy}\,\widehat C_{yy}^{-1}\,\bigl(y^*-y^{(i)}\bigr) \]

is exactly the stochastic EnKF analysis step (a.k.a. EnKF with perturbed observations). That’s the whole fact.

This is really handy in high dimension

If we want posterior samples (and in large \(d\) we usually do want posterior samples rather than densities), the Matheron/EnKF rule is an affine pushforward that sends a prior draw \(x\) to a posterior draw \(x' = x + C_{xy}C_{yy}^{-1}(y^* - y)\). No posterior \(d\times d\) factorization, no storage of dense covariances. We work in observation space, reuse the same ensemble as data arrive, and estimate any posterior functional by Monte Carlo. We can think of the posterior as a set of paths, not a giant matrix we need to write down; the information we need to compute is contained in the observations we have of the process.

1 Set‑up

Let \(x\in\mathbb R^d\) be the latent state with prior \(x\sim\mathcal N(m,\Sigma)\), and \(y=Hx+\varepsilon\) the observation with \(\varepsilon\sim\mathcal N(0,R)\). The classical Gaussian conditioning gives

\[ \begin{aligned} K &= \Sigma H^\top\,(H\Sigma H^\top + R)^{-1},\\ x\mid (y=y^*) &\sim \mathcal N\Bigl(m + K(y^*-Hm),\;\Sigma - KH\Sigma\Bigr). \end{aligned} \]

Matheron’s pathwise update draws a posterior sample without forming the posterior covariance explicitly:

\[ x_{\text{post}} \stackrel{d}{=} x + \underbrace{\mathrm{Cov}(x,y)}_{C_{xy}}\;\underbrace{\mathrm{Cov}(y,y)^{-1}}_{C_{yy}^{-1}}\,(y^*-y). \]

If we replace true covariances with empirical covariances computed from a joint ensemble \((X,Y)\), we get the empirical Matheron rule:

\[ x^{(i)}_{\text{post}} = x^{(i)} + \widehat C_{xy}\,\widehat C_{yy}^{-1}\,(y^*-y^{(i)}). \]

1.1 The EnKF variant

Stochastic EnKF (perturbed obs) forms an ensemble of noisy observations \(Y = HX + \varepsilon\) (each column has independent \(\varepsilon^{(i)}\)). Then the update

\[ X' = X + \widehat C_{xy}\,\widehat C_{yy}^{-1}\,(y^*\mathbf 1^\top - Y) \]

is the empirical Matheron update applied column‑wise.

  • Deterministic / square‑root EnKF variants don’t perturb observations; they add \(R\) explicitly in the inversion and apply an ensemble transform to match the posterior mean and covariance but not the full pathwise law.
Computing the update in ensemble space (no \(m\times m\) inverse)

Let \(X_c = X - \bar X \mathbf 1^\top\), \(Y_c = Y - \bar Y \mathbf 1^\top\), and \(U \coloneqq Y_c/\sqrt{N-1}\). Pick a diagonal \(\Gamma\) as:

  • Stochastic EnKF (perturbed obs): \(\Gamma=\lambda I\) (tiny ridge only; the sample covariance of \(Y\) already contains \(R\)).
  • Deterministic / square-root: \(\Gamma=R\) (plus a tiny ridge).

Define \[ A \;=\; U^\top \Gamma^{-1} U \in \mathbb{R}^{N\times N},\qquad B \;=\; U^\top \Gamma^{-1}(y^*\mathbf 1^\top - Y) \in \mathbb{R}^{N\times N}. \] Solve the \(N\times N\) system \[ (I_N + A) \, Q \;=\; B, \] and update the ensemble via \[ \Delta X \;=\; \frac{1}{\sqrt{N-1}}\, X_c \, Q,\qquad X' \;=\; X + \Delta X. \] This is algebraically equivalent to \(X' = X + \widehat C_{xy}\widehat C_{yy}^{-1}(y^*\mathbf 1^\top - Y)\) but avoids forming \(K\) or inverting any \(m\times m\) matrix. It’s also robust to rank deficiency when \(m\gg N\).

2 Demo

The code below does three things

  1. builds a tiny 1D GP prior (just a squared‑exp kernel over a grid),
  2. generates one “truth” and a noisy observation \(y^*\) at a handful of points,
  3. draws an empirical posterior ensemble via the Matheron/EnKF rule and checks it against the analytic Gaussian posterior.
Code
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
from livingthing.matplotlib_style import set_livingthing_style, reset_default_style

set_livingthing_style()

rng = np.random.default_rng(11)

# Dimensions
d, m, N = 60, 10, 300

# Prior: squared-exponential on a 1D grid
def se_cov(n, ell=12.0, sigma=1.0, jitter=1e-8):
    x = np.arange(n)[:, None]
    S = sigma**2 * np.exp(-0.5 * (x - x.T)**2 / ell**2)
    return S + jitter*np.eye(n)

m0 = np.zeros(d)
Sigma = se_cov(d, ell=12.0, sigma=1.0)
L = np.linalg.cholesky(Sigma)

# Observation operator (m point picks) and noise
idx = np.sort(rng.choice(d, size=m, replace=False))
H = np.zeros((m, d)); H[np.arange(m), idx] = 1.0
rho = 0.15
R = (rho**2) * np.eye(m)

# One truth and its noisy observations
x_true = m0 + L @ rng.standard_normal(d)
y_star = H @ x_true + rho * rng.standard_normal(m)

# Prior ensemble
X = m0[:, None] + L @ rng.standard_normal((d, N))

# Choose EnKF flavour
stochastic = True  # True: perturbed obs; False: deterministic/square-root

if stochastic:
    Eps = rho * rng.standard_normal((m, N))
    Y = H @ X + Eps
    gamma_diag = 1e-9 * np.ones(m)   # only tiny ridge (R already inside sample cov through Y)
else:
    Y = H @ X
    gamma_diag = np.diag(R) + 1e-9

# Centered ensembles
Xc = X - X.mean(axis=1, keepdims=True)   # d x N
Yc = Y - Y.mean(axis=1, keepdims=True)   # m x N

# Woodbury factors
sqrtN1 = np.sqrt(N - 1.0)
U = Yc / sqrtN1                          # m x N
Gamma_inv = 1.0 / gamma_diag             # length-m

# residuals in obs space
Res = (y_star[:, None] - Y)              # m x N

# Build A = U^T Γ^{-1} U (N x N) and B = U^T Γ^{-1} Res (N x N) efficiently
U_over_Gamma = U * Gamma_inv[:, None]    # m x N (diag multiply)
A = U.T @ U_over_Gamma                   # N x N
B = U.T @ (Res * Gamma_inv[:, None])     # N x N

# Solve (I + A) Q = B  for Q, then ΔX = (1/sqrt(N-1)) Xc @ Q
I_N = np.eye(N)
M = np.linalg.inv(I_N + A)
Q = M @ B                                # N x N
dX = (Xc @ Q) / sqrtN1                   # d x N

X_post_enkf = X + dX

# Analytic posterior ensemble for comparison
S = H @ Sigma @ H.T + R
K = Sigma @ H.T @ np.linalg.inv(S)
mu_post = m0 + K @ (y_star - H @ m0)
Sigma_post = Sigma - K @ H @ Sigma
L_post = np.linalg.cholesky(Sigma_post + 1e-9*np.eye(d))
X_post_analytic = mu_post[:, None] + L_post @ rng.standard_normal((d, N))

# Plotting (single axes)
alpha = float(N)**(-0.75)
alpha = max(min(alpha, 0.25), 0.003)
x_axis = np.arange(d)

c_enkf     = 'tab:blue'
c_analytic = 'tab:orange'
c_obs      = 'black'

y_min = min(X_post_enkf.min(), X_post_analytic.min(), y_star.min())
y_max = max(X_post_enkf.max(), X_post_analytic.max(), y_star.max())
pad = 0.05*(y_max - y_min + 1e-12)
ylims = (y_min - pad, y_max + pad)

plt.figure(figsize=(9, 4.8), dpi=120)
for i in range(N):
    plt.plot(x_axis, X_post_enkf[:, i], color=c_enkf, alpha=alpha, linewidth=1.0)
    plt.plot(x_axis, X_post_analytic[:, i], color=c_analytic, alpha=alpha, linewidth=1.0)
plt.scatter(idx, y_star, s=32, color=c_obs, marker='o', alpha=0.9, zorder=3)

proxy_enkf = Line2D([0], [0], color=c_enkf,     lw=2, alpha=0.9, label='Empirical Matheron / EnKF (Woodbury, ΔX)')
proxy_an   = Line2D([0], [0], color=c_analytic, lw=2, alpha=0.9, label='Analytic Gaussian')
proxy_obs  = Line2D([0], [0], color=c_obs, marker='o', lw=0, label='Observations (m=10)')
plt.legend(handles=[proxy_enkf, proxy_an, proxy_obs], loc='upper right', frameon=False)

plt.title(f'Posterior samples (N={N}, {"stochastic" if stochastic else "deterministic"} EnKF)')
plt.xlabel('state index')
plt.ylabel('value')
plt.ylim(*ylims)
plt.xlim(0, d-1)
plt.tight_layout()
plt.show()

# Quick numeric check
mean_rel_err = np.linalg.norm(X_post_enkf.mean(axis=1) - mu_post) / (np.linalg.norm(mu_post) + 1e-12)
cov_rel_err  = np.linalg.norm(np.cov(X_post_enkf, bias=False) - Sigma_post, 'fro') / (np.linalg.norm(Sigma_post, 'fro') + 1e-12)
print("Relative mean error (EnKF vs analytic):", f"{mean_rel_err:.3e}")
print("Relative cov  error (EnKF vs analytic):", f"{cov_rel_err:.3e}")
Single axes with two overlaid posterior ensembles and 10 observation points.
Figure 1: Posterior samples via empirical Matheron/EnKF (Woodbury, ΔX form) vs analytic Gaussian; m=10 observations.
Relative mean error (EnKF vs analytic): 5.756e-02
Relative cov  error (EnKF vs analytic): 8.156e-02

3 How to read it

  • Everything is standard linear‑Gaussian conditioning; the only twist is replacing true covariances by ensemble covariances.
  • With a noisy observation ensemble \(Y=HX+\varepsilon\), the empirical \(C_{yy}\) already contains \(R\); do not add it again.
  • With a noise‑free observation ensemble \(Y=HX\), add \(R\) explicitly to \(\widehat C_{yy}\) (and you’ll typically use a square‑root transform to update the ensemble).

4 Why I care

  • It gives a pathwise sampler: samples are obtained by an affine residual correction—no Cholesky of the posterior needed.
  • It unifies toolboxes: localisation, inflation, and square‑root tricks from EnKF port straight over to pathwise GP sampling; conversely, autodiff and sparse‑GP ideas go the other way.
  • It clarifies the zoo: stochastic EnKF, conditional GP all share the same core update; the differences are entirely in how we build \(\widehat C\) and whether we perturb \(y\).

The reason I particularly care about this is that we are moving towards the world of generative AI. These days we work in samples, not densities.

If what we actually need are posterior samples, then the Matheron/EnKF rule is the “right thing”: it’s an affine pushforward that takes a prior draw \(x\) to a posterior draw \(x' = x + C_{xy}C_{yy}^{-1}(y^\* - y)\). This avoids building or storing any \(d\times d\) covariance at all, needs only matrix–vector products with \(H\) and solves in the observation space, and gives us Monte-Carlo access to any posterior functional we care about. In other words, computing the full posterior covariance is not just expensive in large \(d\); it’s usually irrelevant. The sample-to-sample map is also a reparameterization, so it’s differentiable and streams naturally as new data arrive (update the same ensemble, don’t refactor a giant matrix). We can treat the posterior as a set of paths, not a density we must summarize, and the implementation choices make themselves. See the toy code path for a minimal demonstration of this empirical posterior sampler.

5 Implications

Understanding the EnKF as an empirical Matheron update opens up several possibilities. Both the Matheron update and Ensemble Kalman methods have been extensively studied in their respective fields. In recent years, numerous developments in Bayesian machine learning have advanced both approaches; see the many developments in Ensemble Kalman methods in the Ensemble Kalman notebook and especially the Neural networks meet Kalman filters notebook.

If we interpret the pathwise Matheron update in the context of the empirical measures used in the EnKF, it suggests the potential to import developments from one field to the other. The vast literature on the EnKF indicates that many ideas developed there might be applicable to Gaussian Process regression via the Matheron update. For instance, the clear probabilistic interpretation of the ensemble update could inspire improved regularization and localization strategies. One example is the Local Ensemble Transform Kalman Filter (Bocquet, Farchi, and Malartic 2020), which provides state-of-the-art performance in high-dimensional filtering and system identification settings and could be adapted to the Matheron update (homework exercise).

@misc{MacKinlay2025Ensemble,
  title = {The {{Ensemble Kalman Update}} Is an {{Empirical Matheron Update}}},
  author = {MacKinlay, Dan},
  year = {2025},
  month = feb,
  number = {arXiv:2502.03048},
  eprint = {2502.03048},
  publisher = {arXiv},
  doi = {10.48550/arXiv.2502.03048},
  archiveprefix = {arXiv}
}

6 Appendix: Mistakes I made while writing this up

  • \(K = C_{xy}\,C_{yy}^{-1}\). Don’t sneak an extra \(H^\top\) in there.
  • Don’t double‑count the noise: if we drew noisy \(Y\), we don’t add \(R\) again in \(\widehat C_{yy}\).
  • Regularise \(\widehat C_{yy}\) a little (e.g. \(10^{-9}I\)) if \(m>N\) or the ensemble is small.

6.1 Appendix: the deterministic (square‑root) variant in one line

If we prefer not to perturb observations, set \(Y=HX\) and use

\[ K=\widehat C_{xy}\,\bigl(\widehat C_{yy} + R\bigr)^{-1}. \]

This matches the posterior mean and covariance; producing an actual posterior sample then requires an ensemble transform (ETKF/LETKF, etc.) rather than the residual‑add form above.

7 References

Bocquet, Farchi, and Malartic. 2020. Online Learning of Both State and Dynamics Using Ensemble Kalman Filters.” Foundations of Data Science.
Borovitskiy, Terenin, Mostowsky, et al. 2023. Matérn Gaussian Processes on Riemannian Manifolds.” In Advances in Neural Information Processing Systems.
Brajard, Carrassi, Bocquet, et al. 2020. Combining Data Assimilation and Machine Learning to Emulate a Dynamical Model from Sparse and Noisy Observations: A Case Study with the Lorenz 96 Model.” Journal of Computational Science.
Cheng, Quilodrán-Casas, Ouala, et al. 2023. Machine Learning With Data Assimilation and Uncertainty Quantification for Dynamical Systems: A Review.” IEEE/CAA Journal of Automatica Sinica.
Chen, and Oliver. 2012. Ensemble Randomized Maximum Likelihood Method as an Iterative Ensemble Smoother.” Mathematical Geosciences.
Chilès, and Desassis. 2018. Fifty Years of Kriging.” In Handbook of Mathematical Geosciences.
Chilès, and Lantuéjoul. 2005. Prediction by Conditional Simulation: Models and Algorithms.” In Space, Structure and Randomness: Contributions in Honor of Georges Matheron in the Field of Geostatistics, Random Sets and Mathematical Morphology. Lecture Notes in Statistics.
Del Moral, Kurtzmann, and Tugaut. 2017. On the Stability and the Uniform Propagation of Chaos of a Class of Extended Ensemble Kalman-Bucy Filters.” SIAM Journal on Control and Optimization.
Doucet. 2010. A Note on Efficient Conditional Simulation of Gaussian Distributions.” Technical Report.
Dubrule. 2018. Kriging, Splines, Conditional Simulation, Bayesian Inversion and Ensemble Kalman Filtering.” In Handbook of Mathematical Geosciences: Fifty Years of IAMG.
Evensen. 2003. The Ensemble Kalman Filter: Theoretical Formulation and Practical Implementation.” Ocean Dynamics.
———. 2009a. Data Assimilation - The Ensemble Kalman Filter.
———. 2009b. The Ensemble Kalman Filter for Combined State and Parameter Estimation.” IEEE Control Systems.
Hunt, Kostelich, and Szunyogh. 2007. Efficient Data Assimilation for Spatiotemporal Chaos: A Local Ensemble Transform Kalman Filter.” Physica D: Nonlinear Phenomena, Data Assimilation,.
Iglesias, Law, and Stuart. 2013. Ensemble Kalman Methods for Inverse Problems.” Inverse Problems.
Kelly, Law, and Stuart. 2014. Well-Posedness and Accuracy of the Ensemble Kalman Filter in Discrete and Continuous Time.” Nonlinearity.
Kuzin, Yang, Isupova, et al. 2018. Ensemble Kalman Filtering for Online Gaussian Process Regression and Learning.” In 2018 21st International Conference on Information Fusion (FUSION).
Kwiatkowski, and Mandel. 2015. Convergence of the Square Root Ensemble Kalman Filter in the Large Ensemble Limit.” SIAM/ASA Journal on Uncertainty Quantification.
Le Gland, Monbet, and Tran. 2011. Large Sample Asymptotics for the Ensemble Kalman Filter.” In The Oxford Handbook of Nonlinear Filtering,.
MacKinlay. 2025. The Ensemble Kalman Update Is an Empirical Matheron Update.”
MacKinlay, Tsuchida, Pagendam, et al. 2025. Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional Systems.” In Proceedings of the International Conference on Learning Representations (ICLR).
Malartic, Farchi, and Bocquet. 2021. State, Global and Local Parameter Estimation Using Local Ensemble Kalman Filters: Applications to Online Machine Learning of Chaotic Dynamics.” arXiv:2107.11253 [Nlin, Physics:physics, Stat].
Mandel, Cobb, and Beezley. 2011. On the convergence of the ensemble Kalman filter.” Applications of Mathematics.
Petersen, and Pedersen. 2012. The Matrix Cookbook.”
Raanes, Chen, Grudzien, et al. 2024. DAPPER: Data Assimilation with Python: A Package for Experimental Research.”
Schillings, and Stuart. 2017. Analysis of the Ensemble Kalman Filter for Inverse Problems.” SIAM Journal on Numerical Analysis.
Spantini, Baptista, and Marzouk. 2022. Coupling Techniques for Nonlinear Ensemble Filtering.” SIAM Review.
Wikle, and Berliner. 2007. A Bayesian Tutorial for Data Assimilation.” Physica D: Nonlinear Phenomena, Data Assimilation,.
Wilson, Borovitskiy, Terenin, et al. 2020. Efficiently Sampling Functions from Gaussian Process Posteriors.” In Proceedings of the 37th International Conference on Machine Learning.
Wilson, Borovitskiy, Terenin, et al. 2021. Pathwise Conditioning of Gaussian Processes.” Journal of Machine Learning Research.