Ensemble Kalman updates are empirical Matheron updates
2022-06-22 —
2025-11-23
Wherein the Ensemble Kalman update is presented as an empirical Matheron rule, with perturbed observations the stochastic variant being shown to coincide, and updates being performed in observation space so that no d×d covariance is inverted.
Bayes
distributed
dynamical systems
generative
linear algebra
machine learning
Monte Carlo
optimization
particle
probabilistic algorithms
probability
sciml
signal processing
state space models
statistics
stochastic processes
time series
Assumed audience:
Machine learning researchers and geospatial statisticians, together in peace at last
It’s one of those facts I’m amazed isn’t widely used. I’m not the first to notice — the idea keeps getting rediscovered — yet it remains somehow “unknown”, because no one seems to exploit it in practice.
This post was revised and expanded into a much more fancy document published in the AJCAI proceedings(MacKinlay 2025). I now recommend that one, since it has more details and more convincing experiments.
NoneTL;DR
Given an ensemble of joint prior samples \(\{(x^{(i)},y^{(i)})\}_{i=1}^N\), the empirical Matheron update
is exactly the stochastic EnKF analysis step (a.k.a. EnKF with perturbed observations). That’s the whole fact.
Here we denote \(\widehat C_{xy}\) and \(\widehat C_{yy}\) as the empirical cross‑ and auto‑covariances of the joint ensemble \((X,Y)\).
In EnKF code, these are often written using ensemble matrices \(X\in\mathbb R^{d\times N}\) and \(Y\in\mathbb R^{m\times N}\), with deviations \(X_c=X-\bar X\mathbf 1^\top\) and \(Y_c=Y-\bar Y\mathbf 1^\top\).
If we want posterior samples (and in large \(d\) we usually do want posterior samples rather than densities), the Matheron/EnKF rule is an affine pushforward that sends a prior draw \(x\) to a posterior draw \(x' = x + C_{xy}C_{yy}^{-1}(y^* - y)\). No posterior \(d\times d\) factorization, no storage of dense covariances. We work in observation space, reuse the same ensemble as data arrive, and estimate any posterior functional by Monte Carlo. We can think of the posterior as a set of paths, not a giant matrix we need to write down; the information we need to compute is contained in the observations we have of the process.
1 Set‑up
Let \(x\in\mathbb R^d\) denote the latent state with prior \(x\sim\mathcal N(m,\Sigma)\), and let \(y=Hx+\varepsilon\) denote the observation with \(\varepsilon\sim\mathcal N(0,R)\). Classical Gaussian conditioning gives:
Stochastic EnKF (perturbed obs) forms an ensemble of noisy observations \(Y = HX + \varepsilon\), where each column has independent \(\varepsilon^{(i)}\). The update
This is the empirical Matheron update applied column‑wise.
Deterministic / square‑root EnKF variants don’t perturb the observations. Instead, they add \(R\) explicitly in the inversion and apply an ensemble transform to match the posterior mean and covariance, but not the full pathwise law.
TipComputing the update in ensemble space (no \(m\times m\) inverse)
Let \(X_c = X - \bar X \mathbf 1^\top\), \(Y_c = Y - \bar Y \mathbf 1^\top\), and \(U \coloneqq Y_c/\sqrt{N-1}\). Pick a diagonal \(\Gamma\) as:
Stochastic EnKF (perturbed obs): \(\Gamma=\lambda I\) (tiny ridge only; the sample covariance of \(Y\) already contains \(R\)).
Deterministic / square-root: \(\Gamma=R\) (plus a tiny ridge).
Define \[
A \;=\; U^\top \Gamma^{-1} U \in \mathbb{R}^{N\times N},\qquad
B \;=\; U^\top \Gamma^{-1}(y^*\mathbf 1^\top - Y) \in \mathbb{R}^{N\times N}.
\] Solve the \(N\times N\) system \[
(I_N + A) \, Q \;=\; B,
\] and update the ensemble via \[
\Delta X \;=\; \frac{1}{\sqrt{N-1}}\, X_c \, Q,\qquad
X' \;=\; X + \Delta X.
\] This is algebraically equivalent to \(X' = X + \widehat C_{xy}\widehat C_{yy}^{-1}(y^*\mathbf 1^\top - Y)\) but avoids forming \(K\) or inverting any \(m\times m\) matrix. It’s also robust to rank deficiency when \(m\gg N\).
2 Demo
Here we do three things:
build a simple 1D GP prior (a squared‑exp kernel over a grid),
generate a single “truth” and a noisy observation \(y^*\) at a handful of points,
draw an empirical posterior ensemble via the Matheron/EnKF rule and check it against the analytic Gaussian posterior.
Code
import numpy as npimport matplotlib.pyplot as pltfrom matplotlib.lines import Line2Dfrom livingthing.matplotlib_style import set_livingthing_style, reset_default_styleset_livingthing_style()rng = np.random.default_rng(11)# Dimensionsd, m, N =60, 10, 300# Prior: squared-exponential on a 1D griddef se_cov(n, ell=12.0, sigma=1.0, jitter=1e-8): x = np.arange(n)[:, None] S = sigma**2* np.exp(-0.5* (x - x.T)**2/ ell**2)return S + jitter*np.eye(n)m0 = np.zeros(d)Sigma = se_cov(d, ell=12.0, sigma=1.0)L = np.linalg.cholesky(Sigma)# Observation operator (m point picks) and noiseidx = np.sort(rng.choice(d, size=m, replace=False))H = np.zeros((m, d)); H[np.arange(m), idx] =1.0rho =0.15R = (rho**2) * np.eye(m)# One truth and its noisy observationsx_true = m0 + L @ rng.standard_normal(d)y_star = H @ x_true + rho * rng.standard_normal(m)# Prior ensembleX = m0[:, None] + L @ rng.standard_normal((d, N))# Choose EnKF flavourstochastic =True# True: perturbed obs; False: deterministic/square-rootif stochastic: Eps = rho * rng.standard_normal((m, N)) Y = H @ X + Eps gamma_diag =1e-9* np.ones(m) # only tiny ridge (R already inside sample cov through Y)else: Y = H @ X gamma_diag = np.diag(R) +1e-9# Centered ensemblesXc = X - X.mean(axis=1, keepdims=True) # d x NYc = Y - Y.mean(axis=1, keepdims=True) # m x N# Woodbury factorssqrtN1 = np.sqrt(N -1.0)U = Yc / sqrtN1 # m x NGamma_inv =1.0/ gamma_diag # length-m# residuals in obs spaceRes = (y_star[:, None] - Y) # m x N# Build A = U^T Γ^{-1} U (N x N) and B = U^T Γ^{-1} Res (N x N) efficientlyU_over_Gamma = U * Gamma_inv[:, None] # m x N (diag multiply)A = U.T @ U_over_Gamma # N x NB = U.T @ (Res * Gamma_inv[:, None]) # N x N# Solve (I + A) Q = B for Q, then ΔX = (1/sqrt(N-1)) Xc @ QI_N = np.eye(N)M = np.linalg.inv(I_N + A)Q = M @ B # N x NdX = (Xc @ Q) / sqrtN1 # d x NX_post_enkf = X + dX# Analytic posterior ensemble for comparisonS = H @ Sigma @ H.T + RK = Sigma @ H.T @ np.linalg.inv(S)mu_post = m0 + K @ (y_star - H @ m0)Sigma_post = Sigma - K @ H @ SigmaL_post = np.linalg.cholesky(Sigma_post +1e-9*np.eye(d))X_post_analytic = mu_post[:, None] + L_post @ rng.standard_normal((d, N))# Plotting (single axes)alpha =float(N)**(-0.75)alpha =max(min(alpha, 0.25), 0.003)x_axis = np.arange(d)c_enkf ='tab:blue'c_analytic ='tab:orange'c_obs ='black'y_min =min(X_post_enkf.min(), X_post_analytic.min(), y_star.min())y_max =max(X_post_enkf.max(), X_post_analytic.max(), y_star.max())pad =0.05*(y_max - y_min +1e-12)ylims = (y_min - pad, y_max + pad)plt.figure(figsize=(9, 4.8), dpi=120)for i inrange(N): plt.plot(x_axis, X_post_enkf[:, i], color=c_enkf, alpha=alpha, linewidth=1.0) plt.plot(x_axis, X_post_analytic[:, i], color=c_analytic, alpha=alpha, linewidth=1.0)plt.scatter(idx, y_star, s=32, color=c_obs, marker='o', alpha=0.9, zorder=3)proxy_enkf = Line2D([0], [0], color=c_enkf, lw=2, alpha=0.9, label='Empirical Matheron / EnKF (Woodbury, ΔX)')proxy_an = Line2D([0], [0], color=c_analytic, lw=2, alpha=0.9, label='Analytic Gaussian')proxy_obs = Line2D([0], [0], color=c_obs, marker='o', lw=0, label='Observations (m=10)')plt.legend(handles=[proxy_enkf, proxy_an, proxy_obs], loc='upper right', frameon=False)plt.title(f'Posterior samples (N={N}, {"stochastic"if stochastic else"deterministic"} EnKF)')plt.xlabel('state index')plt.ylabel('value')plt.ylim(*ylims)plt.xlim(0, d-1)plt.tight_layout()plt.show()# Quick numeric checkmean_rel_err = np.linalg.norm(X_post_enkf.mean(axis=1) - mu_post) / (np.linalg.norm(mu_post) +1e-12)cov_rel_err = np.linalg.norm(np.cov(X_post_enkf, bias=False) - Sigma_post, 'fro') / (np.linalg.norm(Sigma_post, 'fro') +1e-12)print("Relative mean error (EnKF vs analytic):", f"{mean_rel_err:.3e}")print("Relative cov error (EnKF vs analytic):", f"{cov_rel_err:.3e}")
Figure 1: Posterior samples via empirical Matheron/EnKF (Woodbury, ΔX form) vs analytic Gaussian; m=10 observations.
Relative mean error (EnKF vs analytic): 5.756e-02
Relative cov error (EnKF vs analytic): 8.156e-02
3 How to read it
Everything is standard linear‑Gaussian conditioning; the only twist is that we replace true covariances with ensemble covariances.
With a noisy observation ensemble \(Y=HX+\varepsilon\), the empirical \(C_{yy}\) already contains \(R\); don’t add it again.
With a noise‑free observation ensemble \(Y=HX\), add \(R\) explicitly to \(\widehat C_{yy}\) (and we’ll typically use a square‑root transform to update the ensemble).
4 Why I care
It gives a pathwise sampler: we obtain samples by an affine residual correction—no Cholesky of the posterior needed.
It unifies toolboxes: localization, inflation, and square‑root tricks from EnKF port straight over to pathwise GP sampling; conversely, autodiff and sparse‑GP ideas go the other way.
It clarifies the zoo: stochastic EnKF and conditional GP all share the same core update; the differences are entirely in how we build \(\widehat C\) and whether we perturb \(y\).
I particularly care because we’re moving towards the world of generative AI. These days we work with samples, not densities.
If what we actually need are posterior samples, the Matheron/EnKF rule is the “right thing”: it’s an affine pushforward that maps a prior draw \(x\) to a posterior draw \(x' = x + C_{xy}C_{yy}^{-1}(y^\* - y)\). It avoids building or storing any \(d\times d\) covariance at all; it only needs matrix–vector products with \(H\), performs solves in the observation space, and gives us Monte‑Carlo access to any posterior functional we care about. In other words, computing the full posterior covariance isn’t just expensive for large \(d\); it’s usually irrelevant. The sample-to-sample map is also a reparameterization, so it’s differentiable and streams naturally as new data arrive (update the same ensemble, don’t refactor a giant matrix). We can treat the posterior as a set of paths, not a density we must summarize, and the implementation choices present themselves. See the toy code path for a minimal demonstration of this empirical posterior sampler.
5 Implications
Understanding the EnKF as an empirical Matheron update opens up several possibilities. Both the Matheron update and Ensemble Kalman methods have been extensively studied in their respective fields. In recent years, numerous developments in Bayesian machine learning have advanced both approaches; see the many developments in Ensemble Kalman methods in the Ensemble Kalman notebook and especially the Neural networks meet Kalman filters notebook.
If we interpret the pathwise Matheron update in the context of the empirical measures used in the EnKF, this suggests we can import developments from one field to the other. The vast literature on the EnKF indicates that many ideas developed there might be applicable to Gaussian Process regression via the Matheron update. For instance, the clear probabilistic interpretation of the ensemble update could inspire improved regularization and localization strategies. One example is the Local Ensemble Transform Kalman Filter (Bocquet, Farchi, and Malartic 2020), which provides state‑of‑the‑art performance in high-dimensional filtering and system identification settings and could be adapted to the Matheron update (homework exercise).
@inproceedings{MacKinlay2025Ensemble,title = {The {{Ensemble Kalman Update}} Is an {{Empirical Matheron Update}}},booktitle = {Australian {{Joint Conference}} on {{AI}}},author = {MacKinlay, Dan},year = 2025,month = dec,series = {Lecture Note on {{Computer Science}}},publisher = {Springer},address = {Canberra},doi = {10.48550/arXiv.2502.03048},archiveprefix = {arXiv}}
6 Appendix: the deterministic (square‑root) variant in one line
If we prefer not to perturb the observations, set \(Y=HX\) and use
This matches the posterior mean and covariance. To produce an actual posterior sample, we need to use an ensemble transform (ETKF/LETKF, etc.) rather than the residual‑add form above.
Chilès, and Desassis. 2018. “Fifty Years of Kriging.” In Handbook of Mathematical Geosciences.
Chilès, and Lantuéjoul. 2005. “Prediction by Conditional Simulation: Models and Algorithms.” In Space, Structure and Randomness: Contributions in Honor of Georges Matheron in the Field of Geostatistics, Random Sets and Mathematical Morphology. Lecture Notes in Statistics.