Ensemble Kalman updates are empirical Matheron updates

2022-06-22 — 2025-11-23

Wherein the Ensemble Kalman update is presented as an empirical Matheron rule, with perturbed observations the stochastic variant being shown to coincide, and updates being performed in observation space so that no d×d covariance is inverted.

Bayes

distributed

dynamical systems

generative

linear algebra

machine learning

Monte Carlo

optimization

particle

probabilistic algorithms

probability

sciml

signal processing

state space models

statistics

stochastic processes

time series

Assumed audience:

Machine learning researchers and geospatial statisticians, together in peace at last

A note about a fact that I use a lot: the Ensemble Kalman Filter can be formulated as — and, I find, more conveniently explained as — an empirical version of the Matheron update, a handy tool from Gaussian process regression.

It’s one of those facts I’m amazed isn’t widely used. I’m not the first to notice — the idea keeps getting rediscovered — yet it remains somehow “unknown”, because no one seems to exploit it in practice.

This post was revised and expanded into a much more fancy document published in the AJCAI proceedings (MacKinlay 2025). I now recommend that one, since it has more details and more convincing experiments.

TL;DR

Given an ensemble of joint prior samples \(\{(x^{(i)},y^{(i)})\}_{i=1}^N\), the empirical Matheron update

\[ x^{(i)}_{\text{post}} \;=\; x^{(i)} \;+\; \widehat C_{xy}\,\widehat C_{yy}^{-1}\,\bigl(y^*-y^{(i)}\bigr) \]

is exactly the stochastic EnKF analysis step (a.k.a. EnKF with perturbed observations). That’s the whole fact.

Here we denote \(\widehat C_{xy}\) and \(\widehat C_{yy}\) as the empirical cross‑ and auto‑covariances of the joint ensemble \((X,Y)\).
In EnKF code, these are often written using ensemble matrices \(X\in\mathbb R^{d\times N}\) and \(Y\in\mathbb R^{m\times N}\), with deviations \(X_c=X-\bar X\mathbf 1^\top\) and \(Y_c=Y-\bar Y\mathbf 1^\top\).

\[ \widehat C_{xy}=\frac{1}{N-1}X_cY_c^\top,\qquad \widehat C_{yy}=\frac{1}{N-1}Y_cY_c^\top. \]

This is really handy in high dimension

If we want posterior samples (and in large \(d\) we usually do want posterior samples rather than densities), the Matheron/EnKF rule is an affine pushforward that sends a prior draw \(x\) to a posterior draw \(x' = x + C_{xy}C_{yy}^{-1}(y^* - y)\). No posterior \(d\times d\) factorization, no storage of dense covariances. We work in observation space, reuse the same ensemble as data arrive, and estimate any posterior functional by Monte Carlo. We can think of the posterior as a set of paths, not a giant matrix we need to write down; the information we need to compute is contained in the observations we have of the process.

1 Set‑up

Let \(x\in\mathbb R^d\) denote the latent state with prior \(x\sim\mathcal N(m,\Sigma)\), and let \(y=Hx+\varepsilon\) denote the observation with \(\varepsilon\sim\mathcal N(0,R)\). Classical Gaussian conditioning gives:

\[ \begin{aligned} K &= \Sigma H^\top\,(H\Sigma H^\top + R)^{-1},\\ x\mid (y=y^*) &\sim \mathcal N\Bigl(m + K(y^*-Hm),\;\Sigma - KH\Sigma\Bigr). \end{aligned} \]

Matheron’s pathwise update draws a posterior sample without explicitly forming the posterior covariance:

\[ x_{\text{post}} \stackrel{d}{=} x + \underbrace{\mathrm{Cov}(x,y)}_{C_{xy}}\;\underbrace{\mathrm{Cov}(y,y)^{-1}}_{C_{yy}^{-1}}\,(y^*-y). \]

Replacing the true covariances with the empirical covariances computed from the joint ensemble \((X,Y)\) gives the empirical Matheron rule:

\[ x^{(i)}_{\text{post}} = x^{(i)} + \widehat C_{xy}\,\widehat C_{yy}^{-1}\,(y^*-y^{(i)}). \]

1.1 The EnKF variant

Stochastic EnKF (perturbed obs) forms an ensemble of noisy observations \(Y = HX + \varepsilon\), where each column has independent \(\varepsilon^{(i)}\). The update

\[ X' = X + \widehat C_{xy}\,\widehat C_{yy}^{-1}\,(y^*\mathbf 1^\top - Y) \]

This is the empirical Matheron update applied column‑wise.

Deterministic / square‑root EnKF variants don’t perturb the observations. Instead, they add \(R\) explicitly in the inversion and apply an ensemble transform to match the posterior mean and covariance, but not the full pathwise law.

Computing the update in ensemble space (no \(m\times m\) inverse)

Let \(X_c = X - \bar X \mathbf 1^\top\), \(Y_c = Y - \bar Y \mathbf 1^\top\), and \(U \coloneqq Y_c/\sqrt{N-1}\). Pick a diagonal \(\Gamma\) as:

Stochastic EnKF (perturbed obs): \(\Gamma=\lambda I\) (tiny ridge only; the sample covariance of \(Y\) already contains \(R\)).
Deterministic / square-root: \(\Gamma=R\) (plus a tiny ridge).

Define \[ A \;=\; U^\top \Gamma^{-1} U \in \mathbb{R}^{N\times N},\qquad B \;=\; U^\top \Gamma^{-1}(y^*\mathbf 1^\top - Y) \in \mathbb{R}^{N\times N}. \] Solve the \(N\times N\) system \[ (I_N + A) \, Q \;=\; B, \] and update the ensemble via \[ \Delta X \;=\; \frac{1}{\sqrt{N-1}}\, X_c \, Q,\qquad X' \;=\; X + \Delta X. \] This is algebraically equivalent to \(X' = X + \widehat C_{xy}\widehat C_{yy}^{-1}(y^*\mathbf 1^\top - Y)\) but avoids forming \(K\) or inverting any \(m\times m\) matrix. It’s also robust to rank deficiency when \(m\gg N\).

2 Demo

Here we do three things:

build a simple 1D GP prior (a squared‑exp kernel over a grid),
generate a single “truth” and a noisy observation \(y^*\) at a handful of points,
draw an empirical posterior ensemble via the Matheron/EnKF rule and check it against the analytic Gaussian posterior.

Code

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
from livingthing.matplotlib_style import set_livingthing_style, reset_default_style

set_livingthing_style()

rng = np.random.default_rng(11)

# Dimensions
d, m, N = 60, 10, 300

# Prior: squared-exponential on a 1D grid
def se_cov(n, ell=12.0, sigma=1.0, jitter=1e-8):
    x = np.arange(n)[:, None]
    S = sigma**2 * np.exp(-0.5 * (x - x.T)**2 / ell**2)
    return S + jitter*np.eye(n)

m0 = np.zeros(d)
Sigma = se_cov(d, ell=12.0, sigma=1.0)
L = np.linalg.cholesky(Sigma)

# Observation operator (m point picks) and noise
idx = np.sort(rng.choice(d, size=m, replace=False))
H = np.zeros((m, d)); H[np.arange(m), idx] = 1.0
rho = 0.15
R = (rho**2) * np.eye(m)

# One truth and its noisy observations
x_true = m0 + L @ rng.standard_normal(d)
y_star = H @ x_true + rho * rng.standard_normal(m)

# Prior ensemble
X = m0[:, None] + L @ rng.standard_normal((d, N))

# Choose EnKF flavour
stochastic = True  # True: perturbed obs; False: deterministic/square-root

if stochastic:
    Eps = rho * rng.standard_normal((m, N))
    Y = H @ X + Eps
    gamma_diag = 1e-9 * np.ones(m)   # only tiny ridge (R already inside sample cov through Y)
else:
    Y = H @ X
    gamma_diag = np.diag(R) + 1e-9

# Centered ensembles
Xc = X - X.mean(axis=1, keepdims=True)   # d x N
Yc = Y - Y.mean(axis=1, keepdims=True)   # m x N

# Woodbury factors
sqrtN1 = np.sqrt(N - 1.0)
U = Yc / sqrtN1                          # m x N
Gamma_inv = 1.0 / gamma_diag             # length-m

# residuals in obs space
Res = (y_star[:, None] - Y)              # m x N

# Build A = U^T Γ^{-1} U (N x N) and B = U^T Γ^{-1} Res (N x N) efficiently
U_over_Gamma = U * Gamma_inv[:, None]    # m x N (diag multiply)
A = U.T @ U_over_Gamma                   # N x N
B = U.T @ (Res * Gamma_inv[:, None])     # N x N

# Solve (I + A) Q = B  for Q, then ΔX = (1/sqrt(N-1)) Xc @ Q
I_N = np.eye(N)
M = np.linalg.inv(I_N + A)
Q = M @ B                                # N x N
dX = (Xc @ Q) / sqrtN1                   # d x N

X_post_enkf = X + dX

# Analytic posterior ensemble for comparison
S = H @ Sigma @ H.T + R
K = Sigma @ H.T @ np.linalg.inv(S)
mu_post = m0 + K @ (y_star - H @ m0)
Sigma_post = Sigma - K @ H @ Sigma
L_post = np.linalg.cholesky(Sigma_post + 1e-9*np.eye(d))
X_post_analytic = mu_post[:, None] + L_post @ rng.standard_normal((d, N))

# Plotting (single axes)
alpha = float(N)**(-0.75)
alpha = max(min(alpha, 0.25), 0.003)
x_axis = np.arange(d)

c_enkf     = 'tab:blue'
c_analytic = 'tab:orange'
c_obs      = 'black'

y_min = min(X_post_enkf.min(), X_post_analytic.min(), y_star.min())
y_max = max(X_post_enkf.max(), X_post_analytic.max(), y_star.max())
pad = 0.05*(y_max - y_min + 1e-12)
ylims = (y_min - pad, y_max + pad)

plt.figure(figsize=(9, 4.8), dpi=120)
for i in range(N):
    plt.plot(x_axis, X_post_enkf[:, i], color=c_enkf, alpha=alpha, linewidth=1.0)
    plt.plot(x_axis, X_post_analytic[:, i], color=c_analytic, alpha=alpha, linewidth=1.0)
plt.scatter(idx, y_star, s=32, color=c_obs, marker='o', alpha=0.9, zorder=3)

proxy_enkf = Line2D([0], [0], color=c_enkf,     lw=2, alpha=0.9, label='Empirical Matheron / EnKF (Woodbury, ΔX)')
proxy_an   = Line2D([0], [0], color=c_analytic, lw=2, alpha=0.9, label='Analytic Gaussian')
proxy_obs  = Line2D([0], [0], color=c_obs, marker='o', lw=0, label='Observations (m=10)')
plt.legend(handles=[proxy_enkf, proxy_an, proxy_obs], loc='upper right', frameon=False)

plt.title(f'Posterior samples (N={N}, {"stochastic" if stochastic else "deterministic"} EnKF)')
plt.xlabel('state index')
plt.ylabel('value')
plt.ylim(*ylims)
plt.xlim(0, d-1)
plt.tight_layout()
plt.show()

# Quick numeric check
mean_rel_err = np.linalg.norm(X_post_enkf.mean(axis=1) - mu_post) / (np.linalg.norm(mu_post) + 1e-12)
cov_rel_err  = np.linalg.norm(np.cov(X_post_enkf, bias=False) - Sigma_post, 'fro') / (np.linalg.norm(Sigma_post, 'fro') + 1e-12)
print("Relative mean error (EnKF vs analytic):", f"{mean_rel_err:.3e}")
print("Relative cov  error (EnKF vs analytic):", f"{cov_rel_err:.3e}")

Single axes with two overlaid posterior ensembles and 10 observation points. — Figure 1: Posterior samples via empirical Matheron/EnKF (Woodbury, ΔX form) vs analytic Gaussian; m=10 observations.

Relative mean error (EnKF vs analytic): 5.756e-02
Relative cov  error (EnKF vs analytic): 8.156e-02

3 How to read it

Everything is standard linear‑Gaussian conditioning; the only twist is that we replace true covariances with ensemble covariances.
With a noisy observation ensemble \(Y=HX+\varepsilon\), the empirical \(C_{yy}\) already contains \(R\); don’t add it again.
With a noise‑free observation ensemble \(Y=HX\), add \(R\) explicitly to \(\widehat C_{yy}\) (and we’ll typically use a square‑root transform to update the ensemble).

4 Why I care

It gives a pathwise sampler: we obtain samples by an affine residual correction—no Cholesky of the posterior needed.
It unifies toolboxes: localization, inflation, and square‑root tricks from EnKF port straight over to pathwise GP sampling; conversely, autodiff and sparse‑GP ideas go the other way.
It clarifies the zoo: stochastic EnKF and conditional GP all share the same core update; the differences are entirely in how we build \(\widehat C\) and whether we perturb \(y\).

I particularly care because we’re moving towards the world of generative AI. These days we work with samples, not densities.

If what we actually need are posterior samples, the Matheron/EnKF rule is the “right thing”: it’s an affine pushforward that maps a prior draw \(x\) to a posterior draw \(x' = x + C_{xy}C_{yy}^{-1}(y^\* - y)\). It avoids building or storing any \(d\times d\) covariance at all; it only needs matrix–vector products with \(H\), performs solves in the observation space, and gives us Monte‑Carlo access to any posterior functional we care about. In other words, computing the full posterior covariance isn’t just expensive for large \(d\); it’s usually irrelevant. The sample-to-sample map is also a reparameterization, so it’s differentiable and streams naturally as new data arrive (update the same ensemble, don’t refactor a giant matrix). We can treat the posterior as a set of paths, not a density we must summarize, and the implementation choices present themselves. See the toy code path for a minimal demonstration of this empirical posterior sampler.

5 Implications

Understanding the EnKF as an empirical Matheron update opens up several possibilities. Both the Matheron update and Ensemble Kalman methods have been extensively studied in their respective fields. In recent years, numerous developments in Bayesian machine learning have advanced both approaches; see the many developments in Ensemble Kalman methods in the Ensemble Kalman notebook and especially the Neural networks meet Kalman filters notebook.

If we interpret the pathwise Matheron update in the context of the empirical measures used in the EnKF, this suggests we can import developments from one field to the other. The vast literature on the EnKF indicates that many ideas developed there might be applicable to Gaussian Process regression via the Matheron update. For instance, the clear probabilistic interpretation of the ensemble update could inspire improved regularization and localization strategies. One example is the Local Ensemble Transform Kalman Filter (Bocquet, Farchi, and Malartic 2020), which provides state‑of‑the‑art performance in high-dimensional filtering and system identification settings and could be adapted to the Matheron update (homework exercise).

@inproceedings{MacKinlay2025Ensemble,
  title = {The {{Ensemble Kalman Update}} Is an {{Empirical Matheron Update}}},
  booktitle = {Australian {{Joint Conference}} on {{AI}}},
  author = {MacKinlay, Dan},
  year = 2025,
  month = dec,
  series = {Lecture Note on {{Computer Science}}},
  publisher = {Springer},
  address = {Canberra},
  doi = {10.48550/arXiv.2502.03048},
  archiveprefix = {arXiv}
}

6 Appendix: the deterministic (square‑root) variant in one line

If we prefer not to perturb the observations, set \(Y=HX\) and use

\[ K=\widehat C_{xy}\,\bigl(\widehat C_{yy} + R\bigr)^{-1}. \]

This matches the posterior mean and covariance. To produce an actual posterior sample, we need to use an ensemble transform (ETKF/LETKF, etc.) rather than the residual‑add form above.

7 References

Bocquet, Farchi, and Malartic. 2020. “Online Learning of Both State and Dynamics Using Ensemble Kalman Filters.” Foundations of Data Science.

Borovitskiy, Terenin, Mostowsky, et al. 2023. “Matérn Gaussian Processes on Riemannian Manifolds.” In Advances in Neural Information Processing Systems.

Brajard, Carrassi, Bocquet, et al. 2020. “Combining Data Assimilation and Machine Learning to Emulate a Dynamical Model from Sparse and Noisy Observations: A Case Study with the Lorenz 96 Model.” Journal of Computational Science.

Cheng, Quilodrán-Casas, Ouala, et al. 2023. “Machine Learning With Data Assimilation and Uncertainty Quantification for Dynamical Systems: A Review.” IEEE/CAA Journal of Automatica Sinica.

Chen, and Oliver. 2012. “Ensemble Randomized Maximum Likelihood Method as an Iterative Ensemble Smoother.” Mathematical Geosciences.

Chilès, and Desassis. 2018. “Fifty Years of Kriging.” In Handbook of Mathematical Geosciences.

Chilès, and Lantuéjoul. 2005. “Prediction by Conditional Simulation: Models and Algorithms.” In Space, Structure and Randomness: Contributions in Honor of Georges Matheron in the Field of Geostatistics, Random Sets and Mathematical Morphology. Lecture Notes in Statistics.

Del Moral, Kurtzmann, and Tugaut. 2017. “On the Stability and the Uniform Propagation of Chaos of a Class of Extended Ensemble Kalman-Bucy Filters.” SIAM Journal on Control and Optimization.

Doucet. 2010. “A Note on Efficient Conditional Simulation of Gaussian Distributions.” Technical Report.

Dubrule. 2018. “Kriging, Splines, Conditional Simulation, Bayesian Inversion and Ensemble Kalman Filtering.” In Handbook of Mathematical Geosciences: Fifty Years of IAMG.

Evensen. 2003. “The Ensemble Kalman Filter: Theoretical Formulation and Practical Implementation.” Ocean Dynamics.

———. 2009a. Data Assimilation - The Ensemble Kalman Filter.

———. 2009b. “The Ensemble Kalman Filter for Combined State and Parameter Estimation.” IEEE Control Systems.

Hunt, Kostelich, and Szunyogh. 2007. “Efficient Data Assimilation for Spatiotemporal Chaos: A Local Ensemble Transform Kalman Filter.” Physica D: Nonlinear Phenomena, Data Assimilation,.

Iglesias, Law, and Stuart. 2013. “Ensemble Kalman Methods for Inverse Problems.” Inverse Problems.

Kelly, Law, and Stuart. 2014. “Well-Posedness and Accuracy of the Ensemble Kalman Filter in Discrete and Continuous Time.” Nonlinearity.

Kuzin, Yang, Isupova, et al. 2018. “Ensemble Kalman Filtering for Online Gaussian Process Regression and Learning.” In 2018 21st International Conference on Information Fusion (FUSION).

Kwiatkowski, and Mandel. 2015. “Convergence of the Square Root Ensemble Kalman Filter in the Large Ensemble Limit.” SIAM/ASA Journal on Uncertainty Quantification.

Le Gland, Monbet, and Tran. 2011. “Large Sample Asymptotics for the Ensemble Kalman Filter.” In The Oxford Handbook of Nonlinear Filtering,.

MacKinlay. 2025. “The Ensemble Kalman Update Is an Empirical Matheron Update.” In. Lecture Note on Computer Science.

MacKinlay, Tsuchida, Pagendam, et al. 2025. “Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional Systems.” In Proceedings of the International Conference on Learning Representations (ICLR).

Malartic, Farchi, and Bocquet. 2021. “State, Global and Local Parameter Estimation Using Local Ensemble Kalman Filters: Applications to Online Machine Learning of Chaotic Dynamics.” arXiv:2107.11253 [Nlin, Physics:physics, Stat].

Mandel, Cobb, and Beezley. 2011. “On the convergence of the ensemble Kalman filter.” Applications of Mathematics.

Petersen, and Pedersen. 2012. “The Matrix Cookbook.”

Raanes, Chen, Grudzien, et al. 2024. “DAPPER: Data Assimilation with Python: A Package for Experimental Research.”

Schillings, and Stuart. 2017. “Analysis of the Ensemble Kalman Filter for Inverse Problems.” SIAM Journal on Numerical Analysis.

Spantini, Baptista, and Marzouk. 2022. “Coupling Techniques for Nonlinear Ensemble Filtering.” SIAM Review.

Wikle, and Berliner. 2007. “A Bayesian Tutorial for Data Assimilation.” Physica D: Nonlinear Phenomena, Data Assimilation,.

Wilson, Borovitskiy, Terenin, et al. 2020. “Efficiently Sampling Functions from Gaussian Process Posteriors.” In Proceedings of the 37th International Conference on Machine Learning.

Wilson, Borovitskiy, Terenin, et al. 2021. “Pathwise Conditioning of Gaussian Processes.” Journal of Machine Learning Research.