This is one of those facts that I am amazed does not seem to be widely used. I am not the first person to notice this simple fact — in fact, it is constantly rediscovered — but then it is somehow still “unknown” in that no-one seems to exploit it practically.
If you prefer your maths in PDF form, check out the arXiv report that this post is translated from(MacKinlay 2025). The fact that this was translated from a semi-formal paper might also warn you that the language here will be stilted and jargony. I may fix that if I have time some day.
TL;DR
Given an ensemble of joint prior samples \(\{(x^{(i)},y^{(i)})\}_{i=1}^N\), the empirical Matheron update
is exactly the stochastic EnKF analysis step (a.k.a. EnKF with perturbed observations). That’s the whole fact.
Here \(\widehat C_{xy}\) and \(\widehat C_{yy}\) are the empirical cross‑ and auto‑covariances of the joint ensemble \((X,Y)\).
In EnKF code this is often written with ensemble matrices: \(X\in\mathbb R^{d\times N}\), \(Y\in\mathbb R^{m\times N}\), deviations \(X_c=X-\bar X\mathbf 1^\top\), \(Y_c=Y-\bar Y\mathbf 1^\top\), and
If we want posterior samples (and in large \(d\) we usually do want posterior samples rather than densities), the Matheron/EnKF rule is an affine pushforward that sends a prior draw \(x\) to a posterior draw \(x' = x + C_{xy}C_{yy}^{-1}(y^* - y)\). No posterior \(d\times d\) factorization, no storage of dense covariances. We work in observation space, reuse the same ensemble as data arrive, and estimate any posterior functional by Monte Carlo. We can think of the posterior as a set of paths, not a giant matrix we need to write down; the information we need to compute is contained in the observations we have of the process.
1 Set‑up
Let \(x\in\mathbb R^d\) be the latent state with prior \(x\sim\mathcal N(m,\Sigma)\), and \(y=Hx+\varepsilon\) the observation with \(\varepsilon\sim\mathcal N(0,R)\). The classical Gaussian conditioning gives
Stochastic EnKF (perturbed obs) forms an ensemble of noisy observations \(Y = HX + \varepsilon\) (each column has independent \(\varepsilon^{(i)}\)). Then the update
is the empirical Matheron update applied column‑wise.
Deterministic / square‑root EnKF variants don’t perturb observations; they add \(R\) explicitly in the inversion and apply an ensemble transform to match the posterior mean and covariance but not the full pathwise law.
Computing the update in ensemble space (no \(m\times m\) inverse)
Let \(X_c = X - \bar X \mathbf 1^\top\), \(Y_c = Y - \bar Y \mathbf 1^\top\), and \(U \coloneqq Y_c/\sqrt{N-1}\). Pick a diagonal \(\Gamma\) as:
Stochastic EnKF (perturbed obs): \(\Gamma=\lambda I\) (tiny ridge only; the sample covariance of \(Y\) already contains \(R\)).
Deterministic / square-root: \(\Gamma=R\) (plus a tiny ridge).
Define \[
A \;=\; U^\top \Gamma^{-1} U \in \mathbb{R}^{N\times N},\qquad
B \;=\; U^\top \Gamma^{-1}(y^*\mathbf 1^\top - Y) \in \mathbb{R}^{N\times N}.
\] Solve the \(N\times N\) system \[
(I_N + A) \, Q \;=\; B,
\] and update the ensemble via \[
\Delta X \;=\; \frac{1}{\sqrt{N-1}}\, X_c \, Q,\qquad
X' \;=\; X + \Delta X.
\] This is algebraically equivalent to \(X' = X + \widehat C_{xy}\widehat C_{yy}^{-1}(y^*\mathbf 1^\top - Y)\) but avoids forming \(K\) or inverting any \(m\times m\) matrix. It’s also robust to rank deficiency when \(m\gg N\).
2 Demo
The code below does three things
builds a tiny 1D GP prior (just a squared‑exp kernel over a grid),
generates one “truth” and a noisy observation \(y^*\) at a handful of points,
draws an empirical posterior ensemble via the Matheron/EnKF rule and checks it against the analytic Gaussian posterior.
Code
import numpy as npimport matplotlib.pyplot as pltfrom matplotlib.lines import Line2Dfrom livingthing.matplotlib_style import set_livingthing_style, reset_default_styleset_livingthing_style()rng = np.random.default_rng(11)# Dimensionsd, m, N =60, 10, 300# Prior: squared-exponential on a 1D griddef se_cov(n, ell=12.0, sigma=1.0, jitter=1e-8): x = np.arange(n)[:, None] S = sigma**2* np.exp(-0.5* (x - x.T)**2/ ell**2)return S + jitter*np.eye(n)m0 = np.zeros(d)Sigma = se_cov(d, ell=12.0, sigma=1.0)L = np.linalg.cholesky(Sigma)# Observation operator (m point picks) and noiseidx = np.sort(rng.choice(d, size=m, replace=False))H = np.zeros((m, d)); H[np.arange(m), idx] =1.0rho =0.15R = (rho**2) * np.eye(m)# One truth and its noisy observationsx_true = m0 + L @ rng.standard_normal(d)y_star = H @ x_true + rho * rng.standard_normal(m)# Prior ensembleX = m0[:, None] + L @ rng.standard_normal((d, N))# Choose EnKF flavourstochastic =True# True: perturbed obs; False: deterministic/square-rootif stochastic: Eps = rho * rng.standard_normal((m, N)) Y = H @ X + Eps gamma_diag =1e-9* np.ones(m) # only tiny ridge (R already inside sample cov through Y)else: Y = H @ X gamma_diag = np.diag(R) +1e-9# Centered ensemblesXc = X - X.mean(axis=1, keepdims=True) # d x NYc = Y - Y.mean(axis=1, keepdims=True) # m x N# Woodbury factorssqrtN1 = np.sqrt(N -1.0)U = Yc / sqrtN1 # m x NGamma_inv =1.0/ gamma_diag # length-m# residuals in obs spaceRes = (y_star[:, None] - Y) # m x N# Build A = U^T Γ^{-1} U (N x N) and B = U^T Γ^{-1} Res (N x N) efficientlyU_over_Gamma = U * Gamma_inv[:, None] # m x N (diag multiply)A = U.T @ U_over_Gamma # N x NB = U.T @ (Res * Gamma_inv[:, None]) # N x N# Solve (I + A) Q = B for Q, then ΔX = (1/sqrt(N-1)) Xc @ QI_N = np.eye(N)M = np.linalg.inv(I_N + A)Q = M @ B # N x NdX = (Xc @ Q) / sqrtN1 # d x NX_post_enkf = X + dX# Analytic posterior ensemble for comparisonS = H @ Sigma @ H.T + RK = Sigma @ H.T @ np.linalg.inv(S)mu_post = m0 + K @ (y_star - H @ m0)Sigma_post = Sigma - K @ H @ SigmaL_post = np.linalg.cholesky(Sigma_post +1e-9*np.eye(d))X_post_analytic = mu_post[:, None] + L_post @ rng.standard_normal((d, N))# Plotting (single axes)alpha =float(N)**(-0.75)alpha =max(min(alpha, 0.25), 0.003)x_axis = np.arange(d)c_enkf ='tab:blue'c_analytic ='tab:orange'c_obs ='black'y_min =min(X_post_enkf.min(), X_post_analytic.min(), y_star.min())y_max =max(X_post_enkf.max(), X_post_analytic.max(), y_star.max())pad =0.05*(y_max - y_min +1e-12)ylims = (y_min - pad, y_max + pad)plt.figure(figsize=(9, 4.8), dpi=120)for i inrange(N): plt.plot(x_axis, X_post_enkf[:, i], color=c_enkf, alpha=alpha, linewidth=1.0) plt.plot(x_axis, X_post_analytic[:, i], color=c_analytic, alpha=alpha, linewidth=1.0)plt.scatter(idx, y_star, s=32, color=c_obs, marker='o', alpha=0.9, zorder=3)proxy_enkf = Line2D([0], [0], color=c_enkf, lw=2, alpha=0.9, label='Empirical Matheron / EnKF (Woodbury, ΔX)')proxy_an = Line2D([0], [0], color=c_analytic, lw=2, alpha=0.9, label='Analytic Gaussian')proxy_obs = Line2D([0], [0], color=c_obs, marker='o', lw=0, label='Observations (m=10)')plt.legend(handles=[proxy_enkf, proxy_an, proxy_obs], loc='upper right', frameon=False)plt.title(f'Posterior samples (N={N}, {"stochastic"if stochastic else"deterministic"} EnKF)')plt.xlabel('state index')plt.ylabel('value')plt.ylim(*ylims)plt.xlim(0, d-1)plt.tight_layout()plt.show()# Quick numeric checkmean_rel_err = np.linalg.norm(X_post_enkf.mean(axis=1) - mu_post) / (np.linalg.norm(mu_post) +1e-12)cov_rel_err = np.linalg.norm(np.cov(X_post_enkf, bias=False) - Sigma_post, 'fro') / (np.linalg.norm(Sigma_post, 'fro') +1e-12)print("Relative mean error (EnKF vs analytic):", f"{mean_rel_err:.3e}")print("Relative cov error (EnKF vs analytic):", f"{cov_rel_err:.3e}")
Figure 1: Posterior samples via empirical Matheron/EnKF (Woodbury, ΔX form) vs analytic Gaussian; m=10 observations.
Relative mean error (EnKF vs analytic): 5.756e-02
Relative cov error (EnKF vs analytic): 8.156e-02
3 How to read it
Everything is standard linear‑Gaussian conditioning; the only twist is replacing true covariances by ensemble covariances.
With a noisy observation ensemble \(Y=HX+\varepsilon\), the empirical \(C_{yy}\) already contains \(R\); do not add it again.
With a noise‑free observation ensemble \(Y=HX\), add \(R\) explicitly to \(\widehat C_{yy}\) (and you’ll typically use a square‑root transform to update the ensemble).
4 Why I care
It gives a pathwise sampler: samples are obtained by an affine residual correction—no Cholesky of the posterior needed.
It unifies toolboxes: localisation, inflation, and square‑root tricks from EnKF port straight over to pathwise GP sampling; conversely, autodiff and sparse‑GP ideas go the other way.
It clarifies the zoo: stochastic EnKF, conditional GP all share the same core update; the differences are entirely in how we build \(\widehat C\) and whether we perturb \(y\).
The reason I particularly care about this is that we are moving towards the world of generative AI. These days we work in samples, not densities.
If what we actually need are posterior samples, then the Matheron/EnKF rule is the “right thing”: it’s an affine pushforward that takes a prior draw \(x\) to a posterior draw \(x' = x + C_{xy}C_{yy}^{-1}(y^\* - y)\). This avoids building or storing any \(d\times d\) covariance at all, needs only matrix–vector products with \(H\) and solves in the observation space, and gives us Monte-Carlo access to any posterior functional we care about. In other words, computing the full posterior covariance is not just expensive in large \(d\); it’s usually irrelevant. The sample-to-sample map is also a reparameterization, so it’s differentiable and streams naturally as new data arrive (update the same ensemble, don’t refactor a giant matrix). We can treat the posterior as a set of paths, not a density we must summarize, and the implementation choices make themselves. See the toy code path for a minimal demonstration of this empirical posterior sampler.
5 Implications
Understanding the EnKF as an empirical Matheron update opens up several possibilities. Both the Matheron update and Ensemble Kalman methods have been extensively studied in their respective fields. In recent years, numerous developments in Bayesian machine learning have advanced both approaches; see the many developments in Ensemble Kalman methods in the Ensemble Kalman notebook and especially the Neural networks meet Kalman filters notebook.
If we interpret the pathwise Matheron update in the context of the empirical measures used in the EnKF, it suggests the potential to import developments from one field to the other. The vast literature on the EnKF indicates that many ideas developed there might be applicable to Gaussian Process regression via the Matheron update. For instance, the clear probabilistic interpretation of the ensemble update could inspire improved regularization and localization strategies. One example is the Local Ensemble Transform Kalman Filter (Bocquet, Farchi, and Malartic 2020), which provides state-of-the-art performance in high-dimensional filtering and system identification settings and could be adapted to the Matheron update (homework exercise).
This matches the posterior mean and covariance; producing an actual posterior sample then requires an ensemble transform (ETKF/LETKF, etc.) rather than the residual‑add form above.
Chilès, and Desassis. 2018. “Fifty Years of Kriging.” In Handbook of Mathematical Geosciences.
Chilès, and Lantuéjoul. 2005. “Prediction by Conditional Simulation: Models and Algorithms.” In Space, Structure and Randomness: Contributions in Honor of Georges Matheron in the Field of Geostatistics, Random Sets and Mathematical Morphology. Lecture Notes in Statistics.