Noise outsourcing
June 3, 2024 — June 3, 2024
A useful result from probability theory for, for example, reparameterization, or learning with symmetries.
Bloem-Reddy and Teh (2020):
Noise outsourcing is a standard technical tool from measure theoretic probability, where it is also known by other names such as transfer […]. For any two random variables \(X\) and \(Y\) taking values in nice spaces (e.g., Borel spaces), noise outsourcing says that there exists a functional representation of samples from the conditional distribution \(P_{Y \mid X}\) in terms of \(X\) and independent noise: \(Y \stackrel{\text { ass }}{=} f(\eta, X)\). […]the relevant property of \(\eta\) is its independence from \(X\), and the uniform distribution could be replaced by any other random variable taking values in a Borel space, for example a standard normal on \(\mathbb{R}\), and the result would still hold, albeit with a different \(f\).
Basic noise outsourcing can be refined in the presence of conditional independence. Let \(S: \mathcal{X} \rightarrow \mathcal{S}\) be a statistic such that \(Y\) and \(X\) are conditionally independent, given \(S(X)\) : \(Y ⫫_{S(X)} X\). The following basic result[…] says that if there is a statistic \(S\) that d-separates \(X\) and \(Y\), then it is possible to represent \(Y\) as a noise-outsourced function of \(S\).
Lemma 5. Let \(X\) and \(Y\) be random variables with joint distribution \(P_{X, Y}\). Let \(\mathcal{S}\) be a standard Borel space and \(S: \mathcal{X} \rightarrow \mathcal{S}\) a measurable map. Then \(S(X) d\)-separates \(X\) and \(Y\) if and only if there is a measurable function \(f:[0,1] \times \mathcal{S} \rightarrow \mathcal{Y}\) such that \[ (X, Y) \stackrel{\text { as }}{=}(X, f(\eta, S(X))) \text { where } \eta \sim \operatorname{Unif}[0,1] \text { and } \eta ⫫ X \text {. } \]
In particular, \(Y=f(\eta, S(X))\) has distribution \(P_{Y \mid X}\). […]Note that in general, \(f\) is measurable but need not be differentiable or otherwise have desirable properties, although for modeling purposes it can be limited to functions belonging to a tractable class (e.g., differentiable, parameterized by a neural network). Note also that the identity map \(S(X)=X\) trivially d-separates \(X\) and \(Y\), so that \(Y \stackrel{\text { as }}{=} f(\eta, X)\), which is standard noise outsourcing (e.g., Austin (2015), Lem. 3.1).