Smooth transforms of Gaussian noise

Delta method, error propagation, unscented transform, Taylor expansion, in finite and inifinite dimensional spaces

I have a nonlinear transformation of a Gaussian process. What is its distribution? Delta methods, influence functions and other locally-Gaussian transformations of noises. The workhorse of Bayesian filtering and smoothing; as such see Särkkä (2013) for a broad introduction.

See transforms of RVs for non-Gaussian results.

Taylor expansion

Not complicated but it can be a little subtle. For a general exposition which handles first and second-order transforms in context of all the other “easy“ transforms, I recommend Gustafsson and Hendeby (2012), which as a bonus proves some things which look obvious are not, in fact, obvious to prove, and disproves some things which looked obvious to me. Arras (1998) is also good.

Consider a general nonlinear differentiable transformation \(g\) and its second order Taylor expansion. We apply \(g:\mathbb{R}^{n_{x}}\to\mathbb{R}^{n_{z}}\) to a variable \(x,\) defining \(z:=g(x).\) Let \(\mathrm{E}(x)=\mu_{x}\) and \(\operatorname{Var}(x)=P_{x}.\) The Hessian of the \(i^{\text {th }}\) component of \(g\) is denoted \(g_{i}^{\prime \prime}.\) \([x_i]_i\) is a vector where the \(i\)th element is \(x_i\). We will approximate \(z\) using the Taylor expansion, \[z=g\left(\mu_{x}\right)+g^{\prime}\left(\mu_{x}\right)\left(x-\mu_{x}\right)+\left[\frac{1}{2}\left(x-\mu_{x}\right)^{T} g_{i}^{\prime \prime}\left(\mu_{x}\right)\left(x-\mu_{x}\right)\right]_{i}.\] Leaving aside questions of when this is convergent for now, and assume it is. Then we assert \(z\sim\mathcal{N}(\mu_z,P_z)\). The first moment of \(z\) is given by \[ \mu_{z}=g\left(\mu_{x}\right)+\frac{1}{2}\left[\operatorname{tr}\left(g_{i}^{\prime \prime}\left(\mu_{x}\right) P_{x}\right)\right]_{i} \] Further, let \(x \sim \mathcal{N}\left(\mu_{x}, P_{x}\right)\), then the second moment of \(z\) is given by \[ P_{z}=g^{\prime}\left(\mu_{x}\right) P_{x}\left(g^{\prime}\left(\mu_{x}\right)\right)^{T}+\frac{1}{2}\left[\operatorname{tr}\left(g_{i}^{\prime \prime}\left(\mu_{x}\right) P_{x} g_{j}^{\prime \prime}\left(\mu_{x}\right) P_{x}\right)\right]_{i j} \] with \(i, j=1, \ldots, n_{z}.\)

This approach is finite dimensional, but it also generalises to Gaussian processes, in that we can, at any finite number of test locations, once again find a first order approximation. See the [non-parametric case][#nonparametric].

Note that here I have assumed that we have the luxury of expanding the distribution about the mean, in which case I would probably usually only take a first order Taylor transform. Since I have a second-order expansion here, I should give the expansion about an arbitrary point which is not necessarily the mean, for the sake of making the generality worth it.

Unscented transform

The great invention of Uhlmann and Julier is the unscented transform, which uses a cunningly-chosen non-random empirical sample at so-called \(\sigma\)-points to approximate the transformed distribution via its moments. I think that anything using sigma points is an unscented transform? Otherwise it is just garden-variety moment-matching.

Mostly seen in the context of Kalman filtering.

What the Unscented Transform does is to replace the mean vector and its associated error covariance matrix with a special set of points with the same mean and covariance. In the case of the mean and covariance representing the current position estimate for a target, the UT is applied to obtain a set of points, referred to as sigma points, to which the full nonlinear equations of motion can be applied directly. In other words, instead of having to derive a linearized approximation, the equations could simply be applied to each of the points as if it were the true state of the target. The result is a transformed set of points, and the mean and covariance of that set represents the estimate of the predicted state of the target.

See, e.g., Roth, Hendeby, and Gustafsson (2016) and a comparison with the Taylor expansion in Gustafsson and Hendeby (2012).

Question: What would we need to do to apply the unscented transform to non-Gaussian distributions? See Ebeigbe et al. (2021).

Gaussian processes

Propagating error through Gaussian process inputs. See Emmanuel Johnson’s Linearized GPsite (mildly idiosyncratic notation and very idiosyncratic website navigation) and Arras (1998). The following references from Emmanuel Johnson’s lit review look promising: Deisenroth and Mohamed (2012); Girard and Murray-Smith (2003); Ko and Fox (2009) and McHutchon and Rasmussen (2011).



Arras, Kai Oliver. 1998. An Introduction To Error Propagation: Derivation, Meaning and Examples of Equation CY = FX CX FXT,” 22.
Bickson, Danny. 2009. Gaussian Belief Propagation: Theory and Application.” PhD.
Bishop, Adrian N., and Arnaud Doucet. 2014. Distributed Nonlinear Consensus in the Space of Probability Measures.” IFAC Proceedings Volumes, 19th IFAC World Congress, 47 (3): 8662–68.
Candela, J. Quinonero. 2004. “Learning with Uncertainty-Gaussian Processes and Relevance Vector Machines.” Technical University of Denmark, Copenhagen.
Davison, Andrew J., and Joseph Ortiz. 2019. FutureMapping 2: Gaussian Belief Propagation for Spatial AI.” arXiv:1910.14139 [Cs], October.
Deisenroth, Marc Peter, and Shakir Mohamed. 2012. “Expectation Propagation in Gaussian Process Dynamical Systems.” In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2, 25:2609–17. NIPS’12. Red Hook, NY, USA: Curran Associates Inc.
———. 2016. Expectation Propagation in Gaussian Process Dynamical Systems: Extended Version.” arXiv:1207.2940 [Cs, Stat], August.
Ebeigbe, Donald, Tyrus Berry, Michael M. Norton, Andrew J. Whalen, Dan Simon, Timothy Sauer, and Steven J. Schiff. 2021. A Generalized Unscented Transformation for Probability Distributions.” ArXiv, April, arXiv:2104.01958v1.
Girard, Agathe, and Roderick Murray-Smith. 2003. “Learning a Gaussian Process Model with Uncertain Inputs,” 10.
Girard, Agathe, Carl Edward Rasmussen, and Roderick Murray-Smith. 2002. “Gaussian Process Priors with Uncertain Inputs: Multiple-Step-Ahead Prediction,” 18.
Grosse, Roger. 2021. “Chapter 2: Taylor Approximations,” 22.
Gustafsson, Fredrik, and Gustaf Hendeby. 2008. On Nonlinear Transformations of Stochastic Variables and Its Application to Nonlinear Filtering.” In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 3617–20.
———. 2012. Some Relations Between Extended and Unscented Kalman Filters.” IEEE Transactions on Signal Processing 60 (2): 545–55.
Jankowiak, Martin, Geoff Pleiss, and Jacob Gardner. 2020. Deep Sigma Point Processes.” In Conference on Uncertainty in Artificial Intelligence, 789–98. PMLR.
Jin, He-hui, Kenneth L Judd, and Hoover Insitution. n.d. “Perturbation Methods for General Dynamic Stochastic Models,” 44.
Ko, Jonathan, and Dieter Fox. 2009. GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction and Observation Models.” In Autonomous Robots, 27:75–90.
Lin, Wu, Mohammad Emtiyaz Khan, and Mark Schmidt. 2019. Stein’s Lemma for the Reparameterization Trick with Exponential Family Mixtures.” arXiv:1910.13398 [Cs, Stat], October.
Majumdar, Rajeshwari, and Suman Majumdar. 2019. On the Conditional Distribution of a Multivariate Normal Given a Transformation – the Linear Case.” Heliyon 5 (2): e01136.
McHutchon, Andrew, and Carl Edward Rasmussen. 2011. Gaussian Process Training with Input Noise.” In Proceedings of the 24th International Conference on Neural Information Processing Systems, 24:1341–49. NIPS’11. Red Hook, NY, USA: Curran Associates Inc.
Minka, Thomas P. 2001. Expectation Propagation for Approximate Bayesian Inference.” In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, 362–69. UAI’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Ortiz, Joseph, Talfan Evans, and Andrew J. Davison. 2021. A Visual Introduction to Gaussian Belief Propagation.” arXiv:2107.02308 [Cs], July.
Papadopoulos, G., P.J. Edwards, and A.F. Murray. 2001. Confidence Estimation Methods for Neural Networks: A Practical Comparison.” IEEE Transactions on Neural Networks 12 (6): 1278–87.
Roth, Michael, Gustaf Hendeby, and Fredrik Gustafsson. 2016. “Nonlinear Kalman Filters Explained: A Tutorial on Moment Computations and Sigma Point Methods” 11 (1): 24.
Särkkä, Simo. 2007. On Unscented Kalman Filtering for State Estimation of Continuous-Time Nonlinear Systems.” IEEE Transactions on Automatic Control 52 (9): 1631–41.
———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3. Cambridge, U.K. ; New York: Cambridge University Press.
Wilkinson, William J., Paul E. Chang, Michael Riis Andersen, and Arno Solin. 2020. State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes.” In ICML.
Wilkinson, William J, Paul E Chang, Michael Riis Andersen, and Arno Solin. 2019. “Global Approximate Inference via Local Linearisation for Temporal Gaussian Processes,” 12.
Wilkinson, William J., Simo Särkkä, and Arno Solin. 2021. Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees.” arXiv:2111.01721 [Cs, Stat], November.
Wolter, Kirk M. 2007. Introduction to Variance Estimation. 2nd ed. Statistics for Social and Behavioral Sciences. New York: Springer.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.