Transforms of Gaussian noise

Delta method, error propagation, unscented transform, Taylor expansion…

November 25, 2014 — May 22, 2024

approximation
Bayes
dynamical systems
Gaussian
Hilbert space
linear algebra
Lévy processes
Markov processes
networks
optimization
probability
SDEs
signal processing
state space models
statistics
stochastic processes
time series
Figure 1

I have a nonlinear transformation of a Gaussian process. What is its distribution? Delta methods, influence functions and other locally-Gaussian transformations of noises. A workhorse of Bayesian filtering and smoothing; as such see Simo Särkkä (2013) for a broad introduction to applications.

See transforms of RVs for non-Gaussian results.

1 Taylor expansion

Figure 2: The 1D Taylor approximation according to Arras (1998)

A.k.a. propagation of errors, or the 𝛿-method (Dorfman 1938; Ver Hoef 2012). Not complicated but it can be a little subtle. For a general exposition which handles first and second-order transforms, I recommend Gustafsson and Hendeby (2012), which as a bonus proves some things which seem obvious but are not, in fact, obvious to prove, and disproves some things which seemed obviously true to me. Arras (1998) is possibly the most simple introduction.

Taylor expansion works if the transformation in question is smooth enough and the approximation only needs to be accurate about the expansion point.

Todo: treat expansion point and mean separately.

Consider a general nonlinear differentiable transformation \(g\) and its second order Taylor expansion. We apply \(g:\mathbb{R}^{n_{x}}\to\mathbb{R}^{n_{z}}\) to a variable \(x,\) defining \(z:=g(x).\) Let \(\mathrm{E}(x)=\mu_{x}\) and \(\operatorname{Var}(x)=P_{x}.\) The Hessian of the \(i^{\text {th }}\) component of \(g\) is denoted \(g_{i}^{\prime \prime}.\) \([x_i]_i\) is a vector where the \(i\)th element is \(x_i\). We approximate \(z\) using the Taylor expansion, \[z=g\left(\mu_{x}\right)+g^{\prime}\left(\mu_{x}\right)\left(x-\mu_{x}\right)+\left[\frac{1}{2}\left(x-\mu_{x}\right)^{T} g_{i}^{\prime \prime}\left(\mu_{x}\right)\left(x-\mu_{x}\right)\right]_{i}.\] Leaving aside questions of when this is convergent for now, and assume it is. Then we assert \(z\sim\mathcal{N}(\mu_z,P_z)\). The first moment of \(z\) is given by \[ \mu_{z}=g\left(\mu_{x}\right)+\frac{1}{2}\left[\operatorname{tr}\left(g_{i}^{\prime \prime}\left(\mu_{x}\right) P_{x}\right)\right]_{i} \] Further, let \(x \sim \mathcal{N}\left(\mu_{x}, P_{x}\right)\), then the second moment of \(z\) is given by \[ P_{z}=g^{\prime}\left(\mu_{x}\right) P_{x}\left(g^{\prime}\left(\mu_{x}\right)\right)^{T}+\frac{1}{2}\left[\operatorname{tr}\left(g_{i}^{\prime \prime}\left(\mu_{x}\right) P_{x} g_{j}^{\prime \prime}\left(\mu_{x}\right) P_{x}\right)\right]_{i j} \] with \(i, j=1, \ldots, n_{z}.\)

This approach is finite dimensional, but it also generalises to Gaussian processes, in that we can, at any finite number of test locations, once again find a first order approximation. See the non-parametric case.

Note that here I have assumed that we have the luxury of expanding the distribution about the mean, which would be a factor encouraging me to attempt to get away with only taking the first order Taylor transform. Since I have bothered to take a second-order expansion here, I should give the expansion about an arbitrary point which is not necessarily the mean, for the sake of making the generality worth it.

Question: In what metric, if any, have we minimised the error of our approximation by doing this?

2 Monte Carlo moment matching

Classic Monte Carlo methods use a sample to approximate the moments of a distribution, as seen in ensemble Kalman methods.

3 Monte Carlo gradient descent in some metric

If we choose some Monte Carlo method then we can use gradient information to approximate the target in any useful probability metric). This is not special to Gaussian processes, but works with any old stochastic variational method.

3.1 In terms of KL

Suppose we consider the approximation problem in terms of Kullback Leibler divergence between the approximation and the truth. TBC.

3.2 In terms of Wasserstein

TBC.

4 Unscented transform

The great invention of Uhlmann and Julier is the unscented transform, which uses a cunningly-chosen non-random empirical sample at so-called sigma-points to approximate the transformed distribution via its moments. I think that anything using sigma points is an unscented transform? Otherwise it is just garden-variety moment-matching.

Often seen in the context of Kalman filtering.

What the Unscented Transform does is to replace the mean vector and its associated error covariance matrix with a special set of points with the same mean and covariance. In the case of the mean and covariance representing the current position estimate for a target, the UT is applied to obtain a set of points, referred to as sigma points, to which the full nonlinear equations of motion can be applied directly. In other words, instead of having to derive a linearized approximation, the equations could simply be applied to each of the points as if it were the true state of the target. The result is a transformed set of points, and the mean and covariance of that set represents the estimate of the predicted state of the target.

See, e.g., Roth, Hendeby, and Gustafsson (2016) and a comparison with the Taylor expansion in Gustafsson and Hendeby (2012).

Question: What would we need to do to apply the unscented transform to non-Gaussian distributions? See Ebeigbe et al. (2021).

5 Subsampled

See GP by GD.

6 Fisher information

Fisher information between two Gaussians

7 Chaos expansions

See chaos expansions.

8 Gaussian processes

Propagating error of Gaussian process inputs is a functional GP problem. TBD.

Related, propagating error through a GP regression. See Emmanuel Johnson’s Linearized GP site (mildly idiosyncratic notation and very idiosyncratic website navigation).

The following references from Emmanuel Johnson’s lit review look promising: Deisenroth and Mohamed (2012);Girard and Murray-Smith (2003);Ko and Fox (2009) and McHutchon and Rasmussen (2011).

I am curious what, if anything, they add to Murray-Smith and Pearlmutter (2005).

9 References

Arras. 1998. An Introduction To Error Propagation: Derivation, Meaning and Examples of Equation CY = FX CX FXT.”
Bickson. 2009. Gaussian Belief Propagation: Theory and Application.”
Bishop, and Doucet. 2014. Distributed Nonlinear Consensus in the Space of Probability Measures.” IFAC Proceedings Volumes, 19th IFAC World Congress,.
Calandra, Peters, Rasmussen, et al. 2016. Manifold Gaussian Processes for Regression.” In 2016 International Joint Conference on Neural Networks (IJCNN).
Davison, and Ortiz. 2019. FutureMapping 2: Gaussian Belief Propagation for Spatial AI.” arXiv:1910.14139 [Cs].
Deisenroth, and Mohamed. 2012. Expectation Propagation in Gaussian Process Dynamical Systems.” In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2. NIPS’12.
———. 2016. Expectation Propagation in Gaussian Process Dynamical Systems: Extended Version.” arXiv:1207.2940 [Cs, Stat].
Dorfman. 1938. A Note on the 𝛿-Method for Finding Variance Formulae.” Biometric Bulletin.
Ebeigbe, Berry, Norton, et al. 2021. A Generalized Unscented Transformation for Probability Distributions.” ArXiv.
Gao, Sitharam, and Roitberg. 2020. Bounds on the Jensen Gap, and Implications for Mean-Concentrated Distributions.”
Girard, and Murray-Smith. 2003. “Learning a Gaussian Process Model with Uncertain Inputs.”
Girard, Rasmussen, and Murray-Smith. 2002. “Gaussian Process Priors with Uncertain Inputs: Multiple-Step-Ahead Prediction.”
Grosse. 2021. Taylor Approximations.” In CSC2541 Winter 2021.
Gustafsson, and Hendeby. 2008. On Nonlinear Transformations of Stochastic Variables and Its Application to Nonlinear Filtering.” In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
———. 2012. Some Relations Between Extended and Unscented Kalman Filters.” IEEE Transactions on Signal Processing.
Hegde, Heinonen, Lähdesmäki, et al. 2018. Deep Learning with Differential Gaussian Process Flows.” arXiv:1810.04066 [Cs, Stat].
Hendeby, and Gustafsson. 2007. On Nonlinear Transformations of Gaussian Distributions.”
Holderrieth, Hutchinson, and Teh. 2021. Equivariant Learning of Stochastic Fields: Gaussian Processes and Steerable Conditional Neural Processes.” In Proceedings of the 38th International Conference on Machine Learning.
Jankowiak, Pleiss, and Gardner. 2020. Deep Sigma Point Processes.” In Conference on Uncertainty in Artificial Intelligence.
Jin, Judd, and Insitution. n.d. “Perturbation Methods for General Dynamic Stochastic Models.”
Ko, and Fox. 2009. GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction and Observation Models.” In Autonomous Robots.
Lin, Khan, and Schmidt. 2019. Stein’s Lemma for the Reparameterization Trick with Exponential Family Mixtures.” arXiv:1910.13398 [Cs, Stat].
Liou, Su, Chiang, et al. 2011. Gamma Random Field Simulation by a Covariance Matrix Transformation Method.” Stochastic Environmental Research and Risk Assessment.
Majumdar, and Majumdar. 2019. On the Conditional Distribution of a Multivariate Normal Given a Transformation – the Linear Case.” Heliyon.
Marzouk, Moselhy, Parno, et al. 2016. Sampling via Measure Transport: An Introduction.” In Handbook of Uncertainty Quantification.
McHutchon, and Rasmussen. 2011. Gaussian Process Training with Input Noise.” In Proceedings of the 24th International Conference on Neural Information Processing Systems. NIPS’11.
Meyer, Hlinka, and Hlawatsch. 2014. Sigma Point Belief Propagation.” IEEE Signal Processing Letters.
Minka. 2001. Expectation Propagation for Approximate Bayesian Inference.” In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. UAI’01.
Murray-Smith, and Pearlmutter. 2005. Transformations of Gaussian Process Priors.” In Deterministic and Statistical Methods in Machine Learning. Lecture Notes in Computer Science.
Oehlert. 1992. A Note on the Delta Method.” The American Statistician.
Ortiz, Evans, and Davison. 2021. A Visual Introduction to Gaussian Belief Propagation.” arXiv:2107.02308 [Cs].
Papadopoulos, Edwards, and Murray. 2001. Confidence Estimation Methods for Neural Networks: A Practical Comparison.” IEEE Transactions on Neural Networks.
Quiñonero-Candela. 2004. “Learning with Uncertainty-Gaussian Processes and Relevance Vector Machines.” Technical University of Denmark, Copenhagen.
Roth, Hendeby, and Gustafsson. 2016. Nonlinear Kalman Filters Explained: A Tutorial on Moment Computations and Sigma Point Methods.” Journal of Advances in Information Fusion.
Ruiz, Titsias, and Blei. 2016. The Generalized Reparameterization Gradient.” In Advances In Neural Information Processing Systems.
Särkkä, Simo. 2007. On Unscented Kalman Filtering for State Estimation of Continuous-Time Nonlinear Systems.” IEEE Transactions on Automatic Control.
———. 2013. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks 3.
Särkkä, S., and Hartikainen. 2013. Non-Linear Noise Adaptive Kalman Filtering via Variational Bayes.” In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).
Simic. 2008. On a Global Upper Bound for Jensen’s Inequality.” Journal of Mathematical Analysis and Applications.
Spantini, Baptista, and Marzouk. 2022. Coupling Techniques for Nonlinear Ensemble Filtering.” SIAM Review.
Tran, Dusenberry, van der Wilk, et al. 2018. Bayesian Layers: A Module for Neural Network Uncertainty.”
Tran, Ranganath, and Blei. 2015. The Variational Gaussian Process.” In Proceedings of ICLR.
Ver Hoef. 2012. Who Invented the Delta Method? The American Statistician.
Wilkinson, William J, Chang, Andersen, et al. 2019. “Global Approximate Inference via Local Linearisation for Temporal Gaussian Processes.”
Wilkinson, William J., Chang, Andersen, et al. 2020. State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes.” In ICML.
Wilkinson, William J., Särkkä, and Solin. 2021. Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees.”
Wolter. 2007. Introduction to Variance Estimation. Statistics for Social and Behavioral Sciences.