Reparameterization trick.
A trick where we cleverly transform RVs to sample from tricky target distributions, and their jacobians, via a “nice” nice source distribution.
Useful in e.g. variational inference, especially
autoencoders,
for
density estimation in
probabilistic deep learning.
Pairs well with normalizing flows to get powerful target distributions.
Storchastic credits pathwise gradients to Glasserman and Ho (1991) as *perturbation analysis*.

## Tutorials

- Shakir Mohamed, Machine Learning Trick of the Day (4): Reparameterisation Tricks:

Suppose we want the gradient of an expectation of a smooth function \(f\): \[ \nabla_\theta \mathbb {E}_{p(z; \theta)}[f (z)]=\nabla_\theta \int p(z; \theta) f (z) d z \] […] This gradient is often difficult to compute because the integral is typically unknown and the parameters \(\theta\), with respect to which we are computing the gradient, are of the distribution \(p(z; \theta)\).

Now we suppose that we know some function \(g\) such that for some easy distribution \(p(\epsilon)\), \(z | \theta=g(\epsilon, \theta)\). Now we can try to estimate the gradient of the expectation by Monte Carlo:

\[ \nabla_\theta \mathbb {E}_{p(z; \theta)}[f (z)]=\mathbb {E}_{p (c)}\left[\nabla_\theta f(g(\epsilon, \theta))\right] \] Let’s derive this expression and explore the implications of it for our optimisation problem. One-liners give us a transformation from a distribution \(p(\epsilon)\) to another \(p (z)\), thus the differential area (mass of the distribution) is invariant under the change of variables. This property implies that: \[ p (z)=\left|\frac{d \epsilon}{d z}\right—p(\epsilon) \Longrightarrow—p (z) d z|=|p(\epsilon) d \epsilon| \] Re-expressing the troublesome stochastic optimisation problem using random variate reparameterisation, we find: \[ \begin {aligned} & \nabla_\theta \mathbb {E}_{p(z; \theta)}[f (z)]=\nabla_\theta \int p(z; \theta) f (z) d z \\ = & \nabla_\theta \int p(\epsilon) f (z) d \epsilon=\nabla_\theta \int p(\epsilon) f(g(\epsilon, \theta)) d \epsilon \\ = & \nabla_\theta \mathbb {E}_{p (c)}[f(g(\epsilon, \theta))]=\mathbb {E}_{p (e)}\left[\nabla_\theta f(g(\epsilon, \theta))\right] \end {aligned} \]

- Yuge Shi’s variational inference tutorial is a tour of cunning reparameterisation gradient tricks. Written for her paper Shi et al. (2019). She punts some details to Mohamed et al. (2020) which in turn tells me that this adventure continues at reparameterization gradients and Monte Carlo gradient estimation. Figurnov, Mohamed, and Mnih (2018), Devroye (2006) and Jankowiak and Obermeyer (2018).

## Normalizing flows

Cunning reparameterization maps with desirable properties for nonparametric inference. See normalizing flows.

## General measure transport

See transport maps.

## Tooling

## Incoming

Universal representation theorems? Probably many, here are some I saw: Perekrestenko, Müller, and Bölcskei (2020); Perekrestenko, Eberhard, and Bölcskei (2021).

## References

*Gradient Flows: In Metric Spaces and in the Space of Probability Measures*. 2nd ed. Lectures in Mathematics. ETH Zürich. Birkhäuser Basel.

*arXiv:1707.01069 [Cs, Stat]*, July.

*UAI18*.

*Advances in Neural Information Processing Systems*.

*arXiv:2105.04471 [Cs, Stat]*, March.

*arXiv:1709.01179 [Stat]*, September.

*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc.

*Simulation*, edited by Shane G. Henderson and Barry L. Nelson, 13:83–121. Handbooks in Operations Research and Management Science. Elsevier.

*Advances In Neural Information Processing Systems*.

*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 441–52. Curran Associates, Inc.

*Gradient Estimation Via Perturbation Analysis*. Springer Science & Business Media.

*arXiv:1810.01367 [Cs, Stat]*, October.

*arXiv:1804.00779 [Cs, Stat]*, April.

*International Conference on Machine Learning*, 2235–44.

*Advances in Neural Information Processing Systems 29*. Curran Associates, Inc.

*Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2*, 2575–83. NIPS’15. Cambridge, MA, USA: MIT Press.

*ICLR 2014 Conference*.

*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 10236–45. Curran Associates, Inc.

*arXiv:2010.01155 [Cs, Stat]*, October.

*arXiv:1910.13398 [Cs, Stat]*, October.

*PMLR*, 2218–27.

*Advances in Neural Information Processing Systems*. Vol. 33.

*Handbook of Uncertainty Quantification*, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1:1–41. Cham: Springer Heidelberg.

*arXiv:2003.08063 [Cs, Math, Stat]*, March.

*Journal of Machine Learning Research*21 (132): 1–62.

*arXiv:2007.00248 [Stat]*, July.

*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 2338–47. Curran Associates, Inc.

*Journal of Machine Learning Research*22 (57): 1–64.

*Partial Differential Equations and Applications*2 (5): 64.

*Neural Computation*29 (5): 1151–1203.

*International Conference on Machine Learning*, 1530–38. ICML’15. Lille, France: JMLR.org.

*Proceedings of ICML*.

*arXiv:1302.5125 [Cs, Stat]*, February.

*Advances In Neural Information Processing Systems*.

*arXiv:1911.03393 [Cs, Stat]*, November.

*SIAM Review*64 (4): 921–53.

*Journal of Machine Learning Research*19 (66): 2639–709.

*Communications on Pure and Applied Mathematics*66 (2): 145–64.

*Communications in Mathematical Sciences*8 (1): 217–33.

*Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)*, 284–94. Minneapolis, Minnesota: Association for Computational Linguistics.

*arXiv:1809.10330 [Cs, Stat]*, September.

*arXiv:2101.12353 [Cs, Math, Stat]*, January.

*arXiv:1801.07922 [Math]*, January.

*Journal of Geophysical Research: Solid Earth*126 (7): e2021JB022320.

## No comments yet. Why not leave one?