# Reparameterization methods for MC gradient estimation

April 4, 2018 — May 2, 2023

approximation
Bayes
density
likelihood free
Monte Carlo
nonparametric
optimization
probabilistic algorithms
probability
sciml
statistics

Reparameterization trick. A trick where we cleverly transform RVs to sample from tricky target distributions, and their jacobians, via a “nice” nice source distribution. Useful in e.g. variational inference, especially autoencoders, for density estimation in probabilistic deep learning. Pairs well with normalizing flows to get powerful target distributions. Storchastic credits pathwise gradients to Glasserman and Ho (1991) as perturbation analysis.

## 1 Tutorials

Suppose we want the gradient of an expectation of a smooth function $$f$$: $\nabla_\theta \mathbb {E}_{p(z; \theta)}[f (z)]=\nabla_\theta \int p(z; \theta) f (z) d z$ […] This gradient is often difficult to compute because the integral is typically unknown and the parameters $$\theta$$, with respect to which we are computing the gradient, are of the distribution $$p(z; \theta)$$.

Now we suppose that we know some function $$g$$ such that for some easy distribution $$p(\epsilon)$$, $$z | \theta=g(\epsilon, \theta)$$. Now we can try to estimate the gradient of the expectation by Monte Carlo:

$\nabla_\theta \mathbb {E}_{p(z; \theta)}[f (z)]=\mathbb {E}_{p (c)}\left[\nabla_\theta f(g(\epsilon, \theta))\right]$ Let’s derive this expression and explore the implications of it for our optimisation problem. One-liners give us a transformation from a distribution $$p(\epsilon)$$ to another $$p (z)$$, thus the differential area (mass of the distribution) is invariant under the change of variables. This property implies that: $p (z)=\left|\frac{d \epsilon}{d z}\right—p(\epsilon) \Longrightarrow—p (z) d z|=|p(\epsilon) d \epsilon|$ Re-expressing the troublesome stochastic optimisation problem using random variate reparameterisation, we find: \begin {aligned} & \nabla_\theta \mathbb {E}_{p(z; \theta)}[f (z)]=\nabla_\theta \int p(z; \theta) f (z) d z \\ = & \nabla_\theta \int p(\epsilon) f (z) d \epsilon=\nabla_\theta \int p(\epsilon) f(g(\epsilon, \theta)) d \epsilon \\ = & \nabla_\theta \mathbb {E}_{p (c)}[f(g(\epsilon, \theta))]=\mathbb {E}_{p (e)}\left[\nabla_\theta f(g(\epsilon, \theta))\right] \end {aligned}

## 2 Normalizing flows

Cunning reparameterization maps with desirable properties for nonparametric inference. See normalizing flows.

## 3 General measure transport

See transport maps.

## 5 Incoming

Universal representation theorems? Probably many, here are some I saw: Perekrestenko, Müller, and Bölcskei (2020); Perekrestenko, Eberhard, and Bölcskei (2021).

## 6 References

Albergo, Goldstein, Boffi, et al. 2023.
Albergo, and Vanden-Eijnden. 2023. In.
Ambrosio, Gigli, and Savare. 2008. Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics. ETH Zürich.
Bamler, and Mandt. 2017. arXiv:1707.01069 [Cs, Stat].
Caterini, Doucet, and Sejdinovic. 2018. In Advances in Neural Information Processing Systems.
Charpentier, Borchert, Zügner, et al. 2022. arXiv:2105.04471 [Cs, Stat].
Chen, Changyou, Li, Chen, et al. 2017. arXiv:1709.01179 [Stat].
Chen, Tian Qi, Rubanova, Bettencourt, et al. 2018. In Advances in Neural Information Processing Systems 31.
Devroye. 2006. In Simulation. Handbooks in Operations Research and Management Science.
Dinh, Sohl-Dickstein, and Bengio. 2016. In Advances In Neural Information Processing Systems.
Figurnov, Mohamed, and Mnih. 2018. In Advances in Neural Information Processing Systems 31.
Glasserman, and Ho. 1991. Gradient Estimation Via Perturbation Analysis.
Grathwohl, Chen, Bettencourt, et al. 2018. arXiv:1810.01367 [Cs, Stat].
Huang, Krueger, Lacoste, et al. 2018. arXiv:1804.00779 [Cs, Stat].
Jankowiak, and Obermeyer. 2018. In International Conference on Machine Learning.
Kingma, Durk P, and Dhariwal. 2018. In Advances in Neural Information Processing Systems 31.
Kingma, Diederik P., Salimans, Jozefowicz, et al. 2016. In Advances in Neural Information Processing Systems 29.
Kingma, Diederik P., Salimans, and Welling. 2015. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. NIPS’15.
Kingma, Diederik P., and Welling. 2014. In ICLR 2014 Conference.
Koehler, Mehta, and Risteski. 2020. arXiv:2010.01155 [Cs, Stat].
Lin, Khan, and Schmidt. 2019. arXiv:1910.13398 [Cs, Stat].
Lipman, Chen, Ben-Hamu, et al. 2023.
Louizos, and Welling. 2017. In PMLR.
Lu, and Huang. 2020. In Advances in Neural Information Processing Systems.
Marzouk, Moselhy, Parno, et al. 2016. In Handbook of Uncertainty Quantification.
Massaroli, Poli, Bin, et al. 2020. arXiv:2003.08063 [Cs, Math, Stat].
Mohamed, Rosca, Figurnov, et al. 2020. Journal of Machine Learning Research.
Ng, and Zammit-Mangion. 2020. arXiv:2007.00248 [Stat].
Papamakarios. 2019.
Papamakarios, Murray, and Pavlakou. 2017. In Advances in Neural Information Processing Systems 30.
Papamakarios, Nalisnick, Rezende, et al. 2021. Journal of Machine Learning Research.
Perekrestenko, Eberhard, and Bölcskei. 2021. Partial Differential Equations and Applications.
Perekrestenko, Müller, and Bölcskei. 2020.
Pfau, and Rezende. 2020. “Integrable Nonparametric Flows.” In.
Ran, and Hu. 2017. Neural Computation.
Rezende, and Mohamed. 2015. In International Conference on Machine Learning. ICML’15.
Rezende, Mohamed, and Wierstra. 2015. In Proceedings of ICML.
Rippel, and Adams. 2013. arXiv:1302.5125 [Cs, Stat].
Ruiz, Titsias, and Blei. 2016. In Advances In Neural Information Processing Systems.
Shi, Siddharth, Paige, et al. 2019. arXiv:1911.03393 [Cs, Stat].
Spantini, Baptista, and Marzouk. 2022. SIAM Review.
Spantini, Bigoni, and Marzouk. 2017. Journal of Machine Learning Research.
Tabak, E. G., and Turner. 2013. Communications on Pure and Applied Mathematics.
Tabak, Esteban G., and Vanden-Eijnden. 2010. Communications in Mathematical Sciences.
van den Berg, Hasenclever, Tomczak, et al. 2018. In UAI18.
Wang, and Wang. 2019. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
Wehenkel, and Louppe. 2021. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics.
Xu, Zuheng, Chen, and Campbell. 2023.
Xu, Ming, Quiroz, Kohn, et al. 2018. arXiv:1809.10330 [Cs, Stat].
Yang, Li, and Wang. 2021. arXiv:2101.12353 [Cs, Math, Stat].
Zahm, Constantine, Prieur, et al. 2018. arXiv:1801.07922 [Math].
Zhang, and Curtis. 2021. Journal of Geophysical Research: Solid Earth.