Reparameterization methods for MC gradient estimation

Pathwise gradient estimation,



Reparameterization trick. A trick where we cleverly transform RVs to sample from tricky target distributions, and their jacobians, via a “nice” nice source distribution. Useful in e.g. variational inference, especially autoencoders, for density estimation in probabilistic deep learning. Pairs well with normalizing flows to get powerful target distributions. Storchastic credits pathwise gradients to Glasserman and Ho (1991) as perturbation analysis.

Tutorials

Suppose we want the gradient of an expectation of a smooth function \(f\): \[ \nabla_\theta \mathbb {E}_{p(z; \theta)}[f (z)]=\nabla_\theta \int p(z; \theta) f (z) d z \] […] This gradient is often difficult to compute because the integral is typically unknown and the parameters \(\theta\), with respect to which we are computing the gradient, are of the distribution \(p(z; \theta)\).

Now we suppose that we know some function \(g\) such that for some easy distribution \(p(\epsilon)\), \(z | \theta=g(\epsilon, \theta)\). Now we can try to estimate the gradient of the expectation by Monte Carlo:

\[ \nabla_\theta \mathbb {E}_{p(z; \theta)}[f (z)]=\mathbb {E}_{p (c)}\left[\nabla_\theta f(g(\epsilon, \theta))\right] \] Let’s derive this expression and explore the implications of it for our optimisation problem. One-liners give us a transformation from a distribution \(p(\epsilon)\) to another \(p (z)\), thus the differential area (mass of the distribution) is invariant under the change of variables. This property implies that: \[ p (z)=\left|\frac{d \epsilon}{d z}\right—p(\epsilon) \Longrightarrow—p (z) d z|=|p(\epsilon) d \epsilon| \] Re-expressing the troublesome stochastic optimisation problem using random variate reparameterisation, we find: \[ \begin {aligned} & \nabla_\theta \mathbb {E}_{p(z; \theta)}[f (z)]=\nabla_\theta \int p(z; \theta) f (z) d z \\ = & \nabla_\theta \int p(\epsilon) f (z) d \epsilon=\nabla_\theta \int p(\epsilon) f(g(\epsilon, \theta)) d \epsilon \\ = & \nabla_\theta \mathbb {E}_{p (c)}[f(g(\epsilon, \theta))]=\mathbb {E}_{p (e)}\left[\nabla_\theta f(g(\epsilon, \theta))\right] \end {aligned} \]

Normalizing flows

Cunning reparameterization maps with desirable properties for nonparametric inference. See normalizing flows.

General measure transport

See transport maps.

Tooling

Storchastic.

Incoming

Universal representation theorems? Probably many, here are some I saw: Perekrestenko, Müller, and Bölcskei (2020); Perekrestenko, Eberhard, and Bölcskei (2021).

References

Ambrosio, Luigi, Nicola Gigli, and Giuseppe Savare. 2008. Gradient Flows: In Metric Spaces and in the Space of Probability Measures. 2nd ed. Lectures in Mathematics. ETH Zürich. Birkhäuser Basel.
Bamler, Robert, and Stephan Mandt. 2017. Structured Black Box Variational Inference for Latent Time Series Models.” arXiv:1707.01069 [Cs, Stat], July.
Berg, Rianne van den, Leonard Hasenclever, Jakub M. Tomczak, and Max Welling. 2018. Sylvester Normalizing Flows for Variational Inference.” In UAI18.
Caterini, Anthony L., Arnaud Doucet, and Dino Sejdinovic. 2018. Hamiltonian Variational Auto-Encoder.” In Advances in Neural Information Processing Systems.
Charpentier, Bertrand, Oliver Borchert, Daniel Zügner, Simon Geisler, and Stephan Günnemann. 2022. Natural Posterior Network: Deep Bayesian Uncertainty for Exponential Family Distributions.” arXiv:2105.04471 [Cs, Stat], March.
Chen, Changyou, Chunyuan Li, Liqun Chen, Wenlin Wang, Yunchen Pu, and Lawrence Carin. 2017. Continuous-Time Flows for Efficient Inference and Density Estimation.” arXiv:1709.01179 [Stat], September.
Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. Neural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc.
Devroye, Luc. 2006. Chapter 4 Nonuniform Random Variate Generation.” In Simulation, edited by Shane G. Henderson and Barry L. Nelson, 13:83–121. Handbooks in Operations Research and Management Science. Elsevier.
Dinh, Laurent, Jascha Sohl-Dickstein, and Samy Bengio. 2016. Density Estimation Using Real NVP.” In Advances In Neural Information Processing Systems.
Figurnov, Mikhail, Shakir Mohamed, and Andriy Mnih. 2018. Implicit Reparameterization Gradients.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 441–52. Curran Associates, Inc.
Glasserman, Paul, and Yu-Chi Ho. 1991. Gradient Estimation Via Perturbation Analysis. Springer Science & Business Media.
Grathwohl, Will, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. 2018. FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models.” arXiv:1810.01367 [Cs, Stat], October.
Huang, Chin-Wei, David Krueger, Alexandre Lacoste, and Aaron Courville. 2018. Neural Autoregressive Flows.” arXiv:1804.00779 [Cs, Stat], April.
Jankowiak, Martin, and Fritz Obermeyer. 2018. Pathwise Derivatives Beyond the Reparameterization Trick.” In International Conference on Machine Learning, 2235–44.
Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. Improving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
Kingma, Diederik P., Tim Salimans, and Max Welling. 2015. Variational Dropout and the Local Reparameterization Trick.” In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, 2575–83. NIPS’15. Cambridge, MA, USA: MIT Press.
Kingma, Diederik P., and Max Welling. 2014. Auto-Encoding Variational Bayes.” In ICLR 2014 Conference.
Kingma, Durk P, and Prafulla Dhariwal. 2018. Glow: Generative Flow with Invertible 1x1 Convolutions.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 10236–45. Curran Associates, Inc.
Koehler, Frederic, Viraj Mehta, and Andrej Risteski. 2020. Representational Aspects of Depth and Conditioning in Normalizing Flows.” arXiv:2010.01155 [Cs, Stat], October.
Lin, Wu, Mohammad Emtiyaz Khan, and Mark Schmidt. 2019. Stein’s Lemma for the Reparameterization Trick with Exponential Family Mixtures.” arXiv:1910.13398 [Cs, Stat], October.
Louizos, Christos, and Max Welling. 2017. Multiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In PMLR, 2218–27.
Lu, You, and Bert Huang. 2020. Woodbury Transformations for Deep Generative Flows.” In Advances in Neural Information Processing Systems. Vol. 33.
Marzouk, Youssef, Tarek Moselhy, Matthew Parno, and Alessio Spantini. 2016. Sampling via Measure Transport: An Introduction.” In Handbook of Uncertainty Quantification, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1:1–41. Cham: Springer Heidelberg.
Massaroli, Stefano, Michael Poli, Michelangelo Bin, Jinkyoo Park, Atsushi Yamashita, and Hajime Asama. 2020. Stable Neural Flows.” arXiv:2003.08063 [Cs, Math, Stat], March.
Mohamed, Shakir, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. 2020. Monte Carlo Gradient Estimation in Machine Learning.” Journal of Machine Learning Research 21 (132): 1–62.
Ng, Tin Lok James, and Andrew Zammit-Mangion. 2020. Non-Homogeneous Poisson Process Intensity Modeling and Estimation Using Measure Transport.” arXiv:2007.00248 [Stat], July.
Papamakarios, George. 2019. Neural Density Estimation and Likelihood-Free Inference.” The University of Edinburgh.
Papamakarios, George, Iain Murray, and Theo Pavlakou. 2017. Masked Autoregressive Flow for Density Estimation.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 2338–47. Curran Associates, Inc.
Papamakarios, George, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. 2021. Normalizing Flows for Probabilistic Modeling and Inference.” Journal of Machine Learning Research 22 (57): 1–64.
Perekrestenko, Dmytro, Léandre Eberhard, and Helmut Bölcskei. 2021. High-Dimensional Distribution Generation Through Deep Neural Networks.” Partial Differential Equations and Applications 2 (5): 64.
Perekrestenko, Dmytro, Stephan Müller, and Helmut Bölcskei. 2020. Constructive Universal High-Dimensional Distribution Generation Through Deep ReLU Networks.”
Pfau, David, and Danilo Rezende. 2020. “Integrable Nonparametric Flows.” In, 7.
Ran, Zhi-Yong, and Bao-Gang Hu. 2017. Parameter Identifiability in Statistical Machine Learning: A Review.” Neural Computation 29 (5): 1151–1203.
Rezende, Danilo Jimenez, and Shakir Mohamed. 2015. Variational Inference with Normalizing Flows.” In International Conference on Machine Learning, 1530–38. ICML’15. Lille, France: JMLR.org.
Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. 2015. Stochastic Backpropagation and Approximate Inference in Deep Generative Models.” In Proceedings of ICML.
Rippel, Oren, and Ryan Prescott Adams. 2013. High-Dimensional Probability Estimation with Deep Density Models.” arXiv:1302.5125 [Cs, Stat], February.
Ruiz, Francisco J. R., Michalis K. Titsias, and David M. Blei. 2016. The Generalized Reparameterization Gradient.” In Advances In Neural Information Processing Systems.
Shi, Yuge, N. Siddharth, Brooks Paige, and Philip H. S. Torr. 2019. Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models.” arXiv:1911.03393 [Cs, Stat], November.
Spantini, Alessio, Ricardo Baptista, and Youssef Marzouk. 2022. Coupling Techniques for Nonlinear Ensemble Filtering.” SIAM Review 64 (4): 921–53.
Spantini, Alessio, Daniele Bigoni, and Youssef Marzouk. 2017. Inference via Low-Dimensional Couplings.” Journal of Machine Learning Research 19 (66): 2639–709.
Tabak, E. G., and Cristina V. Turner. 2013. A Family of Nonparametric Density Estimation Algorithms.” Communications on Pure and Applied Mathematics 66 (2): 145–64.
Tabak, Esteban G., and Eric Vanden-Eijnden. 2010. Density Estimation by Dual Ascent of the Log-Likelihood.” Communications in Mathematical Sciences 8 (1): 217–33.
Wang, Prince Zizhuang, and William Yang Wang. 2019. Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 284–94. Minneapolis, Minnesota: Association for Computational Linguistics.
Xu, Ming, Matias Quiroz, Robert Kohn, and Scott A. Sisson. 2018. Variance Reduction Properties of the Reparameterization Trick.” arXiv:1809.10330 [Cs, Stat], September.
Xu, Zuheng, Naitong Chen, and Trevor Campbell. 2023. MixFlows: Principled Variational Inference via Mixed Flows.” arXiv.
Yang, Yunfei, Zhen Li, and Yang Wang. 2021. On the Capacity of Deep Generative Networks for Approximating Distributions.” arXiv:2101.12353 [Cs, Math, Stat], January.
Zahm, Olivier, Paul Constantine, Clémentine Prieur, and Youssef Marzouk. 2018. Gradient-Based Dimension Reduction of Multivariate Vector-Valued Functions.” arXiv:1801.07922 [Math], January.
Zhang, Xin, and Andrew Curtis. 2021. Bayesian Geophysical Inversion Using Invertible Neural Networks.” Journal of Geophysical Research: Solid Earth 126 (7): e2021JB022320.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.