Hamiltonian and Langevin Monte Carlo

Physics might be on to something



Hamiltonians, energy conservation in sampling. Handy. Summary would be nice.

Note salad from a Betancourt seminar

Michael Betancourt’s heuristic explanation of Hamiltonian Monte Carlo: sets of high mass, no good - we need the “typical set”, a set whose product of differential volume and density is high. Motivates Markov Chain Monte Carlo on this basis, a way of exploring typical set given points already in it, or getting closer to the typical set if starting without. How to get a central limit theorem? “Geometric” ergodicity results. Hamiltonian Monte Carlo is a procedure for generating measure-preserving floes over phase space

\[H(q,p)=-\log(\pi(p|q)\pi(q))\] So my probability density gradient influences the particle momentum. And we can use symplectic integrators to walk through trajectories (if I knew more numerical quadrature I might know more about the benefits of this) in between random momentum perturbations. Some more stuff about resampling trajectories to de-bias numerical error, which is the NUTS extension to HMC.

Discontinuous likelihood

The solution is MOAR PHYSICS; we can construct hamiltonians which sample based on reflection/refraction dynamics in the augmented state space; see Afshar and Domke (2015); Nishimura, Dunson, and Lu (2020).

Incoming

Manifold Monte Carlo.

Understanding NUTS and HMC | George Ho

In terms of reading code, I'd recommend looking through Colin Carroll's minimc for a minimal working example of NUTS in Python, written for pedagogy rather than actual sampling. For a "real world" implementation of NUTS/HMC, I'd recommend looking through my littlemcmc for a standalone version of PyMC3's NUTS/HMC samplers.

References

Afshar, Hadi Mohasel, and Justin Domke. 2015. “Reflection, Refraction, and Hamiltonian Monte Carlo,” 9.
Bales, Ben, Arya Pourzanjani, Aki Vehtari, and Linda Petzold. 2019. Selecting the Metric in Hamiltonian Monte Carlo.” arXiv:1905.11916 [Stat], May.
Betancourt, Michael. 2017. A Conceptual Introduction to Hamiltonian Monte Carlo.” arXiv:1701.02434 [Stat], January.
———. 2018. The Convergence of Markov Chain Monte Carlo Methods: From the Metropolis Method to Hamiltonian Monte Carlo.” Annalen Der Physik, March.
Betancourt, Michael, Simon Byrne, Sam Livingstone, and Mark Girolami. 2017. The Geometric Foundations of Hamiltonian Monte Carlo.” Bernoulli 23 (4A): 2257–98.
Carpenter, Bob, Matthew D. Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. 2015. The Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” arXiv Preprint arXiv:1509.07164.
Caterini, Anthony L., Arnaud Doucet, and Dino Sejdinovic. 2018. Hamiltonian Variational Auto-Encoder.” In Advances in Neural Information Processing Systems.
Dai, Hanjun, Rishabh Singh, Bo Dai, Charles Sutton, and Dale Schuurmans. 2020. “Learning Discrete Energy-Based Models via Auxiliary-Variable Local Exploration,” 13.
Devlin, Lee, Paul Horridge, Peter L Green, and Simon Maskell. 2021. “The No-U-Turn Sampler as a Proposal Distribution in a Sequential Monte Carlo Sampler with a Near-Optimal L-Kernel,” 5.
Durmus, Alain, and Eric Moulines. 2016. High-Dimensional Bayesian Inference via the Unadjusted Langevin Algorithm.” arXiv:1605.01559 [Math, Stat], May.
Girolami, Mark, and Ben Calderhead. 2011. Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (2): 123–214.
Goodrich, Ben, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Bob Carpenter, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan : A Probabilistic Programming Language.” Journal of Statistical Software 76 (1).
Hoffman, M D, and A Gelman. 2011. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Arxiv Preprint arXiv:1111.4246.
Liu, Meng, Haoran Liu, and Shuiwang Ji. 2021. Gradient-Guided Importance Sampling for Learning Discrete Energy-Based Models,” November.
Ma, Yi-An, Tianqi Chen, and Emily B. Fox. 2015. A Complete Recipe for Stochastic Gradient MCMC.” In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, 2917–25. NIPS’15. Cambridge, MA, USA: MIT Press.
Mangoubi, Oren, and Aaron Smith. 2017. Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions.” arXiv:1708.07114 [Math, Stat], August.
Margossian, Charles C., Aki Vehtari, Daniel Simpson, and Raj Agrawal. 2020. Hamiltonian Monte Carlo Using an Adjoint-Differentiated Laplace Approximation: Bayesian Inference for Latent Gaussian Models and Beyond.” arXiv:2004.12550 [Stat], October.
Meent, Jan-Willem van de, Brooks Paige, Hongseok Yang, and Frank Wood. 2021. An Introduction to Probabilistic Programming.” arXiv:1809.10756 [Cs, Stat], October.
Mototake, Yoh-ichi. 2019. “Conservation Law Estimation by Extracting the Symmetry of a Dynamical System Using a Deep Neural Network.” In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS), 8.
Neal, Radford M. 2011. MCMC Using Hamiltonian Dynamics.” In Handbook for Markov Chain Monte Carlo, edited by Steve Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng. Boca Raton: Taylor & Francis.
Nishimura, Akihiko, David B Dunson, and Jianfeng Lu. 2020. Discontinuous Hamiltonian Monte Carlo for Discrete Parameters and Discontinuous Likelihoods.” Biometrika 107 (2): 365–80.
Norton, Richard A., and Colin Fox. 2016. Tuning of MCMC with Langevin, Hamiltonian, and Other Stochastic Autoregressive Proposals.” arXiv:1610.00781 [Math, Stat], October.
Robert, Christian P., Víctor Elvira, Nick Tawn, and Changye Wu. 2018. Accelerating MCMC Algorithms.” WIREs Computational Statistics 10 (5): e1435.
Sansone, Emanuele. 2022. LSB: Local Self-Balancing MCMC in Discrete Spaces.” In Proceedings of the 39th International Conference on Machine Learning, 19205–20. PMLR.
Strathmann, Heiko, Dino Sejdinovic, Samuel Livingstone, Zoltan Szabo, and Arthur Gretton. 2015. Gradient-Free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families.” In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, 955–63. NIPS’15. Montreal, Canada: MIT Press.
Xifara, T., C. Sherlock, S. Livingstone, S. Byrne, and M. Girolami. 2014. Langevin Diffusions and the Metropolis-Adjusted Langevin Algorithm.” Statistics & Probability Letters 91 (Supplement C): 14–19.
Xu, Kai, Hong Ge, Will Tebbutt, Mohamed Tarek, Martin Trapp, and Zoubin Ghahramani. 2019. AdvancedHMC.jl: A Robust, Modular and Efficient Implementation of Advanced HMC Algorithms,” October.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.