Michael Betancourt’s heuristic explanation of Hamiltonian Monte Carlo: sets of high mass, no good — we need the “typical set,” a set whose product of differential volume and density is high. Motivates Markov Chain Monte Carlo on this basis, a way of exploring the typical set given points already in it, or getting closer to the typical set if starting without. How to get a central limit theorem? “Geometric” ergodicity results. Hamiltonian Monte Carlo is a procedure for generating measure-preserving flows over phase space

$H (q, p) = - \log (π (p | q) π (q))$ So my probability density gradient influences the particle momentum. And we can use symplectic integrators to walk through trajectories (if I knew more numerical quadrature I might know more about the benefits of this) in between random momentum perturbations. Some more stuff about resampling trajectories to de-bias numerical error, which is the NUTS extension to HMC.

2 Discontinuous likelihood

The solution is MOAR PHYSICS; we can construct Hamiltonians which sample based on reflection/refraction dynamics in the augmented state space; see Afshar and Domke (2015);Nishimura, Dunson, and Lu (2020).

3 Incoming

Manifold Monte Carlo.

George Ho, Understanding NUTS and HMC

In terms of reading code, I’d recommend looking through Colin Carroll’s minimc for a minimal working example of NUTS in Python, written for pedagogy rather than actual sampling. For a “real world” implementation of NUTS/HMC, I’d recommend looking through my littlemcmc for a standalone version of PyMC3’s NUTS/HMC samplers.

4 References

Afshar, and Domke. 2015. “Reﬂection, Refraction, and Hamiltonian Monte Carlo.”

Bales, Pourzanjani, Vehtari, et al. 2019. “Selecting the Metric in Hamiltonian Monte Carlo.” arXiv:1905.11916 [Stat].

Betancourt. 2017. “A Conceptual Introduction to Hamiltonian Monte Carlo.” arXiv:1701.02434 [Stat].

———. 2018. “The Convergence of Markov Chain Monte Carlo Methods: From the Metropolis Method to Hamiltonian Monte Carlo.” Annalen Der Physik.

Betancourt, Byrne, Livingstone, et al. 2017. “The Geometric Foundations of Hamiltonian Monte Carlo.” Bernoulli.

Carpenter, Hoffman, Brubaker, et al. 2015. “The Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” arXiv Preprint arXiv:1509.07164.

Caterini, Doucet, and Sejdinovic. 2018. “Hamiltonian Variational Auto-Encoder.” In Advances in Neural Information Processing Systems.

Dai, Singh, Dai, et al. 2020. “Learning Discrete Energy-Based Models via Auxiliary-Variable Local Exploration.”

Devlin, Horridge, Green, et al. 2021. “The No-U-Turn Sampler as a Proposal Distribution in a Sequential Monte Carlo Sampler with a Near-Optimal L-Kernel.”

Durmus, and Moulines. 2016. “High-Dimensional Bayesian Inference via the Unadjusted Langevin Algorithm.” arXiv:1605.01559 [Math, Stat].

Girolami, and Calderhead. 2011. “Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Goodrich, Gelman, Hoffman, et al. 2017. “Stan : A Probabilistic Programming Language.” Journal of Statistical Software.

Hoffman, and Gelman. 2011. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Arxiv Preprint arXiv:1111.4246.

Liu, Liu, and Ji. 2021. “Gradient-Guided Importance Sampling for Learning Discrete Energy-Based Models.”

Ma, Chen, and Fox. 2015. “A Complete Recipe for Stochastic Gradient MCMC.” In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. NIPS’15.

Mangoubi, and Smith. 2017. “Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions.” arXiv:1708.07114 [Math, Stat].

Margossian, Vehtari, Simpson, et al. 2020. “Hamiltonian Monte Carlo Using an Adjoint-Differentiated Laplace Approximation: Bayesian Inference for Latent Gaussian Models and Beyond.” arXiv:2004.12550 [Stat].

Mototake. 2019. “Conservation Law Estimation by Extracting the Symmetry of a Dynamical System Using a Deep Neural Network.” In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS).

Neal. 2011. “MCMC Using Hamiltonian Dynamics.” In Handbook for Markov Chain Monte Carlo.

Nishimura, Dunson, and Lu. 2020. “Discontinuous Hamiltonian Monte Carlo for Discrete Parameters and Discontinuous Likelihoods.” Biometrika.

Norton, and Fox. 2016. “Tuning of MCMC with Langevin, Hamiltonian, and Other Stochastic Autoregressive Proposals.” arXiv:1610.00781 [Math, Stat].

Robert, Elvira, Tawn, et al. 2018. “Accelerating MCMC Algorithms.” WIREs Computational Statistics.

Sansone. 2022. “LSB: Local Self-Balancing MCMC in Discrete Spaces.” In Proceedings of the 39th International Conference on Machine Learning.

Strathmann, Sejdinovic, Livingstone, et al. 2015. “Gradient-Free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families.” In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1. NIPS’15.

van de Meent, Paige, Yang, et al. 2021. “An Introduction to Probabilistic Programming.” arXiv:1809.10756 [Cs, Stat].

Xifara, Sherlock, Livingstone, et al. 2014. “Langevin Diffusions and the Metropolis-Adjusted Langevin Algorithm.” Statistics & Probability Letters.

Xu, Ge, Tebbutt, et al. 2019. “AdvancedHMC.jl: A Robust, Modular and Efficient Implementation of Advanced HMC Algorithms.”