Markov Chain Monte Carlo methods



This chain pump is not a good metaphor for how a Markov chain Monte Carlo sampler works, but it does correctly evoke the effort involved.

Despite studying within this area, I have nothing to say about MCMC broadly, but I do have some things I wish to keep notes on.

Connection to variational inference

Deep. See (Salimans, Kingma, and Welling 2015)

Adaptive MCMC

See Adaptive MCMC.

Langevin Monte carlo

See log concave distributions for now.

Tempering

e.g. Ge, Lee, and Risteski (2020); Syed et al. (2020). Saif Syed can explain this quite well. Or, as Lee and Risteski explain:

The main idea is to create a meta-Markov chain (the simulated tempering chain) which has two types of moves: change the current “temperature” of the sample, or move “within” a temperature. The main intuition behind this is that at higher temperatures, the distribution is flatter, so the chain explores the landscape faster (see the figure below).

Mixing rates

See ergodic theorems and mixing.

Debiasing via coupling

Pierre E. Jacob, John O’Leary, Yves Atchadé, crediting Glynn and Rhee (2014), made MCMC estimators without finite-time-bias, which is nice for parallelisation (Jacob, O’Leary, and Atchadé 2017).

References

Andrieu, Christophe, and Johannes Thoms. 2008. “A Tutorial on Adaptive MCMC.” Statistics and Computing 18 (4): 343–73. https://doi.org/10.1007/s11222-008-9110-y.
Atchadé, Yves, Gersende Fort, Eric Moulines, and Pierre Priouret. 2011. “Adaptive Markov Chain Monte Carlo: Theory and Methods.” In Bayesian Time Series Models, edited by David Barber, A. Taylan Cemgil, and Silvia Chiappa, 32–51. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511984679.003.
Bach, Francis. 2015. “On the Equivalence Between Kernel Quadrature Rules and Random Feature Expansions.” arXiv Preprint arXiv:1502.06800. http://arxiv.org/abs/1502.06800.
Bales, Ben, Arya Pourzanjani, Aki Vehtari, and Linda Petzold. 2019. “Selecting the Metric in Hamiltonian Monte Carlo.” arXiv:1905.11916 [stat], May. http://arxiv.org/abs/1905.11916.
Betancourt, Michael. 2017. “A Conceptual Introduction to Hamiltonian Monte Carlo.” arXiv:1701.02434 [stat], January. http://arxiv.org/abs/1701.02434.
———. 2018. “The Convergence of Markov Chain Monte Carlo Methods: From the Metropolis Method to Hamiltonian Monte Carlo.” Annalen Der Physik, March. https://doi.org/10.1002/andp.201700214.
———. 2021. “A Short Review of Ergodicity and Convergence of Markov Chain Monte Carlo Estimators.” arXiv:2110.07032 [math, Stat], October. http://arxiv.org/abs/2110.07032.
Betancourt, Michael, Simon Byrne, Sam Livingstone, and Mark Girolami. 2017. “The geometric foundations of Hamiltonian Monte Carlo.” Bernoulli 23 (4A): 2257–98. https://doi.org/10.3150/16-BEJ810.
Bousquet, Olivier, Ulrike von Luxburg, and Gunnar Rtsch. 2004. Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, T Bingen, Germany, August 4-16, 2003, Revised Lectures. Springer.
Brosse, Nicolas, Alain Durmus, and Eric Moulines. n.d. “The Promises and Pitfalls of Stochastic Gradient Langevin Dynamics,” 11.
Calderhead, Ben. 2014. “A General Construction for Parallelizing Metropolis−Hastings Algorithms.” Proceedings of the National Academy of Sciences 111 (49): 17408–13. https://doi.org/10.1073/pnas.1408184111.
Carpenter, Bob, Matthew D. Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. 2015. “The Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” arXiv Preprint arXiv:1509.07164. http://arxiv.org/abs/1509.07164.
Caterini, Anthony L., Arnaud Doucet, and Dino Sejdinovic. 2018. “Hamiltonian Variational Auto-Encoder.” In Advances in Neural Information Processing Systems. http://arxiv.org/abs/1805.11328.
Chakraborty, Saptarshi, Suman K. Bhattacharya, and Kshitij Khare. 2019. “Estimating Accuracy of the MCMC Variance Estimator: A Central Limit Theorem for Batch Means Estimators.” arXiv:1911.00915 [math, Stat], November. http://arxiv.org/abs/1911.00915.
Cornish, Robert, Paul Vanetti, Alexandre Bouchard-Côté, George Deligiannidis, and Arnaud Doucet. 2019. “Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets.” arXiv:1901.09881 [cs, Stat], January. http://arxiv.org/abs/1901.09881.
Cotter, S. L., G. O. Roberts, A. M. Stuart, and D. White. 2013. “MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster.” Statistical Science 28 (3): 424–46. https://doi.org/10.1214/13-STS421.
Diaconis, Persi, and David Freedman. 1999. “Iterated Random Functions.” SIAM Review 1 (1): 45–76. https://doi.org/10.1137/S0036144598338446.
Durmus, Alain, and Eric Moulines. 2016. “High-Dimensional Bayesian Inference via the Unadjusted Langevin Algorithm.” arXiv:1605.01559 [math, Stat], May. http://arxiv.org/abs/1605.01559.
Ge, Rong, Holden Lee, and Andrej Risteski. 2020. “Simulated Tempering Langevin Monte Carlo II: An Improved Proof Using Soft Markov Chain Decomposition.” arXiv:1812.00793 [cs, Math, Stat], September. http://arxiv.org/abs/1812.00793.
Girolami, Mark, and Ben Calderhead. 2011. “Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (2): 123–214. https://doi.org/10.1111/j.1467-9868.2010.00765.x.
Glynn, Peter W., and Chang-Han Rhee. 2014. “Exact Estimation for Markov Chain Equilibrium Expectations.” Journal of Applied Probability 51 (A): 377–89. https://doi.org/10.1239/jap/1417528487.
Goodman, Noah, Vikash Mansinghka, Daniel Roy, Keith Bonawitz, and Daniel Tarlow. 2012. “Church: A Language for Generative Models.” arXiv:1206.3255, June. http://arxiv.org/abs/1206.3255.
Goodrich, Ben, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Bob Carpenter, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan : A Probabilistic Programming Language.” Journal of Statistical Software 76 (1). https://doi.org/10.18637/jss.v076.i01.
Hodgkinson, Liam, Robert Salomone, and Fred Roosta. 2019. “Implicit Langevin Algorithms for Sampling From Log-Concave Densities.” arXiv:1903.12322 [cs, Stat], March. http://arxiv.org/abs/1903.12322.
Jacob, Pierre E., John O’Leary, and Yves F. Atchadé. 2017. “Unbiased Markov Chain Monte Carlo with Couplings.” arXiv:1708.03625 [stat], August. http://arxiv.org/abs/1708.03625.
———. 2019. “Unbiased Markov Chain Monte Carlo with Couplings.” arXiv:1708.03625 [stat], July. http://arxiv.org/abs/1708.03625.
Korattikara, Anoop, Yutian Chen, and Max Welling. 2015. “Sequential Tests for Large-Scale Learning.” Neural Computation 28 (1): 45–70. https://doi.org/10.1162/NECO_a_00796.
Lele, S. R., B. Dennis, and F. Lutscher. 2007. “Data Cloning: Easy Maximum Likelihood Estimation for Complex Ecological Models Using Bayesian Markov Chain Monte Carlo Methods.” Ecology Letters 10 (7): 551. https://doi.org/10.1111/j.1461-0248.2007.01047.x.
Lele, Subhash R., Khurram Nadeem, and Byron Schmuland. 2010. “Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning.” Journal of the American Statistical Association 105 (492): 1617–25. https://doi.org/10.1198/jasa.2010.tm09757.
Liu, Jun S. 1996. “Metropolized Independent Sampling with Comparisons to Rejection Sampling and Importance Sampling.” Statistics and Computing 6 (2): 113–19. https://doi.org/10.1007/BF00162521.
Ma, Yi-An, Tianqi Chen, and Emily B. Fox. 2015. “A Complete Recipe for Stochastic Gradient MCMC.” In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, 2917–25. NIPS’15. Cambridge, MA, USA: MIT Press. http://arxiv.org/abs/1506.04696.
Mangoubi, Oren, and Aaron Smith. 2017. “Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions.” arXiv:1708.07114 [math, Stat], August. http://arxiv.org/abs/1708.07114.
Neal, Radford M. 1993. “Probabilistic Inference Using Markov Chain Monte Carlo Methods.” Technical Report CRGTR-93-1. Toronto Canada: Department of Computer Science, University of Toronto,. https://www.cs.princeton.edu/courses/archive/fall07/cos597C/readings/Neal1993.pdf.
———. 2004. “Improving Asymptotic Variance of MCMC Estimators: Non-Reversible Chains Are Better.” arXiv:math/0407281, July. http://arxiv.org/abs/math/0407281.
———. 2011. “MCMC Using Hamiltonian Dynamics.” In Handbook for Markov Chain Monte Carlo, edited by Steve Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng. Boca Raton: Taylor & Francis. http://arxiv.org/abs/1206.1901.
Nitanda, Atsushi, Denny Wu, and Taiji Suzuki. 2020. “Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis.” arXiv:2012.15477 [cs, Stat], December. http://arxiv.org/abs/2012.15477.
Norton, Richard A., and Colin Fox. 2016. “Tuning of MCMC with Langevin, Hamiltonian, and Other Stochastic Autoregressive Proposals.” arXiv:1610.00781 [math, Stat], October. http://arxiv.org/abs/1610.00781.
Propp, James Gary, and David Bruce Wilson. 1996. “Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics.” In Random Structures & Algorithms, 9:223–52. New York, NY, USA: John Wiley & Sons, Inc. https://doi.org/10.1002/(SICI)1098-2418(199608/09)9:1/2<223::AID-RSA14>3.0.CO;2-O.
———. 1998. “Coupling from the Past: A User’s Guide.” In Microsurveys in Discrete Probability, edited by David Aldous and James Gary Propp, 41:181–92. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Providence, Rhode Island: American Mathematical Society. https://doi.org/10.1090/dimacs/041.
Robert, Christian P., Víctor Elvira, Nick Tawn, and Changye Wu. 2018. “Accelerating MCMC Algorithms.” WIREs Computational Statistics 10 (5): e1435. https://doi.org/10.1002/wics.1435.
Roberts, G.O., and A.F.M. Smith. 1994. “Simple Conditions for the Convergence of the Gibbs Sampler and Metropolis-Hastings Algorithms.” Stochastic Processes and Their Applications 49 (2): 207–16. https://doi.org/10.1016/0304-4149(94)90134-1.
Roberts, Gareth O., and Jeffrey S. Rosenthal. 2004. “General State Space Markov Chains and MCMC Algorithms.” Probability Surveys 1 (0): 20–71. https://doi.org/10.1214/154957804100000024.
Rubinstein, Reuven Y., and Dirk P. Kroese. 2016. Simulation and the Monte Carlo Method. 3 edition. Wiley series in probability and statistics. Hoboken, New Jersey: Wiley.
Rubinstein, Reuven Y., Ad Ridder, and Radislav Vaisman. 2014. Fast Sequential Monte Carlo Methods for Counting and Optimization. Wiley Series in Probability and Statistics. Hoboken, New Jersey: Wiley.
Rubinstein, Reuven Y, and Dirk P Kroese. 2004. The Cross-Entropy Method a Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. New York, NY: Springer New York. http://dx.doi.org/10.1007/978-1-4757-4321-0.
Salimans, Tim, Diederik Kingma, and Max Welling. 2015. “Markov Chain Monte Carlo and Variational Inference: Bridging the Gap.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 1218–26. ICML’15. Lille, France: JMLR.org. http://proceedings.mlr.press/v37/salimans15.html.
Schuster, Ingmar, Heiko Strathmann, Brooks Paige, and Dino Sejdinovic. 2017. “Kernel Sequential Monte Carlo.” In ECML-PKDD 2017. http://arxiv.org/abs/1510.03105.
Sisson, S. A., Y. Fan, and Mark M. Tanaka. 2007. “Sequential Monte Carlo Without Likelihoods.” Proceedings of the National Academy of Sciences 104 (6): 1760–65. https://doi.org/10.1073/pnas.0607208104.
Syed, Saifuddin, Alexandre Bouchard-Côté, George Deligiannidis, and Arnaud Doucet. 2020. “Non-Reversible Parallel Tempering: A Scalable Highly Parallel MCMC Scheme.” arXiv:1905.02939 [stat], November. http://arxiv.org/abs/1905.02939.
Welling, Max, and Yee Whye Teh. n.d. “Bayesian Learning via Stochastic Gradient Langevin Dynamics,” 8.
Xifara, T., C. Sherlock, S. Livingstone, S. Byrne, and M. Girolami. 2014. “Langevin Diffusions and the Metropolis-Adjusted Langevin Algorithm.” Statistics & Probability Letters 91 (Supplement C): 14–19. https://doi.org/10.1016/j.spl.2014.04.002.
Xu, Kai, Hong Ge, Will Tebbutt, Mohamed Tarek, Martin Trapp, and Zoubin Ghahramani. 2019. “AdvancedHMC.jl: A Robust, Modular and Efficient Implementation of Advanced HMC Algorithms,” October. https://openreview.net/forum?id=rJgzckn4tH.
Yoshida, Ryo, and Mike West. 2010. “Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.” Journal of Machine Learning Research 11 (May): 1771–98. http://www.jmlr.org/papers/v11/yoshida10a.html.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.