Markov Chain Monte Carlo methods



This chain pump is not a good metaphor for how a Markov chain Monte Carlo sampler works, but it does correctly evoke the effort involved.

Despite studying within this area, I have nothing to say about MCMC broadly, but I do have some things I wish to keep notes on.

Connection to variational inference

Deep. See (Salimans, Kingma, and Welling 2015)

Adaptive MCMC

See Adaptive MCMC.

Stochastic Gradient Monte carlo

See SGD MCMC.

Tempering

e.g. Ge, Lee, and Risteski (2020); Syed et al. (2020). Saif Syed can explain this quite well. Or, as Lee and Risteski put it:

The main idea is to create a meta-Markov chain (the simulated tempering chain) which has two types of moves: change the current β€œtemperature” of the sample, or move β€œwithin” a temperature. The main intuition behind this is that at higher temperatures, the distribution is flatter, so the chain explores the landscape faster.

Debiasing via coupling

Pierre E. Jacob, John O’Leary, Yves AtchadΓ©, crediting Glynn and Rhee (2014), made MCMC estimators without finite-time-bias, which is nice for parallelisation (Jacob, O’Leary, and AtchadΓ© 2017).

Affine invariant

J. Goodman and Weare (2010)

We propose a family of Markov chain Monte Carlo methods whose performance is unaffected by affine tranformations of space. These algorithms are easy to construct and require little or no additional computational overhead. They should be particularly useful for sampling badly scaled distributions. Computational tests show that the affine invariant methods can be significantly faster than standard MCMC methods on highly skewed distributions.

Implemented in, e.g. emcee (Foreman-Mackey et al. 2013).

Efficiency of

Want to adaptively tune the MCMC? See tuning MCMC.

References

Andrieu, Christophe, and Johannes Thoms. 2008. β€œA Tutorial on Adaptive MCMC.” Statistics and Computing 18 (4): 343–73.
AtchadΓ©, Yves, Gersende Fort, Eric Moulines, and Pierre Priouret. 2011. β€œAdaptive Markov Chain Monte Carlo: Theory and Methods.” In Bayesian Time Series Models, edited by David Barber, A. Taylan Cemgil, and Silvia Chiappa, 32–51. Cambridge: Cambridge University Press.
Bach, Francis. 2015. β€œOn the Equivalence Between Kernel Quadrature Rules and Random Feature Expansions.” arXiv Preprint arXiv:1502.06800.
Bales, Ben, Arya Pourzanjani, Aki Vehtari, and Linda Petzold. 2019. β€œSelecting the Metric in Hamiltonian Monte Carlo.” arXiv:1905.11916 [Stat], May.
Betancourt, Michael. 2017. β€œA Conceptual Introduction to Hamiltonian Monte Carlo.” arXiv:1701.02434 [Stat], January.
β€”β€”β€”. 2018. β€œThe Convergence of Markov Chain Monte Carlo Methods: From the Metropolis Method to Hamiltonian Monte Carlo.” Annalen Der Physik, March.
β€”β€”β€”. 2021. β€œA Short Review of Ergodicity and Convergence of Markov Chain Monte Carlo Estimators.” arXiv:2110.07032 [Math, Stat], October.
Betancourt, Michael, Simon Byrne, Sam Livingstone, and Mark Girolami. 2017. β€œThe Geometric Foundations of Hamiltonian Monte Carlo.” Bernoulli 23 (4A): 2257–98.
Bousquet, Olivier, Ulrike von Luxburg, and Gunnar Rtsch. 2004. Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, T Bingen, Germany, August 4-16, 2003, Revised Lectures. Springer.
Brosse, Nicolas, Γ‰ric Moulines, and Alain Durmus. 2018. β€œThe Promises and Pitfalls of Stochastic Gradient Langevin Dynamics.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 8278–88. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.
Calderhead, Ben. 2014. β€œA General Construction for Parallelizing Metropolisβˆ’Hastings Algorithms.” Proceedings of the National Academy of Sciences 111 (49): 17408–13.
Carpenter, Bob, Matthew D. Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. 2015. β€œThe Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” arXiv Preprint arXiv:1509.07164.
Caterini, Anthony L., Arnaud Doucet, and Dino Sejdinovic. 2018. β€œHamiltonian Variational Auto-Encoder.” In Advances in Neural Information Processing Systems.
Chakraborty, Saptarshi, Suman K. Bhattacharya, and Kshitij Khare. 2019. β€œEstimating Accuracy of the MCMC Variance Estimator: A Central Limit Theorem for Batch Means Estimators.” arXiv:1911.00915 [Math, Stat], November.
Cornish, Robert, Paul Vanetti, Alexandre Bouchard-CΓ΄tΓ©, George Deligiannidis, and Arnaud Doucet. 2019. β€œScalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets.” arXiv:1901.09881 [Cs, Stat], January.
Cotter, S. L., G. O. Roberts, A. M. Stuart, and D. White. 2013. β€œMCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster.” Statistical Science 28 (3): 424–46.
Dhaka, Akash Kumar, and Alejandro Catalina. 2020. β€œRobust, Accurate Stochastic Optimization for Variational Inference,” 13.
Diaconis, Persi, and David Freedman. 1999. β€œIterated Random Functions.” SIAM Review 1 (1): 45–76.
Durmus, Alain, and Eric Moulines. 2016. β€œHigh-Dimensional Bayesian Inference via the Unadjusted Langevin Algorithm.” arXiv:1605.01559 [Math, Stat], May.
Fan, Y., and S. A. Sisson. 2010. β€œReversible Jump Markov Chain Monte Carlo.” arXiv.
Foreman-Mackey, Daniel, David W. Hogg, Dustin Lang, and Jonathan Goodman. 2013. β€œEmcee: The MCMC Hammer.” Publications of the Astronomical Society of the Pacific 125 (925): 306.
Ge, Rong, Holden Lee, and Andrej Risteski. 2020. β€œSimulated Tempering Langevin Monte Carlo II: An Improved Proof Using Soft Markov Chain Decomposition.” arXiv:1812.00793 [Cs, Math, Stat], September.
Girolami, Mark, and Ben Calderhead. 2011. β€œRiemann Manifold Langevin and Hamiltonian Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (2): 123–214.
Glynn, Peter W., and Chang-Han Rhee. 2014. β€œExact Estimation for Markov Chain Equilibrium Expectations.” Journal of Applied Probability 51 (A): 377–89.
Goodman, Jonathan, and Jonathan Weare. 2010. β€œEnsemble Samplers with Affine Invariance.” Communications in Applied Mathematics and Computational Science 5 (1): 65–80.
Goodman, Noah, Vikash Mansinghka, Daniel Roy, Keith Bonawitz, and Daniel Tarlow. 2012. β€œChurch: A Language for Generative Models.” arXiv:1206.3255, June.
Goodrich, Ben, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Bob Carpenter, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. β€œStan : A Probabilistic Programming Language.” Journal of Statistical Software 76 (1).
Hodgkinson, Liam, Robert Salomone, and Fred Roosta. 2019. β€œImplicit Langevin Algorithms for Sampling From Log-Concave Densities.” arXiv:1903.12322 [Cs, Stat], March.
Huang, Zaijing, and Andrew Gelman. 2005. β€œSampling for Bayesian Computation with Large Datasets.” SSRN Electronic Journal.
Jacob, Pierre E., John O’Leary, and Yves F. AtchadΓ©. 2017. β€œUnbiased Markov Chain Monte Carlo with Couplings.” arXiv:1708.03625 [Stat], August.
β€”β€”β€”. 2019. β€œUnbiased Markov Chain Monte Carlo with Couplings.” arXiv:1708.03625 [Stat], July.
Jolicoeur-Martineau, Alexia, Ke Li, RΓ©mi PichΓ©-Taillefer, Tal Kachman, and Ioannis Mitliagkas. 2021. β€œGotta Go Fast When Generating Data with Score-Based Models.” arXiv.
Korattikara, Anoop, Yutian Chen, and Max Welling. 2015. β€œSequential Tests for Large-Scale Learning.” Neural Computation 28 (1): 45–70.
Lele, S. R., B. Dennis, and F. Lutscher. 2007. β€œData Cloning: Easy Maximum Likelihood Estimation for Complex Ecological Models Using Bayesian Markov Chain Monte Carlo Methods.” Ecology Letters 10 (7): 551.
Lele, Subhash R., Khurram Nadeem, and Byron Schmuland. 2010. β€œEstimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning.” Journal of the American Statistical Association 105 (492): 1617–25.
Liu, Jun S. 1996. β€œMetropolized Independent Sampling with Comparisons to Rejection Sampling and Importance Sampling.” Statistics and Computing 6 (2): 113–19.
Ma, Yi-An, Tianqi Chen, and Emily B. Fox. 2015. β€œA Complete Recipe for Stochastic Gradient MCMC.” In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, 2917–25. NIPS’15. Cambridge, MA, USA: MIT Press.
Mangoubi, Oren, and Aaron Smith. 2017. β€œRapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions.” arXiv:1708.07114 [Math, Stat], August.
Margossian, Charles C., Aki Vehtari, Daniel Simpson, and Raj Agrawal. 2020. β€œHamiltonian Monte Carlo Using an Adjoint-Differentiated Laplace Approximation: Bayesian Inference for Latent Gaussian Models and Beyond.” arXiv:2004.12550 [Stat], October.
Neal, Radford M. 1993. β€œProbabilistic Inference Using Markov Chain Monte Carlo Methods.” Technical Report CRGTR-93-1. Toronto Canada: Department of Computer Science, University of Toronto,.
β€”β€”β€”. 2004. β€œImproving Asymptotic Variance of MCMC Estimators: Non-Reversible Chains Are Better.” arXiv:math/0407281, July.
β€”β€”β€”. 2011. β€œMCMC Using Hamiltonian Dynamics.” In Handbook for Markov Chain Monte Carlo, edited by Steve Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng. Boca Raton: Taylor & Francis.
Nitanda, Atsushi, Denny Wu, and Taiji Suzuki. 2020. β€œParticle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis.” arXiv:2012.15477 [Cs, Stat], December.
Norton, Richard A., and Colin Fox. 2016. β€œTuning of MCMC with Langevin, Hamiltonian, and Other Stochastic Autoregressive Proposals.” arXiv:1610.00781 [Math, Stat], October.
Propp, James Gary, and David Bruce Wilson. 1996. β€œExact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics.” In Random Structures & Algorithms, 9:223–52. New York, NY, USA: John Wiley & Sons, Inc.
β€”β€”β€”. 1998. β€œCoupling from the Past: A User’s Guide.” In Microsurveys in Discrete Probability, edited by David Aldous and James Gary Propp, 41:181–92. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Providence, Rhode Island: American Mathematical Society.
Robert, Christian P., VΓ­ctor Elvira, Nick Tawn, and Changye Wu. 2018. β€œAccelerating MCMC Algorithms.” WIREs Computational Statistics 10 (5): e1435.
Roberts, Gareth O., and Jeffrey S. Rosenthal. 2004. β€œGeneral State Space Markov Chains and MCMC Algorithms.” Probability Surveys 1 (0): 20–71.
Roberts, G.O., and A.F.M. Smith. 1994. β€œSimple Conditions for the Convergence of the Gibbs Sampler and Metropolis-Hastings Algorithms.” Stochastic Processes and Their Applications 49 (2): 207–16.
Rubinstein, Reuven Y., and Dirk P. Kroese. 2016. Simulation and the Monte Carlo Method. 3 edition. Wiley series in probability and statistics. Hoboken, New Jersey: Wiley.
Rubinstein, Reuven Y, and Dirk P Kroese. 2004. The Cross-Entropy Method a Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. New York, NY: Springer New York.
Rubinstein, Reuven Y., Ad Ridder, and Radislav Vaisman. 2014. Fast Sequential Monte Carlo Methods for Counting and Optimization. Wiley Series in Probability and Statistics. Hoboken, New Jersey: Wiley.
Salimans, Tim, Diederik Kingma, and Max Welling. 2015. β€œMarkov Chain Monte Carlo and Variational Inference: Bridging the Gap.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 1218–26. ICML’15. Lille, France: JMLR.org.
Schuster, Ingmar, Heiko Strathmann, Brooks Paige, and Dino Sejdinovic. 2017. β€œKernel Sequential Monte Carlo.” In ECML-PKDD 2017.
Sisson, S. A., Y. Fan, and Mark M. Tanaka. 2007. β€œSequential Monte Carlo Without Likelihoods.” Proceedings of the National Academy of Sciences 104 (6): 1760–65.
Syed, Saifuddin, Alexandre Bouchard-CΓ΄tΓ©, George Deligiannidis, and Arnaud Doucet. 2020. β€œNon-Reversible Parallel Tempering: A Scalable Highly Parallel MCMC Scheme.” arXiv:1905.02939 [Stat], November.
Vehtari, Aki, Andrew Gelman, Tuomas Sivula, Pasi JylΓ€nki, Dustin Tran, Swupnil Sahai, Paul Blomstedt, John P. Cunningham, David Schiminovich, and Christian Robert. 2019. β€œExpectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data.” arXiv:1412.4869 [Stat], November.
Wang, Xiangyu, and David B. Dunson. 2013. β€œParallelizing MCMC via Weierstrass Sampler,” December.
Welling, Max, and Yee Whye Teh. 2011. β€œBayesian Learning via Stochastic Gradient Langevin Dynamics.” In Proceedings of the 28th International Conference on International Conference on Machine Learning, 681–88. ICML’11. Madison, WI, USA: Omnipress.
Xifara, T., C. Sherlock, S. Livingstone, S. Byrne, and M. Girolami. 2014. β€œLangevin Diffusions and the Metropolis-Adjusted Langevin Algorithm.” Statistics & Probability Letters 91 (Supplement C): 14–19.
Xu, Kai, Hong Ge, Will Tebbutt, Mohamed Tarek, Martin Trapp, and Zoubin Ghahramani. 2019. β€œAdvancedHMC.jl: A Robust, Modular and Efficient Implementation of Advanced HMC Algorithms,” October.
Yoshida, Ryo, and Mike West. 2010. β€œBayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.” Journal of Machine Learning Research 11 (May): 1771–98.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.