Tuning an MCMC sampler



The process of adapting to the target optimally

Designing MCMC transition density, possibly via the proposal density in rejection sampling, by optimisation for optimal mixing.

The simplest way to do this is to do a β€œpilot” run to estimate optimal mixing kernels then use the adapted mixing kernels, discarding the suspect samples from the pilot run as suspect. This wastes some effort but is theoretically simple. Alternatively you could do this dynamically, online, which is called Adaptive MCMC. There are then some theoretical wrinkles.

I do wish to maximise the mixing rate by some criterion. If I already know my mixing rate is bad without optimising, which is why I am optimising, how do I get the simulations against which to conduct the optimisation? How do we optimise simultaneously for maximising mixing rate and minimising rejection rate? Fearnhead and Taylor (2013) summarise some options here for an objective function. One that seems sufficient for publication of typical MCMC papers is Expected Squared Jump Distance, ESJD (which is more precisely an expected squared Mahalanobis distance) between samples, which minimises the lag-1 autocorrelation, which is in practice most of what we do.

Proposal density

Designing the proposal density is often easy for an independent rejection sampler. That is precisely the cross-entropy method. For Markov chain, though the success criterion is muddier. AFAICT the cross entropy trick does not apply for non-i.i.d. samples.

Transition density

πŸ—

Adaptive SMC

In Sequential Monte Carlo, which is not MCMC, we do not need to be so sensitive to changing the proposal parameters, since there is no stationary distribution argument. See Fearnhead and Taylor (2013).

Variational inference

What is Hamiltonian Variational Inference? Does that fit under this heading? πŸ— (Caterini, Doucet, and Sejdinovic 2018; Salimans, Kingma, and Welling 2015)

References

Caterini, Anthony L., Arnaud Doucet, and Dino Sejdinovic. 2018. β€œHamiltonian Variational Auto-Encoder.” In Advances in Neural Information Processing Systems.
Fearnhead, Paul, and Benjamin M. Taylor. 2013. β€œAn Adaptive Sequential Monte Carlo Sampler.” Bayesian Analysis 8 (2): 411–38.
Mathew, B, A M Bauer, P Koistinen, T C Reetz, J LΓ©on, and M J SillanpÀÀ. 2012. β€œBayesian Adaptive Markov Chain Monte Carlo Estimation of Genetic Parameters.” Heredity 109 (4): 235–45.
Norton, Richard A., and Colin Fox. 2016. β€œTuning of MCMC with Langevin, Hamiltonian, and Other Stochastic Autoregressive Proposals.” arXiv:1610.00781 [Math, Stat], October.
Roberts, Gareth O., and Jeffrey S. Rosenthal. 2009. β€œExamples of Adaptive MCMC.” Journal of Computational and Graphical Statistics 18 (2): 349–67.
β€”β€”β€”. 2014. β€œMinimising MCMC Variance via Diffusion Limits, with an Application to Simulated Tempering.” Annals of Applied Probability 24 (1): 131–49.
Salimans, Tim, Diederik Kingma, and Max Welling. 2015. β€œMarkov Chain Monte Carlo and Variational Inference: Bridging the Gap.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 1218–26. ICML’15. Lille, France: JMLR.org.
Schuster, Ingmar, Heiko Strathmann, Brooks Paige, and Dino Sejdinovic. 2017. β€œKernel Sequential Monte Carlo.” In ECML-PKDD 2017.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.