Bayesian sparsity
January 8, 2019 — November 19, 2024
What if I like the flavours of both Bayesian inference and the implicit model selection of sparse inference? Can I cook Bayesian-Frequentist fusion cuisine with this novelty ingredient? It turns out that yes, we can add a variety of imitation sparsity flavours to Bayes model selection. The resulting methods are handy in, for example, symbolic system identification.
1 Laplace Prior
Laplace priors on linear regression coefficients include normal lasso as a MAP estimate.
Pro: It is easy to derive frequentist LASSO as a MAP estimate from this prior.
Con: Full posterior is not sparse, only the MAP estimate.
I have no need for this right now, but if I did, I might start with Dan Simpson’s critique.
2 Spike-and-slab prior
Probabilistic equivalent of a LASSO-type prior, via a hierarchical mixture, i.e. an inclusion probability for each coefficient.
Pro: Full posterior can be sparse in some sense.
Con: Mixtures of discrete and continuous variables like this are fiddly to deal with in MCMC, and just in general.
🏗
3 Horseshoe prior
Stan guy, Michael Betancourt introduces some issues with LASSO-type inference for Bayesians with a slant towards Horseshoe-type priors in preference to spike-and-slab, possibly because hierarchical mixtures like spike-and-slab are not that easy in Stan, albeit possible.
4 Thompson sampling for large model spaces
This looks cool: Liu and Ročková (2023).
5 Transdimensional inference
See transdimensional.
6 Global-local shrinkage hierarchy
Not quite sure what this is, but here are some papers: Bhadra et al. (2016);Polson and Scott (2012);Schmidt and Makalic (2020);Xu et al. (2017).