Bayesian sparsity

2019-01-08 — 2024-11-19

Bayes

high d

information

model selection

regression

sparser than thou

statistics

Suspiciously similar content

What if I like the flavours of both Bayesian inference and the implicit model selection of sparse inference? Can I cook Bayesian-Frequentist fusion cuisine with this novelty ingredient? It turns out that yes, we can add a variety of imitation sparsity flavours to Bayes model selection. The resulting methods are handy in, for example, symbolic system identification.

1 Laplace Prior

Laplace priors on linear regression coefficients include normal lasso as a MAP estimate.

Pro: It is easy to derive frequentist LASSO as a MAP estimate from this prior.

Con: Full posterior is not sparse, only the MAP estimate.

I have no need for this right now, but if I did, I might start with Dan Simpson’s critique.

2 Spike-and-slab prior

Probabilistic equivalent of a LASSO-type prior, via a hierarchical mixture, i.e. an inclusion probability for each coefficient.

Pro: Full posterior can be sparse in some sense.

Con: Mixtures of discrete and continuous variables like this are fiddly to deal with in MCMC, and just in general.

🏗

3 Horseshoe prior

Stan guy, Michael Betancourt introduces some issues with LASSO-type inference for Bayesians with a slant towards Horseshoe-type priors in preference to spike-and-slab, possibly because hierarchical mixtures like spike-and-slab are not that easy in Stan, albeit possible.

4 Thompson sampling for large model spaces

This looks cool: Liu and Ročková (2023).

5 Transdimensional inference

See transdimensional.

6 Global-local shrinkage hierarchy

Not quite sure what this is, but here are some papers: Bhadra et al. (2016);Polson and Scott (2012);Schmidt and Makalic (2020);Xu et al. (2017).

7 As Singular Learning

Sparse bayes problems are clearly connecting to singular learning theory approaches to Bayes. TBD

8 Incoming

Sparsity Blues

9 References

Babacan, Luessi, Molina, et al. 2012. “Sparse Bayesian Methods for Low-Rank Matrix Estimation.” IEEE Transactions on Signal Processing.

Bhadra, Datta, Polson, et al. 2016. “Default Bayesian Analysis with Global-Local Shrinkage Priors.” Biometrika.

Bhadra, Datta, Polson, et al. 2019. “Lasso Meets Horseshoe : A Survey.”

Bondell, and Reich. 2012. “Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions.” Journal of the American Statistical Association.

Brodersen, Gallusser, Koehler, et al. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” The Annals of Applied Statistics.

Carvalho, Polson, and Scott. 2009. “Handling Sparsity via the Horseshoe.” In Artificial Intelligence and Statistics.

———. 2010. “The Horseshoe Estimator for Sparse Signals.” Biometrika.

Castillo, Schmidt-Hieber, and van der Vaart. 2015. “Bayesian Linear Regression with Sparse Priors.” The Annals of Statistics.

Champneys, and Rogers. 2024. “BINDy – Bayesian Identification of Nonlinear Dynamics with Reversible-Jump Markov-Chain Monte-Carlo.”

George, and McCulloch. 1997. “Approaches for bayesian variable selection.” Statistica Sinica.

Herzet, and Drémeau. 2014. “Bayesian Pursuit Algorithms.”

Hirsh, Barajas-Solano, and Kutz. 2022. “Sparsifying Priors for Bayesian Uncertainty Quantification in Model Discovery.” Royal Society Open Science.

Ishwaran, and Rao. 2005. “Spike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics.

Liu, and Ročková. 2023. “Variable Selection Via Thompson Sampling.” Journal of the American Statistical Association.

Madigan, and Raftery. 1994. “Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam’s Window.” Journal of the American Statistical Association.

Mitchell, and Beauchamp. 1988. “Bayesian Variable Selection in Linear Regression.” Journal of the American Statistical Association.

Piironen, and Vehtari. 2017. “Sparsity Information and Regularization in the Horseshoe and Other Shrinkage Priors.” Electronic Journal of Statistics.

Polson, and Scott. 2012. “Local Shrinkage Rules, Lévy Processes and Regularized Regression.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Ročková. 2018. “Bayesian Estimation of Sparse Signals with a Continuous Spike-and-Slab Prior.” The Annals of Statistics.

Ročková, and George. 2018. “The Spike-and-Slab LASSO.” Journal of the American Statistical Association.

Schmidt, and Makalic. 2020. “Log-Scale Shrinkage Priors and Adaptive Bayesian Global-Local Shrinkage Estimation.”

Schniter, Potter, and Ziniel. 2008. “Fast Bayesian Matching Pursuit.” In 2008 Information Theory and Applications Workshop.

Scott, and Varian. 2013. “Predicting the Present with Bayesian Structural Time Series.” SSRN Scholarly Paper ID 2304426.

Seeger, Steinke, and Tsuda. 2007. “Bayesian Inference and Optimal Design in the Sparse Linear Model.” In Artificial Intelligence and Statistics.

Smith, and Kohn. 1996. “Nonparametric Regression Using Bayesian Variable Selection.” Journal of Econometrics.

Tang, Xu, Ghosh, et al. 2016. “Bayesian Variable Selection and Estimation Based on Global-Local Shrinkage Priors.”

Titsias, and Lázaro-Gredilla. 2011. “Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning.” In Advances in Neural Information Processing Systems 24.

Wang, Sarkar, Carbonetto, et al. 2020. “A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping.” Journal of the Royal Statistical Society Series B: Statistical Methodology.

Xu, Schmidt, Makalic, et al. 2017. “Bayesian Sparse Global-Local Shrinkage Regression for Selection of Grouped Variables.”

Zhou, Chen, Paisley, et al. 2009. “Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations.” In Proceedings of the 22nd International Conference on Neural Information Processing Systems. NIPS’09.