Bayesian sparsity


What if you like the flavours of both Bayesian inference and the implicit model selection of sparse inference? Can you cook Bayesian-Frequentist fusion cuisine with this novelty ingredient?

Laplace Prior

Laplace priors on linear regression coefficients, includes normal lasso as a MAP estimate.

Pro: It is easy to derive frequentist LASSO as a MAP estimate from this prior.

Con: Not actually sparse for non-MAP uses.

I have no need for this right now, but I did I might start with Dan Simpson’s critique.

Spike-and-slab prior

🏗

Horseshoe prior

Stan guy, Michael Betancourt introduces some issues with LASSO-type inference for Bayesians with a slant towards Horseshoe-type priors in preference spike and slab, possibly because hierarchical mixtures like spike-and-slab are not that great in Stan, albeit possible.

Bondell, Howard D., and Brian J. Reich. 2012. “Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions.” Journal of the American Statistical Association 107 (500): 1610–24. https://doi.org/10.1080/01621459.2012.716344.

Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” The Annals of Applied Statistics 9 (1): 247–74. https://doi.org/10.1214/14-AOAS788.

Carvalho, Carlos M., Nicholas G. Polson, and James G. Scott. 2009. “Handling Sparsity via the Horseshoe.” In Artificial Intelligence and Statistics, 73–80. http://proceedings.mlr.press/v5/carvalho09a.html.

———. 2010. “The Horseshoe Estimator for Sparse Signals.” Biometrika 97 (2): 465–80. https://doi.org/10.1093/biomet/asq017.

George, Edward I., and Robert McCulloch. 1997. “Approaches for Bayesian Variable Selection.” Statistica Sinica 7 (2): 339–73. http://www.rob-mcculloch.org/some_papers_and_talks/papers/published/fastN96.pdf.

Ishwaran, Hemant, and J. Sunil Rao. 2005. “Spike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics 33 (2): 730–73. https://doi.org/10.1214/009053604000001147.

Madigan, David, and Adrian E. Raftery. 1994. “Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam’s Window.” Journal of the American Statistical Association 89 (428): 1535–46. https://doi.org/10.1080/01621459.1994.10476894.

Mitchell, T. J., and J. J. Beauchamp. 1988. “Bayesian Variable Selection in Linear Regression.” Journal of the American Statistical Association 83 (404): 1023–32. https://doi.org/10.1080/01621459.1988.10478694.

Piironen, Juho, and Aki Vehtari. 2017. “Sparsity Information and Regularization in the Horseshoe and Other Shrinkage Priors.” Electronic Journal of Statistics 11 (2): 5018–51. https://doi.org/10.1214/17-EJS1337SI.

Ročková, Veronika. 2018. “Bayesian Estimation of Sparse Signals with a Continuous Spike-and-Slab Prior.” The Annals of Statistics 46 (1): 401–37. https://doi.org/10.1214/17-AOS1554.

Ročková, Veronika, and Edward I. George. 2018. “The Spike-and-Slab LASSO.” Journal of the American Statistical Association 113 (521): 431–44. https://doi.org/10.1080/01621459.2016.1260469.

Scott, Steven L., and Hal R. Varian. 2013. “Predicting the Present with Bayesian Structural Time Series.” SSRN Scholarly Paper ID 2304426. Rochester, NY: Social Science Research Network. https://papers.ssrn.com/abstract=2304426.

Smith, Michael, and Robert Kohn. 1996. “Nonparametric Regression Using Bayesian Variable Selection.” Journal of Econometrics 75 (2): 317–43. https://doi.org/10.1016/0304-4076(95)01763-1.

Titsias, Michalis K., and Miguel Lázaro-Gredilla. 2011. “Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning.” In Advances in Neural Information Processing Systems 24, edited by J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, 2339–47. Curran Associates, Inc. http://papers.nips.cc/paper/4305-spike-and-slab-variational-inference-for-multi-task-and-multiple-kernel-learning.