Bayesian model selection

Frequentist model selection is not the only type, but I know less about Bayesian model selection. What is model selection in a Bayesian context? Surely you don’t ever get some models with zero posterior probability? In my intro Bayesian classes I learned that one simply keeps all the models weighted by posterior likelihood when making predictions. But sometimes we wish to get rid of some models. When does this work, and when not? Typically this seems to be done by comparing model marginal evidence.



Interesting special case: Bayesian sparsity.

Cross-validation and Bayes

There is a relation between cross-validation and Bayes evidence, a.k.a. marginal likelihood - see (Claeskens and Hjort 2008; Fong and Holmes 2019).

Evidence/marginal likelihood/type II maximum likelihood

See model selection by model evidence maximisation.


John Mount on applied variable selection

We have also always felt a bit exposed in this, as feature selection seems unjustified in standard explanations of regression. One feels that if a coefficient were meant to be zero, the fitting procedure would have set it to zero. Under this misapprehension, stepping in and removing some variables feels unjustified.

Regardless of intuition or feelings, it is a fair question: is variable selection a natural justifiable part of modeling? Or is it something that is already done (therefore redundant). Or is it something that is not done for important reasons (such as avoiding damaging bias)?

In this note we will show that feature selection is in fact an obvious justified step when using a sufficiently sophisticated model of regression. This note is long, as it defines so many tiny elementary steps. However this note ends with a big point: variable selection is justified. It naturally appears in the right variation of Bayesian Regression. You should select variables, using your preferred methodology. And you shouldn’t feel bad about selecting variables.


Bhadra, Anindya, Jyotishka Datta, Nicholas G. Polson, and Brandon Willard. 2016. β€œDefault Bayesian Analysis with Global-Local Shrinkage Priors.” Biometrika 103 (4): 955–69.
Bondell, Howard D., and Brian J. Reich. 2012. β€œConsistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions.” Journal of the American Statistical Association 107 (500): 1610–24.
BΓΌrkner, Paul-Christian, Jonah Gabry, and Aki Vehtari. 2020. β€œApproximate Leave-Future-Out Cross-Validation for Bayesian Time Series Models.” Journal of Statistical Computation and Simulation 90 (14): 2499–2523.
Carvalho, Carlos M., Nicholas G. Polson, and James G. Scott. 2010. β€œThe Horseshoe Estimator for Sparse Signals.” Biometrika 97 (2): 465–80.
Chipman, Hugh, Edward I. George, Robert E. McCulloch, and P Lahiri. 2001. β€œThe Practical Implementation of Bayesian Model Selection.” In Model Selection. Vol. 38. IMS Lecture Notes - Monograph Series. Beachwood, OH: Institute of Mathematical Statistics.
Claeskens, Gerda, and Nils Lid Hjort. 2008. Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge ; New York: Cambridge University Press.
Efron, Bradley. 2012. β€œBayesian Inference and the Parametric Bootstrap.” The Annals of Applied Statistics 6 (4): 1971–97.
Filippone, Maurizio, and Raphael Engler. 2015. β€œEnabling Scalable Stochastic Gradient-Based Inference for Gaussian Processes by Employing the Unbiased LInear System SolvEr (ULISSE).” In Proceedings of the 32nd International Conference on Machine Learning, 1015–24. PMLR.
Fong, Edwin, and Chris Holmes. 2019. β€œOn the Marginal Likelihood and Cross-Validation.” arXiv:1905.08737 [Stat], May.
Gelman, Andrew, and Donald B. Rubin. 1995. β€œAvoiding Model Selection in Bayesian Social Research.” Sociological Methodology 25: 165–73.
George, Edward I., and Robert McCulloch. 1997. β€œApproaches for bayesian variable selection.” Statistica Sinica 7 (2): 339–73.
Hirsh, Seth M., David A. Barajas-Solano, and J. Nathan Kutz. 2022. β€œSparsifying Priors for Bayesian Uncertainty Quantification in Model Discovery.” Royal Society Open Science 9 (2): 211823.
Ishwaran, Hemant, and J. Sunil Rao. 2005. β€œSpike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics 33 (2): 730–73.
Kadane, Joseph B., and Nicole A. Lazar. 2004. β€œMethods and Criteria for Model Selection.” Journal of the American Statistical Association 99 (465): 279–90.
Laud, Purushottam W., and Joseph G. Ibrahim. 1995. β€œPredictive Model Selection.” Journal of the Royal Statistical Society. Series B (Methodological) 57 (1): 247–62.
Li, Meng, and David B. Dunson. 2016. β€œA Framework for Probabilistic Inferences from Imperfect Models.” arXiv:1611.01241 [Stat], November.
Linden, Sander van der, and Breanne Chryst. 2017. β€œNo Need for Bayes Factors: A Fully Bayesian Evidence Synthesis.” Frontiers in Applied Mathematics and Statistics 3.
Lorch, Lars, Jonas Rothfuss, Bernhard SchΓΆlkopf, and Andreas Krause. 2021. β€œDiBS: Differentiable Bayesian Structure Learning.” In.
Mackay, David J. C. 1995. β€œProbable Networks and Plausible Predictions β€” a Review of Practical Bayesian Methods for Supervised Neural Networks.” Network: Computation in Neural Systems 6 (3): 469–505.
MacKay, David JC. 1999. β€œComparison of Approximate Methods for Handling Hyperparameters.” Neural Computation 11 (5): 1035–68.
Madigan, David, and Adrian E. Raftery. 1994. β€œModel Selection and Accounting for Model Uncertainty in Graphical Models Using Occam’s Window.” Journal of the American Statistical Association 89 (428): 1535–46.
Navarro, Danielle J. 2019. β€œBetween the Devil and the Deep Blue Sea: Tensions Between Scientific Judgement and Statistical Model Selection.” Computational Brain & Behavior 2 (1): 28–34.
Ohn, Ilsang, and Yongdai Kim. 2021. β€œPosterior Consistency of Factor Dimensionality in High-Dimensional Sparse Factor Models.” Bayesian Analysis, January.
Ohn, Ilsang, and Lizhen Lin. 2021. β€œAdaptive Variational Bayes: Optimality, Computation and Applications.” arXiv:2109.03204 [Math, Stat], September.
Ormerod, John T., Michael Stewart, Weichang Yu, and Sarah E. Romanes. 2017. β€œBayesian Hypothesis Tests with Diffuse Priors: Can We Have Our Cake and Eat It Too?” arXiv:1710.09146 [Math, Stat], October.
Piironen, Juho, and Aki Vehtari. 2017. β€œComparison of Bayesian Predictive Methods for Model Selection.” Statistics and Computing 27 (3): 711–35.
Polson, Nicholas G., and James G. Scott. 2012. β€œLocal Shrinkage Rules, LΓ©vy Processes and Regularized Regression.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74 (2): 287–311.
Raftery, Adrian E. 1995. β€œBayesian Model Selection in Social Research.” Sociological Methodology 25: 111–63.
RočkovΓ‘, Veronika, and Edward I. George. 2018. β€œThe Spike-and-Slab LASSO.” Journal of the American Statistical Association 113 (521): 431–44.
Schmidt, Daniel F., and Enes Makalic. 2020. β€œLog-Scale Shrinkage Priors and Adaptive Bayesian Global-Local Shrinkage Estimation.” arXiv.
Stein, Michael L. 2008. β€œA Modeling Approach for Large Spatial Datasets.” Journal of the Korean Statistical Society 37 (1): 3–10.
Tang, Xueying, Xiaofan Xu, Malay Ghosh, and Prasenjit Ghosh. 2016. β€œBayesian Variable Selection and Estimation Based on Global-Local Shrinkage Priors.” arXiv.
Vehtari, Aki, and Janne Ojanen. 2012. β€œA Survey of Bayesian Predictive Methods for Model Assessment, Selection and Comparison.” Statistics Surveys 6: 142–228.
Wieringen, Wessel N. van. 2021. β€œLecture Notes on Ridge Regression.” arXiv:1509.09169 [Stat], May.
Xu, Zemei, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper. 2017. β€œBayesian Sparse Global-Local Shrinkage Regression for Selection of Grouped Variables.” arXiv.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.