Model/hyperparameter selection

Choosing which of an ensemble of models to use, or, which is the same things more or less, the number of predictors, or the regularisation. This is a kind of complement to statistical learning theory where you hope to quantify how complicated a model you should bother fitting to a given amount of data.

If your predictors are discrete and small in number, you can do this in the traditional fashion, by stepwise model selection, and you might discuss the degrees of freedom of the model and the data. If you are in the luxurious position of having a small tractable number of parameters and the ability to perform controlled trials, then you do ANOVA.

When you have penalisation parameters, we sometimes phrase this as regularisation and talk about regularisation parameter selection, or hyperparameter selection, which we can do in various ways. Methods for this include degrees-of-freedom penalties, cross-validation etc. However, I’m not yet sure how to make that work in sparse regression.

Multiple testing is model selection writ large, where you can considering many hypothesis tests, possible effectively infinitely many hypothesis tests, or you have a combinatorial explosion of possible predictors to include.

πŸ— document connection with graphical models and thus conditional independence tests.


Bayesian model selection is also a thing, although the framing is a little different. In the classic Bayesian methodI keep all my models about although some might become very unlikely. But apparently I can also throw some out entirely? Presumably for reasons of computational tractability or what-have-you.


If the model order itself is the parameter of interest, how do you do consistent inference on that? AIC, for example, is derived for optimising prediction loss, not model selection. (Doesn’t BIC do better?)

An exhausting, exhaustive review of various model selection procedures with an eye to consistency, is given in C. R. Rao and Wu (2001).

Cross validation

See cross validation.

For mixture models

See mixture models.

Under sparsity

See sparse model selection.

For time series

See model secltion in time series.


Aghasi, Alireza, Nam Nguyen, and Justin Romberg. 2016. β€œNet-Trim: A Layer-Wise Convex Pruning of Deep Neural Networks.” arXiv:1611.05162 [Cs, Stat], November.
Alquier, Pierre, and Olivier Wintenberger. 2012. β€œModel Selection for Weakly Dependent Time Series Forecasting.” Bernoulli.
Andersen, Per Kragh, Ornulf Borgan, Richard D. Gill, and Niels Keiding. 1997. Statistical models based on counting processes. Corr. 2. print. Springer series in statistics. New York, NY: Springer.
Andrews, Donald W. K. 1991. β€œAsymptotic Optimality of Generalized CL, Cross-Validation, and Generalized Cross-Validation in Regression with Heteroskedastic Errors.” Journal of Econometrics 47 (2): 359–77.
Ansley, Craig F., and Robert Kohn. 1985. β€œEstimation, Filtering, and Smoothing in State Space Models with Incompletely Specified Initial Conditions.” The Annals of Statistics 13 (4): 1286–316.
Barber, Rina Foygel, and Emmanuel J. CandΓ¨s. 2015. β€œControlling the False Discovery Rate via Knockoffs.” The Annals of Statistics 43 (5): 2055–85.
Benjamini, Yoav, and Yulia Gavrilov. 2009. β€œA Simple Forward Selection Procedure Based on False Discovery Rate Control.” The Annals of Applied Statistics 3 (1): 179–98.
Bickel, Peter J., Bo Li, Alexandre B. Tsybakov, Sara A. van de Geer, Bin Yu, TeΓ³filo ValdΓ©s, Carlos Rivero, Jianqing Fan, and Aad van der Vaart. 2006. β€œRegularization in Statistics.” Test 15 (2): 271–344.
BirgΓ©, Lucien. 2008. β€œModel Selection for Density Estimation with L2-Loss.” arXiv:0808.1416 [Math, Stat], August.
BirgΓ©, Lucien, and Pascal Massart. 2006. β€œMinimal Penalties for Gaussian Model Selection.” Probability Theory and Related Fields 138 (1-2): 33–73.
Bloniarz, Adam, Hanzhong Liu, Cun-Hui Zhang, Jasjeet Sekhon, and Bin Yu. 2015. β€œLasso Adjustments of Treatment Effect Estimates in Randomized Experiments.” arXiv:1507.03652 [Math, Stat], July.
Broersen, Petrus MT. 2006. Automatic Autocorrelation and Spectral Analysis. Secaucus, NJ, USA: Springer.
BΓΌhlmann, Peter, and Hans R KΓΌnsch. 1999. β€œBlock Length Selection in the Bootstrap for Time Series.” Computational Statistics & Data Analysis 31 (3): 295–310.
Burman, P., and D. Nolan. 1995. β€œA General Akaike-Type Criterion for Model Selection in Robust Regression.” Biometrika 82 (4): 877–86.
Burnham, Kenneth P., and David Raymond Anderson. 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed. New York: Springer.
Cai, T. Tony, and Wenguang Sun. 2017. β€œLarge-Scale Global and Simultaneous Inference: Estimation and Testing in Very High Dimensions.” Annual Review of Economics 9 (1): 411–39.
CandΓ¨s, Emmanuel J., Yingying Fan, Lucas Janson, and Jinchi Lv. 2016. β€œPanning for Gold: Model-Free Knockoffs for High-Dimensional Controlled Variable Selection.” arXiv Preprint arXiv:1610.02351.
CandΓ¨s, Emmanuel J., Michael B. Wakin, and Stephen P. Boyd. 2008. β€œEnhancing Sparsity by Reweighted β„“ 1 Minimization.” Journal of Fourier Analysis and Applications 14 (5-6): 877–905.
Cawley, Gavin C., and Nicola L. C. Talbot. 2010. β€œOn Over-Fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation.” Journal of Machine Learning Research 11 (July): 2079βˆ’2107.
Chan, Ngai Hang, Ye Lu, and Chun Yip Yau. 2016. β€œFactor Modelling for High-Dimensional Time Series: Inference and Model Selection.” Journal of Time Series Analysis, January, n/a–.
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2016. β€œDouble/Debiased Machine Learning for Treatment and Causal Parameters.” arXiv:1608.00060 [Econ, Stat], July.
Chernozhukov, Victor, Christian Hansen, Yuan Liao, and Yinchu Zhu. 2018. β€œInference For Heterogeneous Effects Using Low-Rank Estimations.” arXiv:1812.08089 [Math, Stat], December.
Chernozhukov, Victor, Whitney K. Newey, and Rahul Singh. 2018. β€œLearning L2 Continuous Regression Functionals via Regularized Riesz Representers.” arXiv:1809.05224 [Econ, Math, Stat], September.
Claeskens, Gerda, and Nils Lid Hjort. 2008. Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge ; New York: Cambridge University Press.
Cox, D. R., and H. S. Battey. 2017. β€œLarge Numbers of Explanatory Variables, a Semi-Descriptive Analysis.” Proceedings of the National Academy of Sciences 114 (32): 8592–95.
Dai, Ran, and Rina Foygel Barber. 2016. β€œThe Knockoff Filter for FDR Control in Group-Sparse and Multitask Regression.” arXiv Preprint arXiv:1602.03589.
Ding, J., V. Tarokh, and Y. Yang. 2018. β€œModel Selection Techniques: An Overview.” IEEE Signal Processing Magazine 35 (6): 16–34.
Efron, Bradley. 1986. β€œHow Biased Is the Apparent Error Rate of a Prediction Rule?” Journal of the American Statistical Association 81 (394): 461–70.
Elhamifar, E., and R. Vidal. 2013. β€œSparse Subspace Clustering: Algorithm, Theory, and Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (11): 2765–81.
Fan, Jianqing, and Runze Li. 2001. β€œVariable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96 (456): 1348–60.
Fan, Jianqing, and Jinchi Lv. 2008. β€œSure Independence Screening for Ultrahigh Dimensional Feature Space.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70 (5): 849–911.
Geman, Stuart, and Chii-Ruey Hwang. 1982. β€œNonparametric Maximum Likelihood Estimation by the Method of Sieves.” The Annals of Statistics 10 (2): 401–14.
Guyon, Isabelle, and AndrΓ© Elisseeff. 2003. β€œAn Introduction to Variable and Feature Selection.” Journal of Machine Learning Research 3 (Mar): 1157–82.
Hong, X., R. J. Mitchell, S. Chen, C. J. Harris, K. Li, and G. W. Irwin. 2008. β€œModel Selection Approaches for Non-Linear System Identification: A Review.” International Journal of Systems Science 39 (10): 925–46.
Ishwaran, Hemant, and J. Sunil Rao. 2005. β€œSpike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics 33 (2): 730–73.
Jamieson, Kevin, and Ameet Talwalkar. 2015. β€œNon-Stochastic Best Arm Identification and Hyperparameter Optimization.” arXiv:1502.07943 [Cs, Stat], February.
Janson, Lucas, William Fithian, and Trevor J. Hastie. 2015. β€œEffective Degrees of Freedom: A Flawed Metaphor.” Biometrika 102 (2): 479–85.
Johnson, Jerald B., and Kristian S. Omland. 2004. β€œModel Selection in Ecology and Evolution.” Trends in Ecology & Evolution 19 (2): 101–8.
Kloft, Marius, Ulrich RΓΌckert, and Peter L. Bartlett. 2010. β€œA Unifying View of Multiple Kernel Learning.” In Machine Learning and Knowledge Discovery in Databases, edited by JosΓ© Luis BalcΓ‘zar, Francesco Bonchi, Aristides Gionis, and MichΓ¨le Sebag, 66–81. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Konishi, Sadanori, and G. Kitagawa. 2008. Information Criteria and Statistical Modeling. Springer Series in Statistics. New York: Springer.
Konishi, Sadanori, and Genshiro Kitagawa. 1996. β€œGeneralised Information Criteria in Model Selection.” Biometrika 83 (4): 875–90.
Li, Ker-Chau. 1987. β€œAsymptotic Optimality for \(C_p, C_L\), Cross-Validation and Generalized Cross-Validation: Discrete Index Set.” The Annals of Statistics 15 (3): 958–75.
Li, Lisha, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2016. β€œEfficient Hyperparameter Optimization and Infinitely Many Armed Bandits.” arXiv:1603.06560 [Cs, Stat], March.
Lundberg, Scott M, and Su-In Lee. 2017. β€œA Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc.
Machado, JosΓ© A.F. 1993. β€œRobust Model Selection and M-Estimation.” Econometric Theory 9 (03): 478–93.
Massart, Pascal. 2007. Concentration Inequalities and Model Selection: Ecole d’EtΓ© de ProbabilitΓ©s de Saint-Flour XXXIII - 2003. Lecture Notes in Mathematics 1896. Berlin ; New York: Springer-Verlag.
Meinshausen, Nicolai, and Bin Yu. 2009. β€œLasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics 37 (1): 246–70.
Navarro, Danielle J. 2019. β€œBetween the Devil and the Deep Blue Sea: Tensions Between Scientific Judgement and Statistical Model Selection.” Computational Brain & Behavior 2 (1): 28–34.
Paparoditis, Efstathios, and Theofanis Sapatinas. 2014. β€œBootstrap-Based Testing for Functional Data.” arXiv:1409.4317 [Math, Stat], September.
Qian, Guoqi. 1996. β€œOn Model Selection in Robust Linear Regression.”
Qian, Guoqi, and R. K. Hans. 1996. β€œSome Notes on Rissanen’s Stochastic Complexity.”
Qian, Guoqi, and Hans R. KΓΌnsch. 1998. β€œOn Model Selection via Stochastic Complexity in Robust Linear Regression.” Journal of Statistical Planning and Inference 75 (1): 91–116.
Rao, C. R., and Y. Wu. 2001. β€œOn Model Selection.” In Institute of Mathematical Statistics Lecture Notes - Monograph Series, 38:1–57. Beachwood, OH: Institute of Mathematical Statistics.
Rao, Radhakrishna, and Yuehua Wu. 1989. β€œA Strongly Consistent Procedure for Model Selection in a Regression Problem.” Biometrika 76 (2): 369–74.
RočkovΓ‘, Veronika, and Edward I. George. 2018. β€œThe Spike-and-Slab LASSO.” Journal of the American Statistical Association 113 (521): 431–44.
Ronchetti, E. 2000. β€œRobust Regression Methods and Model Selection.” In Data Segmentation and Model Selection for Computer Vision, edited by Alireza Bab-Hadiashar and David Suter, 31–40. Springer New York.
Royall, Richard M. 1986. β€œModel Robust Confidence Intervals Using Maximum Likelihood Estimators.” International Statistical Review / Revue Internationale de Statistique 54 (2): 221–26.
Shao, Jun. 1996. β€œBootstrap Model Selection.” Journal of the American Statistical Association 91 (434): 655–65.
Shen, Xiaotong, and Hsin-Cheng Huang. 2006. β€œOptimal Model Assessment, Selection, and Combination.” Journal of the American Statistical Association 101 (474): 554–68.
Shen, Xiaotong, Hsin-Cheng Huang, and Jimmy Ye. 2004. β€œAdaptive Model Selection and Assessment for Exponential Family Distributions.” Technometrics 46 (3): 306–17.
Shen, Xiaotong, and Jianming Ye. 2002. β€œAdaptive Model Selection.” Journal of the American Statistical Association 97 (457): 210–21.
Shibata, Ritei. 1989. β€œStatistical Aspects of Model Selection.” In From Data to Model, edited by Professor Jan C. Willems, 215–40. Springer Berlin Heidelberg.
Stein, Charles M. 1981. β€œEstimation of the Mean of a Multivariate Normal Distribution.” The Annals of Statistics 9 (6): 1135–51.
Stone, M. 1977. β€œAn Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion.” Journal of the Royal Statistical Society. Series B (Methodological) 39 (1): 44–47.
Takeuchi, Kei. 1976. β€œDistribution of informational statistics and a criterion of model fitting.” Suri-Kagaku (Mathematical Sciences) 153 (1): 12–18.
Taylor, Jonathan, Richard Lockhart, Ryan J. Tibshirani, and Robert Tibshirani. 2014. β€œExact Post-Selection Inference for Forward Stepwise and Least Angle Regression.” arXiv:1401.3889 [Stat], January.
Tharmaratnam, Kukatharmini, and Gerda Claeskens. 2013. β€œA Comparison of Robust Versions of the AIC Based on M-, S- and MM-Estimators.” Statistics 47 (1): 216–35.
Tibshirani, Ryan J., Alessandro Rinaldo, Robert Tibshirani, and Larry Wasserman. 2015. β€œUniform Asymptotic Inference and the Bootstrap After Model Selection.” arXiv:1506.06266 [Math, Stat], June.
Tibshirani, Ryan J., and Jonathan Taylor. 2012. β€œDegrees of Freedom in Lasso Problems.” The Annals of Statistics 40 (2): 1198–1232.
Vansteelandt, Stijn, Maarten Bekaert, and Gerda Claeskens. 2012. β€œOn Model Selection and Model Misspecification in Causal Inference.” Statistical Methods in Medical Research 21 (1): 7–30.
Wahba, Grace. 1985. β€œA Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem.” The Annals of Statistics 13 (4): 1378–1402.
Zhao, Peng, Guilherme Rocha, and Bin Yu. 2006. β€œGrouped and Hierarchical Model Selection Through Composite Absolute Penalties.”
β€”β€”β€”. 2009. β€œThe Composite Absolute Penalties Family for Grouped and Hierarchical Variable Selection.” The Annals of Statistics 37 (6A): 3468–97.
Zhao, Peng, and Bin Yu. 2006. β€œOn Model Selection Consistency of Lasso.” Journal of Machine Learning Research 7 (Nov): 2541–63.
Zou, Hui, and Runze Li. 2008. β€œOne-Step Sparse Estimates in Nonconcave Penalized Likelihood Models.” The Annals of Statistics 36 (4): 1509–33.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.