Choosing which of an ensemble of models to use, or, which is the same things more or less, the number of predictors, or the regularisation. This is a kind of complement to statistical learning theory where you hope to quantify how complicated a model you should bother fitting to a given amount of data.

If your predictors are discrete and small in number, you can do this in the
traditional fashion,
by *stepwise model selection*, and
you might discuss the degrees of freedom
of the model and the data.
If you are in the luxurious position of having a small tractable number of
parameters and the ability to perform controlled trials, then you do
ANOVA.

When you have penalisation parameters, we sometimes phrase this as
regularisation
and talk about *regularisation parameter selection*,
or hyperparameter selection, which we can do in various ways.
Methods for this include
degrees-of-freedom penalties,
cross-validation etc.
However, Iβm not yet sure how to make that work in
sparse regression.

Multiple testing is model selection writ large, where you can considering many hypothesis tests, possible effectively infinitely many hypothesis tests, or you have a combinatorial explosion of possible predictors to include.

π document connection with graphical models and thus conditional independence tests.

## Bayesian

Bayesian model selection is also a thing, although the framing is a little different. In the classic Bayesian methodI keep all my models about although some might become very unlikely. But apparently I can also throw some out entirely? Presumably for reasons of computational tractability or what-have-you.

## Consistency

If the model order *itself* is the parameter of interest, how do you do
consistent inference on that? AIC, for example, is derived for optimising prediction loss, not model selection.
(Doesnβt BIC do better?)

An exhausting, exhaustive review of various model selection procedures with an eye to consistency, is given in C. R. Rao and Wu (2001).

## Cross validation

See cross validation.

## For mixture models

See mixture models.

## Under sparsity

## Hyperparameter search

How do you choose your hyperparameters? NB hyperparameters might not always be
about model selection *per se*; there are also ones that are about, e.g.
convergence rate. Anyway. Also one could well regard hyperparameters as normal
parameters.

Turns out you can cast this as a bandit problem, or a sequential Bayesian optimisation problem.

## For time series

## References

*arXiv:1611.05162 [Cs, Stat]*, November.

*Bernoulli*.

*Statistical models based on counting processes*. Corr. 2. print. Springer series in statistics. New York, NY: Springer.

*Journal of Econometrics*47 (2): 359β77.

*The Annals of Statistics*13 (4): 1286β316.

*The Annals of Statistics*43 (5): 2055β85.

*The Annals of Applied Statistics*3 (1): 179β98.

*Test*15 (2): 271β344.

*arXiv:0808.1416 [Math, Stat]*, August.

*Probability Theory and Related Fields*138 (1-2): 33β73.

*arXiv:1507.03652 [Math, Stat]*, July.

*Automatic Autocorrelation and Spectral Analysis*. Secaucus, NJ, USA: Springer.

*Computational Statistics & Data Analysis*31 (3): 295β310.

*Biometrika*82 (4): 877β86.

*Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach*. 2nd ed. New York: Springer.

*Annual Review of Economics*9 (1): 411β39.

*arXiv Preprint arXiv:1610.02351*.

*Journal of Fourier Analysis and Applications*14 (5-6): 877β905.

*Journal of Machine Learning Research*11 (July): 2079β2107.

*Journal of Time Series Analysis*, January, n/aβ.

*arXiv:1608.00060 [Econ, Stat]*, July.

*arXiv:1812.08089 [Math, Stat]*, December.

*arXiv:1809.05224 [Econ, Math, Stat]*, September.

*Model Selection and Model Averaging*. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge ; New York: Cambridge University Press.

*Proceedings of the National Academy of Sciences*114 (32): 8592β95.

*arXiv Preprint arXiv:1602.03589*.

*IEEE Signal Processing Magazine*35 (6): 16β34.

*Journal of the American Statistical Association*81 (394): 461β70.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*35 (11): 2765β81.

*Journal of the American Statistical Association*96 (456): 1348β60.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*70 (5): 849β911.

*The Annals of Statistics*10 (2): 401β14.

*Journal of Machine Learning Research*3 (Mar): 1157β82.

*International Journal of Systems Science*39 (10): 925β46.

*The Annals of Statistics*33 (2): 730β73.

*arXiv:1502.07943 [Cs, Stat]*, February.

*Biometrika*102 (2): 479β85.

*Trends in Ecology & Evolution*19 (2): 101β8.

*Machine Learning and Knowledge Discovery in Databases*, edited by JosΓ© Luis BalcΓ‘zar, Francesco Bonchi, Aristides Gionis, and MichΓ¨le Sebag, 66β81. Lecture Notes in Computer Science. Springer Berlin Heidelberg.

*Information Criteria and Statistical Modeling*. Springer Series in Statistics. New York: Springer.

*Biometrika*83 (4): 875β90.

*The Annals of Statistics*15 (3): 958β75.

*arXiv:1603.06560 [Cs, Stat]*, March.

*Advances in Neural Information Processing Systems*. Vol. 30. Curran Associates, Inc.

*Econometric Theory*9 (03): 478β93.

*Concentration Inequalities and Model Selection: Ecole dβEtΓ© de ProbabilitΓ©s de Saint-Flour XXXIII - 2003*. Lecture Notes in Mathematics 1896. Berlin ; New York: Springer-Verlag.

*The Annals of Statistics*37 (1): 246β70.

*arXiv:1409.4317 [Math, Stat]*, September.

*Journal of Statistical Planning and Inference*75 (1): 91β116.

*Institute of Mathematical Statistics Lecture Notes - Monograph Series*, 38:1β57. Beachwood, OH: Institute of Mathematical Statistics.

*Biometrika*76 (2): 369β74.

*Journal of the American Statistical Association*113 (521): 431β44.

*Data Segmentation and Model Selection for Computer Vision*, edited by Alireza Bab-Hadiashar and David Suter, 31β40. Springer New York.

*International Statistical Review / Revue Internationale de Statistique*54 (2): 221β26.

*Journal of the American Statistical Association*91 (434): 655β65.

*Journal of the American Statistical Association*101 (474): 554β68.

*Technometrics*46 (3): 306β17.

*Journal of the American Statistical Association*97 (457): 210β21.

*From Data to Model*, edited by Professor Jan C. Willems, 215β40. Springer Berlin Heidelberg.

*The Annals of Statistics*9 (6): 1135β51.

*Journal of the Royal Statistical Society. Series B (Methodological)*39 (1): 44β47.

*Suri-Kagaku (Mathematical Sciences)*153 (1): 12β18.

*arXiv:1401.3889 [Stat]*, January.

*Statistics*47 (1): 216β35.

*arXiv:1506.06266 [Math, Stat]*, June.

*The Annals of Statistics*40 (2): 1198β1232.

*Statistical Methods in Medical Research*21 (1): 7β30.

*The Annals of Statistics*13 (4): 1378β1402.

*The Annals of Statistics*37 (6A): 3468β97.

*Journal of Machine Learning Research*7 (Nov): 2541β63.

*The Annals of Statistics*36 (4): 1509β33.

## No comments yet. Why not leave one?