# Model/hyperparameter selection

April 15, 2016 — August 20, 2017

Choosing which of an ensemble of models to use, or, which is the same things more or less, the number of predictors, or the regularisation. This is a kind of complement to statistical learning theory where you hope to quantify how complicated a model you should bother fitting to a given amount of data.

If your predictors are discrete and small in number, you can do this in the traditional fashion, by *stepwise model selection*, and you might discuss the degrees of freedom of the model and the data. If you are in the luxurious position of having a small tractable number of parameters and the ability to perform controlled trials, then you do ANOVA.

When you have penalisation parameters, we sometimes phrase this as regularisation and talk about *regularisation parameter selection*, or hyperparameter selection, which we can do in various ways. Methods for this include degrees-of-freedom penalties, cross-validation etc. However, I’m not yet sure how to make that work in sparse regression.

Multiple testing is model selection writ large, where you can considering many hypothesis tests, possible effectively infinitely many hypothesis tests, or you have a combinatorial explosion of possible predictors to include.

🏗 document connection with graphical models and thus conditional independence tests.

## 1 Bayesian

Bayesian model selection is also a thing, although the framing is a little different. In the classic Bayesian methodI keep all my models about although some might become very unlikely. But apparently I can also throw some out entirely? Presumably for reasons of computational tractability or what-have-you.

## 2 Consistency

If the model order *itself* is the parameter of interest, how do you do consistent inference on that? AIC, for example, is derived for optimising prediction loss, not model selection. (Doesn’t BIC do better?)

An exhausting, exhaustive review of various model selection procedures with an eye to consistency, is given in C. R. Rao and Wu (2001).

## 3 Cross validation

See cross validation.

## 4 For mixture models

See mixture models.

## 5 Under sparsity

## 6 Hyperparameter search

How do you choose your hyperparameters? NB hyperparameters might not always be about model selection *per se*; there are also ones that are about, e.g. convergence rate. Anyway. Also one could well regard hyperparameters as normal parameters.

Turns out you can cast this as a bandit problem, or a sequential Bayesian optimisation problem.

## 7 For time series

## 8 References

*arXiv:1611.05162 [Cs, Stat]*.

*Bernoulli*.

*Statistical models based on counting processes*. Springer series in statistics.

*Journal of Econometrics*.

*The Annals of Statistics*.

*The Annals of Statistics*.

*The Annals of Applied Statistics*.

*Test*.

*arXiv:0808.1416 [Math, Stat]*.

*Probability Theory and Related Fields*.

*arXiv:1507.03652 [Math, Stat]*.

*Automatic Autocorrelation and Spectral Analysis*.

*Computational Statistics & Data Analysis*.

*Biometrika*.

*Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach*.

*Annual Review of Economics*.

*arXiv Preprint arXiv:1610.02351*.

*Journal of Fourier Analysis and Applications*.

*Journal of Machine Learning Research*.

*Journal of Time Series Analysis*.

*The Econometrics Journal*.

*arXiv:1812.08089 [Math, Stat]*.

*arXiv:1809.05224 [Econ, Math, Stat]*.

*Model Selection and Model Averaging*. Cambridge Series in Statistical and Probabilistic Mathematics.

*Proceedings of the National Academy of Sciences*.

*arXiv Preprint arXiv:1602.03589*.

*IEEE Signal Processing Magazine*.

*Journal of the American Statistical Association*.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*.

*Journal of the American Statistical Association*.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*.

*The Annals of Statistics*.

*Journal of Machine Learning Research*.

*International Journal of Systems Science*.

*The Annals of Statistics*.

*arXiv:1502.07943 [Cs, Stat]*.

*Biometrika*.

*Trends in Ecology & Evolution*.

*Machine Learning and Knowledge Discovery in Databases*. Lecture Notes in Computer Science.

*Biometrika*.

*Information Criteria and Statistical Modeling*. Springer Series in Statistics.

*The Annals of Statistics*.

*arXiv:1603.06560 [Cs, Stat]*.

*Advances in Neural Information Processing Systems*.

*Econometric Theory*.

*Concentration Inequalities and Model Selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII - 2003*. Lecture Notes in Mathematics 1896.

*The Annals of Statistics*.

*arXiv:1409.4317 [Math, Stat]*.

*Journal of Statistical Planning and Inference*.

*Biometrika*.

*Institute of Mathematical Statistics Lecture Notes - Monograph Series*.

*Journal of the American Statistical Association*.

*Data Segmentation and Model Selection for Computer Vision*.

*International Statistical Review / Revue Internationale de Statistique*.

*Journal of the American Statistical Association*.

*Journal of the American Statistical Association*.

*Technometrics*.

*Journal of the American Statistical Association*.

*From Data to Model*.

*The Annals of Statistics*.

*Journal of the Royal Statistical Society. Series B (Methodological)*.

*Suri-Kagaku (Mathematical Sciences)*.

*arXiv:1401.3889 [Stat]*.

*Statistics*.

*arXiv:1506.06266 [Math, Stat]*.

*The Annals of Statistics*.

*Statistical Methods in Medical Research*.

*The Annals of Statistics*.

*The Annals of Statistics*.

*Journal of Machine Learning Research*.

*The Annals of Statistics*.