On substituting simulation for analysis in model selection, in
e.g. choosing the “right” regularisation parameter for sparse regression.
Asymptotically equivalent to generalised Akaike information criteria. (e.g. Stone (1977))
Related to bootstrap in various ways.
The computationally expensive default option when your model doesn’t have any
obvious short cuts for complexity regularization, for example when AIC cannot be shown to work.
To learn: how this interacts with Bayesian inference.
The vtreat introduction mentions their why you need hold-out article and also (Perlich and Świrszcz 2011):
Cross-methods such as cross-validation, and cross-prediction are effective tools for many machine learning, statistics, and data science related applications.
They are useful for parameter selection, model selection, impact/target encoding of high cardinality variables, stacking models, and super learning.
As cross-methods simulate access to an out of sample data set the same the original data, they are more statistically efficient, lower variance, than partitioning training data into calibration/training/holdout sets.
However, cross-methods do not satisfy the full exchangeability conditions that full hold-out methods have.
This introduces some additional statistical trade-offs when using cross-methods, beyond the obvious increases in computational cost.
Specifically, cross-methods can introduce an information leak into the modeling process.
Andrews, Donald W. K. 1991. “Asymptotic Optimality of Generalized CL, Cross-Validation, and Generalized Cross-Validation in Regression with Heteroskedastic Errors.” Journal of Econometrics
47 (2): 359–77. https://doi.org/10.1016/0304-4076(91)90107-O
Giordano, Ryan, Michael I. Jordan, and Tamara Broderick. 2019. “A Higher-Order Swiss Army Infinitesimal Jackknife.”
July 28, 2019. http://arxiv.org/abs/1907.12116
Giordano, Ryan, Will Stephenson, Runjing Liu, Michael I. Jordan, and Tamara Broderick. 2019. “A Swiss Army Infinitesimal Jackknife.”
Golub, Gene H., Michael Heath, and Grace Wahba. 1979. “Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter.” Technometrics
21 (2): 215–23. https://doi.org/10.1080/00401706.1979.10489751
Hall, Peter, Jeff Racine, and Qi Li. 2004. “Cross-Validation and the Estimation of Conditional Probability Densities.” Journal of the American Statistical Association
99 (468): 1015–26. https://doi.org/10.1198/016214504000000548
Laan, Mark J. van der, Eric C Polley, and Alan E. Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology
6 (1). https://doi.org/10.2202/1544-6115.1309
Li, Ker-Chau. 1987. “Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set.” The Annals of Statistics
15 (3): 958–75. https://doi.org/10.1214/aos/1176350486
Perlich, Claudia, and Grzegorz Świrszcz. 2011. “On Cross-Validation and Stacking: Building Seemingly Predictive Models on Random Data.” ACM SIGKDD Explorations Newsletter
12 (2): 11–15. https://doi.org/10.1145/1964897.1964901
Polley, Eric, and Mark van der Laan. 2010. “Super Learner In Prediction.” U.C. Berkeley Division of Biostatistics Working Paper Series
, May. https://biostats.bepress.com/ucbbiostat/paper266
Stone, M. 1977. “An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion.” Journal of the Royal Statistical Society. Series B (Methodological)
39 (1): 44–47. https://doi.org/10.1111/j.2517-6161.1977.tb01603.x
Wood, S. 1994. “Monotonic Smoothing Splines Fitted by Cross Validation.” SIAM Journal on Scientific Computing
15 (5): 1126–33. https://doi.org/10.1137/0915069