The computationally expensive default option when your model doesn’t have any obvious short cuts for complexity regularization, for example when AIC cannot be shown to work.
To learn: how this interacts with Bayesian inference.
Basic Cross Validation
Generalised Cross Validation
🏗 Hat matrix, smoother matrix. Note comparative computational efficiency. Define hat matrix.
Cross-methods such as cross-validation, and cross-prediction are effective tools for many machine learning, statistics, and data science related applications. They are useful for parameter selection, model selection, impact/target encoding of high cardinality variables, stacking models, and super learning. As cross-methods simulate access to an out of sample data set the same the original data, they are more statistically efficient, lower variance, than partitioning training data into calibration/training/holdout sets. However, cross-methods do not satisfy the full exchangeability conditions that full hold-out methods have. This introduces some additional statistical trade-offs when using cross-methods, beyond the obvious increases in computational cost.
Specifically, cross-methods can introduce an information leak into the modeling process.
Andrews, Donald W. K. 1991. “Asymptotic Optimality of Generalized CL, Cross-Validation, and Generalized Cross-Validation in Regression with Heteroskedastic Errors.” Journal of Econometrics 47 (2): 359–77. https://doi.org/10.1016/0304-4076(91)90107-O.
Giordano, Ryan, Michael I. Jordan, and Tamara Broderick. 2019. “A Higher-Order Swiss Army Infinitesimal Jackknife,” July. http://arxiv.org/abs/1907.12116.
Giordano, Ryan, Will Stephenson, Runjing Liu, Michael I. Jordan, and Tamara Broderick. 2019. “A Swiss Army Infinitesimal Jackknife.” In AISTATS. http://arxiv.org/abs/1806.00550.
Golub, Gene H., Michael Heath, and Grace Wahba. 1979. “Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter.” Technometrics 21 (2): 215–23. https://doi.org/10.1080/00401706.1979.10489751.
Hall, Peter, Jeff Racine, and Qi Li. 2004. “Cross-Validation and the Estimation of Conditional Probability Densities.” Journal of the American Statistical Association 99 (468): 1015–26. https://doi.org/10.1198/016214504000000548.
Laan, Mark J. van der, Eric C Polley, and Alan E. Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1). https://doi.org/10.2202/1544-6115.1309.
Li, Ker-Chau. 1987. “Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set.” The Annals of Statistics 15 (3): 958–75. https://doi.org/10.1214/aos/1176350486.
Perlich, Claudia, and Grzegorz Świrszcz. 2011. “On Cross-Validation and Stacking: Building Seemingly Predictive Models on Random Data.” ACM SIGKDD Explorations Newsletter 12 (2): 11–15. https://doi.org/10.1145/1964897.1964901.
Polley, Eric, and Mark van der Laan. 2010. “Super Learner in Prediction.” U.C. Berkeley Division of Biostatistics Working Paper Series, May. https://biostats.bepress.com/ucbbiostat/paper266.
Stone, M. 1977. “An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion.” Journal of the Royal Statistical Society. Series B (Methodological) 39 (1): 44–47. http://www.stat.washington.edu/courses/stat527/s14/readings/Stone1977.pdf.
Wood, S. 1994. “Monotonic Smoothing Splines Fitted by Cross Validation.” SIAM Journal on Scientific Computing 15 (5): 1126–33. https://doi.org/10.1137/0915069.