Model averaging

On keeping many incorrect hypotheses and using them all as one goodish one

Train a bunch of different models and use them all. Fashionable in the form of blending, stacking or or staging in machine learning competitions, but also popular in classic inference.

A mere placeholder. For now see Ensemble learning on Wikipedia I’ve seen the idea pop up in disconnected areas recently. Specifically: a Bayesian heuristic for dropout in neural nets, AIC for frequentist model averaging, Neural net ensembles, boosting/bagging, and in a statistical learning context for optimal time series prediction.

This vexingly incomplete article points out that something like model averaging might work for any convex loss thanks to Jensen’s inequality. I am most used to it with K-L loss.


Bates, J. M., and C. W. J. Granger. 1969. “The Combination of Forecasts.” Journal of the Operational Research Society 20 (4): 451–68.
Breiman, Leo. 1996. “Stacked Regressions.” Machine Learning 24 (1, 1): 49–64.
Buckland, S. T., K. P. Burnham, and N. H. Augustin. 1997. “Model Selection: An Integral Part of Inference.” Biometrics 53 (2): 603–18.
Claeskens, Gerda, and Nils Lid Hjort. 2008. Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge ; New York: Cambridge University Press.
Clyde, Merlise, and Edward I. George. 2004. “Model Uncertainty.” Statistical Science 19 (1): 81–94.
Fragoso, Tiago M., and Francisco Louzada Neto. 2015. “Bayesian Model Averaging: A Systematic Review and Conceptual Classification.” September 29, 2015.
Hansen, Bruce E. 2007. “Least Squares Model Averaging.” Econometrica 75 (4): 1175–89.
He, Bobby, Balaji Lakshminarayanan, and Yee Whye Teh. 2020. “Bayesian Deep Ensembles via the Neural Tangent Kernel.” In Advances in Neural Information Processing Systems. Vol. 33.
Hinne, Max, Quentin Frederik Gronau, Don van den Bergh, and Eric-Jan Wagenmakers. 2019. “A Conceptual Introduction to Bayesian Model Averaging.” Preprint. PsyArXiv.
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. 2015. “Distilling the Knowledge in a Neural Network.” March 9, 2015.
Hjort, Nils Lid, and Gerda Claeskens. 2003. “Frequentist Model Average Estimators.” Journal of the American Statistical Association 98 (464): 879–99.
Hoeting, Jennifer A., David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. “Bayesian Model Averaging: A Tutorial.” Statistical Science 14 (4): 382–417.
Hu, Feifang, and James V. Zidek. 2002. “The Weighted Likelihood.” The Canadian Journal of Statistics / La Revue Canadienne de Statistique 30 (3, 3): 347–71.
Laan, Mark J. van der, Eric C Polley, and Alan E. Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).
Lawless, J. F., and Marc Fredette. 2005. “Frequentist Prediction Intervals and Predictive Distributions.” Biometrika 92 (3): 529–42.
Le, Tri, and Bertrand Clarke. 2017. “A Bayes Interpretation of Stacking for $\Mathcal{}M{}$-Complete and $\Mathcal{}M{}$-Open Settings.” Bayesian Analysis 12 (3): 807–29.
Leung, G., and A. R. Barron. 2006. “Information Theory and Mixing Least-Squares Regressions.” IEEE Transactions on Information Theory 52 (8): 3396–3410.
Phillips, Robert F. 1987. “Composite Forecasting: An Integrated Approach and Optimality Reconsidered.” Journal of Business & Economic Statistics 5 (3): 389–95.
Piironen, Juho, and Aki Vehtari. 2017. “Comparison of Bayesian Predictive Methods for Model Selection.” Statistics and Computing 27 (3): 711–35.
Polley, Eric, and Mark van der Laan. 2010. “Super Learner In Prediction.” U.C. Berkeley Division of Biostatistics Working Paper Series, May.
Shen, Xiaotong, and Hsin-Cheng Huang. 2006. “Optimal Model Assessment, Selection, and Combination.” Journal of the American Statistical Association 101 (474): 554–68.
Wang, Haiying, Xinyu Zhang, and Guohua Zou. 2009. “Frequentist Model Averaging Estimation: A Review.” Journal of Systems Science and Complexity 22 (4): 732.
Wolpert, David H. 1992. “Stacked Generalization.” Neural Networks 5 (2): 241–59.
Zhang, Xinyu, and Hua Liang. 2011. “Focused Information Criterion and Model Averaging for Generalized Additive Partial Linear Models.” The Annals of Statistics 39 (1): 174–200.

Warning! Experimental comments system! If is does not work for you, let me know via the contact form.

No comments yet!

GitHub-flavored Markdown & a sane subset of HTML is supported.