Ensemble methods; mixing predictions from simple learners to get sophisticated predictions.
Fast to train, fast to use. Gets you results. May not get you answers. So, like neural networks but from the previous hype cycle.
Jeremy Kun: Why Boosting Doesn’t Overfit:
Boosting, which we covered in gruesome detail previously, has a natural measure of complexity represented by the number of rounds you run the algorithm for. Each round adds one additional “weak learner” weighted vote. So running for a thousand rounds gives a vote of a thousand weak learners. Despite this, boosting doesn’t overfit on many datasets. In fact, and this is a shocking fact, researchers observed that Boosting would hit zero training error, they kept running it for more rounds, and the generalization error kept going down! It seemed like the complexity could grow arbitrarily without penalty. … this phenomenon is a fact about voting schemes, not boosting in particular.
In a different context, I’ve run into model averaging; How does this relate to voting algorithms?
How do you phrase ensemble algorithms in a Bayesian context? If it were Bayesian model averaging, this would be easy, but where the learners are all ill-posed?
Randoms trees, forests, jungles
Balog, Matej, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, and Yee Whye Teh. 2016. “The Mondrian Kernel,” June. http://arxiv.org/abs/1606.05241.
Balog, Matej, and Yee Whye Teh. 2015. “The Mondrian Process for Machine Learning,” July. http://arxiv.org/abs/1507.05181.
Bickel, Peter J., Bo Li, Alexandre B. Tsybakov, Sara A. van de Geer, Bin Yu, Teófilo Valdés, Carlos Rivero, Jianqing Fan, and Aad van der Vaart. 2006. “Regularization in Statistics.” Test 15 (2): 271–344. https://doi.org/10.1007/BF02607055.
Breiman, Leo. 1996. “Bagging Predictors.” Machine Learning 24 (2): 123–40. https://doi.org/10.1007/BF00058655.
Bühlmann, Peter, and Sara van de Geer. 2011. Statistics for High-Dimensional Data: Methods, Theory and Applications. 2011 edition. Heidelberg ; New York: Springer.
Criminisi, Antonio, Jamie Shotton, and Ender Konukoglu. 2012. “Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning.” Foundations and Trends® in Computer Graphics and Vision 7 (2-3). https://doi.org/10.1561/0600000035.
Criminisi, A., J. Shotton, and E. Konukoglu. 2011. “Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning.” MSR-TR-2011-114. Microsoft Research. http://research.microsoft.com/apps/pubs/default.aspx?id=155552.
Díaz-Avalos, Carlos, P. Juan, and J. Mateu. 2012. “Similarity Measures of Conditional Intensity Functions to Test Separability in Multidimensional Point Processes.” Stochastic Environmental Research and Risk Assessment 27 (5): 1193–1205. https://doi.org/10.1007/s00477-012-0654-1.
Fernández-Delgado, Manuel, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. “Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?” Journal of Machine Learning Research 15 (1): 3133–81. http://jmlr.org/papers/v15/delgado14a.html.
Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29 (5): 1189–1232. http://www.jstor.org/stable/2699986.
———. 2002. “Stochastic Gradient Boosting.” Computational Statistics & Data Analysis, Nonlinear Methods and Data Mining, 38 (4): 367–78. https://doi.org/10.1016/S0167-9473(01)00065-2.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2000. “Additive Logistic Regression: A Statistical View of Boosting (with Discussion and a Rejoinder by the Authors).” The Annals of Statistics 28 (2): 337–407. https://doi.org/10.1214/aos/1016218223.
Gall, J., and V. Lempitsky. 2013. “Class-Specific Hough Forests for Object Detection.” In Decision Forests for Computer Vision and Medical Image Analysis, edited by A. Criminisi and J. Shotton, 143–57. Advances in Computer Vision and Pattern Recognition. Springer London. http://www.iai.uni-bonn.de/~gall/download/jgall_houghforest_cvpr09.pdf.
Johnson, R., and Tong Zhang. 2014. “Learning Nonlinear Functions Using Regularized Greedy Forest.” IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (5): 942–54. https://doi.org/10.1109/TPAMI.2013.159.
Lakshminarayanan, Balaji, Daniel M Roy, and Yee Whye Teh. 2014. “Mondrian Forests: Efficient Online Random Forests.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3140–8. Curran Associates, Inc. http://papers.nips.cc/paper/5234-mondrian-forests-efficient-online-random-forests.pdf.
Rahimi, Ali, and Benjamin Recht. 2009. “Weighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning.” In Advances in Neural Information Processing Systems, 1313–20. Curran Associates, Inc. http://papers.nips.cc/paper/3495-weighted-sums-of-random-kitchen-sinks-replacing-minimization-with-randomization-in-learning.
Schapire, Robert E., Yoav Freund, Peter Bartlett, and Wee Sun Lee. 1998. “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods.” The Annals of Statistics 26 (5): 1651–86. https://doi.org/10.1214/aos/1024691352.
Scornet, Erwan. 2014. “On the Asymptotics of Random Forests,” September. http://arxiv.org/abs/1409.2090.
Scornet, Erwan, Gérard Biau, and Jean-Philippe Vert. 2014. “Consistency of Random Forests,” May. http://arxiv.org/abs/1405.2881.
Shotton, Jamie, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, and Antonio Criminisi. 2013. “Decision Jungles: Compact and Rich Models for Classification.” In NIPS. http://research.microsoft.com/apps/pubs/default.aspx?id=205439.