Arora, Sanjeev, and Yi Zhang. 2021. “Rip van Winkle’s Razor: A Simple Estimate of Overfit to Test Data.” arXiv:2102.13189 [Cs, Stat]
Blum, Avrim, and Moritz Hardt. 2015. “The Ladder: A Reliable Leaderboard for Machine Learning Competitions.” arXiv:1502.04585 [Cs]
Brockman, Greg, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. “OpenAI Gym.” arXiv:1606.01540 [Cs]
Fleming, Philip J., and John J. Wallace. 1986. “How Not to Lie with Statistics: The Correct Way to Summarize Benchmark Results.” Communications of the ACM
29 (3): 218–21.
Geirhos, Robert, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. 2020. “Shortcut Learning in Deep Neural Networks.” arXiv:2004.07780 [Cs, q-Bio]
Hutson, Matthew. 2022. “Taught to the Test.” Science
376 (6593): 570–73.
Hyndman, Rob J. 2020. “A Brief History of Forecasting Competitions.” International Journal of Forecasting
, M4 Competition, 36 (1): 7–14.
Kistowski, Jóakim v., Jeremy A. Arnold, Karl Huppler, Klaus-Dieter Lange, John L. Henning, and Paul Cao. 2015. “How to Build a Benchmark.”
In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering
, 333–36. ICPE ’15. New York, NY, USA: Association for Computing Machinery.
Lathuilière, Stéphane, Pablo Mesejo, Xavier Alameda-Pineda, and Radu Horaud. 2020. “A Comprehensive Analysis of Deep Regression.” IEEE Transactions on Pattern Analysis and Machine Intelligence
42 (9): 2065–81.
Makridakis, Spyros, Evangelos Spiliotis, and Vassilios Assimakopoulos. 2020. “The M4 Competition: 100,000 Time Series and 61 Forecasting Methods.” International Journal of Forecasting
, M4 Competition, 36 (1): 54–74.
Musgrave, Kevin, Serge Belongie, and Ser-Nam Lim. 2020. “A Metric Learning Reality Check.” arXiv:2003.08505 [Cs]
Mytkowicz, Todd, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2009. “Producing Wrong Data Without Doing Anything Obviously Wrong!”
In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems
, 265–76. ASPLOS XIV. New York, NY, USA: Association for Computing Machinery.
Olson, Randal S., William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. “PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison.” BioData Mining
10 (1): 36.
No comments yet. Why not leave one?