Reproducibility in Machine Learning research

2024-05-05 — 2024-05-05

Suspiciously similar content

How does reproducible science happen for ML models? How can we responsibly communicate how the latest sexy paper is likely to work in practice?

1 Difficulties of foundation models in particular

When the model is both too large and too secretive to be interrogated. TBD

2 Connection to domain adaptation

How do we know that our models generalize to the wild? See Domain adaptation.

3 Benchmarks

See ML benchmarks.

4 Incoming

Reproducibility Checklist - AAAI

REFORMS: Reporting standards for ML-based science:

The REFORMS checklist consists of 32 items across 8 sections. It is based on an extensive review of the pitfalls and best practices in adopting ML methods. We created an accompanying set of guidelines for each item in the checklist. We include expectations about what it means to address the item sufficiently. To aid researchers new to ML-based science, we identify resources and relevant past literature.

The REFORMS checklist differs from the large body of past work on checklists in two crucial ways. First, we aimed to make our reporting standards field-agnostic, so that they can be used by researchers across fields. To that end, the items in our checklist broadly apply across fields that use ML methods. Second, past checklists for ML methods research focus on reproducibility issues that arise commonly when developing ML methods. But these issues differ from the ones that arise in scientific research. Still, past work on checklists in both scientific research and ML methods research has helped inform our checklist.

Various syntheses arise from time to time: Albertoni et al. (2023); Pineau et al. (2020).

5 References

Albertoni, Colantonio, Skrzypczyński, et al. 2023. “Reproducibility of Machine Learning: Terminology, Recommendations and Open Issues.”

Bender, Gebru, McMillan-Major, et al. 2021. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.

Bommasani, Hudson, Adeli, et al. 2022. “On the Opportunities and Risks of Foundation Models.”

Crawford. 2021. The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence.

Gibney. 2019. “This AI Researcher Is Trying to Ward Off a Reproducibility Crisis.” Nature.

Hardt. 2025. The Emerging Science of Machine Learning Benchmarks.

Kellogg, Valentine, and Christin. 2020. “Algorithms at Work: The New Contested Terrain of Control.” Academy of Management Annals.

Liang, Tadesse, Ho, et al. 2022. “Advances, Challenges and Opportunities in Creating Data for Trustworthy AI.” Nature Machine Intelligence.

Liu, Miao, Zhan, et al. 2019. “Large-Scale Long-Tailed Recognition in an Open World.” In.

Madaio, Stark, Wortman Vaughan, et al. 2020. “Co-Designing Checklists to Understand Organizational Challenges and Opportunities Around Fairness in AI.” In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. CHI ’20.

Mitchell, Shira, Potash, Barocas, et al. 2021. “Algorithmic Fairness: Choices, Assumptions, and Definitions.” Annual Review of Statistics and Its Application.

Mitchell, Margaret, Wu, Zaldivar, et al. 2019. “Model Cards for Model Reporting.” In Proceedings of the Conference on Fairness, Accountability, and Transparency. FAT* ’19.

Pineau, Vincent-Lamarre, Sinha, et al. 2020. “Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program).”

Pushkarna, Zaldivar, and Kjartansson. 2022. “Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI.” In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’22.

Raji, Bender, Paullada, et al. 2021. “AI and the Everything in the Whole Wide World Benchmark.”

Raji, Smart, White, et al. 2020. “Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT* ’20.

Wang, Fu, Du, et al. 2023. “Scientific Discovery in the Age of Artificial Intelligence.” Nature.