Big data ML best practice


A grab bag of links I have found pragmatically useful in the topsy-turvy world of ML research. Here, where even though we have big data about the world, we still have small data about our own experimental models of the world, because they are so computationally expensive.

see also Surrogate optimisation of experiments.

Martin Zinkervich’s Rules of ML for engineers, and Google’s broad brush workflow overview. Andrej Karpathy’s Recipe for training neural networks. Zayd Enam on why debugging machine learning is hard.

Jeremy Jordan on writing tests for ML.

The Turing Way by the Alan Turing institute covers many reproducible research/open notebook science ideas which includes some tips applicable to ML research.

Jordan’s test-ML pipeline

Tools

gin-config configures default parameters in a useful way for ML experiments.

Ameisen, Emmanuel. 2020. Building Machine Learning Powered Applications: Going from Idea to Product. https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=2357577.

Friedrich, Sarah, Gerd Antes, Sigrid Behr, Harald Binder, Werner Brannath, Florian Dumpert, Katja Ickstadt, et al. 2020. “Is There a Role for Statistics in Artificial Intelligence?” September 13, 2020. http://arxiv.org/abs/2009.09070.