Tracking experiments in machine learning

Experiment tracking, specialised for ML and in particular neural nets. In particular, ML experiment often have a long optimisation process and we care a lot about the whole optimisation and various metrics calculated on this. This is the nuts-and-bolts end of how we allow for reproducibility in AI, even while we have complicated model fitting process and many long slow computation steps, even as the code changes through a complicated development process.

Neptune reviews a few options including their own product.


If it is just a few configuration parameters that needs tracking, then we do not need to be too fancy. hydra for example, allows us to store the configuration with the output data. We can probably output other useful metrics too. But if we want to do some non-trivial analytics on our massive NN then more elaborate tools are needed, and a nice way of visualising them, and dynamically updating them… You could possibly pump them into some nice data store and a data dashboard, but at this point, most people discover they have reinvented tensorboard and just switch to that.


Tensorboard is a de facto debugging/tracking tool standard. It is easy-ish to install, hard to modify and works well enough (but not on our internal cluster). It is a handy tool to have around, handy enough that I gave it its own page.

Weights and biases

Weights & Biases is a full-featured model-training-tracking system using a third party host. In practice by shunting stuff off to the cloud it is a little easier than tensorflow, and the SDK for various platforms abstract away some boring details.

Documentation is here. For my purposes the most handy entry point is their Experiment Tracking.

Track and visualize experiments in real time, compare baselines, and iterate quickly on ML projects

Use the wandb Python library to track machine learning experiments with a few lines of code. If you’re using a popular framework like PyTorch or Keras, we have lightweight integrations.

You can then review the results in an interactive dashboard or export your data to Python for programmatic access using our Public API.

Neptune bills itself as a Metadata Store for MLOps. It has many team collaboration features. TBD.


For julia there is DrWatson which automatically attaches code versions to simulation and does some other work to generally keep simulations tracked and reproducible.

DrWatson is a scientific project assistant software. Here is what it can do:

  • Project Setup : A universal project structure and functions that allow you to consistently and robustly navigate through your project, no matter where it is located.
  • Naming Simulations : A robust and deterministic scheme for naming and handling your containers.
  • Saving Tools : Tools for safely saving and loading your data, tagging the Git commit ID to your saved files, safety when tagging with dirty repos, and more.
  • Running & Listing Simulations: Tools for producing tables of existing simulations/data, adding new simulation results to the tables, preparing batch parameter containers, and more.

See the DrWatson Workflow Tutorial page to get a quick overview over all of these functionalities.



DVC the data versioning tool can apparently do some useful tracking. TBD


A dual problem to experiment tracking is experiment configuring. How do I even set up those parameters?

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.