Tracking experiments in machine learning

Experiment tracking, specialised for ML and in particular neural nets. This is the nuts-and-bolts end of how we allow for reproducibility in AI, even while we have complicated model fitting process and many long slow computation steps, even as the code changes through a complicated development process.

Neptune reviews a few options including their own product.


If it is just a few configuration parameters that needs tracking, then we do not need to be too fancy. hydra for example, allows us to store the configuration with the output data. We can probably output other useful metrics too. But if we want to do some complex analytics on our massive NN then more elaborate tools are needed.


NB: Last time I configured tensorboard manually was 2019; this advice be out of date.

Tensorboard is a de facto debugging/tracking tool standard. It is easy-ish to install, hard to modify and works well enough (but not on our internal cluster). I recommend reading Li Yin’s explanation.

I tend to find it slightly annoying; it clutters up with crap pretty easily.


tensorboard --logdir=path/to/log-directory

or, more usually,

tensorboard --logdir=name1:/path/to/logs/1,name2:/path/to/logs/2 --host=localhost

or, lazily, (bash)

tensorboard --logdir=$(ls -dm *.logs |tr -d ' \n\r') --host=localhost


tensorboard --logdir=(string join, (for f in *.logs; echo (basename $f .logs):$f; end)) --host=localhost

In fact, that sometimes works not so well for me. Tensorboard reeeeally wants me to explicitly specify my folder names.

#!/bin/env python3

from pathlib import Path
from subprocess import run
import sys

p = Path('./')

logdirstring = '--logdir=' + ','.join([
  str(d)[:-5] + ":" + str(d)
  for d in p.glob('*.logs')

proc = run(
  • Projector visualises embeddings:

    TensorBoard has a built-in visualizer, called the Embedding Projector, for interactive visualization and analysis of high-dimensional data like embeddings. It is meant to be useful for developers and researchers alike. It reads from the checkpoint files where you save your tensorflow variables. Although it’s most useful for embeddings, it will load any 2D tensor, potentially including your training weights.

Weights and biases

A full-featured model-training-tracking system using a third party host. In practice by shunting stuff off to the cloud it is a little easier than tensorflow, and the SDK for various platforms abstract away some boring details.

For my purposes the most handy entry point is their Experiment Tracking.

Track and visualize experiments in real time, compare baselines, and iterate quickly on ML projects

Use the wandb Python library to track machine learning experiments with a few lines of code. If you’re using a popular framework like PyTorch or Keras, we have lightweight integrations.

You can then review the results in an interactive dashboard or export your data to Python for programmatic access using our Public API.

Neptune bills itself as a Metadata Store for MLOps.


For julia there is DrWatson which automatically attaches code versions to simulation and does some other work to generally keep simulations tracked and reproducible.

DrWatson is a scientific project assistant software. Here is what it can do:

  • Project Setup : A universal project structure and functions that allow you to consistently and robustly navigate through your project, no matter where it is located.
  • Naming Simulations : A robust and deterministic scheme for naming and handling your containers.
  • Saving Tools : Tools for safely saving and loading your data, tagging the Git commit ID to your saved files, safety when tagging with dirty repos, and more.
  • Running & Listing Simulations: Tools for producing tables of existing simulations/data, adding new simulation results to the tables, preparing batch parameter containers, and more.

See the DrWatson Workflow Tutorial page to get a quick overview over all of these functionalities.



As seen in data versioning. TBD



Is this still current?

ML Metadata (MLMD) is a library for recording and retrieving metadata associated with ML developer and data scientist workflows. MLMD is an integral part of TensorFlow Extended (TFX), but is designed so that it can be used independently.

Every run of a production ML pipeline generates metadata containing information about the various pipeline components, their executions (e.g. training runs), and resulting artifacts(e.g. trained models). In the event of unexpected pipeline behavior or errors, this metadata can be leveraged to analyze the lineage of pipeline components and debug issues.Think of this metadata as the equivalent of logging in software development.

MLMD helps you understand and analyze all the interconnected parts of your ML pipeline instead of analyzing them in isolation and can help you answer questions about your ML pipeline such as:

  • Which dataset did the model train on?
  • What were the hyperparameters used to train the model?
  • Which pipeline run created the model?
  • Which training run led to this model?


A dual problem to experiment tracking is experiment configuring. How do I even set up those parameters?

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.