October 23, 2020 — February 2, 2024

Figure 1

An experiment tracking tool made famous by its use on neural nets and in particular the fact that its default visualisations work OK for NNs and run on almost any server without further dependencies.

Tensorboard is a de facto debugging/tracking tool standard. It is easy-ish to install, hard to modify and works well enough (but not on our internal cluster). I recommend reading Li Yin’s explanation. It looks like it is closely coupled to tensorflow, because it used to be, but these days may be installed kinda-sorta separately. That is also a strength — since it works by writing files to the filesystem it can get around the horrible network lockdowns that we often experience in HPC hell The torch manual shows a worked example, but the best walk-through IMO is Derek Mwiti’s TensorBoard Tutorial for neptune.ai which is regularly updated for new technology and across platforms.

I tend to find installation nonetheless slightly annoying; it clutters up with crap pretty easily, and esoteric surprise dependencies are common. If you look at it funny it will slyly install tensorflow for you anyway.

There are two parts to using Tensorboard:

  1. Writing data that Tensorboard can read (Tensorflow TFEvents)
  2. Looking at the data

1 Writing data

Part 1 looks like this in pytorch.

# torch
from torch.utils.tensorboard import SummaryWriter
LOG_DIR = "debug/"
SUBLOG_PATH = os.path.join(
writer = SummaryWriter(log_dir=SUBLOG_PATH)

writer.add_graph(net, data[0])

# …log the running loss
    'training loss',
    running_loss / 1000,
    epoch * len(data) + i)

# …log a Matplotlib Figure showing the model’s predictions on a
# random mini-batch
writer.add_figure('predictions vs. actuals',
                plot_classes_preds(net, inputs, labels),
                global_step=epoch * len(trainloader) + i)

See also torch.utils.tensorboard API Docs. There is lots of cool stuff that can be logged, like 3d meshes and audio files.

2 Viewing data

Part 2, minimally,

tensorboard --logdir=path/to/log-directory

Supposedly we can run tensorboard part 2 inside vs code too. This has never worked for me; the tensorboard instance never sees any log data and just sits there grinning inanely. I cannot find anyone else on the internet who has experienced this problem despite much searching, so I gave up. Maybe using TensorBoard in Notebooks works better? Running it from the command line is fine.

Handy trick: Projector visualises embeddings:

TensorBoard has a built-in visualizer, called the Embedding Projector, for interactive visualization and analysis of high-dimensional data like embeddings. It is meant to be useful for developers and researchers alike. It reads from the checkpoint files where you save your tensorflow variables. Although it’s most useful for embeddings, it will load any 2D tensor, potentially including your training weights.

Data from tensorboard experiments may be loaded back into python as a DataFrame using tensorboard.data.experimental. There are also community options.