Experiment tracking for machine learning



A dual problem to experiment tracking is experiment configuring; How can I keep reproducible files which nicely define experiments?

DID YOU SAY A FRAMEWORK FOR ELEPHANTLY CONFIGURING COMPLEX APPLICATIONS

Allennlp Param

I am quite fond of alennlp’s Param system as a kind of introductory trainer-wheels system, but it come with a lot of baggage — installing it will slurp in a large number of fragile and fussy dependencies for language parsing.

Hydra

Hydra is a generic configuration system which happens to work well for nlp. It uses something called the OmegaConf system. It supports configuration by YAML and in particular they have thought about configuring experiments. They also optionally use a typing system to detect misconfigurations, and allow importing many configurations and overriding them.

It also provides, incidentally, a command-line system with extremely flexible abilities to configure complex parameter hierarchies.

The main problem for me is that because it is so generic, powerful and clean, it is not clear where to even begin in building a config system. Fortunately, there are HOWTOs.

Julien Beaulieu describes Building A Flexible Configuration System For Deep Learning Models.

For some do-by-example, here are some template examples showing hydra in action:

Important non-obvious patterns:

Instantiating objects with hydra.utils.instantiate — this allows you to specify class objects using a special field, .e.g.

_target_: src.models.mnist_model.MNISTLitModel

Paths are complicated:

# path to original working directory
# hydra hijacks working directory by changing it to the current log directory,
# so it’s useful to have this path as a special variable
# https://hydra.cc/docs/next/tutorials/basic/running_your_app/working_directory
work_dir: ${hydra:runtime.cwd}

# path to folder with data
data_dir: ${work_dir}/data/

Variable interpolation is supported but a little bit weird because it supports both internal variables and system environment variables. This syntax supports environment variables:

local_path: ${oc.env:LOCAL_PATH}

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.