Hydra for tracking machine learning experiments
October 20, 2021 — July 19, 2023
1 What?
When I configure my ML experiments, i.e. try the various combinations of parameters to find the good ones, the one-stop shop (at least for python) is hydra. Related: hyperparameter search (which I also do using hydra).
The problem that hydra solves is
- configuring experiments, by which we usually mean “finding good neural net parameters”. It generates a command-line interface that allows me to express all kinds of parameters and hyperparameters to test out different candidate NN configurations, and set sane defaults and use text files for storing complicated stuff, in a standardised and readable format
- Logging outputs: whenever I set up and experiment using the lovely configuration interface, it gets its own little automatic output folder which keeps all the experiment parameters there for later reference and analysis.
It has a few other bonus features as well (e.g. automatically running jobs in parallel or performing parameters sweeps or integrating with hyperparameter tuners, but those are “bonus” features for me.
Anyway, the fact that it can do all those things is not necessarily obvious because the documentation targets grizzled veterans, not new users. And THAT is why this notebook exists. Hydra is a supposedly generic configuration system but they have been very careful to have good affordances for ML in particular. The fact that this is not obvious (IMO) from the docs is the worst feature of Hydra; despite that it is good enough to push through that difficulty.
1.1 Features
- explicit flexible config syntax (by YAML) which is easy to read and write and process
- they have thought about configuring ML experiments in particular
- optional typing system to detect misconfigurations (although in practice I will never have time to actually set this up)
- a command-line system with sophisticated further configuration and overrides
- hierarchical configurations may be overridden in various ways
1.2 Misfeatures
- Most of the documentation and examples are about pytorch, presumably because of the facebook connection, which means that the examples are not generic.
- Manual is flat, affectless and unopinionated even where an opinion would be useful and where some affect would be more engaging and memorable
- hierarchical configurations may be overridden in too many ways and TBH are a bit confusing. I would prefer less flexibility for the sake of simplicity.
This all comes to a head when starting a new project: because hydra is so generic, powerful and clean, it is not clear where to even begin. Fortunately, there are 3rd party HOWTOs and boilerplate examples to copy-paste. These are IMO worth the overhead, since setting it up this way save much time later on.
2 Tutorials and examples
- Best tutorial: Kushajveer Singh, Complete tutorial on how to use Hydra in Machine Learning projects
- Julien Beaulieu, Building A Flexible Configuration System For Deep Learning Models.
- Simone Scardapane, Learning Hydra for configuring ML experiments
- Omry Yadan (hydra author) writes Hydra — A fresh look at configuration for machine learning projects
I generally hate video documentation but I will make a rare exception here:
Here are some examples of hydra in action which might be useful as templates:
- worked examples from the project manual
- lucmos/nn-template: Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, DVC, and Streamlit.
- ashleve/lightning-hydra-template: PyTorch Lightning + Hydra. A very general, feature-rich template for rapid and scalable ML experimentation with best practices. ⚡🔥⚡
In practice I found the fancy templates of those way too fancy; too feature-stuffed. It was easiest to start from scratch without all the overblown dependencies and fancy chaos.
More community documentation:
3 Instantiating arbitrary python classes
Instantiating objects with hydra.utils.instantiate
— this allows us to specify which python class to instantiate using a special field, e.g.
Corollary: config files are a security risk. Do not use a config file from anyone you do not trust.
4 Environment variable interpolation
Variable interpolation is supported but a little bit weird because it supports both internal variables (which we just saw) and system environment variables and it takes a while to work out what is what. Hydra supports system environment variables, but confusingly does not expand any paths it finds in them, as it would if they were internal hydra variables. 🤷 I gather this is because the system environment variables arrive via lower-level Omegaconf resolvers upon which hydra is built.
Pro tip: this is handy with an environment variable config system like dotenv.
Hydra supports additional environment variable management.
5 Paths
Paths are complicated. I find it useful to be aware of the following patterns:
Under some configurations hydra hijacks working directory by changing it to the current log directory. It’s useful to have the original working dir path as a special variable.
work_dir: ${hydra:runtime.cwd}
# example of using the above to define some other folder
data_dir: ${work_dir}/data/
We can alternatively get at the original and configured paths using hydra.utils.get_original_cwd
.
Configuring that dir is possible. The default looks like this:
Here is one tht is IMO more useful, that allows us to optionally use a custom output dir set by local environment vars and which uses the job name.
NB: hijacking is now optional: Set hydra.job.chdir=False
to disable hijacking.
Paths are still slightly unintuitive. Hydra’s facility for creating nice paths is great, but they do not necessarily interpolate in the obvious way — tl;dr if what we want is to change into the path that hydra lets us define with lots of luxurious command substitution everything is fine.
If we wish to interrogate hydra in arbitrary locations in the config file to do arbitrary things (e.g. using the hydra job name or job id as a parameter for something else) it does not seem to be so nice; the hydra
object is not available there.
The path of least resistance (ahem) is to do everything in the folder that hydra gives us but if we have some artefact that needs to go somewhere else, we can import the hydra object explicitly and interrogate it for the variables of interest. This is slightly ugly and leaky.
6 Logs
Logs are per default configured through hydra to persist to a disk file as well as stdout, which is usually what I want. See Customizing logging.
7 Python packages with hydra configs
Feasible. See the example.
8 Go parallel with hydra
Hydra will sweep parameters in parallel via plugins:
- Joblib Launcher plugin (for local jobs)
- Submitit Launcher plugin (for SLURM clusters)
The docs are a little confusing, but check out this example.
Looking at joblib, for example, we might want to enable a param sweep from the command line. AFAICT this should launch
TBC