Configuring machine learning experiments

2021-10-20 — 2025-06-15

computers are awful

faster pussycat

how do science

premature optimization

provenance

python

Suspiciously similar content

A dual problem to experiment tracking is experiment configuring; How can I nicely define experiments and the parameters that make them go? Ideally, how can I do both these things at the same time?

For a problem that we all need to solve constantly, and which has ?hundreds? of packages to solve it, this is remarkably unsolved.

Almost everyone gets annoyed by the frictions of configuring experiments at some stage, triggered by a complicated situation where we have some messy hierarchical neural net and we wish to keep the configuration in a file, while still allowing ourselves to override it from the command line. And it’s a deeply nested hierarchical configuration. And we are editing the code. And sometimes we are invoking it programmatically from a notebook, and sometimes from a command-line script. We want to make it easy to tweak function arguments. easy to say but… how? How do we make it “easy” without introducing many more moving parts I would rather not need to have “configuration objects” and “configuration files” and “configuration decorators” and “configuration classes” and “configuration functions” and “configuration arguments” and “configuration overrides” and “configuration defaults” and “configuration validation” and “configuration parsing”. Yet, many solutions seem to involve a remarkably large number of such components. Somehow I need to write 200 lines of code and I just want to tweak the learning rate.

That’s the setting. Further, since all my experiments at the moment are in Python, I will assume python hereafter.

The competing design goals of experiment configurations are hard to resolve. I want my core business logic to be plain python functions. I also want to invoke them easily from various systems, such as the CLI, which will want to know a lot about my business logic in order to call it automatically. Is that argument always an integer? Can it return a negative number? Oh that argument is an nn.Module? Which of its constructor arguments might we change? what is the default value for the others?

Of course, we are in neural network land, which is typically an uneasy truce between functional and object-oriented programming with late binding, leaky abstractions, and a lot of late resolutions of dimensions and other such annoyances that seductively easy to configure if you are doing exactly the example in the user manual, but turn out to be brittle to automate in like, actual applications. (I’m looking at you, UNet). The question of who bears responsibility for which parts of the complexity things is not settled; all the libraries try different approaches. Which is to say, it is not clear what the simplest thing is.

And there are so, so many libraries once you start looking for them. It is like mushroom picking; first you see one, and then another, and then you realise you are standing in a field of mushrooms, and you have no idea which ones are edible and which ones will have you hallucinating for a week. It seems this problem defines a psychological weak spot in python where it tips over from empowering software to enabling yak shaving. ML config is in the Dunning-Kruger liminal zone where the waves of duck-typing lap upon the beach of solid software engineering, but demolish all you build upon it. Python has just enough metaprogramming facilities to make it conceivable for an overconfident developer to feel they can automate some of the labour but at the cost of being ugly and hard to reason about but how hard could it be? Just because a hundred people before you have tried and failed doesn’t mean you will fail maybe THEY hadn’t read PEP484 ! or PEP557! or thought about implementing everything in using a brand new Param class that is a dataclass but also a decorator and also a context manager and also a type hint and also a function signature and generates CLIs by introspection !!!

So, onto this beach littered with the flotsam and jetsam of abandoned configuration systems, and also somehow hallucinatory mushrooms somehow, and probably in addition landmines, I will try sell you my most premium sandcastle real estate.

I have provisionally decided that the following things are anti-patterns:

recursively configuring a configuration file in some special language like json or yaml. It sounds elegant, but in practice ends up being a sad journey into a mess of mapping arguments to functions. I am open to being persuaded otherwise. A system that allowed incrementally adding configuration to a function call, rather than replacing the whole configuration, might work
auto-config. fiddle tried to do that to make my life easier, and all it did was tank my LLM budget trying to debug fiddle.

But, you know what? I haven’t found anything that works so you should ignore my opinions and try everything anyway.

1 yaml_config_override

See sashank-tirumala/yaml_config_override.

This library does one thing: it automatically creates command-line arguments to override values in a YAML configuration file. I know I just said that I don’t like special configuraion languages, this library is so simple that maybe I should suck it up and have a go?

Extremely Lightweight: It’s a tiny library with a single-purpose.
Non-Intrusive: You can add it to your existing workflow with just a couple of lines of code. Your business logic remains completely separate from the configuration logic.
Easy CLI Integration: This is its core feature. You get command-line overrides for free, without writing any argparse boilerplate.

Here’s how you might use it:

# main.py
from yaml_config_override import add_arguments
import yaml
from pathlib import Path

# Assume you have a 'config.yaml'
# outer:
#   x: 0
#   inner:
#     y: 1

my_config_path = 'config.yaml'
conf = yaml.safe_load(Path(my_config_path).read_text())
conf = add_arguments(conf)

print(conf)

# You can now run from the command line:
# python main.py --outer.x 2 --outer.inner.y 3

Look like it will suck if one of those arguments is an nn.Module.

2 argbind

pseeth/argbind: Simple package for binding functions to CLI or config files.

Looks simple conceptually. However it’s intrusive; I do not enjoy the philosophy of my function signatures being the canonical definition of my experiment’s parameters.

import argbind

# Your logic is decorated, tying it to the config system
@argbind.bind
def train(learning_rate: float = 0.01, epochs: int = 10):
    # ...

if __name__ == “__main__”:
    args = argbind.parse_args()
    with argbind.scope(args):
        train() # Arguments are magically passed in

3 fastargs

If you’re looking for a more programmatic and powerful solution, maybe fastargs? It allows you to define your configuration directly in Python and use decorators to inject configuration values into your functions.

Incremental Adoption: The decorator-based approach means you can start by configuring just a few functions and expand from there.
Separation of Concerns: It helps keep your configuration logic separate from your main code, but in a more programmatic way than YAML files.
Powerful Features for ML: It can handle complex types like Python modules as arguments, allowing you to specify things like which optimizer to use directly from the command line.
Clear Boilerplate: While it does involve some boilerplate with decorators, it’s very clear what needs to be written, making it easy to use with an LLM assistant.

I dunno though, decorators are already a red flag for me.

4 YACS

rbgirshick/yacs: YACS – Yet Another Configuration System It looks so ugly to configure, but people have won kaggle competitions using it so it might be good, right?

Worked example: Building A Flexible Configuration System For Deep Learning Models · Julien Beaulieu.

5 Hydra

This does more or less everything, and in fact too many things to even understand, and it is very opinionated about them. See my Hydra page. Too heavy and confusing for me in practice, in hindsight.

6 Fiddle

Fiddle seems to resolve some problems in gin so I gave it a go. On paper the API is much simpler to use. It employs a code-based config system, where we define Python functions that mutate the configuration. In practice it is a land-war-in-Indochina situation, with shifting APIs, lagging documentation, counterintuitive patterns that are only simple in hindsight, and massive complexity that tanks my productivity. See my fiddle notebook.

7 Gin

gin-config configures default parameters in a useful way for ML experiments. It is made by Googlers, as opposed to Hydra, which is made by Facebookers. It is more limited than Hydra, but slightly lighter. Things I miss from Hydra: CLI parsing for overrides. It seems to be abandoned or deprecated in favour of fiddle.

8 Spock

Also looks like it solves lots of problem. However, code is untouched since 2023. Does that mean it is finished? or unmaintained? Spock aims to solve so many problems that I am suspicious it could be finished.

9 Spock

Spock is a parameter configuration framework that uses decorated Python classes. It offers a comprehensive approach to managing complex configurations with strong typing.

Key features include:

Type-checked, immutable parameters
Support for inheritance and complex parameter dependencies
Multiple configuration file formats (YAML, TOML, JSON)
Automatic CLI argument generation
Serialization for reproducibility
Hierarchical configuration through composition

Spock seems quite feature-rich, supporting everything from simple parameter definitions to complex nested types and even hyperparameter tuning. While the repository hasn’t been updated since 2023, it appears fairly complete. The question remains whether it’s “finished” or unmaintained. idk.

10 Pyrallis

eladrich/pyrallis seems to be a reimagining of hydra. It seems a little less opinionated, which is relaxing, while leaving the user with fewer choices about how to map the configuration to the code.

The major trick is using the recent (v3.7) Python feature dataclasses as a first-class citizen. Looks elegant but not very maintained.

There is a fork, dlwh/draccus: Configuration with Dataclasses+YAML+Argparse. Fork of Pyrallis.

11 Allennlp Param

Allennlp’s Param system is a kind of introductory trainer-wheels configuration system, but not recommended in practice. It comes with a lot of baggage — installing it will bring in many fragile and fussy dependencies for language parsing. Once I used this for a while I realised all the reasons I would want a better system, and I no longer recommend this in general.

12 DIY

Why use an external library for this? I could, of course, roll my own. I have done that quite a few times. It is a surprisingly large amount of work, remarkably easy to get wrong..