Experiment tracking for computation science

Closely-related problem, in this modern data-driven age: versioning and managing data, build tools.


Sacred is a tool to configure, organize, log and reproduce computational experiments. It is designed to introduce only minimal overhead, while encouraging modularity and configurability of experiments.

The ability to conveniently make experiments configurable is at the heart of Sacred. If the parameters of an experiment are exposed in this way, it will help you to:

  • keep track of all the parameters of your experiment
  • easily run your experiment for different settings
  • save configurations for individual runs in files or a database
  • reproduce your results


ML Metadata (MLMD) is a library for recording and retrieving metadata associated with ML developer and data scientist workflows. MLMD is an integral part of TensorFlow Extended (TFX), but is designed so that it can be used independently.

Every run of a production ML pipeline generates metadata containing information about the various pipeline components, their executions (e.g. training runs), and resulting artifacts(e.g. trained models). In the event of unexpected pipeline behavior or errors, this metadata can be leveraged to analyze the lineage of pipeline components and debug issues.Think of this metadata as the equivalent of logging in software development.

MLMD helps you understand and analyze all the interconnected parts of your ML pipeline instead of analyzing them in isolation and can help you answer questions about your ML pipeline such as:

  • Which dataset did the model train on?
  • What were the hyperparameters used to train the model?
  • Which pipeline run created the model?
  • Which training run led to this model?

Dr Watson

DrWatson is a scientific project assistant software package. Here is what it can do:

  • Project Setup : A universal project structure and functions that allow you to consistently and robustly navigate through your project, no matter where it is located on your hard drive.
  • Naming Simulations : A robust and deterministic scheme for naming and handling your containers.
  • Saving Tools : Tools for safely saving and loading your data, tagging the Git commit ID to your saved files, safety when tagging with dirty repos, and more.
  • Running & Listing Simulations: Tools for producing tables of existing simulations/data, adding new simulation results to the tables, preparing batch parameter containers, and more.

Think of these core aspects of DrWatson as independent islands connected by bridges. If you don’t like the approach of one of the islands, you don’t have to use it to take advantage of DrWatson!

Applications of DrWatson are demonstrated the Real World Examples page. All of these examples are taken from code of real scientific projects that use DrWatson.

Please note that DrWatson is not a data management system.


Lancet comes from neuroscience.

Lancet is designed to help you organize the output of your research tools, store it, and dissect the data you have collected. The output of a single simulation run or analysis rarely contains all the data you need; Lancet helps you generate data from many runs and analyse it using your own Python code.

Its special selling point is that it integrates exploratory parameter sweeps. (can it do such interactively, I wonder?)

Parameter spaces often need to be explored for the purpose of plotting, tuning, or analysis. Lancet helps you extract the information you care about from potentially enormous volumes of data generated by such parameter exploration.

Natively supports the over-engineered campus jobs manager PlatformLSF, and it has a thoughtful workflow for reproducible notebook computation. Less focussed on the dependency management side.


Some kind of “researcher-friendly” MySQL frontend for managing experiments. I’m not sure how well this truly integrates into a workflow of solving problems I actually have.q

CaosDB - Research Data Management for Complex, Changing, and Automated Research Workflows

Here we present CaosDB, a Research Data Management System (RDMS) designed to ensure seamless integration of inhomogeneous data sources and repositories of legacy data. Its primary purpose is the management of data from biomedical sciences, both from simulations and experiments during the complete research data lifecycle. An RDMS for this domain faces particular challenges: Research data arise in huge amounts, from a wide variety of sources, and traverse a highly branched path of further processing. To be accepted by its users, an RDMS must be built around workflows of the scientists and practices and thus support changes in workflow and data structure. Nevertheless it should encourage and support the development and observation of standards and furthermore facilitate the automation of data acquisition and processing with specialized software. The storage data model of an RDMS must reflect these complexities with appropriate semantics and ontologies while offering simple methods for finding, retrieving, and understanding relevant data. We show how CaosDB responds to these challenges and give an overview of the CaosDB Server, its data model and its easy-to-learn CaosDB Query Language. We briefly discuss the status of the implementation, how we currently use CaosDB, and how we plan to use and extend it.


TBC. Forge is an attempt to create a custom experiment-oriented build tool by Adam Kosiorek.

Forge makes it easier to configure experiments and allows easier model inspection and evaluation due to smart checkpoints. With Forge, you can configure and build your dataset and model in separate files and load them easily in an experiment script or a jupyter notebook. Once the model is trained, it can be easily restored from a snapshot (with the corresponding dataset) without the access to the original config files.


Tensorflow-specific. tf.estimator.


Artemis seems to be mostly an experiment wrapper

The Artemis Experiment Framework helps you to keep track of your experiments and their results. It is an alternative to Sacred, with the goal of being more intuitive to use. … Using this module, you can turn your main function into an “Experiment”, which, when run, stores all console output, plots, and computed results to disk (in ~/.artemis/experiments)


pachyderm “is a data lake that offers complete version control for data and leverages the container ecosystem to provide reproducible data processing.”

AFAICT that means it is a cloudimificated build/pipeline tool with data versioning baked in. For the curious, it uses Kubernetes to manage container deployments, which rather presumes you are happy to rent out servers from someone, or have some container-compatibly cluster lying around which it is economic for you to admin and also use.


Sumatra is about tracking and reproducing simulation or analysis parameters for sciencey types. Exploratory data pipelines, especially.



[…] is a framework for heterogenous computing. It primarily provides the communication mechanisms for configuring and launching parallel computations across heterogenous resources. Pathos provides stagers and launchers for parallel and distributed computing, where each launcher contains the syntactic logic to configure and launch jobs in an execution environment. Some examples of included launchers are: a queue-less MPI-based launcher, a ssh-based launcher, and a multiprocessing launcher. Pathos also provides a map-reduce algorithm for each of the available launchers, thus greatly lowering the barrier for users to extend their code to parallel and distributed resources. Pathos provides the ability to interact with batch schedulers and queuing systems, thus allowing large computations to be easily launched on high-performance computing resources.

Integrates well with your jupyter notebook which is the main thing, but much like jupyter notebooks themselves, you are on your own when it comes to reproducibility and might want to use it in concert with one of the other solutions here to achieve that.


Ruffus is about setting up your exploratory simulation and automation pipelines, especially for science.


The original, and still the default. For connoisseurs of fragile whitespace handling.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.