Parsl

Quiet but deadly Python HPC workflow manager

2025-11-17 — 2025-11-17

Wherein a Python parallel scripting library is described as a dynamic DAG‑driven workflow manager for scalable and adaptive analyses, with Python-based configuration used to submit jobs to Slurm and other schedulers.

computers are awful

computers are awful together

concurrency hell

premature optimization

python

I recently switched my primary workflow engine from Snakemake to Parsl (Parallel Scripting Library). Parsl is not highly promoted and it doesn’t have many sexy hipster deisgn patterns or, indeed, much graphic design about its internet present. However, it turns out to solve a lot of my problems.

Parsl originated from the academic and high-performance computing community, motivated by the need for a scalable, flexible way to orchestrate complex scientific workflows without leaving the Python ecosystem. The goal was to address the growing demands of “big data” and the plateauing of sequential processing power by enabling researchers to scale from a laptop to a supercomputer with minimal changes to their code.

At its core, Parsl is designed to execute data-oriented workflows in parallel. Like Snakemake or Make, it manages dependencies in a DAG (Directed Acyclic Graph). However, Parsl’s has a better affordance for exploratory methodologies, by virtue of being more dynamic-feeling_ While Snakemake requires you to define the entire workflow upfront in a static file, Parsl builds the dependency graph implicitly and dynamically at runtime, purely within Python.

Snakemake is an amazing build tool for static pipelines on HPC. Buuuuut its rigid structure and custom DSL made it painful for exploratory data science where the parameters or even the next steps of the analysis depend on the results of the previous steps. Parsl has the edge here, enabling interactive and adaptive experimentation directly from Python scripts or Jupyter notebooks.

The trade-off is that Parsl requires more explicit management for robust caching (memoization) and file dependency tracking compared to Snakemake’s automatic file-based approach.

1 The API

Snakemake uses its own DSL (Domain-Specific Language) in a Snakefile. I hated that. It broke IDE support, linting, and felt like an unnecessary abstraction layer over Python, and cemented my general side-eyed distrust of DSLs.

Parsl avoids this. It is just Python.

Workflows in Parsl are defined by annotating standard Python functions with decorators. These annotated functions are called “Apps”.

import parsl
from parsl import python_app, bash_app

# A Parsl App that executes Python code
@python_app
def process_data(input_data):
    # ... complex processing ...
    return result

# A Parsl App that executes a shell command
@bash_app
def run_simulation(inputs, outputs):
    return f"my-simulator --in {inputs[0]} --out {outputs[0]}"

When you call a Parsl App, it doesn’t execute immediately. Instead, it returns a Future—an object representing the eventual result of the computation. This is a relatively modern approach in Python and probably the Right Way Tio Do Things.

If you pass the Future returned by one App as an argument to another App, Parsl automatically recognizes the dependency.

# Call the first app
future1 = process_data(initial_value)

# Call the second app, passing the future from the first
# Parsl knows this task depends on the completion of the first
future2 = process_data(future1)

# Execution happens asynchronously. We only block when we ask for the result.
print(future2.result())

Parsl constructs the DAG based on these implicit data flows. When all inputs (Futures) for a task are ready, Parsl schedules the task for execution on the available resources. This pure-Python approach is intuitive, integrates perfectly with modern development tools, and allows for complex, dynamic logic that is impossible in a static DSL.

2 How Cluster Execution Works

The main reason I tolerate workflow managers at all is for their ability to handle the nightmare campus cluster horrors. Parsl has integration with batch schedulers like Slurm, PBS, SGE, and HTCondor, as well as cloud providers and Kubernetes.

In Snakemake, you combine rules (with resource hints) and a separate YAML “profile” to map those hints to scheduler flags.

In Parsl, configuration is handled entirely within the Python script using a Config object. This object defines where and how tasks should run by combining Executors and Providers.

Providers: These handle the interaction with the resource manager (e.g., SlurmProvider). They are responsible for requesting, scaling, and terminating “blocks” of resources (like N nodes on a cluster).
Executors: These manage the execution of the tasks on the resources acquired by the Provider. The HighThroughputExecutor (HTEX) is commonly used for HPC scenarios, efficiently distributing tasks across many workers.

Here is a simple example of a configuration for a Slurm cluster:

from parsl import Config
from parsl.executors import HighThroughputExecutor
from parsl.providers import SlurmProvider
import os

# Get dynamic values from the environment
slurm_account = os.getenv("SLURM_ACCOUNT", "default_account")
partition = os.getenv("SLURM_PARTITION", "standard")

config = Config(
    executors=[
        HighThroughputExecutor(
            label="my_hpc_cluster",
            max_workers_per_node=48,
            provider=SlurmProvider(
                account=slurm_account,
                partition=partition,
                nodes_per_block=10,  # Request 10 nodes per Slurm job
                init_blocks=1,
                max_blocks=5,        # Scale up to 5 blocks (50 nodes total)
                walltime="02:00:00",
                # Optional: Add specific scheduler options
                # scheduler_options="#SBATCH --gpus-per-node=4"
            ),
        )
    ]
)

# Load the configuration before executing the workflow
# import parsl
# parsl.load(config)

When the script runs, Parsl uses the SlurmProvider to submit sbatch jobs (the “blocks”). Once the jobs start, the HighThroughputExecutor connects to the allocated nodes and starts distributing the workflow tasks across them.

3 The Luxury of Python Configuration

In my experience with Snakemake (especially version 8+), configuring cluster execution was frustrating. Snakemake’s YAML-based executor profiles strictly prohibited the use of environment variables.

See the cluster-generic mode in Snakemake? It was flexible but required maintaining shell command boilerplate.

# Snakemake cluster-generic (Flexible, but boilerplate heavy)
cluster-generic-submit-cmd: >-
  sbatch --account=$SLURM_ACCOUNT --time={resources.runtime} ...

Consider also the modern executor plugins? They were supposed to be better, but they kinda sucked because they forced hard-coding.

# Snakemake slurm executor plugin (Rigid and terrible)
executor: slurm
default-resources:
  - slurm_partition=standard # ⚠️ Hard-coded!
  - slurm_account=my_project_account # ⚠️ Hard-coded!

This rigidity forced me to hard-code site-specific details, leading to a proliferation of near-identical config files and terrible portability, and ultimately I found myself generating YAML files programmatically just to get around the limitations.

Parsl’s configuration system is a massive improvement because it is just Python. See the Parsl example above? We can use os.getenv() to dynamically pull the Slurm account or partition. We can use conditional logic, loops, and functions to construct the configuration object. We can easily integrate it with automatic config tools.

This flexibility is crucial for writing portable workflows. We can define a single script that intelligently adapts to the environment it is running in, whether it’s a local machine or different HPC clusters, without the configuration boilerplate nightmare that plagued my Snakemake setup.

4 Local execution

the oldest local executor is the ThreadPoolExecutor. Threads are almost always more trouble than they are worth in Python, IMO. If I wanted to spend time debugging non-deterministic segfaults, I could switch to writing C.

The HighThroughputExecutor seems, however to support multiprocessing backends.

5 Bonus points

It supports various Providers for public cloud backends (AWS, Google Cloud, Azure) and Kubernetes. Because the workflow logic (the Apps and their dependencies) is completely decoupled from the execution configuration (the Config object), running the same analysis on-prem or in the cloud is often just a matter of loading a different configuration object. Although I bet it gets less seamless when I need to manage massive data assets, but I have managed to avoid that thus far.