Highly performative computing

In Soviet Russia, job puts YOU in queue



As we call it, riding the pantomime dragon

On getting things done on the Big Computer your supervisor is convinced will solve that Big Problem with Big Data because that was the phrasing used in the previous funding application.

I’m told the experience of HPC on campus is different for, e.g. physicists, who are working on a software stack co-evolved over decades with HPC clusters, and start from different preconceptions about what computers do. For us machine-learning types, though, the usual scenario is as follows (and I have been on-boarded 4 times to different clusters with similar pain points by now):

Our cluster uses some job manager pre-dating many modern trends in cloud computing, including the word cloud applied to computing. Perhaps the cluster is based upon Platform LSF, Torque or slurm. Typically I don’t care about the details of this, since I have a vague suspicion that the entire world of traditional HPC is asymptotically approaching 0% market share as measured by share of jobs and the main question is merely if that approach is fast enough to save me. All I know is that my fancy-pants modern cloud-kubernetes-containerized-apache-spark-hogwild-SGD project leveraging some disreputable stuff you found on github, that will not run here in the manner Google intended. And that if you try to ask for containerization the system responds with the error:

>>> GET OFF MY LAWN

But indifference is no solution: I need to work in this environment now, because my department prefers

  • to shovel labour into the sunk-cost pit at the bottom of which awaits the giant computing cluster they bought a share in 5 years ago, to
  • the horrifying prospect of giving you billing privileges for the cloud.

You read the help documents that your campus IT staff wrote, but they are targeted at people who have never heard of the terminal before, and in their effort hide complexity hide practicality too. Rarely do they let slip few useful keywords to search for in favour of cargo-cult instructions to run some FORTRAN software that I have never heard of and that has suspiciously few stars on github.

So how can we get usable calculation from the these cathedrals of computing? while filling up our time and brain space with the absolute minimum that we possibly can about anything to do with their magnificently baroque, obsolescent architecture which would detract from writing job applications for hip dotcoms?

There is a gradual convergence here, between classic-flavoured campus clusters and trendier self-service clouds. To get a feeling for how this happens, I recommend Simon Yin’s test drive of the “Nimbus” system, which is an Australian research-oriented hybrid-style system.

UPDATE: I am no longer a grad student. I am a postdoc at CSIRO’s Data61, where we have HPC resources in an intermediate state: some containerization, some cloud and much SLURM.

Job management

Slurm, torque, PlatformLSF all implement a similar API providing concurrency guarantees specified by the famous Byzantine committee-designed greasy totem pole priority system. Empirical observation, the IT department for any given cluster often seems reluctant to document which one they are using. Typically a campus cluster will come with some gruff example commands that worked for that guy that time, but little more. Usually that guy that time was running a molecular simulation package written in some language I have never heard of. Presumably this is a combination of the understandable desire not to write documentation, and a kind of availability-through-obscurity demand-management. They are typically less eager to allocate GPUs, slightly confused by all this modern neural network stuff, and downright flabbergasted by containers.

To investigate: Apparently there is a modern programmatic API to some of these schedulers called DRMAA, (Distributed Resource Management Application API), which allows fairly generic job definition and which works on my local cluster.

Poor person’s parallelism

As a penniless grad student, what I needed to do to get value from the cluster, was to schedule as many small single-core jobs as I could so that they would be scheduled into the gaps between all the multicore horrors that the physics folks ran. This is a common problem, but I did not find any generic solutions for it; I simply rolled my own ad hoc ones.

Job submission via the cluster job management system

Running deep-learning-style jobs using classic HPC tools.

submitit is a recent entrant which programmatically submits jobs from inside python. It looks like this:

import submitit

def add(a, b):
    return a + b

# executor is the submission interface (logs are dumped in the folder)
executor = submitit.AutoExecutor(folder="log_test")
# set timeout in min, and partition for running the job
executor.update_parameters(timeout_min=1, slurm_partition="dev")
job = executor.submit(add, 5, 7)  # will compute add(5, 7)
print(job.job_id)  # ID of your job

output = job.result()  # waits for completion and returns output
assert output == 12  # 5 + 7 = 12...  your addition was computed in the cluster

An alternative option for many use cases is test-tube, a “Python library to easily log experiments and parallelize hyperparameter search for neural networks”. AFAICT there is nothing neural-network specific in this and it will happily schedule a whole bunch of useful types of deployment for you, generating the necessary scripts and keeping track of what is going on. This function is not obvious from the front page description of this software library, but see test-tube/SlurmCluster.md. (Thanks for pointing me to this, Chris Jackett.)

See also DRMAA Python, which is a Python wrapper around the DRMAA API.

In Julia there is a rather fancy system JuliaParallel/ClusterManagers.jl which supports most major HPC job managers automatically.

There is also a bare-bones cth/QsubCmds.jl: Run Julia external (shell) commands on a HPC cluster.

Other ones I looked at: Andrea Zonca wrote a script that allows spawning jobs on your legacy HPC monstrosity. After several iterations and improvements it is now called batchspawner.

snakemake which supports make-like build workflows, but on horrible campus clusters. Seem general and powerful but complicated.

hanythingondemand provides a set of scripts to easily set up an ad-hoc Hadoop cluster through PBS jobs.

Request a multicore job from the scheduler and manage that like a mini cluster

Dask.distributed works well here appparently, and will even spawn the Slurm job.

Easily distributing a parallel IPython Notebook on a cluster:

Have you ever asked yourself: “Do I want to spend 2 days adjusting this analysis to run on the cluster and wait 2 days for the jobs to finish or do I just run it locally with no extra work and just wait a week.”

Or: ipython-cluster-helper automates that.

“Quickly and easily parallelize Python functions using IPython on a cluster, supporting multiple schedulers. Optimizes IPython defaults to handle larger clusters and simultaneous processes.” […]

ipython-cluster-helper creates a throwaway parallel IPython profile, launches a cluster and returns a view. On program exit it shuts the cluster down and deletes the throwaway profile.

works on Platform LSF, Sun Grid Engine, Torque, SLURM. Strictly python.

Luxury parallelism with pipelines and coordination

More modern tools facilitate very sophisticated workflows with execution graphs and pipelines and such. One that was briefly pitched to us that I did not ultimately use: nextflow

Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.

Its fluent DSL simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.

Nextflow supports Docker and Singularity containers technology.

This, along with the integration of the GitHub code sharing platform, allows you to write self-contained pipelines, manage versions and to rapidly reproduce any former configuration.

It provides out of the box executors for SGE, LSF, SLURM, PBS and HTCondor batch schedulers and for Kubernetes, Amazon AWS and Google Cloud platforms.

I think that we are actually going to be given D2iQ Kaptain: End-to-End Machine Learning Platform instead? TBC.

Software dependency management

With high likelihood, the campus cluster is running some ancient decrepit edition of RHEL with too elderly a version of anything to run anything you need. There will be some weird semi-recent version of python that you can activate but in practice even that will be some awkward version that does not mesh with the latest code that you are using (you are doing research after all, right?). You will need to install everything you use. That is fine, but be aware it’s a little different than provisioning virtual machines on your new-fangled fancy cloud thingy. This is to teach you important and hugely useful lessons for later in life, such as which compile flags to set to get the matrix kernel library version 8.7.3patch3 to compile with python 2.5.4rc3 for the itanium architecture as at May 8, 2009 Why, think on how many times you will use that skill after you leave your current job! (We call such contemplation void meditation and it is a powerful mindfulness technique.)

There are lots of package managers. I suspect most of them work on HPC machines. I use homebrew and then additional python environments on top of that.

For HPC specifically, I have also had recommended spack, which also lets you prototype on macOS.

Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. With Spack, you can build a package with multiple versions, configurations, platforms, and compilers, and all of these builds can coexist on the same machine.

Spack isn’t tied to a particular language; you can build a software stack in Python or R, link to libraries written in C, C++, or Fortran, and easily swap compilers. Use Spack to install in your home directory, to manage shared installations and modules on a cluster, or to build combinatorial versions of software for testing.

Deployment management

DIY cluster. Normally HPC is based requesting jobs via slurm and some shared file system. For my end of the budget this also means requesting breaking up the job into small chunks and squeezing them in around the edges of people with a proper CPU allocation. However! some people are blessed with the ability to request to simultaneously control a predictable number of machines. For these, you can roll your own deploy of execution across some machines, which might be useful. This might work you have a bunch of unmanaged machines not on the campus cluster, which I have personally never experienced.

If you have a containerized deployment, as with all the cloud providers these days, see perhaps containerized deployment solution, singularity, if you are blessed with admins who support it.

For traditional clustershell:

ClusterShell is an event-driven open source Python library, designed to run local or distant commands in parallel on server farms or on large Linux clusters. No need to reinvent the wheel: you can use ClusterShell as a building block to create cluster aware administration scripts and system applications in Python. It will take care of common issues encountered on HPC clusters, such as operating on groups of nodes, running distributed commands using optimized execution algorithms, as well as gathering results and merging identical outputs, or retrieving return codes. ClusterShell takes advantage of existing remote shell facilities already installed on your systems, like SSH.

ClusterShell’s primary goal is to improve the administration of high-performance clusters by providing a lightweight but scalable Python API for developers. It also provides clush, clubak and nodeset/cluset, convenient command-line tools that allow traditional shell scripts to benefit from some of the library features.

A little rawer: pdsh:

pdsh is a variant of the rsh(1) command. Unlike rsh(1), which runs commands on a single remote host, pdsh can run multiple remote commands in parallel. pdsh uses a “sliding window” (or fanout) of threads to conserve resources on the initiating host while allowing some connections to time out.

For example, the following would duplicate using the ssh module to run hostname(1) across the hosts foo[0-10]:

pdsh -R exec -w foo[0-10] ssh -x -l %u %h hostname

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.