Python cluster computing

Parallel computing, wherein a head process spawns workers executing some python function

Various cluster options.



[…] is a framework for heterogenous computing. It primarily provides the communication mechanisms for configuring and launching parallel computations across heterogenous resources. Pathos provides stagers and launchers for parallel and distributed computing, where each launcher contains the syntactic logic to configure and launch jobs in an execution environment. Some examples of included launchers are: a queue-less MPI-based launcher, a ssh-based launcher, and a multiprocessing launcher. Pathos also provides a map-reduce algorithm for each of the available launchers, thus greatly lowering the barrier for users to extend their code to parallel and distributed resources. Pathos provides the ability to interact with batch schedulers and queuing systems, thus allowing large computations to be easily launched on high-performance computing resources.

Integrates well with your jupyter notebook which is the main thing, but much like jupyter notebooks themselves, you are on your own when it comes to reproducibility and might want to use it in concert with one of the other solutions here to achieve that.


dask seems to parallelize certain python tasks well and claims to scale up elastically. It’s purely for python.


dispy (HT cpill) seems to be a python solution if I have a mess of machines lying around to borrow.

dispy is a comprehensive, yet easy to use framework for creating and using compute clusters to execute computations in parallel across multiple processors in a single machine (SMP), among many machines in a cluster, grid or cloud. dispy is well suited for data parallel (SIMD) paradigm where a computation (Python function or standalone program) is evaluated with different (large) datasets independently with no communication among computation tasks (except for computation tasks sending Provisional/Intermediate Results or Transferring Files to the client

ipython native

Ipython spawning overview. ipyparallel is the built-in jupyter option with less pluggability but much ease.


joblib is a simple python scientific computing library with basis mapreduce and some nice caching that integrate well. Not fancy, but super easy, which is what an academic usually wants, since fancy would imply we have a personnel budget.

>>> from math import sqrt
>>> from joblib import Parallel, delayed
>>> Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

dask.distributed is a similar project which expands slightly on joblib to handle networked computer clusters and also does load management even without a cluster. In fact it integrates with joblib.


has special needs. See pytorch dstributed.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.