Figure 1

Python, the universal glue language, famously does numerics too. The experience is… just ok. It’s easier than invoking naked fortran, but not so smooth as running MATLAB.

Tools built on core matrix libraries, numpy and scipy, do what I want mostly. When, for example, I really need some fancy spline type I read about on the internet, or you want someone to have really put in some effort into ensuring such-and-such a recursive filter is stable, you might find you need to implement it yourself. The trade-off is openness versus usability as with many software systems.

We also cannot ignore the fact that python really isn’t doing he heavy lifting. Ultiamte the actual computation is being done by some library which python invoked uner the hood, about which you must eventually learn. numpy, for the example, the current incarnation of the oldest python numerics linear, defers calculations to mostly to the classic Fortran linear algebra libraries. Modern systems make different choices; There are several underlying numerics libraries which can be invoked from Python, as with any language with a decent FFI. For example tensorflow will invoke Eigen. PyArmadillo invokes Armadillo. See also the strange adjacent system of GPU libraries.

A lot of useful machine-learning-type functionality, which I won’t discuss in detail here, exists in the Python deep learning toolkits such as Tensorflow, pytorch and jax; you might want to check those pages too. Also plotting is a whole separate issue.

1 Arrays in Python

For many developers, numpy is the default number wrangler. It is easier than FORTRAN but not actually that easy, because its gradually-evolved design is not that great. In practice,lots of development time goes not on the problem itself, but on debugging what the hell jsut went wrong with the ant-nest of punctuation that you jsut dumped on the page that was supposed to do something fancy to an array.

Cas in point: NumPy’s array broadcasting. Broadcasting describes how numpy treats arrays with different shapes during arithmetic operations, allowing for efficient, C-speed vectorized operations without making explicit copies of data. Sometimes it works and feels like magic. Other times it works, but at the price of a baffling bunch of invocations of confusingly named helper functions. When it fails, it leads to confusing errors or, even worse, incorrect results that fail silently.

My favourite introduction to this is the DYNOMIGHT post “DumPy: NumPy except it’s OK if you’re dum”. The article argues that numpy’s syntax can be powerful, but irritating leading to code that is hard to read and debug. For example, to add a 1-D array to a 2-D array, a developer might need to insert a new axis with np.newaxis or None, like matrix + vector[:, None]. Is this the syntax we woudl have designed for ourselves? The author’s dumpy library proposes solutions that are more explicit, offering functions like dumpy.add_vec_to_cols(matrix, vector) or dumpy.add_vec_to_rows(matrix, vector). This approach makes the code’s intent unambiguous and self-documenting, getting the grug-brained seal of approval. The fact that dumpy’s implementation took only 700 lines of code is some kind of vindication of python, I guess.

I’ll confess that I don’t actually use dumpy myself; instead I try to make most of my numeric computations happen using einstein operations.

2 Pretty printing numbers

Getting the numbers to show up on the screen in a way that doesn’t make my eyes bleed is a constant, low-level chore. The defaults are rarely what I want. This annoys everyone and so there are many confusingly documented tools in the standard library and then many other libraries besides.

2.1 Floats

For simple floats, I rely on Python’s built-in format specifiers. After years of using different string formatting methods, I’ve settled on f-strings.

# Format x with 4 decimal points, aligned to fill 10 columns
print(f"{x:10.4f}")

There are a million options for this, covering alignment, signs, padding, and more. Instead of memorizing them, I just keep a bookmark to pyformat.info, which has every possible combination laid out beautifully. Or, ask ChatGPT.

2.2 Numpy Arrays

The default numpy printout is often too verbose or not verbose enough. I usually run this snippet at the top of my scripts or notebooks to get a more informative display:

import numpy as np
np.set_printoptions(edgeitems=5,
  linewidth=85, precision=4,
  suppress=True, threshold=500)

This setup shows more items at the edges of large arrays, keeps lines from wrapping too early, dials back the decimal precision, suppresses scientific notation (which I find unreadable for most things), and increases the threshold before numpy gives up and just prints a summary.

If I ever need to go back to the factory settings, this is the command: np.set_printoptions(edgeitems=3, infstr='inf', linewidth=75, nanstr='nan', precision=8, suppress=False, threshold=1000, formatter=None)

For cases where I just want to format a single array for a one-off print statement without changing my global settings, I use np.array_str.

2.3 For a Nicer Tensor Life

When working with deep learning frameworks, a raw printout of a tensor is next to useless. Is it a wall of numbers an image, a batch of weights, or a list of tokens? Who knows.

It feels hacky but I do like Lovely Tensors. It answers the questions I actually have about a tensor: What’s its shape? What are its min, max, and mean? Does it contain any nan or inf values? It gives a compact, useful summary instead of a screen full of digits, and it it usually what I want when I’m debbugging an NN in situe on some HPC cluster.

For more thorough, interactive experience in a notebook, I’ve had Treescope recommended It’s a full-blown interactive HTML pretty-printer. It replaces the default display and lets me:

  • Expand and collapse parts of deeply nested objects (like a complex neural network).
  • See faceted visualizations of tensors right in the output, so I can instantly grasp their shape and value distributions.
  • Color-code parts of a model to see shared structures.

It works with JAX, Flax, Equinox, and PyTorch models too.

3 Einstein operations

Lots of array operations end up being easily expressed by shorthand from physics: the Einstein summation convention, or einsum for short if you are writing code. The core idea is simple: if an index variable appears twice in a term, it implies summation over all possible values of that index. So, instead of writing the dot product of two vectors a and b with an explicit sum symbol, like iaibi, you can just write aibi. The repeated index i tells you everything you need to know.

This convention elegantly generalizes to more complex operations like matrix multiplication, transpositions, and tensor contractions. It’s a compact and powerful way to describe operations on multi-dimensional arrays. For a deeper dive, there are some excellent tutorials out there:

3.1 The Classic: numpy.einsum

Universally available and oldest: numpy.einsum. It lets you specify complex operations in a single, dense string. For example, matrix multiplication, which you might write as np.matmul(A, B), can be expressed with einsum like this:

np.einsum(’ik, kj->ij’, A, B)

Here, the string ’ik, kj->ij’ is the recipe. It says: “Take a matrix with dimensions i and k, and another with dimensions k and j. Multiply them and sum over the repeated index k. The resulting matrix will have dimensions i and j.” It looks like it doesn’t necessarily optimise the order of operations super well, and so we pay a performance price.

The same function exists in modern deep learning frameworks, for example, torch.einsum in PyTorch.

3.2 For Speed: opt_einsum

While numpy.einsum is clever, it isn’t always smart about the order in which it performs contractions When you’re chaining many operations, the computational cost can vary wildly depending on the path you take. Enter opt_einsum, a library that finds the most efficient contraction order, and intermediate array to speed up your calculation. It’s a drop-in replacement for numpy.einsum that makes the computer’s life easier without you having to change your code, and it notionally works with nearly any array library (NumPy, PyTorch, JAX, etc.) and thus GPUs.

3.3 For Sanity: einops

einops generalises and redesigns the UX for classic einsum. It was created to make tensor operations, especially the reshaping and shuffling common in deep learning, more intuitive and readable. Instead of just index juggling, einops uses a more descriptive, recipe-like syntax. For example, to rearrange an image batch from (batch, height, width, channel) to (batch, channel, height, width), you don’t need a chain of transpose calls; you just write:

rearrange(images, 'b h w c -> b c h w')

It reads like what it does, which is win for writing and maintaining code. The library’s tutorials, like the one on einops basics, are excellent not just for explaining code, but einstein summation itself.

3.4 The Next Generation: einx

Einx takes the ideas from einops and pushes them further, aiming to create a universal language for tensor operations. It keeps the Einstein-inspired notation but adds new concepts to make it more expressive. A key innovation is the [] bracket notation, which expresses vectorization directly. For example, einx.sum(“a [b]”, x) is a clearer way to say you’re summing over the b axis. It also improves how ellipses () work and makes expressions more composable. The goal is to create a notation that is not only readable but also powerful enough to cover a wider range of operations, all while supporting modern features like JIT-compilation (which may or may not be the same as optimisng the order of operations; I haven’t checked).

4 Optimisation

See optimisation.

5 Compiling

Think you can go faster by compiling some numeric code and invoking it from python? Or compiling some python subset?

Specific CPU and GPU support in Python precompiled binaries is weak and outdated. For maximum performance we should be compiling them.

e.g. for performance or invoking external binaries.

See compiling Python.

5.1 Aesara

I’ve been keeping an eye on Aesara as it’s the official successor to the once-mighty Theano library. Aesara seems to be something like a library for symbolic mathematics, much like Sympy but also integrated with some kind of optimizing compiler for mathematical expressions, especially those involving large, multi-dimensional arrays.

The basic idea is that I can define a complex computation symbolically, and Aesara’s job is to analyze that computation, apply a bunch of optimizations (like constant folding, arithmetic simplification, and using efficient BLAS operations), and then compile it down to fast, low-level code. It can target various backends including C, Numba, and even JAX. This makes it a powerful tool for research where I might be developing novel algorithms and need both the flexibility of symbolic definition and the speed of a compiled implementation, without having to write the C or JAX code myself.

6 HDF5: h5py and PyTables

I use h5py frequently for saving and loading large arrays. It’s a straightforward, Pythonic interface to the HDF5 file format, and it feels like a natural extension of NumPy. My data gets saved in a hierarchical, self-describing format, which is great. The only quirk I’ve run into is some occasional weirdness when passing serialized h5py objects on macOS, which doesn’t happen on Linux.

I’ve also bookmarked PyTables, another HDF5 interface, but I’ve never been entirely clear on its value proposition over h5py. After looking into it, the distinction seems to be that h5py is a direct, NumPy-centric mapping to the HDF5 library, whereas PyTables builds more on top of it. PyTables adds a database-like layer, offering features like indexing and querying for searching within datasets. For my typical use case of just saving and loading entire arrays, h5py feels simpler. But if I had a massive HDF5 file and needed to efficiently query for specific rows or metadata without loading the whole thing, PyTables sounds like itmight add some value?

Much more info about this at data file formats in general, plus also see my hdf-specific notebook.

7 zarr

Zarr is a format for storing chunked, N-dimensional arrays that I’ve bookmarked because it feels like a modern take on HDF5. Its main value proposition seems to be its flexibility and cloud-friendliness. Unlike HDF5, which is a more monolithic format, zarr can store array chunks in lots of different backends—on disk, in a Zip file, or, most interestingly, directly in a cloud object store like Amazon S3.

This design makes it particularly good for parallel and distributed computing. It’s built to allow multiple threads or processes to read and write to the same array concurrently, which is a known pain point with some HDF5 configurations. While HDF5 has broader support in other languages like C++, zarr seems like the more forward-looking choice for pure Python, cloud-based workflows.

8 Dask

Dask is a library I’m aware of for parallel computing in Python. It’s interesting because it tackles the “big data” problem by mimicking the APIs of libraries I already use every day. It provides parallel versions of NumPy arrays and Pandas DataFrames. This means I can, in theory, apply the same code I wrote for in-memory data to datasets that are too large to fit on my machine, and Dask handles the distribution and parallel execution under the hood.

It’s built on two core components: a dynamic task scheduler for coordinating the work, and the “big data” collections themselves. This makes it flexible enough for simple, single-machine parallelism (taking advantage of all my CPU cores) all the way up to large, multi-machine clusters. For some examples of what this looks like in practice, Matthew Rocklin has a helpful how-to guide.

9 xarray

xarray solves a problem that has bitten me many times: keeping track of what the dimensions of my NumPy arrays actually mean. xarray wraps raw NumPy arrays and lets me attach labels to the dimensions (like time, latitude, longitude) and coordinates to the axes, which is useful for statistical analysis.

This makes the code more intuitive, self-documenting, and less prone to bugs. Instead of data.sum (axis=2), I can write data.sum (dim=‘channel’) This labeled approach is borrowed from the netCDF file format, which is a standard in the geosciences. I’ve also seen it popping up more in deep learning workflows, especially for handling complex, multi-dimensional data like satellite imagery or climate simulations, where keeping track of axes by number alone is a recipe for disaster. It seems to be a key tool for bridging the gap between raw numerical data and machine learning frameworks.

10 References

Rogozhnikov. 2022. “Einops: Clear and Reliable Tensor Manipulations with Einstein-Like Notation.”
Smith, and Gray. 2018. Opt_einsum - A Python Package for Optimizing Contraction Order for Einsum-Like Expressions.” Journal of Open Source Software.
van der Walt, Colbert, and Varoquaux. 2011. The NumPy Array: A Structure for Efficient Numerical Computation.” Computing in Science Engineering.