Numerical Python

2015-04-27 — 2025-07-22

Wherein Python’s numerical ecosystem is surveyed, and NumPy is noted to delegate heavy linear algebra to classic Fortran BLAS/LAPACK while tools for printing, tensors, and einsum variants are outlined.

compsci

computers are awful

premature optimization

python

Python, the universal glue language, famously does numerics too. The experience is… just okay. It’s easier than invoking naked Fortran, but not so smooth as running MATLAB.

Tools built on core matrix libraries, numpy and scipy, do what I want mostly. When, for example, I really need some fancy spline type I read about on the internet, or we want someone to have really put in some effort into ensuring such-and-such a recursive filter is stable, we might find we need to implement it ourselves. The trade-off is openness versus usability, as with many software systems.

We also cannot ignore the fact that Python really isn’t doing the heavy lifting. Ultimately, some library that Python invokes under the hood does the actual computation, and we eventually have to learn about it. numpy, for example, as the current incarnation of the oldest Python numerics library, defers calculations mostly to the classic Fortran linear algebra libraries. Modern systems make different choices; there are several underlying numerics libraries that can be invoked from Python, as with any language with a decent FFI. For example, tensorflow will invoke Eigen. PyArmadillo invokes Armadillo. See also the strange, adjacent system of GPU libraries.

A lot of useful machine-learning-type functionality, which I won’t discuss in detail here, exists in the Python deep learning toolkits such as Tensorflow, pytorch and jax; you might want to check those pages too. Also plotting is a whole separate issue.

1 Arrays in Python

For many developers, numpy is the default number wrangler. It’s easier than FORTRAN but not actually that easy, because its gradually evolved design isn’t great. In practice, a lot of development time doesn’t go on the problem itself, but on debugging what the hell just went wrong with the ant-nest of punctuation that we just dumped on the page that was supposed to do something fancy to an array.

Case in point: NumPy’s array broadcasting. Broadcasting describes how numpy treats arrays with different shapes during arithmetic operations, allowing for efficient, C-speed vectorized operations without making explicit copies of data. Sometimes it works and feels like magic. Other times it works, but at the price of a baffling bunch of invocations of confusingly named helper functions. When it fails, it leads to confusing errors or, even worse, incorrect results that fail silently.

My favourite introduction to this is the DYNOMIGHT post “DumPy: NumPy except it’s OK if you’re dum”. The article argues that numpy’s syntax can be powerful but irritating, leading to code that’s hard to read and debug. For example, to add a 1-D array to a 2-D array, we might need to insert a new axis with np.newaxis or None, like matrix + vector[:, None]. Is this the syntax we would have designed for ourselves? The author’s dumpy library proposes solutions that are more explicit, offering functions like dumpy.add_vec_to_cols (matrix, vector) or dumpy.add_vec_to_rows (matrix, vector). This approach makes the code’s intent unambiguous and self-documenting, getting the grug-brained seal of approval. The fact that dumpy’s implementation took only 700 lines of code is some kind of vindication of Python, I guess.

I’ll confess that I don’t actually use dumpy myself; instead I try to make most of my numeric computations happen using einstein operations.

2 Pretty printing numbers

Getting the numbers to show up on the screen in a way that doesn’t make my eyes bleed is a constant, low-level chore. The defaults are rarely what I want. This annoys everyone, so there are many confusingly documented tools in the standard library and many other libraries besides.

2.1 Floats

For simple floats, I rely on Python’s built-in format specifiers. After years of using different string formatting methods, I’ve settled on f-strings.

# Format x with 4 decimal points, aligned to fill 10 columns
print(f"{x:10.4f}")

There are a million options for this, covering alignment, signs, padding, and more. Instead of memorizing them, I just keep a bookmark to pyformat.info, which has every possible combination laid out beautifully. Or we can ask ChatGPT.

2.2 Numpy Arrays

The default numpy printout is somehow both too verbose and not verbose enough. By default, it shows a tiny fragment of the array while taking up the whole terminal. I usually run this snippet at the top of my scripts or notebooks to get a more informative display:

import numpy as np
np.set_printoptions(edgeitems=5,
  linewidth=85, precision=4,
  suppress=True, threshold=500)

This setup shows more items at the edges of large arrays, keeps lines from wrapping too early, dials back the decimal precision, suppresses scientific notation (which I find useless for viewing arrays where I want a quick visual cue for how “big” a number is), and tweaks the threshold before numpy gives up and just prints a summary.

If I ever need to go back to the factory settings, this is the command: np.set_printoptions(edgeitems=3, infstr='inf', linewidth=75, nanstr='nan', precision=8, suppress=False, threshold=1000, formatter=None)

For cases where I just want to format a single array for a one-off print statement without changing my global settings, I use np.array_str.

2.3 Big Tensors

When working with deep learning frameworks, a raw printout of a tensor is next to useless. Is this big wall of numbers an image, a batch of weights, or a list of tokens? Who knows? Human eyeballs were not designed for this.

I tend to use Lovely Tensors. It answers the questions I actually have about a tensor: What’s its shape? What are its min, max, and mean? Does it contain any nan or inf values? It produces several flavours of compact, useful summaries instead of a screen full of digits, and it’s usually what I want when I’m debugging an NN in situ on some HPC cluster. It also has a thrilling transgressive monkey-patching mode which changes how all tensors are displayed.

import lovely_tensors as lt
lt.monkey_patch()

For a more thorough, interactive notebook experience, I’ve had Treescope recommended. It’s a full-blown interactive HTML pretty-printer. It replaces the default display and lets us:

Expand and collapse parts of deeply nested objects (like a complex neural network).
See faceted visualizations of tensors right in the output, so we can instantly grasp their shape and value distributions.
Color-code parts of a model to see shared structures.

It works with JAX, Flax, Equinox, and PyTorch models too. However, I’m not using notebooks on the server much at the moment, so I can’t claim to have battle-tested it.

3 Einstein operations

Lots of array operations are easily expressed using shorthand from physics: the Einstein summation convention, or einsum for short if we’re writing code. The core idea is simple: if an index variable appears twice in a term, it implies summation over all possible values of that index. So, instead of writing the dot product of two vectors a and b with an explicit sum symbol, like \(\sum_i a_i b_i\), we can just write \(a_i b_i\). The repeated index i tells us everything we need to know.

This convention elegantly generalizes to more complex operations like matrix multiplication, transpositions, and tensor contractions. It’s a compact and powerful way to describe operations on multi-dimensional arrays. For a deeper dive, there are some excellent tutorials out there:

Einsum is All You Need by Tim Rocktäschel.
einsum - an underestimated function by Chris Lemke.
Understanding einsum for Deep learning which uses it to build a transformer from scratch.
The einops manual illustrates everything with pictures, which is what made it click for me.

3.1 The Classic: `numpy.einsum`

Universally available and the oldest: numpy.einsum. It lets us specify complex operations in a single, dense string. For example, matrix multiplication, which we might write as np.matmul(A, B), can be expressed with einsum like this:

np.einsum(’ik, kj->ij’, A, B)

Here, the string ’ik, kj->ij’ is the recipe. It says: “Take a matrix with dimensions i and k, and another with dimensions k and j. Multiply them and sum over the repeated index k. The resulting matrix will have dimensions i and j.” numpy.einsum doesn’t necessarily optimize the order of operations, so we pay a performance price.

The same function exists in modern deep learning frameworks, for example, torch.einsum in PyTorch.

3.2 For Speed: `opt_einsum`

While numpy.einsum is clever, it isn’t always smart about the order in which it performs contractions. When we’re chaining many operations, the computational cost can vary wildly depending on the path we take. Enter opt_einsum, a library that finds the most efficient contraction order and the intermediate arrays to speed up our calculations. It’s a drop-in replacement for numpy.einsum that makes the computer’s life easier without us having to change our code, and it works with nearly any array library (NumPy, PyTorch, JAX, etc.), so it can run on GPUs.

3.3 For Sanity: `einops`

einops generalizes and redesigns the UX for classic einsum. It was created to make tensor operations, especially the reshaping and shuffling common in deep learning, more intuitive and readable. Instead of just index juggling, einops uses a more descriptive, recipe-like syntax. For example, to rearrange an image batch from (batch, height, width, channel) to (batch, channel, height, width), we don’t need a chain of transpose calls; we just write:

rearrange(images, 'b h w c -> b c h w')

It reads like what it does, which is a win for writing and maintaining code. The library’s tutorials, like the one on einops basics, are excellent not just for explaining code, but for Einstein summation itself.

3.4 The Next Generation: `einx`

Einx takes the ideas from einops and pushes them further, aiming to create a universal language for tensor operations. It keeps the Einstein-inspired notation but adds new concepts to make it more expressive. A key innovation is the [] bracket notation, which expresses vectorization directly. For example, einx.sum(“a [b]”, x) is a clearer way to say we’re summing over the b axis. It also improves how ellipses (…) work and makes expressions more composable. The goal is to create a notation that is not only readable but also powerful enough to cover a wider range of operations, all while supporting modern features like JIT-compilation (which may or may not be the same as optimizing the order of operations; I haven’t checked).

4 Optimization

See optimization.

5 Compiling

Think we can go faster by compiling some numeric code and invoking it from Python? Or compiling some Python subset?

Specific CPU and GPU support in Python precompiled binaries is weak and outdated. For maximum performance we should compile them.

e.g. for performance or invoking external binaries.

See compiling Python.

5.1 Aesara

I’ve been keeping an eye on Aesara as it’s the official successor to the once-mighty Theano library. Aesara seems to be a library for symbolic mathematics, much like Sympy, but also integrated with an optimizing compiler for mathematical expressions, especially those involving large, multi-dimensional arrays.

The basic idea is that I can define a complex computation symbolically, and Aesara’s job is to analyze that computation, apply a bunch of optimizations (like constant folding, arithmetic simplification, and using efficient BLAS operations), and then compile it down to fast, low-level code. It can target various backends including C, Numba, and even JAX. This makes it a powerful tool for research where I might be developing novel algorithms and need both the flexibility of symbolic definition and the speed of a compiled implementation, without having to write the C or JAX code myself.

6 HDF5: `h5py` and `PyTables`

I use h5py frequently for saving and loading large arrays. It’s a straightforward, Pythonic interface to the HDF5 file format, and it feels like a natural extension of NumPy. My data gets saved in a hierarchical, self-describing format, which is great. The only quirk I’ve run into is some occasional weirdness when passing serialized h5py objects on macOS, which doesn’t happen on Linux.

I’ve also bookmarked PyTables, another HDF5 interface, but I’ve never been entirely clear on its value proposition over h5py. After looking into it, the distinction seems to be that h5py is a direct, NumPy-centric mapping to the HDF5 library, whereas PyTables builds more on top of it. PyTables adds a database-like layer, offering features like indexing and querying for searching within datasets. For my typical use case of just saving and loading entire arrays, h5py feels simpler. But if I had a massive HDF5 file and needed to efficiently query for specific rows or metadata without loading the whole thing, PyTables sounds like it might add some value?

Much more info about this at data file formats in general; also see my hdf-specific notebook.

7 `zarr`

Zarr is a format for storing chunked, N-dimensional arrays that I’ve bookmarked because it feels like a modern take on HDF5. Its main value proposition seems to be its flexibility and cloud-friendliness. Unlike HDF5, which is a more monolithic format, zarr can store array chunks in lots of different backends—on disk, in a Zip file, or, most interestingly, directly in a cloud object store like Amazon S3.

This design makes it particularly good for parallel and distributed computing. It’s built to allow multiple threads or processes to read and write to the same array concurrently, which is a known pain point with some HDF5 configurations. While HDF5 has broader support in other languages like C++, zarr seems like the more forward-looking choice for pure Python, cloud-based workflows.

8 Dask

Dask is a library I’m familiar with for parallel computing in Python. It’s interesting because it tackles the “big data” problem by mimicking the APIs of libraries I already use every day. It provides parallel versions of NumPy arrays and Pandas DataFrames. This means I can, in theory, apply the same code I wrote for in-memory data to datasets that are too large to fit on my machine, and Dask handles the distribution and parallel execution under the hood.

It’s built on two core components: a dynamic task scheduler for coordinating the work, and the “big data” collections themselves. This makes it flexible enough for simple, single-machine parallelism (taking advantage of all my CPU cores) all the way up to large, multi-machine clusters. For some examples of what this looks like in practice, Matthew Rocklin has a helpful how-to guide.

9 `xarray`

xarray solves a problem that has bitten me many times: keeping track of what the dimensions of my NumPy arrays actually mean. xarray wraps raw NumPy arrays and lets me attach labels to the dimensions (like time, latitude, longitude) and coordinates to the axes, which is useful for statistical analysis.

This makes the code more intuitive, self-documenting, and less prone to bugs. Instead of data.sum (axis=2), I can write data.sum (dim=‘channel’) This labeled approach is borrowed from the netCDF file format, which is a standard in the geosciences. I’ve also seen it popping up more in deep learning workflows, especially for handling complex, multi-dimensional data like satellite imagery or climate simulations, where keeping track of axes by number alone is a recipe for disaster. It seems to be a key tool for bridging the gap between raw numerical data and machine learning frameworks.

10 References

Rogozhnikov. 2022. “Einops: Clear and Reliable Tensor Manipulations with Einstein-Like Notation.”

Smith, and Gray. 2018. “Opt_einsum - A Python Package for Optimizing Contraction Order for Einsum-Like Expressions.” Journal of Open Source Software.

van der Walt, Colbert, and Varoquaux. 2011. “The NumPy Array: A Structure for Efficient Numerical Computation.” Computing in Science Engineering.

1 Arrays in Python

2 Pretty printing numbers

2.1 Floats

2.2 Numpy Arrays

2.3 Big Tensors

3 Einstein operations

3.1 The Classic: numpy.einsum

3.2 For Speed: opt_einsum

3.3 For Sanity: einops

3.4 The Next Generation: einx

4 Optimization

5 Compiling

5.1 Aesara

6 HDF5: h5py and PyTables

7 zarr

8 Dask

9 xarray

10 References

3.1 The Classic: `numpy.einsum`

3.2 For Speed: `opt_einsum`

3.3 For Sanity: `einops`

3.4 The Next Generation: `einx`

6 HDF5: `h5py` and `PyTables`

7 `zarr`

9 `xarray`