Python

Syntactic saccharine for compiled code

April 18, 2011 — December 17, 2024

computers are awful
python

Assumed audience:

Active but not deep users of python

Figure 1: Guido van Rossum and Python, by Sir Frederick Leighton.

A Swiss army knife coding tool. Good matrix library, general scientific tools, statistics tools, web servers, art tools, and interoperation with many other tools in C, C++, FORTRAN etc. That is, all the other fruits of a thriving diverse community with a common trade language. Most especially, it includes many variants of all those things mentioned above.

Python is most useful as a lingua franca. It is a true lingua franca, in that

  1. It allows intercommunication of many other languages, and
  2. it is a Frankish creation.

Fast enough, easy to debug, garbage-collected. If some bit is too slow, you call into a compiled language; otherwise, you relax. A good default if you’d rather get stuff done than write code.

Not because it is easy to do stuff in Python as such — meh, it’s fine I guess, state-of-the-art for 1990s open source — but because the chances are good that someone else has already done most of the stuff you need and released it as a library.

I typically do my stats and graphs in R. My user interface is javascript. My linear algebra library is Fortran, and I use julia for my other scientific calculations. Python is the thread that stitches this Frankensteinisch monster together.

Of course, it could be better. Clojure is more elegant, scala is easier to parallelize, julia prioritises science more highly, MATLAB is what your academic advisor is obsessed with because they did their thesis before 2010.

In terms of using a well-supported satisfactory and unexciting language that goes on my computer right now, and requires me to reinvent few wheels, and which is transferable across number crunching, web development, UIs, text processing, graphics and sundry other domains, and does not require heavy licensing costs… this is an adequate one. See Hillel Wayne, The hard part of learning a language that lists all the main friction points that various languages have, and which I read as a low-key endorsement of the acceptable mediocrity of this particular language. Maybe that mediocrity is why the tooling is rich. If a language is too powerful, maybe its community is not incentivised work together on getting it to go.

What a pitch! Wow! Python for the marginal, begrudging win! Now, let’s look at the horrid things that are wrong with it.

1 Debugging, profiling and testing

See Python debugging, profiling and testing.

2 Linting and formatting

There are many linters. Here are some I have used.

  • ruff is my favourite. The defaults are good and it is fast. Downside: one needs to install an additional binary to use it. However, it pip installs easily.
  • Pylint whose defaults I like, but it is slow.
  • flake8 which is too alarmist per default (CRITICAL RED ALERT: SPLIT INFINITIVE) but is adequately fast.
Figure 2: Python linter speeds according to ruff

I briefly experimented with writing custom rules for my linter, but decided that this is something that only makes sense at a corporate scale, and otherwise is yak shaving in the derogatory sense.

3 Pyodide

Python from the browser, including mathematical libraries.

HT Emma Krantz for showing me this while solving my web interactive maths problems for me.

4 Idiom

Python people set great value on mastering python idiom. For example, python idiom for accessing a nullable member is a and a.x. I never remember this and always write getattr(a, 'x', None). Python idiom for ‘python person’ is pythonista. Python idiom for ‘idiomatic’ is pythonic. Much has been written on how to be pythonic.

I think learning can benefit also from negative examples. Here, may I proffer to you John Burkardt’s Hypershere integrals? That the least idiomatic python code I have seen. It is a line-for-line translation of C code into slow, verbose python. If you are going to do that, write C code.

5 ipython

The interactive python upgrade The python-specific part of jupyter, which can also run without jupyter. Long story. But think of it as a REPL, a CLI-style execution environment, that is a little friendlier and has colourisation and autocomplete and such. If sounds useful, read on

6 Pro tip: store config in .env

There are many, many python tools to load environment variables from local files. See .env..

7 Pretty printing

For a non-graphical, non-fancy terminal, the most common useful rich display probably simply wants nice formatting of big data structures.

from pprint import pprint, pformat
# display it
pprint(obj)
# get a formatted representation
pretty = pformat(obj)

8 String formatting/interpolation

8.1 Non obvious hacks

What a nightmare is that manual for the string formatting. While all the information I need is in there, it is arranged in perverse inversion of a comprehensible order as measured by the frequency and the priority with which I use it. See instead Marcus Kazmierczak’s cookbook.

Highlights:

## float precision
>>> print("{:.2f}".format(3.1415926))
3.14
## left padding
>>> print("{:0>2d}".format(5))
05

## power tip which the manual does not make clear:
## variable formatting of variable formatting
>>> pi = 3.1415926
>>> precision = 4
>>> print( "{:.{}f}".format( pi, precision ) )
3.1415

8.2 f-strings

f-strings make things somewhat easier for python 3.6+ because they don’t need to mess around with naming things for the .format(foo=foo) call:

>>> name = "Fred"
>>> f"He said his name is {name!r}."
"He said his name is 'Fred'."

8.3 Timestamps

Why is a now timestamp in UTC not the first line in every academic research workbook/paper/data analysis? Because it’s tedious to look up the different bits.

Here you go:

from datetime import datetime
datetime.utcnow().isoformat(timespec='seconds')

Or if we want less punctuation in there:

datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))

8.4 Rendering HTML output

You have a quick and dirty chunk o’ HTML you need to output? You aren’t writing some damnable webapp with a nested hierarchy of template rendering and CSS integration into some design framework? You just want to pump out some markup?

I recommend yattag which is fast, simple, good and has a 1-page manual. It works real good.

8.5 Rendering markdown as HTML

from markdown import markdown
html_string = markdown("""
# Title

Paragraph. *emphasis*.
""")

9 Compiling and FFI

See Accelerating python.

10 Packaging and environments

See python packaging and environments.

11 Asynchrony in python

See asynchronous python.

12 functional-style programming

The style functional programming is often conceptually useful, and it can sometimes even be made performant in python. It rapidly gets clunky because the type system is bad.

toolz and the accelerated version cytoolz provide many useful functional abstractions which look fairly natural

13 IDEs

I meant to list some, but in fact I only ever use VS Codium so that is the only current advice I would be able to give.

PyCharm also looks very good but I am hopelessly out of date on it

14 Search path stunts

add something to the path (e.g. to add some binary to the path which I forgot to include on the HPC)

import os

# Add a directory to the PATH
new_path = "/path/to/your/programs"
os.environ["PATH"] += os.pathsep + new_path

# Verify the updated PATH
print(os.environ["PATH"])

Capture a path update from a shell script

import subprocess
import os

# Run the bash script and capture the new PATH
result = subprocess.run("source /path/to/modify_path_script.sh && echo $PATH",
                        shell=True, executable='/bin/bash', text=True, capture_output=True)

# Update the Python environment with the modified PATH
new_path = result.stdout.strip()
os.environ["PATH"] = new_path

# Verify the updated PATH
print(os.environ["PATH"])

15 Typing

Python 3.6+ includes type hinting, and projects such as mypy support static analysis using type hints.1 There were not many tutorials on the details of this at time of writing. Here’s one.

Short version: you go from this:

def fib(n):
  a, b = 0, 1
  while a < n:
    yield a
    a, b = b, a+b

to this:

def fib(n: int) -> Iterator[int]:
  a, b = 0, 1
  while a < n:
    yield a
    a, b = b, a+b

A version with actual corporate backing is pyright, which uses type checking to provide usability improvements in VS Codium. Might be useful.

15.1 Multiple dispatch

Would it not be great if python supported multiple dispatch like julia?

Here are some hacks to shoehorn multiple dispatch into Python that looks too hairy to actually use:

16 Command line parsers

I am furious at how many options there are to build python CLI scripts. This is the new contender for most bike-shedded function now that the python web server world has calmed down a bit. AFAICT if you write some python software and don’t write a new command-line parsing library then you will be regarded as somehow feckless. If you are indeed so dissolute as to not make your own quirky artisanal command-line parser, you could try the listing at python CLIs.

17 Exceptions

TIL about raise from. see Python’s raise: Effectively Raising Exceptions in Your Code – Real Python

18 Miscellaneous stuff I always need to look up

19 Progress meters

tqdm seems to be a de facto standard.

Basic version with a console progress bar:

pip install tqdm

Use it:

for x in tqdm(my_list):
    pbar.update(1)

Fancy integrated HTML display is available in jupyter. Install it:

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension
jupyter labextension install @jupyter-widgets/jupyterlab-manager

Use it:

%%capture
from tqdm import tqdm_notebook as tqdm
tqdm().pandas()

with tqdm(total=len(my_list)) as pbar:
    for x in my_list:
        pbar.update(1)

20 Images

Loading and handling images is balkanised, janky and tedious for complicated historical reasons.

These days I use Willow image library, which is a meta-library which mediates between many squabbling backend libraries. It will load many challenging formats such as SVG, HEIF and AVIF, as well as animated GIFs etc.

pip install pillow-heif Willow
# or
pip install Willow[heif]

Previously I have used

  • Wand (which I think is an ImageMagick wrapper?)
  • Pillow

also handy if using Pillow:

pip install pillow-avif-plugin

We also need to import it:

from PIL import Image
import pillow_avif
...

Now that I know about libvips I would probably use that instead next time. pyvips seems to be pretty good.

21 ExitStack

If you are acquiring several context managers, there is a modern handy idiom called ExitStack. For an explanation see On the Beauty of Python’s ExitStack by Nikolaus Rath.

with ExitStack() as cm:
    res1 = acquire_resource_one()
    cm.callback(release_resource, res1)
    # do stuff with res1
    res2 = acquire_resource_two()
    cm.callback(release_resource, res2)
    # do stuff with res1 and res2

22 Everything is files

  • fsspec: Filesystem interfaces for Python

    Filesystem Spec (fsspec) is a project to provide a unified pythonic interface to local, remote and embedded file systems and bytes storage.

    There are many places to store bytes, from in memory, to the local disk, cluster distributed storage, to the cloud. Many files also contain internal mappings of names to bytes, maybe in a hierarchical directory-oriented tree. Working with all these different storage media, and their associated libraries, is a pain. fsspec exists to provide a familiar API that will work the same whatever the storage backend. As much as possible, we iron out the quirks specific to each implementation, so you need do no more than provide credentials for each service you access (if needed) and thereafter not have to worry about the implementation again.

    fsspec provides two main concepts: a set of filesystem classes with uniform APIs (i.e., functions such as cp, rm, cat, mkdir, …) supplying operations on a range of storage systems; and top-level convenience functions like fsspec.open(), to allow you to quickly get from a URL to a file-like object that you can use with a third-party library or your own code.

  • andrewcb/cope: Continuous Processing Engine, a small Python framework for turning input files into output files on an ongoing basis

22.1 Watching files for changes

Does this inotify solution work for non-Linux? Because macOS uses FSEvents and Windows uses I-don’t-even-know.

watchdog asserts that it is cross-platform (source).

emcrisostomo/fswatch claims to be a universal file watcher.. There is a python frontend.

23 Misc recommendations

  • Let’s get weird: Hy is a lisp that compiles to python syntax trees.

More Itertools: A bunch of essential (IMO) extra iterators for python, essential for streaming work.

24 Piping functions

Everyone wants shell-style function piping to work in python, and so there are many implementations, e.g. here are two recent ones.

How is it supposed to interact with generators? Many questions. Will ignore for now.

25 Python 2 vs 3

New to python 3? These days that makes you very old. Here are some notes for people porting legacy code, or who have been frozen in a glacier for the last decade.

Sebastian Raschka, The key differences between Python 2.7.x and Python 3.x with examples.

25.1 Neat python 3 stuff

Alex Rogozhnikov, Python 3 with pleasure highlights some tricks which landed in the python 3 lineage. There are a lot of those.

Useful for me is a friendlier python struct-like thing, the data class Geir Arne Hjelle explains.

Also nice, we can override module accessors.

Asynchrony is less awful (although still fairly awful).

25.2 Python 2 and 3 compatibility

TL;DR: I am no employee of giant enterprise-type business with a gigantic legacy code base, and so I don’t use python 2. My code is not python 2 compatible. Python 3 is more productive, and no-one is paying me to be less productive right now. Python 2 code is usually easy to port to python 3. It is possible to write code which is compatible with python 2 and 3, but then I would miss out on some of the work that has gone into making python 3 easier and better, and waste time porting elegant easy python 3 things to hard boring python 2 things.

🏗 six versus future.

26 Incoming

Footnotes

  1. If I am going to that much trouble, I would possibly prefer to be using julia, which takes type-hinting further, by actually using it to JIT-compile optimised code, and do smart dispatch.↩︎