Asynchronous Python

It can’t be premature optimisation if it took 20 years to start

2018-03-24 — 2024-12-06

Wherein asynchronous Python ecosystems are surveyed and uvloop performance alternative is covered, Trio and asyncio are contrasted, and common libraries such as HTTPX, aiohttp and pyzmq are noted.

computers are awful

concurrency hell

python

Not covered: Niceties of asynchrony, when threads run truly concurrently (it’s complicated), when evented poll systems are “truly” asynchronous (never, but it doesn’t matter).

🏗 cover uvloop.

1 baseline asyncio ecosystem

Modern Python async-style stuff.

Raw asyncio is getting civilised these days, might be worth using. But there is a complicated relationship between the various bits. And I no longer need to do this, so my advice might be stale. G’luck.

Sometimes with older code it has been easier to use the event loop from tornado or pyzmq. They are comparatively easy and well-documented.

Here are some links I found helpful for getting asynchronous Python working:

BBC’s tutorial Python Asyncio Part 1 – Basic Concepts and Patterns
AnyIO mostly adapts the Structured concurrency idiom of trio to asyncio:
HTTPX “is a fully featured HTTP client for Python 3, which provides sync and async APIs, and support for both HTTP/1.1 and HTTP/2.”

Seems to aim to be the future version of the popular Python requests library.
aiohttp seems to be the ascendant asynchronous server/client Swiss army knife for HTTP stuff.
sanic is a hip, Python3.8+-only, Flask-like web server. Supports websocket and graphql extensions if you really want it.
pallets/quart seems similar to sanic but an even more flask-like API with a different API.
backoff is a handy Python library for a menial and common task, retrying with a slightly longer delay.
rx exists for Python as rxpy and is tornado compatible.
terminado provides a terminal for tornado, for quick and dirty interaction.

This feels over-engineered to me, but looks easy for some common cases.
0mq itself is attractive because it already uses tornado loops, and can pass numpy arrays without copying.
- Zerorpc is an RPC layer over 0mq.
aiomonitor injects REPL for async Python

2 Trio

trio is what my colleagues seem to use for green-field developments, built upon a design pattern now called Structured concurrency

The Trio project’s goal is to produce a production-quality, permissively licensed, async/await-native I/O library for Python. Like all async libraries, its main purpose is to help you write programs that do multiple things at the same time with parallelized I/O. A web spider that wants to fetch lots of pages in parallel, a web server that needs to juggle lots of downloads and websocket connections at the same time, a process supervisor monitoring multiple subprocesses… that sort of thing. Compared to other libraries, Trio attempts to distinguish itself with an obsessive focus on usability and correctness. Concurrency is complicated; we try to make it easy to get things right.

Trio was built from the ground up to take advantage of the latest Python features, and draws inspiration from many sources, in particular Dave Beazley’s Curio. The resulting design is radically simpler than older competitors like asyncio and Twisted, yet just as capable. Trio is the Python I/O library I always wanted; I find it makes building I/O-oriented programs easier, less error-prone, and just plain more fun. Perhaps you’ll find the same.

The essay that explains why there is a different synchronous ecosystem: Nathaniel J. Smith, Some thoughts on asynchronous API design in a post-async/await world.

3 Alternative asynchronous ecosystems

There is also the ancient and justified Twisted which is a monster, but has a lot of features.

4 Idioms

Check datasette for an example of integrating threading loops (asyncio.get_event_loop().run_in_executor())
Multiprocessing channels-over-sockets are available

5 Threaded asynchrony

Sometimes we need it? But I don’t have much to say, and am not an expert.

For threaded and multi-proc concurrency we sometimes need simple shared variables. Here is, e.g. counters HOWTO.

6 Locking resources

If we are doing parallel stuff, we need locking to avoid two things doing something at the same time that should not be done at the same time. portalocker is a handy tool to lock files and optionally other stuff.