Python packaging, environment and dependency management

“import antigravity” raises an exception if you want CUDA antigravity

2011-04-18 — 2025-03-10

computers are awful

python

standards

Suspiciously similar content

This week’s recommended Python packaging system is

~~conda~~ ~~pip+venv~~ ~~poetry~~ uv.

How do I install the right versions of everything for some Python code I’m developing? How do I deploy it reliably? How do I share it with others? How do I minimise the tedious manual labour of doing the above?

As with many other vibrant, dynamic languages, maintaining consistent project dependencies in Python is a complicated mess. See also software package managers, software dependency managers and intractable standards battles…

Python is especially terrible for this—IMO, worse even than the notoriously bad node.js. While node.js has been criticised for settling on a bad package manager, Python settled on several bad package managers. It is confusing, chaotic, inconvenient and dangerous. In my experience, this fact is the single biggest barrier to entry for new Python users, both technically and socially, because all the systems are plagued by long-running arguments, some of which are highly personal. To use a language, you should not have to develop opinions about so many community disputes. Here is one angry overview of the situation.

Although Python is one of the world’s most popular programming languages, there are no easy answers to basic questions about how to install it, and sometimes even the hard answers are insane.

Harm reduction tips follow.

1 Things we can safely ignore

In the before-times, many Python packaging standards existed. AFAICT, unless migrating extremely old code or performing digital archaeology, I should ignore everything about these.

The following are all deprecated or irrelevant: distutils, easy_install, virtualenv (superseded by venv)… Any HOWTOs that include them are probably not going to be useful.

Are you considering rye? Rye is kinda merging with uv so maybe you should consider uv instead.

Next decision point: Are you doing machine learning on the GPU? If so, go to the GPU quagmire section, because that is some stuff you cannot ignore.

If you are CUDA-free, some of the more horrible complexities can be ignored. If I were you, o blessed innocent, I would simply use uv if I want to be modern. I should be modern, indeed, because tradition has even less to recommend it than the awkward and confusing modernity.

That said, I would not mindlessly follow the latest fad because that strategy is dangerous too. If I took the latest fad every time I started a new project, I would have migrated between conda, flit, poetry, uv, mamba, pipenv, etc and would not have had any time to write code. If something is new, consider ignoring it for a year or two.

2 `pip`-like Systems

pip is the default Python package installer.

See pip-like python packaging.

3 Conda-like systems

A parallel system to pip, which is a generalisation and kinda-sorta compatible. It replaces many build difficulties with portability and intellectual-property difficulties. See conda-like python packaging.

4 Build systems

pyproject.toml now includes build system declarations. Python Packaging User Guide explains this:

Tools like pip and build do not actually convert your sources into a distribution package (like a wheel); that job is performed by a build backend. The build backend determines how your project will specify its configuration, including metadata (information about the project, for example, the name and tags that are displayed on PyPI) and input files. Build backends have different levels of functionality, such as whether they support building extension modules, and you should choose one that suits your needs and preferences.

You can choose from a number of backends; this tutorial uses Hatchling by default, but it will work identically with Setuptools, Flit, PDM, and others that support the [project] table for metadata.

Setuptools. Old. Popular for historical reasons.

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

Hatchling, the build backend from Hatch, seems to be the uv default at the moment:

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

Poetry’s build system:

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

PDM’s build system seems popular for people with heavy C/C++ dependencies:

[build-system]
requires = ["pdm-backend"]
build-backend = "pdm.backend"

Flit’s build system (Looks like it only handles pure Python?)

[build-system]
requires = ["flit_core>=3.2,<4"]
build-backend = "flit_core.buildapi"

Maturin seems to support Rust-backed packages, so it is very zeitgeisty.

We presumably need to also install whichever build system we use.

5 Non-python-specific dependency managers

Does Python’s slapstick ongoing shambles of a failed consensus on dependency management system fill you with distrust? Do you have the vague feeling that perhaps you should use something else to manage Python since Python cannot manage itself? See generic dependency managers for an overview. Some of these are known to be an OK means of managing Python specifically, even though they are more general.

5.1 Spack

Supercomputing dep manager spack has Python-specific support.

PyPI has hundreds of thousands of packages that are not yet in Spack, and pip may be a perfectly valid alternative to using Spack. The main advantage of Spack over pip is its ability to compile non-Python dependencies. It can also build cythonized versions of a package or link to an optimised BLAS/LAPACK library like MKL, resulting in calculations that run orders of magnitudes faster. Spack does not offer a significant advantage over other Python-management systems for installing and using tools like flake8 and sphinx. But if you need packages with non-Python dependencies like numpy and scipy, Spack will be very valuable to you.

Anaconda is another great alternative to Spack and comes with its own conda package manager. Like Spack, Anaconda is capable of compiling non-Python dependencies. Anaconda contains many Python packages not yet in Spack, and Spack contains many Python packages not yet in Anaconda. The main advantage of Spack over Anaconda is its ability to choose a specific compiler and BLAS/LAPACK or MPI library. Spack also has better platform support for supercomputers and can build optimised binaries for your specific microarchitecture.

5.2 Meson

meson-python uses The Meson Build system for Python. Is that… good?

6 Writing a package

Least nerdview guide: Vicki Boykis, Alice in Python projectland.
Simplest readable guide is python-packaging
PyPI Quick and Dirty, includes good tips such as using twine to make it more automatic.
Open-sourcing a Python project the right way.
Official docs are no longer awful but are slightly stale, and are especially perfunctory for compilation.
There is a community effort to document the issues of compiled packages in pypackaging-native (tldr it is hard)
Kenneth Reitz shows rather than tells with a heavily documented setup.py
Try Zed Shaw’s signature aggressively cynical and reasonably practical explanation of project structure, with bonus explication of how you should expect much time-wasting yak shaving if you want to do software.
- Or copy pyskel.
- Or generate a project structure with a templating/scaffolding system.
Updated: What the heck is pyproject.toml?

6.1 Scaffolding and templating

Generating all those files is boring. The Python packaging ecosystem has several tools to automate it.

6.1.1 Copier

Slightly newer than cookiecutter, copier is a more modern and flexible tool for generating project templates.

6.1.2 cookiecutter

Classic. Works. See cookiecutter.

6.2 Documenting my package

Here are two famous options. I’ve used Sphinx, and it is adequate and well-integrated, but has its own markup language which the world outside of Python does not use. MkDocs seems less blessed by the mainstream Python foundation but also uses markdown which is more widely used and has a rich ecosystem of tools.

7 Python versions

If we use conda or uv then the Python version is handled for us, along with generic dependency managers. With pip, we need to manage it ourselves. Poetry is in between — it knows about Python versions but cannot install Python for us.

7.1 `pyenv`

I find pyenv baffling as it interacts with all the other tools in the Python packaging ecosystem in a way that is not immediately obvious to me.

I prefer to avoid it completely and let uv handle Python versions, which it does in a relatively seamless way, without me needing to consider local and global versions and remember which sub-sub-version of Python I compiled with what for which.

pyenv is the core tool of an ecosystem that eases and automates switching between Python versions. It manages Python and thus can be used as a manager for all the other managers.

BUT WHO MANAGES THE VIRTUALENV MANAGER MANAGER? Also, what is going on in this ecosystem of bits? Logan Jones explains:

pyenv manages multiple versions of Python itself.
virtualenv/venv manage virtual environments for a specific Python version.
pyenv-virtualenv manages virtual environments across varying Python versions.

Anyway, pyenv compiles a custom version of Python and is extremely isolated from everything else. An introduction with emphasis on my area: Intro to Pyenv for Machine Learning.

# initialize pyenv
pyenv init
# install a specific Python version
pyenv install 3.8.13
# ensure we can find that version
pyenv rehash
# switch to that version
pyenv shell 3.8.13

Of course, because this is adjacent to Python packaging, it is infected with the same brainworms. Everything immediately becomes complicated and confusing when I try to interact with the rest of the ecosystem, e.g.,

pyenv-virtualenvwrapper is different from pyenv-virtualenv, which provides extended commands like pyenv virtualenv 3.4.1 project_name to directly help out with managing virtualenvs. pyenv-virtualenvwrapper helps in interacting with virtualenvwrapper, but pyenv-virtualenv provides more convenient commands, where virtualenvs are first-class pyenv versions, that can be (de)activated. That’s to say, pyenv and virtualenvwrapper are still separated while pyenv-virtualenv is a nice combination.

Huh. I am already too bored to think. However, I did work out a command which installed a pyenv TensorFlow with an isolated virtualenv:

brew install pyenv pyenv-virtualenv
pyenv install 3.8.6
pyenv virtualenv 3.8.6 tf2.4
pyenv activate tf2.4
pip install --upgrade pip wheel
pip install 'tensorflow-probability>=0.12' 'tensorflow<2.5' jupyter

For fish shell I needed to add some special lines to config.fish:

set -x PYENV_ROOT $HOME/.pyenv
set -x PATH $PYENV_ROOT/bin $PATH
## fish <3.1
# status --is-interactive; and . (pyenv init -|psub)
# status --is-interactive; and . (pyenv virtualenv-init -|psub)
## fish >=3.1
status --is-interactive; and pyenv init - | source
status --is-interactive; and pyenv virtualenv-init - | source

For bash/zsh (resp. .bashrc/.zshrc) it is as follows:

export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv init -)"

8 Sorry, GPU/TPU/etc users

Users of GPUs must ignore any other options, no matter how attractive all the other options might seem at first glance. The stupid drudge work of venv is the price of hardware support for now. Only pip and conda support hardware specification in practice.

UPDATE: poetry now supports Pytorch with CUDA. uv has had a crack at it too.

Terminology you need to learn: Many packages specify local versions for particular architectures as a part of their functionality. For example, pytorch comes in various flavours, which when using pip, can be selected in the following fashion:

# CPU flavour
pip install torch==1.10.0+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
# GPU flavour
pip install torch==1.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

The local version is given by the +cpu or +cu113 bit, and it changes what code will be executed when using these packages. Specifying a GPU version is essential for many machine learning projects (essential, that is, if I don’t want my code to run orders of magnitude slower). The details of how this can be controlled with regard to the Python packaging ecosystem are somewhat contentious and complicated and thus not supported by any of the new wave options like poetry or pipenv. Brian Wilson argues,

During my dive into the open-source abyss that is ML packages and +localVersions I discovered lots of people have strong opinions about what it should not be and like to tell other people they’re wrong. Other people with opinions about what it could be are too afraid of voicing them lest there be some unintended consequence. PSF has asserted what they believe to be the intended state in PEP-440 (no local versions published) but the solution (PEP-459) is not an ML Model friendly solution because the installation providers (pip, pipenv, poetry) don’t have enough standardised hooks into the underlying hardware (cpu vs gpu vs cuda lib stack) to even understand which version to pull, let alone the Herculean effort it would take to get even just pytorch to update their package metadata.

There is no evidence that this logjam will resolve any time soon. However, it turns out that this machine learning thing is not going away, and ML projects use GPUs. It turns out that packaging projects with GPU code is hard. Since I do neural network stuff and thus use GPU/CPU versions of packages, this means that I can effectively ignore most of the Python environment alternatives on this page. The two that work are conda and pip. These all support a minimum viable local version package system de facto which does what we want. If you want something fancier, try containerization using a GPU-compatible system such as apptainer.