pip
is the default Python package installer. It is the most widely used and most widely supported package manager for Python, especially if you include the many derivative versions which extend the baseline implementation.
1 Basics
pip is best invoked as
2 pip
without venv
I assume throughout that we are using venv
to manage our Python environments, as opposed to installing things globally. This is the best practice, and I recommend it.
Note that we can use pip
to install packages outside of a virtual environment. It will happily execute outside an active environment without complaint.
But don’t do that. That is burning down your house to toast a sandwich. Doing so does extremely weird things, installing possibly-conflicting packages into a global Python. I have no idea what it would be good for, apart from creating confusing errors and introducing bugs. Depending on which platform I am on, it will either work, fail, or introduce subtle problems that I will notice later when other things break. I have not found a use case for installing packages outside a contained virtual environment in more than a decade of Python programming. Maybe in a Docker container or something?
Anyway, read on for the important bit.
3 pip
+ venv
venv
is the built-in Python virtual environment system in Python 3, replacing virtualenv
which we still find documented about the place. It creates self-contained environments that do not interfere with each other, which is what most people want.
venv works, for all that I would like it to work more smoothly. It is a built-in Python virtual environment system in Python 3. While it doesn’t support Python 2 (but, also, let Python 2 go unless someone is paying you money to keep a grip on it), it does fix various problems, e.g., it supports framework Python on macOS which is important for GUIs, and is covered by the Python docs in the Python virtual environment introduction. venv
is a good default choice, widely supported and adequate, if not awesome, workflow.
# Create venv in the current folder
python3 -m venv ./venv --prompt some_arbitrary_name
# or if we want to use system packages:
python3 -m venv ./venv --prompt some_arbitrary_name --system-site-packages
# Use venv from fish OR
source ./venv/bin/activate.fish
# Use venv from bash
source ./venv/bin/activate
Hereafter, I assume we are in an active venv
. Now we use pip
. I always begin by upgrading pip
itself and installing wheel
which is some bit of installation infrastructure that is helpful in practice. Thereafter, everything else should install more correctly, except when it doesn’t.
To snapshot dependencies in requirement.txt
:
I do not recommend using the freeze
command except as a first draft. It is too specific and includes very precise version numbers and obscure, locally specific sub-dependencies. Best keep a tally of the actual hard dependencies and let pip
sort out the details.
To restore dependencies from a requirements.txt
:
Version specification in requirements.txt
looks something like this:
MyProject
YourProject == 1.3
SomeProject >= 1.2, < 2.0
SomeOtherProject[foo, bar]
OurProject ~= 1.4.2
TheirProject == 5.4 ; python_version < '3.8'
HerProject ; sys_platform == 'win32'
requests [security] >= 2.8.1, == 2.8.* ; python_version < "2.7"
The ~=
is a handy lazy shortcut; it permits point releases, but not minor releases, so e.g. ~=1.3.0
will also satisfy itself with version 1.3.9
but not 1.4.0
.
Gotcha: pip
’s requirements.txt
does not actually specify the version of Python itself when you install from it, although you might think it from the python_version
specifier. See Python versions to see how to stipulate the Python version at package development time.
4 uv
The one I mostly use these days. See uv.
5 Poetry
One I briefly used. See Poetry.
6 pipenv
⛔️⛔️UPDATE⛔️⛔️: Note that the pipenv
system does not support “local versions” and is therefore unusable for machine learning applications. This project is dead to me. (Bear in mind that my opinions will become increasingly outdated depending on when you read this.)
venv
has a higher-level, er, …wrapper (?) interface called pipenv.
Pipenv is a production-ready tool that aims to bring the best of all packaging worlds to the Python world. It harnesses Pipfile, pip, and virtualenv into one single command.
I switched to pipenv from poetry because it looked less chaotic than poetry. I think it is, although not by much.
HOWEVER, it is still pretty awful for my use-case. To be honest, I’d just use plain pip and requirements.txt
, which, while primitive and broken, are at least broken and primitive in a well-understood way.
At the time of writing, the pipenv website was 3 weeks into an outage, because dependency management is a quagmire of sadness and comically broken management with terrible Bus factor. However, the backup docs site is semi-functional, albeit too curt to be useful and, as far as I can tell, outdated. The documentation site inside GitHub is readable. See also an introduction showing pipenv and venv used together.
The dependency resolver is, as the poetry devs point out, broken in its own special ways. The procedure to install modern ML frameworks, for example, is gruelling.
For my system, important settings are:
To get the venv inside the project (required for sanity in my HPC) I need the following:
Pipenv will automatically load dotenv files, which is a nice touch.
7 pipx
Pro tip: pipx:
pipx is made specifically for application installation, as it adds isolation yet still makes the apps available in your shell: pipx creates an isolated environment for each application and its associated packages.
That is, pipx is an application that installs global applications for you.
8 Rye
9 PDM
is a modern Python package and dependency manager supporting the latest PEP standards. But it is more than a package manager. It boosts your development workflow in various aspects. The most significant benefit is it installs and manages packages in a similar way to
npm
that doesn’t need to create a virtualenv at all!Feature highlights:
10 Flit
Make the easy things easy and the hard things possible is an old motto from the Perl community. Flit is focused on the easy things part of that, and leaves the hard things up to other tools.
Specifically, the easy things are pure Python packages with no build steps (neither compiling C code, nor bundling Javascript, etc.). The vast majority of packages on PyPI are like this: plain Python code, with maybe some static data files like icons included.
It’s easy to underestimate the challenges involved in distributing and installing code, because it seems like you just need to copy some files into the right place. There’s a whole lot of metadata and tooling that has to work together around that fundamental step. But with the right tooling, a developer who wants to release their code doesn’t need to know about most of that.
What, specifically, does Flit make easy?
flit init
helps you set up the information Flit needs about your package.- Subpackages are automatically included: you only need to specify the top-level package.
- Data files within a package directory are automatically included. Missing data files have been a common packaging mistake with other tools.
- The version number is taken from your package’s
__version__
attribute, so it always matches the version that tools like pip see.flit publish
uploads a package to PyPI, so you don’t need a separate tool to do this.Setuptools, the most common tool for Python packaging, now has shortcuts for many of the same things. But it has to stay compatible with projects published many years ago, which limits what it can do by default.
11 Hatch
Hatch is a modern, extensible Python project manager.
Features:
- Standardised build system with reproducible builds by default
- Robust environment management with support for custom scripts
- Easy publishing to PyPI or other indexes
- Version management
- Configurable project generation with sane defaults
- Responsive CLI, ~2-3x faster than equivalent tools
12 Python versions
If we use conda or uv then the Python version is handled for us, along with generic dependency managers. With pip
, we need to manage it ourselves. Poetry is in between — it knows about Python versions but cannot install Python for us.
See pyenv for the most common solution to manage Python versions.
13 GPU/TPU/etc pain
Users of GPUs must ignore any other options, no matter how attractive all the other options might seem at first glance. The stupid drudge work of venv
is the price of hardware support for now. Only pip and conda support hardware specification in practice.
UPDATE: poetry
now supports Pytorch with CUDA. uv
has had a crack at it too.
Terminology you need to learn: Many packages specify local versions for particular architectures as a part of their functionality. For example, pytorch comes in various flavours, which when using pip
, can be selected in the following fashion:
# CPU flavour
pip install torch==1.10.0+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
# GPU flavour
pip install torch==1.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
The local version is given by the +cpu
or +cu113
bit, and it changes what code will be executed when using these packages. Specifying a GPU version is essential for many machine learning projects (essential, that is, if I don’t want my code to run orders of magnitude slower). The details of how this can be controlled with regard to the Python packaging ecosystem are somewhat contentious and complicated and thus not supported by any of the new wave options like poetry
or pipenv
. Brian Wilson argues,
During my dive into the open-source abyss that is ML packages and
+localVersions
I discovered lots of people have strong opinions about what it should not be and like to tell other people they’re wrong. Other people with opinions about what it could be are too afraid of voicing them lest there be some unintended consequence. PSF has asserted what they believe to be the intended state in PEP-440 (no local versions published) but the solution (PEP-459) is not an ML Model friendly solution because the installation providers (pip, pipenv, poetry) don’t have enough standardised hooks into the underlying hardware (cpu vs gpu vs cuda lib stack) to even understand which version to pull, let alone the Herculean effort it would take to get even just pytorch to update their package metadata.
There is no evidence that this logjam will resolve any time soon. However, it turns out that this machine learning thing is not going away, and ML projects use GPUs. It turns out that packaging projects with GPU code is hard. Since I do neural network stuff and thus use GPU/CPU versions of packages, I can effectively ignore most of the Python environment alternatives on this page. The two that work are conda and pip. These support a minimum viable local version package system de facto which does what we want. If you want something fancier, try containerization using a GPU-compatible system such as apptainer.