Python packaging, dependency management and isolation



The actual experience of managing python packages.

How to install the right versions of everything for some python code I am developing? How to deploy that sustainably? How to share it with others? There are two problems here: installing the right package dependenciess and keeping the right dependency versions for this project. In python there are various integrated solutions that solve these two problems at once with varying degrees of success. Not so hard, but confusing and chaotic due to many long-running disputes only lately resolving.

General

primordial time

In the past-times there were other python packageing systems. Distutils and what-not. AFAICT, unless I am migrating extremely old code I should ignore everything about this.

pip

The default python package installer. It is best spelled

python -m pip install package_name

Pro tip: pipx:

pip is a general-purpose package installer for both libraries and apps with no environment isolation. pipx is made specifically for application installation, as it adds isolation yet still makes the apps available in your shell: pipx creates an isolated environment for each application and its associated packages.

That is, pipx is a global application that installs global applications for you. (There is a bootstrapping problem here; How to install pipx itself.)

pip has a heavy cache overhead. If disk space is at a premium, I invoke it as pip --no-cache-dir.

Anaconda

The distribution you use if you want to teach a course in numerical python without dicking around with a 5 hour install process.

Setup

Download e.g. Linux x64 Miniconda, from the download page.

bash Miniconda3-latest-Linux-x86_64.sh
# login/logout here
# or do something like `exec bash -` if you are fancy
# Less aggressive conda
conda config --set auto_activate_base false
# conda for fish users
conda init fish

It is very much worth installing _mini_conda rather than the default anaconda distro, since anaconda default is gigantic but nonetheless does not have what I need, so it simply wastes space.

Conda has a slightly different packaging workflow. See, e.g. Tim Hoppper’s explanation of this environment.yml malarkey, or the creators’ rationale and manual.

The upshot for the end user is that if I want to install something with tricky dependencies like ViTables, I do this:

conda install pytables=3.2
conda install pyqt=4

Aside: I use fish shell, so need to do some extra setup. Specifically, I add the line

source (conda info --root)/etc/fish/conf.d/conda.fish

into ~/.config/fish/config.fish. These days this is autoamted by

conda init fish

For jupyter compatibility one needs

conda install nb_conda_kernels

Save space

NB Conda will fill up my hard disk if not regularly disciplined. via conda clean.

conda clean -pt

If I have limited space in your home dir, might need to move the cache:

configure PKGS_DIR in ~/.condarc:

conda config --add pkgs_dirs /SOME/OTHER/PATH/.conda

Possibly also required?

chmod a-rwx ~/.conda

Dependencies

One exports the current conda environment config, by convention, into environment.yml.

conda env export > environment.yml
conda env create --file environment.yml

Which to use out of conda env create and conda create? if it involves .yaml environment configs then conda env create. Confusing errors and capability differences for these two is a quagmire which leads to opaque errors, bad documentation and sadness.

One annoying point of friction that I rapidly encountered is that these are not terribly generic environments; I might specify from the command-line a package that I know will install sanely on any platform (matplotlib, say) but the version as stored in the environment file is specific to where I installed it (macos, linux, windows…). So to share environments with collaborators on different platforms, I need to… be them I guess? idk this seems weird maybe I’m missing something.

No MKL

I might also want to not have the gigantic MKL library installed, not being a fan. I can usually disable it by request:

conda create -n pynomkl python nomkl

Clearly the packagers do not test thi configuration so often, because it fails sometimes even for packages which notionally do not need ML. Worth trying, however. Between the various versions and installed copies, MKL alone was using about 10GB total on my mac when I last checked. I also try to reduce the number of copies of MKL by starting from miniconda as my base anaconda distribution, cautiously adding things as I need them.

Local environment

Local environment folder is more isolated, rather than keeping all environments somewhere global.

conda config --set env_prompt '({name})'
conda env create --prefix ./env/myenv --file environment_linux.yml
conda activate ./env/myenv

Gotcha: in fish shell the first line needs to be

conda config --set env_prompt '\({name}\)'

I am not sure why. AFAIK, fish command substitution does not happen inside strings.

Either way, this will add the line

env_prompt: ({name})

to .condarc.

venv

venv is now a built-in python virtual environment system in python 3. It doesn’t support python 2 but fixes various problems, e.g. it supports framework python on macOS which is important for GUIs, and is covered by the python docs in the python virtual environment introduction.

# Create venv
python3 -m venv ~/.virtualenvs/learning_gamelan_keras_2
# Use venv from fish
source ~/.virtualenvs/learning_gamelan_keras_2/bin/activate.fish
# Use venv from bash
source ~/.virtualenvs/learning_gamelan_keras_2/bin/activate

pipenv

venv has a higher-level, er, …wrapper (?) interface (?) called pipenv.

Pipenv is a production-ready tool that aims to bring the best of all packaging worlds to the Python world. It harnesses Pipfile, pip, and virtualenv into one single command.

Here is an introduction showing these tools used together.

pyenv

pyenv. which eases and automates switching between all the other pythons and python environments created by virtualenv, python.org python, os python, anaconda python etc. Manages python and thus implicitly can be used as a manager for all the other managers. The new new hipness, at least for platforms other than windows.

BUT WHO MANAGES THE VIRTUALENV MANAGER MANAGER? Also, what is going on? Logan Jones explains:

  • pyenv manages multiple versions of Python itself.
  • virtualenv/venv manages virtual environments for a specific Python version.
  • pyenv-virtualenv manages virtual environments for across varying versions of Python.

Anyway, pyenv compiles a custom version of python and as such is extremely isolated from everything else. Here is an introduction with emphasis on my area: Intro to Pyenv for Machine Learning.

Of course, because this is a python packaging solution, it immediately becomes complicated and confusing when you try to interact with the rest of the ecosystem, e.g.,

pyenv-virtualenvwrapper is different from pyenv-virtualenv, which provides extended commands like pyenv virtualenv 3.4.1 project_name to directly help out with managing virtualenvs. pyenv-virtualenvwrapper helps in interacting with virtualenvwrapper, but pyenv-virtualenv provides more convenient commands, where virtualenvs are first-class pyenv versions, that can be (de)activated. That’s to say, pyenv and virtualenvwrapper are still separated while pyenv-virtualenv is a nice combination.

Huh. I am already too bored to think. However, I did nut out a command which installed a pyenv tensorflow with an isolated virtualenv:

brew install pyenv pyenv-virtualenv
pyenv install 3.8.6
pyenv virtualenv 3.8.6 tf2.4
pyenv activate tf2.4
pip install --upgrade pip wheel
pip install 'tensorflow-probability>=0.12' 'tensorflow<2.5' jupyter

For fish shell you need to add some special lines to config.fish:

set -x PYENV_ROOT $HOME/.pyenv
set -x PATH $PYENV_ROOT/bin $PATH
## fish <3.1
# status --is-interactive; and . (pyenv init -|psub)
# status --is-interactive; and . (pyenv virtualenv-init -|psub)
## fish >=3.1
status --is-interactive; and pyenv init - | source
status --is-interactive; and pyenv virtualenv-init - | source

Poetry

No! wait! The new new new hipness is poetry. All the other previous hipnesses were not the real eternal ultimate hipness that transcends time. I know we said this every previous time, but this time its real and our love will last forever.

This look will be forever.

Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you.

From the introduction:

Packaging systems and dependency management in Python are rather convoluted and hard to understand for newcomers. Even for seasoned developers it might be cumbersome at times to create all files needed in a Python project: setup.py, requirements.txt, setup.cfg, MANIFEST.in and the newly added Pipfile.

So I wanted a tool that would limit everything to a single configuration file to do: dependency management, packaging and publishing.

It takes inspiration in tools that exist in other languages, like composer (PHP) or cargo (Rust).

And, finally, I started poetry to bring another exhaustive dependency resolver to the Python community apart from Conda’s.

What about Pipenv?

In short: I do not like the CLI it provides, or some of the decisions made, and I think we can make a better and more intuitive one.

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/install-poetry.py | python -

OK, so poetry could be regarded as a similar thing to pipenv, in that it (per default, but not necessarily) manages the local dependencies in a local venv. I list it separately because it has a much more full-service approach. For example, it has its own dependency resolver, which makes use of modern dependency metadata but also will work with previous dependency specifications by brute force if needed. It separates specified dependencies from the ones that it actually resolves in practice, which means that the dependencies seem to transport much better than conda. In practice the many small conveniences and very thoughtful workflow are helpful. For example, it will set up the current package for developing per default so that imports work as similarly as possible across this local environment and when it is distributed to users.

Generic dependency managers

i.e. non-python-specific. See dependency managers.


No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.