Docker containerized apps (for scientists)

Doing things that previously took 0.5 computers using 0.4 computers

2015-11-05 — 2022-01-18

Suspiciously similar content

Assumed audience:

People who want to do containerization for machine learning research

Content warning:

The needs of ML research people are not the usual scaling-web-apps type needs of many containerization users. Obsolete advice danger.

The most popular containerization solution.

The most common way of doing containerization generally; so common that it is easiest to define the alternatives with reference to this. It is, however, often not the best-suited task for my particular needs, which are research-oriented.

Docker is well supported but has awful terminology, riven with confusing analogies, and poor explanation. Fortunately, we have Julia Evans who explains at least the filesystem, overlayfs by example. The Google best practice page also has good illustrations which make it clear what is going on. See also the docker cheat sheet, as noted by digithead, who also explains the annoying terminology:

Docker terminology has spawned some confusion. For instance: images vs. containers and registry vs. repository. Luckily, there’s help, for example this stack-overflow post by a brilliant, but under-appreciated, hacker on the difference between images and containers.

Registry - a service that stores image repositories

Repository - a set of Docker images, usually versions of the same application

Image - an immutable snapshot of a running container. An image consists of layers of file system changes stacked up on top of a base image.

Container - a runtime instance of an image

Essentially with Docker, you provide a recipe for building a reproducible execution environment, and the infrastructure will ensure that environment exists for your program. The recipe is ideally encapsulated in the Dockerfile. The cost of this is that it is somewhat more clunky to set things up. The benefit is that setting things up the second time and all subsequent times is in principle effortless and portable.

1 Installation

There is a GUI for all this called Dock station which might make some steps easier on some platforms. TBC.

1.1 Linux

Installing docker is easy. Do not forget to give yourself permission to actually run docker:

sudo groupadd docker
sudo usermod -aG docker $USER

1.2 macOS

Choose one:

Homebrew install.
Docker for mac worked for me. I think it is the same as Docker Community Edition for Mac?
~~kitematic provides a GUI for the containers themselves~~ is discontinued. Ignore it.

1.3 Windows

Docker for Windows.

2 Docker with GPU

Annoying, last time I tried and required manual patching so intrusive that it was easier not to use Docker at all. Maybe better now? I’m not doing this at the moment, and the terrain is shifting. The currently least-awful hack could be simple. Or, not.

special NVIDIA-GPU-happy docker.
Tensorflow-serving-happy docker.

This might be an advantage of Apptainer.

3 Opaque timeout error

Do you get the following error?

Error response from daemon: Get https://registry-1.docker.io/v2/:
net/http: request canceled while waiting for connection
(Client.Timeout exceeded while awaiting headers)

According to thaJeztah, the solution is to use Google DNS for Docker (or presumably some other non-awful DNS). You can set this by providing a JSON configuration in the preference panel (under daemon -> advanced), e.g.

{ "dns": [ "8.8.8.8", "8.8.4.4" ]}

4 Orchestrating

Docker Compose: a nice way to set up a dev environment.

5 R Docker

5.1 Rocker

rocker has recipes for r docker.

## command-line R
docker run --rm -ti rocker/r-base
## Rstudio
docker run -e PASSWORD=yourpassword --rm -p 8787:8787 rocker/rstudio
# now browse to localhost:8787. L

5.2 containerit

6 Docker compose

Docker Compose: a nice way to set up a dev environment:

Docker Compose basically lets you run a bunch of Docker containers that can communicate with each other. You configure all your containers in one file called docker-compose.yml.

7 As package manager

Russell Jones, Whalebrew. Docker Images as ‘Native’ Commands:

As I’ve previously written, containers can be started, perform a task, then stopped in a matter of milliseconds. And that’s exactly what Whalebrew allows you to do in the form of Docker images aliased in your $PATH.

whalebrew/whalebrew: Homebrew, but with Docker images

8 Kubernetes

Kubernetes is a large-scale container automation system. I don’t need kubernetes since I am not in a team with 500 engineers.