Containerized apps

Doing things that previously took 1 computer using 0.75 computers


These are rapidly evolving standards. Check the timestamps on any advice.

A lighter, hipper alternative to virtual machines, which, AFAICT, attempts to make provisioning services more like installing an app than building a machine, because it emphasises containerized apps or services, rather than machines, which emphasis leads to less arsing about I suppose, but somehow even more webinars. It is a fuzzy distinction that shares a lot of infrastructure. Related to sandboxing, in that thye use some of the same technology, even sometimes conflicting. Containerization targets quick light-weight reproducible execution environments for some program, often a server process, and the user is fairly likely to be on the same team as the developer, and fairly likley to run on some share infrastructure. Sandboxing is usually for apps, and usually on the desktop. It is more often used for developers to distribute apps to end-user customers.

The most common hosts for containers are, or were, Linux-ish, but I believe there are also Windows/macOS solutions these days. Do they run a VM or something? I do not know how similar the abstractions are across these.

For this you could probably Julia Evans’s How Containers work.

When you first start using containers, they seem really weird. Is it a process? A virtual machine? What’s a container image? Why isn’t the networking working? I’m on a Mac but it’s somehow running Linux sort of? And now it’s written a million files to my home directory as root? What’s HAPPENING?

And you’re not wrong. Containers seem weird because they ARE really weird. They’re not just one thing, they’re what you get when you glue together 6 different features that were mostly designed to work together but have a bunch of confusing edge cases.

Go get Julia Evans’s How Containers work for an introduction

Go get Julia Evans’s How Containers work for an introduction

Docker

The most common way of doing this; so common that it stands in for all the technologies. It is simple structurally but is riven with confusing analogies, inconsistent terminology and poor explanation that make it seem esoteric.

Fortunately we have Julia Evans who explains at least the filesystem, overlayfs by example. See also the docker cheat sheet.

Essentially with Docker you provide a distribute for building a reproducible execution environment and the infrastructure will ensure that environment exists for your program. The costs of this is that it is somewhat more clunky to set things up. The benefit is that setting things up the second time and all subsequent times is in principle effortless and portable.

Installation

  • Linux hosts: installing docker is easy.
  • macOS has a confusing profusion of toolchain bits and pieces they can try to install to get the experience, all of which try to install various distinct versions of each other, and give little information about which is the recommended way of doing what.

    Choose one:

    • Homebrew install.

    • Docker for mac worked for me. I think it is the same as Docker Community Edition for Mac?

    • kitematic provides a GUI for the containers themselves, as opposed to the infrastructure.

    • docker toolbox bundles some docker infrastructure plus kitematic. It attempts to run docker properly, but seems to fail in weird ways in the default setup, giving, e.g. permission errors and such. If you install Docker for Mac then install this, you get Kitematic but it can’t see your docker images, because of something boring that I can’t be bothered understanding.

  • Docker for Windows. (i.e. runs Windows clients. On windows? IDK.)

Docker with GPU

Annoying, last time I tried and required manual patching so intrusive that it was easier not to use Docker at all. Maybe better now? I’m not doing this at the moment, and the terrain is shifting. The currently least-awful hack could be simple. Or, not.

Secrets

Handling passwords is fiddly – see secrets.

Opaque timeout error

Do you get the following error?

Error response from daemon: Get https://registry-1.docker.io/v2/:
net/http: request canceled while waiting for connection
(Client.Timeout exceeded while awaiting headers)

According to thaJeztah, the solution is to use google DNS for Docker (or presumably some other non-awful DNS). You can set this by providing a JSON configuration in the preference panel (under daemon -> advanced), e.g.

{ "dns": [ "8.8.8.8", "8.8.4.4" ]}

Docker for reproducible research

Docker may not be the ultimate tool for reproducible research but it is a start. And it is convenient - see Keunwoo Choi’s guide for researchers by example. (🏗 fact-check the linked article.)

…How do you get your data in?

Tiffany Timbers gives a brisk run-through for academics.

Jon Zelner goes in-depth with R in a series culminating in continuous integration for science.

Reproducible research tuts has a docker (plus also VM-backed) tutorial.

Singularity

Singularity promises potentially useful container infrastructure.

Singularity provides a single universal on-ramp from the laptop, to HPC, to cloud.

USERS OF SINGULARITY CAN BUILD APPLICATIONS ON THEIR DESKTOPS AND RUN HUNDREDS OR THOUSANDS OF INSTANCES—WITHOUT CHANGE—ON ANY PUBLIC CLOUD.

Features include:

  • Support for data-intensive workloads—The elegance of Singularity’s architecture bridges the gap between HPC and AI, deep learning/machine learning, and predictive analytics.
  • A secure, single-file-based container format—Cryptographic signatures ensure trusted, reproducible, and validated software environments during runtime and at rest.
  • Extreme mobility—Use standard file and object copy tools to transport, share, or distribute a Singularity container. Any endpoint with Singularity installed can run the container.
  • Compatibility—Designed to support complex architectures and workflows, Singularity is easily adaptable to almost any environment.
  • Simplicity—If you can use Linux®, you can use Singularity.
  • Security—Singularity blocks privilege escalation inside containers by using an immutable single-file container format that can be cryptographically signed and verified.
  • User groups—Join the knowledgeable communities via GitHub, Google Groups, or in the Slack community channel.
  • Enterprise-grade features—Leverage SingularityPRO’s Container Library, Remote Builder, and expanded ecosystem of resources. […]

Released in 2016, Singularity is an open source-based container platform designed for scientific and high-performance computing (HPC) environments. Used by more than 25,000 top academic, government, and enterprise users, Singularity is installed on more than 3 million cores and trusted to run over a million jobs each day.

In addition to enabling greater control over the IT environment, Singularity also supports Bring Your Own Environment (BYOE)—where entire Singularity environments can be transported between computational resources (e.g., users’ PCs) with reproducibility.

Rocker

rocker has recipes for r docker.

## command-line R
docker run --rm -ti rocker/r-base
## Rstudio
docker run -e PASSWORD=yourpassword --rm -p 8787:8787 rocker/rstudio
# now browse to localhost:8787. L

Docker GUIs

GUI comparison

  • kitematic, already mentioned, is languishing but works. Windows, macOS.
  • portainer is a docker GUI that runs on docker, and therefore everywhere.

LXC

LXC is another containerization standard. Because docker is a de facto default, let’s look at this in terms of docker.

Kubernetes

Kubernetes is a large scale container automation system. I don’t need kubernetes since I am not in a team with 500 engineers.