The Bone Yard

Diving Into Docker

The Planning and Execution Around Dockerizing My Blog

In the last post, I detailed how I was struggling a bit with the development process for this blog, in the sense that moving the individual pieces of the app from server to server was becoming a hassle. After a few weeks of experimentation, and some POC'ing, I have converted my site to a container based application structure using Docker. Now is a good time for me to share some observations on "Dockerization", and outline a few choices I made along the way.

Container Composition

The first choice I needed to make was, how do I decompose my site into separate container chunks? There were a few different options available. At one end of the spectrum, I could essentially package the whole site, in its entirety, as a single container. This would be the most expedient approach, as it would require less coordination between pieces; everything would exist in one logical space. The issue with that approach is that it doesn't align with Docker "best practices". My reading from the Docker user guide indicated that containers should really center around a single process. If you think about this, it matches one of the basic Linux tenets, which is building a system around small, individual units that do one thing well, and only one thing well. The obvious win there is that with building blocks, you can re-arrange or re-order the blocks to change how the system works, or even build entirely different systems with minimal change to the pieces. So to keep with the spirit of this rule, my decision was to break apart my site into 3 chunks:

a container housing static HTML/CSS content served by Nginx, which also reverse proxies requests to the blog app.
a container holding Gunicorn, the Pyramid framework, as well as Python blog app code.
a container in which PostgreSQL is installed to store the blog app data.

Container Communication

Once the decision was made to break apart the application into multiple containers, it became clear that I would then need to deal with the configuration to enable the containers to talk with one another. By default, Docker will create a bridge network on the host machine, similar to how VM solutions like VirtualBox or KVM facilitate management of their networking. This bridge network will show up as the docker0 interface in commands like ss or ifconfig. Using this bridge network, the host machine will be able to communicate with containers. While this is a very convenient state of affairs, we do surrender control of network structure to Docker with the default bridge. Docker randomly selects an address subnet and gateway for the default bridge network. What's more, if you fall back to Docker's default network bridge, you can't (as of version 1.12) assign static IP addresses to individual containers you create. To achieve that, you'll need to forgo use of the default bridge in favor of your own, custom network. Actually, setting up your own Docker network isn't difficult at all, and then allows you to very precisely choose your own subnet, gateway and other network topography, so that you can ensure the Docker net doesn't, say, overlap with an auxiliary subnet on your home LAN (other than the ones your host directly routes to) that Docker on your host machine can't possibly know about.

Controlling the container network topography is done with the docker network command. Using 'docker network', you can set up a basic custom network bridge, and then, assign newly constructed containers to your custom network.

# Define a custom network named 'foobar' for our containers to use
# Note you can use CIDR notation to describe the IP range.
docker network create --subnet=10.11.12.0/24 foobar

Then, later on, when you are creating your container instances from their base image, you can specify the network the container will connect to, as well as assign it a static IP address.

# Create a container instance named Hello World from BaseImgName.
# Attach it to the foobar network, using IP 10.11.12.13.
docker create --name HelloWorld --network=foobar --ip=10.11.12.13 BaseImgName

Container Storage

Provisioning disk space for your containers is a simple proposition, at least for setting up your development environment. Docker built intelligence into their application to work with a number of different back-end storage technologies. Out of the box, you shouldn't need to tweak any settings in order to get Docker up and running, as Docker will examine the packages installed on your system and try to pick the best one for you. Refining your storage strategy for production is a different story, but we'll discuss that later.

The preferred storage package that Docker offers is called union file system, also known as AUFS. AUFS incorporates some clever techniques, including copy on write, so that Docker can stack layers upon layers of Docker images to form your multi layered container that will only allocate space when needed. Docker will only make a copy of a file within a source image in the event that your container is modifying data within the image's file. This tends to make read performance very snappy, e.g. starting your container, at the expense of potentially slower writes, particularly if your container changes larger files lower in the image stack. Docker nicely describes the way they leverage AUFS here.

While AUFS is quite mature within Docker's platform, and has a few nice features, it may not be your first choice as a storage driver. The issue is that AUFS is not natively supported in the mainline Linux kernel. This being the case, there are a number of distributions that don't even include AUFS, as this would violate their policies of not changing source code/packaging derived from upstream package repos, the Linux kernel being one of those repos. The Red Hat organization, for their own reasons, shuns "out of tree bits" for their packaging, which means that if you're using Fedora or CentOS, you would need to arrange for installing AUFS on your own; you won't get it through the native package manager. I would advise you to research AUFS on your own and make a determination if you think it's worth it to either a) manage installation AUFS on your own, or b) choose a different distribution that natively packages AUFS.

Luckily, as mentioned earlier, there are alternatives. Docker can also run on top of devicemapper, which is a Linux kernel driver that forms an abstraction layer on top of physical block devices, enabling very useful features like spanning file systems over different physical drives, and creating hot copies of your data, without unmounting drives (called snapshots). devicemapper is the foundation for well-known storage technologies like LVM, and dm-raid (software raid). Since certain distributions, like the Red Hat family, don't natively include AUFS, devicemapper is the default back-end storage solution on those environments. Read more about the history behind Docker's introduction of devicemapper support, and how it works, here.

This is where we tie in our earlier mention of production suitability for Docker storage drivers. While it's true that Docker is flexible enough to choose storage drivers based on your host system, without requiring intervention, in a production setting, you will not want to settle on what Docker gives you. In the case of the Red Hat family, Docker will default to using a variation of devicemapper called loop-lvm. In a nutshell, rather than using devicemapper on top of a specific physical drive, Docker uses simple files on the host system to model/mimic a separate drive. This will not be as performant as it could be, as you can imagine. Running devicemapper on top of an actual separate physical drive is going to be faster than leveraging files from within the host OS, since we can eliminate a layer of indirection in storage allocation/modification (roughly speaking, loop lvm = Docker -> host OS -> kernel -> drive, direct lvm = Docker -> kernel -> drive). This use case for devicemapper is called direct-lvm. The trade off in this case will be, basically, performance in exchange for complexity. Now, you're going to need to take steps on your own to provision a separate block device, install and tune LVM within your host OS, and tell the Docker daemon it should prefer using direct-lvm as its back end. If you're running in a VPS context, it shouldn't be hard to grab another disk to spin up to dedicate solely for Docker storage. From the software/config side, there is definitely a learning curve to picking up LVM, but once you do that, you'll find it is a great tool in your toolbox. Pretty much all Linux installations, from my laptop to NAS system to VPS, I build on top of LVM, for the ability to add more disks on the fly, and snapshot backups.

Since I have already built my blog application on CentOS, and given that I had selected a VPS that can support LVM-based OS deployments, I elected to use direct-lvm for my needs. I won't go into detail on the steps to setup, configure and leverage LVM with respect to Docker. I could (and may) devote an entire post about LVM, and besides, the earlier link I provided to the Red Hat developers portal nicely describes how to adapt a Docker deployment to use LVM.

Development in Containers

The last consideration I'll touch on is how working with containers impacts the development work flow. With a traditional, non-Docker solution, there isn't much you need to manage, with respect to staging your development environment. Sure, you will need to keep your dev vs prod config files separate when it's time to deploy, but there is nothing preventing you from executing/testing your code, in situ, on your dev machine. Things get a bit more complicated with Docker, as now, your code will not be running on your host system, per se, but instead wrapped by a Docker container. To test your code changes, your container's code base will need to be synced with the latest changes as you make them in your dev working copy. There are a few different ways to tackle this, as you can probably guess:

Re-build your Docker images after you make changes, re-creating each container instance.
Perform your actual development, editing, compilation etc inside each respective container.
Copy your latest code changes into the containers as they occur.

There are some significant cons to the first two approaches, in my opinion. For #1, you would need to re-build affected Docker images after each time you change the code they wrap, which is clearly going to require extra time, and could quickly become tedious after many change & test iterations. For #2, assuming you would clone your working copy into a container, and then later push changes back to an originating repo, you now have to include knowledge about your source control system into the container context. I don't like this idea; it doesn't feel right, and you'd need to ensure you'd wipe all linkages to your SCS prior to deployment, as it constitutes a security concern. The last approach seems like the lesser of 3 evils, yet, there is a way to do even better. You can ensure that your changes in the working directory in the host system are reflected instantly in the target container by using bind mounting. You can identify file paths in your container that, rather than copying in data during creation, will be mounted directly to file paths on your host machine. Certainly you'd need to copy in the "real" data for your production deployment, but should you forget your pre-prod deployment wipe, the downside to me is a bit easier to accept instead of sending the intimate details of your SCS out into the cloud on hosted VPS environment. To use bind mounting, you can leverage the -v option for docker commands such as run or create. This is, then, a init time setting that you establish on building your container. I wired the option of using a bind mount versus copy source directly into the script I use to build my Docker containers.

To wrap things up, here is a shell script I wrote that glues together the building and creation steps for my customized blog Docker app. Hopefully this discussion has given you some insight into what decisions you will face in moving to the realm of Docker, and some ideas on how to prioritize and select from your options.

#!/bin/bash

if [ $# -lt 1 ] || [ $# -gt 2 ] || ([ $# -eq 2 ] && [ $2 != "1" ] && [ $2 != "0" ]); then
      echo -e "create_site usage:"
      echo -e "create_site   []"
      echo -e "where:"
      echo -e "=> path to SSL artifacts"
      echo -e " => if 1, application is built for dev mode (default=0)"
      echo -e "             In dev mode, app src is a bind mount against host."
      echo -e "args=$#"
      exit 16
fi

DEVMODE=0
if [ $# -eq 2 ] && [ $2 = "1" ]; then
   DEVMODE=1
fi 

# create network, if none exists
if [ -z $(docker network ls -q -f name=sitenet) ]; then
   echo "Creating network sitenet."
   docker network create --subnet=10.11.12.0/24 sitenet
else 
   echo "Using existing sitenet."
fi

echo -e "*** Creating Postgres image site-psql. *** "
MNT=""
if [ $DEVMODE -eq 1 ]; then
   MNT="-v $(pwd)/site_db/code:/code"
   echo "Bind mount Postgres code dir to host."
fi 

# create DB container
docker build --tag site-psql -f site_db/stage/Dockerfile $(pwd)/site_db
if [ $? -ne 0 ]; then
   echo -e "Error creating Postgres image; exiting."
   exit 16
fi
echo -e "*** Creating Postgres container sitedb. ***"
docker create --name sitedb   --network=sitenet --ip=10.11.12.1 $MNT site-psql

##
# TO-DO : need to restore blog data from backup.

echo -e "*** Creating Python image site-py. *** "
MNT=""
INI=""
if [ $DEVMODE -eq 1 ]; then
   MNT="-v $(pwd)/site_blog/code:/srv/site"
   echo "Bind mount Python code dir to host."   
   INI="--build-arg ini_file=development.ini"
   echo "Setting Pyramid blog app to use development ini."
fi 

# create Pyramid container
docker build $INI --tag site-py -f site_blog/stage/Dockerfile $(pwd)/site_blog
if [ $? -ne 0 ]; then
   echo -e "Error creating Python image; exiting."
   exit 16
fi
echo -e "*** Creating Python container siteblog. *** "
docker create --name siteblog --network=sitenet --ip=10.11.12.2 $MNT site-py 

echo -e "*** Creating Nginx image site-nginx. ***"
MNT=""
if [ $DEVMODE -eq 1 ]; then
   MNT="-v $(pwd)/site_www/code:/www/data"
  echo "Bind mount Nginx code dir to host."
fi

SSL="-v $1:/www/tls" 

# create Nginx container
docker build --tag site-nginx -f site_www/stage/Dockerfile $(pwd)/site_www
if [ $? -ne 0 ]; then
   echo -e "Error creating Nginx image; exiting."
   exit 16
fi
echo -e "*** Creating Nginx container sitewww. ***"
docker create --name sitewww  --network=sitenet --ip=10.11.12.3 $SSL $MNT site-nginx

The Bone Yard

Browse posts by

Search posts by

Diving Into Docker

The Planning and Execution Around Dockerizing My Blog

Container Composition

Container Communication

Container Storage

Development in Containers

Like what you're reading?

How to Connect

Tags for this post