Creating a docker container to emulate your systems environment

Continuous integration and Testing for a Django app with a postgres database

Part 1: Creating a docker container to emulate your systems environment

Background

To be written.

Intro: Docker

1. Create a container that runs python

Here we’re using python 3.6 from a miniconda installation. Luckily, someone has already created a container with miniconda in it. We’re also going to install the python modules we need whilst we’re at it

FROM continuumio/miniconda3

# Python packages from conda 
RUN conda install -y -c rdkit rdkit 
RUN conda install -y -c anaconda pandas 
RUN conda install -y -c anaconda luigi 
RUN conda install -y -c anaconda numpy 
RUN conda install -y -c anaconda psycopg2 
RUN conda install -y -c anaconda postgresql 
RUN conda install -y -c anaconda django 
RUN conda install -y -c conda-forge django-extensions

Actually, this is a bit of a silly way to install python packages. Usually, we would include a requirements.txt file, as you would with a normal python package. However, conda environments work slightly differently. With conda, we can create a yaml file to specify the environment name, channels to look for, and which packages to install.
What’s even better is we can use one simple command in our current environment to produce this file for us:

conda env export > environment.yml

Here’s our environment.yml file. We’ve stripped the dependencies down to the bare minimum, to try to keep our docker container small:

name: pipeline 
channels: 
- rdkit 
- anaconda 
- conda-forge 
dependencies: 
- rdkit==2017.09.02 
- pandas==0.22.0 
- luigi==2.7.5 
- numpy==1.14.2 
- psycopg2==2.7.4 
- postgresql==10.5 
- django==2.0.5 
- django-extensions==2.0.7

It’s important to specify which version of each package you want to install. If your continuous integration breaks, you want to be sure that this isn’t due to an update in your software dependencies. Thus, if we specify versions, we can ensure that the code we’re running at home is the same code that is being tested on Travis.
Next, we add the environment.yml file to our docker container, and create our conda environment:

# Python packages from conda 
RUN echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc 
RUN /bin/bash -c "source ~/.bashrc" 
ADD environment.yml /tmp/environment.yml 
WORKDIR /tmp 
RUN conda env create

Steps from above snippet:

Add the conda profile to our bashrc file
Source our .bashrc file – here we have to specify that we’re using a bash command, as docker will default to sh, which means the source command would fail
Add our environment.yml file to the container
Create our environment

2. Add a postgres database

For our app, we require a database (you may have noticed Django and postgresql in the requirements.txt file earlier).
To set up a postgres database, we follow the same steps in our docker container as we would on our own computer. First, we create a directory to store our database files:

mkdir for database files

RUN mkdir database/ RUN mkdir database/db_files
Next, we need to add a user to the container to run the database. We don’t want to run our database as sudo anyway, so let’s add a postgres user, and make them the owner of the database directories we’ve created above:

Add pipeline user

RUN adduser postgres RUN chown postgres database/ RUN chown postgres database/db_files
Next, we want to run our database within our container. I won’t cover in depth how to run a postgres database here, that should be a separate post, if anything. There’s also some excellent documentation on the postgres website ().
To run our database, there are a few commands we need to run. These would usually be done at the command line. To keep our docker container nice and simple, we’re going to add a bash script to the container, and then run it. This bash script also contains the commands we need to set up our database with Django, and to run a daemon for luigi (this is specific to our app, so feel free to omit this line if you are following this tutorial for your own services):

#!/bin/bash

nohup luigid >/dev/null 2>&1 & 
pg_ctl -D db_files -l logfile start 
createdb test_xchem 
cd pipeline 
python manage.py makemigrations db 
python manage.py migrate db

And we add it to our docker container a similar way to how we added our requirements.txt file. We’ll also swap ownership over to the postgres user, and allow anyone to read, write, or execute this file:

COPY run_services.sh . 
RUN chown postgres run_services.sh 
RUN chmod 777 run_services.sh

3. Set up Django dependencies

Next, we need to consider our settings file for Django. Usually, this would not be uploaded to github, as it contains sensitive details such as usernames and passwords to access the database that the app communicates with.
To solve this problem, we can create a settings file for the docker container, which never goes near your production database.
Here’s an example from the settings file for our docker container, where we set up the user for the database:

# Database 
# https://docs.djangoproject.com/en/2.0/ref/settings/#databases

DATABASES = { 
    'default': { 
    'ENGINE': 'django.db.backends.postgresql', 
    'NAME': 'test_xchem', 
    'USER': 'postgres', 
    'PASSWORD':  '', 
    'HOST': 'localhost', 
    'PORT': 5432, 
    } 
}

You can read more about Django settings at their website (), but here we’re assuming you are already familiar with Django, as you’re setting up a test environment for your Django app!
Let’s add the Django settings file to the docker container in the same way we added the run_services.sh file in step 2:

COPY settings_docker_django.py . 
RUN chown postgres settings_docker_django.py 
RUN chmod 777 settings_docker_django.py

4. Starting our database in the container

Now that we have all of the installation steps done, and we’ve copid over all of the files and scripts required to run the environment, we can start up our database. First, we switch over to the postgres user we created earlier:

# Run the rest of the commands as the ``postgres`` user 
USER postgres

This means that any commands run below this line are run as the postgres user.
Next, we need to start up the database. This can be done with the standard postgres command initdb:

# Start postgres 
RUN initdb db_files

The final step in making the database available in the test environment is to expose the port that the database runs on. This also means you can interact with the databse by ssh’ing into the container. The default port used in postgres is 5432, so let’s expose that (we’re also exposing port 8082, which is used by luigi, but don’t worry about that):

EXPOSE 5432 
EXPOSE 8082

5. Pulling our code into the container

Next, we want to pull the code we are testing into our container. The continuous integration relies on a push to github to trigger it (we’ll explain that in part 3), so we want to make sure we’re using up-to-date code. We do this with the usual commands you would use to pull a git repo:

# Git pull pipeline 
RUN git clone https://github.com/xchem/pipeline.git

6. Telling the container what to do when we start it

The docker container needs an instruction to tell it what to do when we start it. As we haven’t built our test suite yet (coming up in Part 2), we’ll have it just run the database and luigi daemon for now.
First, we need to change our settings file to have the right name for Django (settings.py):

WORKDIR /database/pipeline/ 
RUN cp ../settings_docker_django.py settings.py 
RUN chmod 777 settings.py

Finally, we can tell the docker container to execute the run_services.sh script we added earlier. This runs the commands to set up the database and the luigi daemon (see step 2):

WORKDIR /database 
CMD ./run_services.sh

And we’re done setting up our container! Next, we can test that our container works.

7. Build your docker container

Now that we have our dockerfile, we can build the container to test it. To do this, make sure you’re in the same directory as your dockerfile and run:

docker build -t pipeline .

Oxford Comp Biochemistry

Search This Blog