Docker: volumes tutorial – how and when to use

In this post, we will go over how Docker manages data on your host and how you can leverage it in your day-to-day work.

After reading this post you should understand:

  • Distinguish between two types of data: Volatile and Persistent
  • When should you choose one over the other
  • Real world examples on when to choose which

This post is a followup on our Docker introduction post, we highly recommend you to first go over it before proceeding.

Volatile data

During development, it is common to experiment in an isolated environment to test your code.
By default, unless you instruct Docker otherwise, any data written inside a container will be deleted once that container is removed (not stopped).

Lets confirm that this is indeed the case. Let’s start a container and create a file in it:

When creating a new container, if we haven’t defined a volume explicitly (using the -v flag), Docker creates a layer on the host which represents the data diff between the original image (in our example: alpine:latest) and the current running container.

In order to see this in action, we will use the Docker’s inspect command which provides low level information on Docker objects.

Note: Docker inspect provides much more information than we need, so we have appended:

-f "{{ json .GraphDriver }}" | python -m json.tool

as syntactic sugar to only show the part we care about.

Now let’s look inside the UpperDir path:


As we can see, Docker maintains any data written in side a container in a special location on the host machine.
Now that we know where it is kept on the host, lets stop our container and see if the data is still there.

As we can see, the data is still there! that is because the container has not been removed yet and potentially, you want to start it again using docker container start volatile.

Now, let us remove it and see if the data still persist.

What have we learned?
Docker manages any container data at a special location (that can be extracted using the docker container inspect command).
As long as we do not remove the container, the data will persist.

When to use volatile type data?

Development / Experimental stage.
While we saw that the container data is persistent even if we stop the container, it is meant to be used during the development stage.
A good example is our Postgres DB from our previous post, we can experiment freely with a newly created Postgres container or reuse the data until we are happy with the result before committing our code.

Persistent data – volumes

There are two types of volume. Docker-managed volume and Bind mount volume.
Both are similar in being a mount point between the host directory tree and a container directory tree, but differ in where that location is on the host.
Each has its pros and cons and we will discuss each in depth so you will know when to choose each.

Bind mount volumes

Bind mount volume maps a user-specific directory or file on the host machine to a corresponding location on the container.
When should we use bind mount volumes?
The rule of thumb is, if you need to share data between the host machine and the container, you will want to use the mount bind type volume.

Example: Testing your code in a container

When developing a new application, you often need to test it out.
Let us develop a small flask application to demonstrate the bind mount option.

First, let create our environment, create a new folder called “app” and add two files to it:

# app/requirements.txt
Flask
# app/app.py
from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=80)

Now that we finished writing our code, let us spin a new container and mount the folder inside so we can run it:

Example of development of a flask application inside a container

Bind mount syntax looks as follows:
-v <absolute path on host>:<absolute path on container>
So you will notice that in the example above, I used:
-v $(pwd)/app:/code.
Now that the container is up and running, you will notice that any change made on the host side is directly changing the files in the container, feel free to experiment with it using your IDE of choice or ever add new files to the host app folder and watch it appear in the container.

Another use case for bind mounts is when you want to share data between your container and other running processes on the host machine.

For example, if you just started your journey with containers, a good practice is to start containerising each service independently. Let us assume your log aggregation is being done by one of the many tools out there (e.g. filebeat, logstash, greylog or splunk) running on the host machine.

As we just learned, we can bind mount a folder in our host machine like:
-v /path/where/we/manage/logs:/var/log/hello-world/logs
This way, if our app is writing the logs into /var/log/hello-world/logs we can point any agent / service on the host machine to listen on /path/where/we/manage/logs and offload the log management to some other process.

There are many more use cases for bind mounts but we will leave it to you to explore further.

Docker-managed – volumes

Managed volumes use locations that are created by the Docker daemon in a Docker managed space (similar to what we saw in volatile).

It is the recommended way to work since it is not dependent on the host directory structure and has several advantages over bind mount volumes:

  • Easier to back up and/or migrate than bind mounts.
  • Can be managed by Docker CLI commands or the Docker API.
  • Safely share the volume among multiple containers.
  • Different volume drivers allow you to store volumes on remote (other hosts or cloud providers), encrypt the contents of volume.

When should we use it? whenever possible 🙂

Example: sharing data between containers

In this example, we are going to create a volume and show you how two different containers are going to use it, one as a producer and the other as a consumer.
Before we start, we will introduce a new command:

Create a docker volume named “shared-vol”

To see our new volume, we will use a second command docker volume ls that will list all volumes that Docker is currently managing.
We can see Docker has created the volume shared-vol using the driver local (drivers are a topic of a future post, for now local means a managed volume similar to what we saw earlier).
Let us spin a new container called test1 and attach our volume to it under /mnt/shared
Note the subtle difference in how we use the flag this time:
-v <volume name>:<path on system>

Let us create a second container test2 and mount the same volume but this time to: /mnt/consume

Note that both containers have been created with -v shared-vol:<path>

Now that everything is ready, from container test1 let us create a new file under our volume mount point: /mnt/shared/hello.txt.

And now, from our second container test2, let us check the (previously empty) folder /mnt/consume:


As we can see, our test2 container can see a new file created by our test1 container.

This type of functionality allows us, for example, to take our log aggregation example and implement it between two different containers, one that produces logs into a volume and a second one that consumes it.

Summary

This post was an introduction to volumes and how one can leverage the different types based on specific needs.
We went over the differences and use cases of bind-mount volumes and persistent type and showed examples on how and when you should use each.

It is important to note that for managed volumes, one should use the
--mount flag with the explicit parameters over the --volume / -v flag we demonstrated, but for the sake of simplicity we chose not to.

There are many more things we can do with Docker volumes, especially around the managed volumes, but we will save those for a future post on advance volume patterns.

Stay tuned!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s