Kubernetes Storage Made Easy - Kubernetes Volumes & Persistent Volumes
When working with any application, you want a way to store data. In a production environment, you might have to store mission-critical data such as customer information, contact and payment information, or some historical data. Losing this data would be disastrous for the business and the customer.
As you might already be aware, Kubernetes runs its workloads in pods, and pods are ephemeral in nature. When a pod is destroyed, the data within the pod is lost as well. When the pod gets recreated, the data from the older pod does not carry over to the new pod. So how does Kubernetes enable data storage in a way that the data persists even if the pod is deleted?
Within this blog post, we will be taking a deep dive into the mechanics built into Kubernetes for managing persistent data. We will be taking a hands-on approach and creating the different storage resources to ensure that the pod data is persisted.
Before continuing with this blog post, please make sure you have read the following blogs as they will give you more background information into all the concepts we will be discussing:
Container Storage Interface (CSI)
Before we can go ahead and discuss the different storage mechanisms in Kubernetes, it’s important to understand about the container storage interface (CSI). The container storage interface (CSI) is a standardized interface that lets Kubernetes interact with different storage systems in a standardized way. Since Kubernetes is a vendor-agnostic tool, it is important to be able to easily integrate with different types of storage drives. For example, if the CSI standard did not exist, it would be very difficult to integrate Kubernetes with an S3 Bucket, Google Cloud’s storage devices, and storage devices provided by other cloud providers. Custom integrations would need to be made for each of these different storage methods.
With the introduction of CSI, it became much easier to integrate with all these different storage devices.
Kubernetes Volumes & VolumeMounts
As you are already aware, Kubenretes runs all of it’s application containers within pods. Any data that is required by the application is stored within the pod. Let’s say that you have a pod that generates logs at a set interval. While the pod is running, this data will be stored in the pod’s container. However, as soon as this pod gets destroyed, the data is lost as well.
A volume is a storage component used by the containers for storing data. There are multiple types of Kubernetes Volumes. Kubernetes provides two main types of volumes: Ephemeral Volumes and Persistent Volumes. As the name suggests, ephemeral volumes have the same lifecycle as the pod itself. Once the pod dies, the data is also lost. Persistent Volume on the other hand can store the data far beyond the pod’s lifecycle. The data is available even after the pod is destroyed, and can be accessed by newer pods as well. We will look at persistent volumes later in this blog. First, we will be looking at Ephemeral Volumes.
Ephemeral Kubernetes Volumes
As mentioned earlier, Ephemeral Volumes in Kubernetes are temporary storage units. The data they contain is lost after the pod is destroyed. This is useful for use cases such as caching solutions that do not require to persist the data. There are a few different types of Ephemeiral Volumes available in Kubernetes such as:
- emptyDir: A directory that is empty at Pod startup. The storage with this comes locally from the kubelet running on the node or in the memory.
- configMap, downwardAPI, secret: Inject different kinds of Kubernetes data into a Pod
- image: allows mounting container image files or artifacts, directly to a Pod.
- CSI ephemeral volumes: Similar to the previous volume kinds, but provided by special CSI drivers.
Kubernetes Volumes are defined within the pod manifest and can be used across all the different containers in the pod. In order to mount the Kubernetes Volume into any container, you can make use of VolumeMounts. Let’s take a look at how to create a Kubernetes volume of emptyDir
type, and mount it within the pod.
Let’s say that we have a pod running a Redis container. Redis is an in-memory database, so we need to provide it with a volume to store the data. Let’s create this Redis pod, and give it a Kubernetes volume of type emptyDir
which is mounted at /data/redis
.
Before the volume can be mounted to the pod, the volume first has to be created. The below YAML is used to create a volume. This volume is called redis-storage
and we have assigned it the emptyDir
type.
volumes:
- name: redis-storage
emptyDir: {}
After the volume is created, we have to actually mount it in the container so that it can be used. To mount a volume to a container, we use the volumeMounts
field. A single container can have multiple volumes mounted to it. For the redis-storage
volume, we will mount it using:
volumeMounts:
- name: redis-storage
mountPath: /data/redis
Putting it all together, the final YAML manifest that will create the pod with the Kubernetes volume will look as follows:
apiVersion: v1
kind: Pod
metadata:
name: redis
spec:
containers:
- name: redis
image: redis
volumeMounts:
- name: redis-storage
mountPath: /data/redis
volumes:
- name: redis-storage
emptyDir: {}
Let’s break down what’s happening in the above YAML file. We will take a look at all the fields under the spec
field:
- Name: Name of the container
- Image: The container image that’s running inside the pod.
- volumeMounts: The Kubernetes volumes that are being mounted in the container
- mountPath: The path where the Kubernetes Volume is being mounted
- Volumes: The Kubernetes Volumes that are defined for this pod.
- emptyDir: The type of Kubernetes Volume is being created.
So far we’ve looked at the Ephemeral Volumes. Now let’s take a look at Persistent Volumes and how we can use them within our pods.
Kubernetes Persistent Volumes (PV)
In many cases, you want to retain the pod's data long after it has been destroyed. To persist the data, Kubernetes provides us with two objects. Persistent Volumes(PV) and Persistent Volume Claims(PVC). They are similar to Kubernetes Volumes and VolumeMounts in the sense that a Persistent Volume(PV) is used to provision the storage resources and a Persistent Volume Claim(PVC) is used to bind the storage resources to a particular pod or set of pods.
Let’s take a look at how you can provision a Persistent Volume in Kubernetes, and mount it to a pod’s container using Kubernetes Volumes and Volume Mounts. Let’s say that you want to run a pod with a PostgreSQL container image. Postgres is a database which means that you want to store the data long after the lifecycle of the pod.
While creating a Persistent Volume, there are a number of configurations that we have to keep in mind. Let’s look at them one by one, and understand what’s their importance.
Volume Mode
The Volume Mode defined how the data volume should be created for the Persistent Volume. There are two types of Volume Modes that can be used:
- Filesystem: A filesystem type is mounted in the pod in a specific directory. If it has a raw block storage device attached to it, and the device is empty, Kubernetes will create a filesystem on it before mounting it for the first time
- Block: When you use a block volume, the volume will be presented to Kubernetes as a raw block device. It will not have any filesystem on it.
Access Mode
When mounting a persistent volume, you need to define its access mode as well. The Access mode is used to define how to read and write data on the volume. There are a few different access modes that can be defined:
- ReadWriteOnce: The volume can be mounted as a read-write volume on a single node. If multiple pods are running on the same node, they can still access the volume.
- ReadOnlyMany: The volume can be mounted as Read Only by multiple pods running on multiple nodes.
- ReadWriteMany: The volume can be mounted as read-write by pods running on multiple nodes.
- ReadWriteOncePod: The volume can be mounted as read-write only by a single pod.
Reclaim Policy
Every Persistent Volume has a reclaim policy that defines what should happen to the volume after the pod is destroyed. Three reclaim policies can be set on the Persistent Volume:
- Retain: The persistent volume will still exist after the pod is deleted. It will retain the data of the pod, but it cannot be bound to a new pod until manual action is taken.
- Recycle: Recycle automatically cleans the Persistent volume. It simply runs a simple
rm -rf /volume-mount-path/*
. Once the data has been deleted, the volume can be bound to a new pod. - Delete: The delete reclaim policy will delete the persistent volume.
Mount Options
When creating a Persistent Volume, we also have to define how we want to mount the volume. Some of the mounting options for the Persistent Volume include:
- azureFile
- iscsi
- nfs
- vsphereVolume
Using the above options, we can create a Persistent Volume. The below YAML will create a Persistent Volume, which we will later mount into a pod that runs a PostgreSQL container image.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pgsql-pv
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
hostPath:
path: /data/postgresql/
The above YAML manifest will create a Persistent Volume that will store the data on the node host at the path /data/postgresql/
and has a capacity of 5 GB.
Next, we will see the role of a Persistent Volume Claim.
Persistent Volume Claim (PVC)
To bind a Persistent Volume(PV) to a pod, you will have to create a Persistent Volume Claim (PVC) which has the same configurations as the Persistent Volume. The PVC is what we will actually mount into the pod. The PVC will try to find a Persistent Volume which matches the following configurations:
- Access Mode
- Storage Size
- VolumeMode
If the PVC does not find a persistent volume that matches all the above properties, it will not bind to any persistent volume.
Let us create a PVC that will get bound with the Persistent Volume that has been created above. The following YAML will create a PVC while ensuring that all the configurations match with the persistent volume that has been created.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgresql-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 5Gi
Now that we have created the PV and PVC with the correct configurations, let’s mount the Persistent Volume to the PostgresSQL Pod. We can use the following YAML manifest to mount the PVC in to the pod.
apiVersion: v1
kind: Pod
metadata:
name: pod-with-block-volume
spec:
containers:
- name: postgres-container
image: postgres
volumeMounts:
- name: data
path: /data/postgresql/
volumes:
- name: data
persistentVolumeClaim:
claimName: postgresql-pvc
In the above YAML file, we use the fields persistentVolumeClaim
under the Volumes
section to define which persistent volume claim we want to use for creating a volume. Here, we have defined the claimName
as postgresql-pvc
. While creating a volume, Kubernetes will look look for a PVC with the given name, and use it to mount the Persistent Volume to the pod.
With the above configuration, we have successfully created a persistent volume, a persistent volume claim, and mounted it to a Postgres pod. All the data created by the Postgres pod will be stored in the persistent volume.
Storage Class (SC)
In Kubernetes, a StorageClass is a blueprint for how a persistent volume should be created. It is very useful when you want to dynamically provision storage from an external source such as a cloud provider. For example, you may want to store your application’s data in an AWS S3 bucket. For this, you can easily create a Storage Class in Kubernetes using the AWS S3 plugin. All you need to do is create the appropriate Persistent Volume Claim (PVC) and mount it in your pod. The Persistent Volume will automatically be created by the Storage Class.
It simplifies managing storage because you can create different storage classes for different needs, like one for high-speed storage for a database and another for regular storage for file storage. Then, when you deploy your app, you can just pick the StorageClass that best suits your storage needs without worrying about the details of setting it up manually.
Every storage class has the following fields defined in it:
- Provisioner: The place from where the storage device is provisioned. For example, AWS S3 Buckets.
- Reclaim Policy: Same as the reclaim policies discussed for Persistent Volumes (PVs)
- Volume Binding Mode: This defines when should the Persistent Volume be created in the storage lifecycle. The options for the different binding modes are
Immediate
andWaitForFirstConsumer
.
Every Kubernetes cluster has a default storage class. The default storage class stores the data on the Node’s host path. To learn about all the different provisioners that can be used to create a storage class, please refer to this page in the official Kubernetes documentation.
Conclusion
Many different applications in Kubernetes require data to be persisted across different pods. Since pods are ephemeral, the data within the pods is lost once the pod is deleted. In order to ensure that the data is persisted, Kubernetes provides two resources i.e. Persistent Volumes and Persistent Volume Claims.
These resources can be used to create persistent Kubernetes Volumes and mount them in the pods using Kubernetes Volume and VolumeMounts. They can be configured in a number of different ways to ensure that they fit the requirements of the application running in the pod.