Why Persistent Volume?
The need for persistent storage in Kubernetes arises for two reasons:
- ephemerality of the Pods;
- the ability to share data between different Pods, which are part of the same application, or between the containers of the same Pod.
Kubernetes doesn't provide data persistence out of the box, which means when a pod is re-created, the data is gone. So, you need to create and configure the actual physical storage and manage it by yourself. Once configured, you can use that physical storage using Kubernetes storage components.
How does storage work in Kubernetes?
To fulfill this work, Kubernetes provides 3 components, that you need to use to connect the actual physical storage to your pod, so that the application inside the container can access it. They are (references to the official guide in the list):
- Storage Class (SC): a way for administrators to describe the "classes" of storage they offer (reference to the guide). Borrowing the concepts of the OOP paradigm, the PV is an instance of the SC, which takes place thanks to the PVC constructor.
- Persistent Volume (PV): a piece of storage in the cluster that has been provisioned by an administrator (static provisioning) or using Storage Classes (dynamic provisioning). It is a resource in the cluster, like CPU or RAM, and, like them, have a lifecycle independent of any individual Pod that uses the PV (reference to the guide).
- Persistent Volume Claim (PVC): a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (more details later or in the guide).
From the descriptions of the 3 components it is clear that PV and SC are managed by the administrators (backend), while PVC concerns the user (frontend). Furthermore, it is clear that there are two ways of implementing the resources useful for storage, namely PVs:
- static provisioning: a cluster administrator creates a number of PVs. They carry the details of the real storage, which is available for use by cluster users. They exist in the Kubernetes API and are available for consumption.
- dynamic provisioning: in this case the cluster administrator is limited to creating a manifest, which will be used (responsibly) by the user to instantiate the PVs necessary for his applications.
In the static approach, if no PV present on the cluster meets the user's needs, the latter will have to contact the administrator and request the creation of a new PV with the desired specifications. This scenario, although more controlled than the dynamic one, can be inefficient if the requests become numerous.
Parameterization
These Kubernetes components can be finely customized thanks to a number of parameters. We briefly present the theoretical aspects concerning their personalization.
Warning
PersistentVolume types are implemented as plugins. Kubernetes currently supports a large number of plugins (GCEPersistentDisk, AWSElasticBlockStore, AzureDisk, NFS, CephFS, Cinder, etc.). A broad overview of the parameters that the different components can adopt is presented below. It is not certain that all plugins support all the parameters listed here.
Storage Class
Each SC contains fields which are used when a PV belonging to the class needs to be dynamically provisioned. The name of a SC object is significant, and is how users can request a particular class. Administrators set the name and other parameters of a class when first creating SC objects, and the objects cannot be updated once they are created. The aspects concerning the SC are:
- Provisioner. It determines what volume plugin is used for provisioning PVs (the list is available on the official documentation). This field must be specified.
- Reclaim Policy. PVs that are dynamically created by a SC will have the reclaim policy specified in the
reclaimPolicy
field of the class. The possible values are the same presented in the section of the PV. PVs that are created manually (static provisioning) and managed via a SC will have whatever reclaim policy they were assigned at creation. - Allow Volume Expansion. PVs can be configured to be expandable (in some provisioners). This feature when set to
true
, allows the users to resize the volume by editing the corresponding PVC object. You can only use this feature to grow a Volume, not to shrink it. - Mount Options. PVs that are dynamically created by a SC will have the mount options specified in the
mountOptions
field of the class. If the volume plugin does not support mount options but mount options are specified, provisioning will fail. - Volume binding Mode. The
volumeBindingMode
field controls when volume binding and dynamic provisioning should occur. There are two ways:Immediate
mode: volume binding and dynamic provisioning occurs once the PVC is created.WaitForFirstConsumer
mode: delay the binding and provisioning of a PV until a Pod using the PersistentVolumeClaim is created. Until then, the PVC waits in the pending state.
- Parameters. Storage Classes have parameters that describe volumes belonging to the storage class. Different parameters may be accepted depending on the
provisioner
. When a parameter is omitted, some default is used.
Persistent Volume
The following parameters can be managed only if the static mode is chosen, since in the dynamic case the PVs are generated on the basis of the SC and the PVCs. The aspects concerning the PV are:
- Capacity. A PV will have a specific storage capacity. This is set using the PV's capacity attribute. An example list of accepted values are 500K, 100M, 5G, 800Ki, 350Mi, 1Ti. The letter "i" accompanying the various SI prefixes indicates numbers on a binary basis, rather than a decimal basis (more details here).
- Volume mode. Kubernetes supports
Filesystem
(default) andBlock
volume modes. In the first case, a volume is mounted into Pods into a directory. In the second, volume is presented into a Pod as a block device, without any filesystem on it. This mode is useful to provide a Pod the fastest possible way to access a volume, without any filesystem layer between the Pod and the volume. - Access Mode. The ways of accessing the volume are shown below. Not all providers, however, support the 3 modes listed.
ReadWriteOnce
(RWO): the volume can be mounted as read-write by a single node;ReadOnlyMany
(ROX): the volume can be mounted read-only by many nodes;ReadWriteMany
(RWX): the volume can be mounted as read-write by many nodes.
- Class. A PV can have a class, which is specified by setting the
storageClassName
attribute to the name of a SC. A PV of a particular class can only be bound to PVCs requesting that class. A PV with nostorageClassName
has no class and can only be bound to PVCs that request no particular class. In the static case, therefore, the class merely has the function of a label. - Reclaim Policy. When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PV tells the cluster what to do with the volume after it has been released of its claim. Current reclaim policies are listed below. As in the case of access mode, policy support depends on the provider used.
- Retain. This policy allows for manual reclamation of the resource. When the PVC is deleted, the PersistentVolume still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume.
- Recycle. This policy performs a basic scrub (
rm -rf /thevolume/*
) on the volume and makes it available again for a new claim. - Delete (default). It removes both the PV object from Kubernetes, as well as the associated storage asset in the external infrastructure.
- Mount Option. A Kubernetes administrator can specify additional mount options for when a PV is mounted on a node, using the
mountOptions
attribute. Mount options are not validated, so mount will simply fail if one is invalid. Again, not all persistent volume types support mount options. - Node Affinity. Kubernetes offers us the possibility to create a sub-selection of nodes, from which the volume can be accessed. Pods that use a PV will only be scheduled to nodes that are selected by the node affinity.
- Phase. Even if it does not represent a parameter, we conclude the part on the PV with a picture regarding the possible status it can assume.
- Available: a free resource that is not yet bound to a claim;
- Bound: the volume is bound to a claim;
- Released: the claim has been deleted, but the resource is not yet reclaimed by the cluster;
- Failed: the volume has failed its automatic reclamation;
Persistent Volume Claim
Now let's move on to the component that deals with claiming pieces of storage. PVC can be used both in static and dynamic mode, but it was mainly born to be used on the frontend side in dynamic mode. The aspects concerning the PVC are:
- Capacity. Claims, like Pods, can request specific quantities of a resource. In this case, the request is for storage and use the same convention as PV.
- Volume mode. Same conventions as PV.
- Access Mode. Same conventions as PV.
- Selector. Claims can specify a label selector to further filter the set of volumes. Only the volumes whose labels match the selector can be bound to the claim.
- Class. A claim can request a particular class by specifying the name of a StorageClass using the attribute
storageClassName
. In the static provisioning, only PVs of the requested class, ones with the samestorageClassName
as the PVC, can be bound to the PVC. As mentioned, this parameter is not mandatory, in fact it can be set with the name of the class, it can be empty (storageClassName
set equal to""
) and completely missing. What happens if the parameter is absent? The system behavior depends on whether theDefaultStorageClass
admission plugin is:- ON. The administrator may specify a default SC. All PVCs that have no
storageClassName
can be bound only to PVs of that default. Specifying a default SC is done by setting the annotationstorageclass.kubernetes.io/is-default-class: "true"
in a SC object. If the administrator does not specify a default, the cluster responds to PVC creation as if the admission plugin were turned off. If more than one default is specified, the admission plugin forbids the creation of all PVCs. - OFF. There is no notion of a default SC. All PVCs that have no
storageClassName
can be bound only to PVs that have no class.
- ON. The administrator may specify a default SC. All PVCs that have no
To activate this plugin the administrator needs to edit the /etc/kubernetes/manifests/kube-apiserver.yaml
file (the path may be different), inserting DefaultStorageClass
in the specifications
spec: containers: - command: - kube-apiserver # A few lines down... - --enable-admission-plugins=NodeRestriction,DefaultStorageClass
Lifecycle of a volume and claim
We conclude by investigating in more detail the link between PV and PVC. In particular, we deepen the concepts of binding and storage object in Use Protection.
Binding
For the union between PV and PVC, two of the parameters described above must be in agreement: storage and access mode.
If the PVC requires more memory than the one made available by the PV, the binding does not take place and will remain unbound indefinitely if a matching volume does not exist. Likewise, if the access mode of the two components are not compatible, they will remain unconnected. I used the term compatible, because the two values don't necessarily have to be the same. In some providers, the binding can still take place if the PVC requires a lower level of permissions (eg. RWO) than that of the PV (eg. RWX).
We have said that the bond between PV and PVC is one-to-one. Therefore if the PVC requires, for example, 5Gi and the PV makes 8Gi available, there is a "waste" of 3Gi. Unfortunately, this can happen in the case of static provisioning. Dynamic mode is more "economical", because the PV is sewn exactly to the requirements of the PV.
Storage Object in Use Protection
The purpose of the storage object in use protection feature is to ensure that PVCs, in active use by a Pod, and PVs, that are bound to PVCs, are not removed from the system, as this may result in data loss. If a user deletes a PVC in active use by a Pod, the PVC is not removed immediately. PVC removal is postponed until the PVC is no longer actively used by any Pods. Also, if an admin deletes a PV that is bound to a PVC, the PV is not removed immediately. PV removal is postponed until the PV is no longer bound to a PVC.