KAI Road of Kubernetes 06 — Volumes and PersistentVolumes: Pods can move, data should not disappear

The previous chapter separated configuration from the image.

An image should package the program, not the environment decision or the secret.

But workloads do more than read configuration.

They also write things:

uploaded files
cache
log buffers
queue checkpoints
database data

That creates the next problem.

Pods can be recreated. Containers can restart. Nodes can change.

If data only lives in the writable filesystem of one container, it is tied too tightly to the fate of that container.

The sentence I would keep is this:

A Volume is a mountable data location inside a Pod; PVs and PVCs move long-lived data out of the Pod lifecycle.

KAI notebook style diagram showing that a Pod is temporary while data should have its own lifecycle through Volumes, PVs, and PVCs. — Pods can be recreated and containers can restart. Data with business meaning should not live only inside one container's writable layer.

Do not treat the container filesystem as a hard drive

A container image gives the process a clean starting point.

After the process starts, it can write files.

That does not mean the container has become a reliable disk.

When a container restarts, the writable layer of that container is not the place I want to trust for durable application data.

When a Pod is deleted, moved, or recreated, I should not expect data written inside the old container filesystem to naturally come back.

That is why Kubernetes has Volumes.

The first question is simple:

Where should containers in this Pod write data through an explicit mount point?

PersistentVolume and PersistentVolumeClaim answer the next question:

If the Pod disappears, should this data still exist, who provides it, and who is allowed to claim it?

Think of Volumes, PVs, and PVCs like rented storage

I remember this chapter with a mini-storage example.

A Pod is like a temporary workshop.

You walk in, do work, use the desk, and maybe tomorrow the whole room is gone and you get a different one.

If you only need temporary scratch paper, the desk is fine.

That is close to emptyDir: it exists while the Pod exists, and it can survive container restarts, but when the Pod is removed from the node, the data is gone.

If the material is important, you do not leave it only in the temporary room.

You rent a storage unit.

The storage unit is like a PersistentVolume: real storage capacity in the cluster.

The rental form is like a PersistentVolumeClaim: the workload does not need to know the exact disk brand, cloud provider, or storage backend. It requests capacity, access mode, and storage class.

The menu of storage plans is like a StorageClass: standard storage, faster storage, zone-aware storage, each backed by a different provisioner or policy.

The only memory model I need is:

The Pod is the workshop. The PVC is the rental form. The PV is the actual storage unit.

A Volume is first a mount point

A Kubernetes Volume is not always a permanent disk.

The first meaning is more basic:

A Volume is a data source declared in the Pod spec, and each container mounts it at a path.

So the Pod spec has two important places:

.spec.volumes: which volumes exist for this Pod
.spec.containers[*].volumeMounts: which container mounts which volume at which path

The same volume can be used by multiple containers in the same Pod.

But each container declares whether it mounts that volume and where it appears.

This matters.

A Volume is not background magic.

It is a specific path mounted into the container filesystem.

emptyDir is temporary storage inside a Pod

emptyDir is the easiest way to build the first storage intuition.

It is created when the Pod is assigned to a node, and it starts empty.

Containers in the Pod can read and write it.

A container crash does not delete the emptyDir, because the Pod still exists.

But when the Pod is removed from that node, the data in the emptyDir is permanently deleted.

So it is useful for:

scratch files
cache
shared working data between containers
recomputable intermediate results

It is not the right place for:

the only copy of user uploads
database data
non-rebuildable queue state
anything the application still needs tomorrow

The short version:

emptyDir can survive a container restart. It cannot survive the Pod going away.

PV and PVC are about data lifecycle

If data must live longer than the Pod, the design moves into PersistentVolume territory.

I do not remember PV and PVC as two similar abbreviations.

I remember the responsibility split:

PersistentVolume: a storage resource in the cluster, either created by an administrator or dynamically provisioned through a StorageClass
PersistentVolumeClaim: a namespaced request for storage, including capacity, access mode, and storage class
StorageClass: the storage plan and provisioner used when dynamically creating PVs

A Pod usually does not say, “give me that exact cloud disk.”

It says, “use this PVC.”

Kubernetes uses the PVC to find the bound PV, then mounts that storage into the Pod.

That abstraction feels very Kubernetes:

the workload describes what it needs.

the cluster binds that need to real infrastructure.

A simplified PVC example

Start with the claim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: web-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard
  resources:
    requests:
      storage: 10Gi

This does not mean “create a folder.”

It means:

I need 10Gi of storage from the standard StorageClass, mounted with ReadWriteOnce semantics.

A Pod template can reference that claim:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: ghcr.io/example/web:1.0.0
          volumeMounts:
            - name: web-data
              mountPath: /var/lib/web
      volumes:
        - name: web-data
          persistentVolumeClaim:
            claimName: web-data

The point is the responsibility split:

the PVC describes the storage request
the Deployment describes where the workload mounts it
the application writes to /var/lib/web
the storage backend details stay behind the cluster abstraction

KAI notebook style PVC, StorageClass, and PV binding flow: a workload requests storage with a PVC, the StorageClass provisions, and the PV becomes Bound and mounted. — The PVC is the request, the StorageClass is the plan, and the PV is the real storage. The workload asks; the cluster binds.

What Kubernetes actually does

Here is the useful version:

You create a PVC with capacity, access mode, and StorageClass.
The control plane finds a matching PV, or dynamically provisions one through the StorageClass.
Once the claim and volume bind, the PVC becomes Bound.
A Pod references the PVC as a volume.
After the Pod is scheduled to a node, the kubelet and storage driver attach and mount the volume at the requested path.
When the Pod goes away, what happens to the underlying data depends on reclaim policy, the storage backend, and whether you have backups.

There are a few boundaries worth keeping precise.

First, PVC Bound does not guarantee that the application can write.

The system can still fail later on topology, attach, mount permissions, filesystem, driver behavior, or quota.

Second, ReadWriteOnce is not a universal “only one Pod” lock.

It means the volume can be mounted read-write by a single node. If the design needs a cluster-wide single-Pod writer guarantee, look at ReadWriteOncePod and CSI driver support.

Third, reclaim policy is not a backup strategy.

Many dynamically provisioned PVs inherit the reclaim policy from their StorageClass; a common default is Delete.

Before deleting a PVC, know whether the backing data will be retained or deleted.

Common beginner mistakes

1. Thinking every Volume is permanent

Volume is an abstraction, not a durability promise.

emptyDir, configMap, secret, and PVC-backed volumes all have different lifecycles.

Ask for the volume type before assuming the data survives.

2. Treating the PVC as the data itself

A PVC is a claim, not the stored content.

It is the workload’s request and reference point for storage.

The actual data lives behind the PV and storage backend.

This often turns into a scheduling or mount problem.

Even when multiple Pods on the same node can touch the same RWO volume, that does not make the application data model safe.

If every replica needs stable identity and its own storage, the next chapter is StatefulSet.

4. Looking only at the Pod

A Pod stuck in ContainerCreating is not always an image problem.

The PVC may still be Pending, or attach / mount may be failing.

5. Deleting PVCs without checking reclaim policy

Some data is really gone after deletion.

Kubernetes can manage storage object lifecycles, but it does not replace your backup strategy.

How I would inspect it

If someone says “the Pod will not start” and the workload uses storage, I would not only read logs.

I would walk these layers:

kubectl get pvc -n <ns>
kubectl describe pvc <pvc-name> -n <ns>
kubectl get pv
kubectl describe pv <pv-name>
kubectl get storageclass
kubectl describe pod <pod-name> -n <ns>
kubectl get events -n <ns> --sort-by=.lastTimestamp

The signals I care about:

whether the PVC is Pending or Bound
whether capacity, access mode, and StorageClass match the workload
whether the PV reclaim policy is Delete or Retain
whether Pod Events mention attach, mount, or permission errors
whether the storage backend has topology constraints
whether multiple replicas are sharing a PVC that should not be shared

KAI notebook style Kubernetes storage troubleshooting path: check PVC, PV, StorageClass, Events, and Pod mount or attach errors. — When a Pod is stuck, do not blame the image first. Check whether the PVC is Bound, then walk PV, StorageClass, Events, and mount errors.

Short version:

When storage breaks, do not inspect only the Pod. Walk PVC -> PV -> StorageClass -> Events.

How I remember Volumes

I do not remember this chapter as “Kubernetes has many volume types.”

That turns into a list too quickly.

I remember it like this:

Volume is the join between a workload and data; PV/PVC moves that join out of the Pod’s lifetime.

Pods can be recreated.

Containers can restart.

Nodes can be replaced.

But if data has business meaning, it cannot merely be “whatever happened to be written inside one container.”

It needs its own lifecycle, request model, mount point, reclaim boundary, and backup boundary.

This will keep coming back.

Once you run stateful workloads, the hard part of Kubernetes is not starting a process.

The hard part is that the process can change, but identity and data cannot be random.

Three things to keep

A Volume is an explicit data mount inside a Pod; it is not automatically persistent storage.
emptyDir follows the Pod; PV/PVC move data lifecycle outside the Pod.
Debug storage from PVC status first, then PV, StorageClass, Events, and mount errors.

The next chapter should ask: What is a StatefulSet? When every Pod needs a stable name and its own data, Deployment stops being the most comfortable tool.

Technical references: