CSI VolumeAttachment Stuck: Pods in ContainerCreating and Drains That Never Finish
This is the Kubernetes storage incident that eats hours because the symptoms look like “Kubernetes is slow”:
- Pods sit in ContainerCreating forever.
kubectl drainhangs on a node because volumes won’t detach.- Events show
AttachVolume.Attach failedorMulti-Attach error. - The application isn’t even starting, so app-level debugging is useless.
When CSI is in the picture, the real unit of progress is often the VolumeAttachment object and its finalizers — not the Pod.
Tested on: Kubernetes 1.29–1.31, CSI drivers for cloud block storage and on-prem, StatefulSets, managed and self-managed clusters.
Incident narrative (anonymized)
We lost a node (hard failure). A StatefulSet pod rescheduled to a new node, but it never came up.
What I saw:
- Pod stayed in ContainerCreating.
- Events alternated between “waiting for volume attachment” and “multi-attach”.
- VolumeAttachment objects piled up with old node references.
The actual root cause was not “the disk is broken”. The CSI controller path was degraded: the external-attacher was running with a single replica and got evicted during the chaos. That left finalizers stuck, so attachments/detachments didn’t converge.
Constraint: this was a stateful workload. A wrong “force” action can cause data corruption. I needed a runbook that makes “safe vs risky” explicit.
Timeline
- T-0: Node fails; StatefulSet pod reschedules.
- T+10m: Pod stuck in ContainerCreating; I check Pod events.
- T+20m: I identify the PV/PVC and the related VolumeAttachment objects.
- T+30m: VolumeAttachment shows old node attachment state; finalizer not progressing.
- T+45m: Mitigation: restore CSI controller health (external-attacher back up) and wait for clean detach/attach.
- T+90m: Pod starts; volume shows attached to the new node.
- T+1d: Fix: make CSI controllers HA + add alerts on stuck VolumeAttachments.
Mechanism: why VolumeAttachment is the “truth” during CSI incidents
Pods don’t attach volumes; controllers do
For CSI, the attach/detach flow is coordinated by controllers and tracked as objects:
- PVC/PV describe what you want.
- VolumeAttachment represents the attach intent and state for a specific node.
- CSI side components (external-attacher, external-provisioner) and the kube-controller-manager drive the state machine.
Finalizers exist so Kubernetes doesn’t “forget” about an attachment before the driver confirms detach. That’s good — but when the controller path is unhealthy, finalizers become a wedge.
Common failure modes
- Multi-Attach
- Many block volumes support a single attach.
- If a node dies and the volume is still marked attached, the new node can’t attach.
- CSI controller path degraded
- external-attacher not running / stuck / no leader
- RBAC or cloud API errors
- control-plane congestion
- Node is NotReady but not really gone
- Kubernetes still thinks the node exists; detach can take a long time.
- Force detach too early risks two nodes writing the same volume.
Runbook: from Pod symptom to safe recovery
What to check first
1) Pod events (they usually tell you which volume)
kubectl -n <ns> describe pod <pod> | sed -n '/Events:/,$p'
Look for lines like:
AttachVolume.Attach failedMulti-Attach errortimed out waiting for the condition
2) Identify PVC and PV
kubectl -n <ns> get pod <pod> -o jsonpath='{.spec.volumes[*].persistentVolumeClaim.claimName}{"\n"}'
kubectl -n <ns> get pvc <pvc> -o wide
kubectl get pv <pv> -o wide
3) Inspect VolumeAttachment objects (cluster-scoped)
kubectl get volumeattachment
kubectl describe volumeattachment <name>
If you have many, filter by PV name in the spec (or grep output):
kubectl get volumeattachment -o yaml | grep -n "<pv>" -n | head
What I look for in describe:
- target node name
attached: true/false- errors from the CSI driver
- finalizers that aren’t being removed
4) Check the CSI controller components
The exact names vary by driver, but you’re looking for:
- external-attacher
- external-provisioner
- driver controller pods
kubectl -n kube-system get pods | grep -E 'csi|attacher|provisioner' | head -n 50
How to confirm the hypothesis
You have a “VolumeAttachment stuck” incident if:
- Pod is blocked on volume attach
- VolumeAttachment references an old node or sits in error
- CSI controller components are unhealthy or the underlying volume is still attached elsewhere
A strong confirmation is when restoring the controller health (or fixing cloud API errors) causes VolumeAttachments to progress without force.
Safe mitigations
1) Make the CSI controller path healthy again
If external-attacher is down or wedged, fix that first:
- restore replicas
- fix RBAC
- fix cloud API rate limits
- restart only the controller components (not the whole cluster)
This is usually safe because it allows the designed state machine to complete.
2) Ensure the old node is truly dead before any “force”
If you suspect multi-attach:
- confirm the old node is NotReady and not coming back
- ensure the filesystem isn’t mounted anywhere else
- only then consider provider-side detach actions
3) Drain in the correct order
If you’re draining a node:
- cordon first
- delete pods that hold the volume if needed
- wait for detach to complete before proceeding
Risky mitigations (high data-loss potential)
- Deleting VolumeAttachment objects or stripping finalizers by hand
- you can trick Kubernetes into thinking a volume is safe to reattach while it’s still mounted
- Force detach in the storage provider without ensuring the node is dead
- you can create split-brain at the filesystem level
- Restarting everything
- increases chaos; may hide the root cause
What we changed (concrete)
1) We made CSI controllers highly available
Before: one replica, no PDB, vulnerable to eviction during cluster stress.
After: 2 replicas + PDB + priority class (representative example):
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: csi-controller
namespace: kube-system
spec:
minAvailable: 1
selector:
matchLabels:
app: csi-controller
And a deployment tweak (sketch):
spec:
replicas: 2
template:
spec:
priorityClassName: system-cluster-critical
2) We added an alert on stuck VolumeAttachments
Using kube-state-metrics (metric names can vary), the intent is:
- alert if a VolumeAttachment exists for more than 15 minutes and is not progressing
- alert if a volume is attached to a NotReady node for too long
Example query shape:
# VolumeAttachments older than 15 minutes
time() - kube_volumeattachment_created > 900
3) We documented “safe detach checklist”
We turned “tribal knowledge” into a checklist:
- confirm node state
- confirm mount state
- confirm VolumeAttachment target
- only then consider force detach
How to verify (measurable)
1) VolumeAttachment converges
kubectl get volumeattachment
kubectl describe volumeattachment <name>
Expected:
- attachment errors stop
attachedmatches reality- finalizers get removed when appropriate
2) Pod transitions to Running
kubectl -n <ns> get pod <pod> -w
Expected: ContainerCreating → Running without repeated attach events.
3) Stateful workload passes a basic integrity check
For databases, I always run a lightweight integrity check or a read-only query that touches the data directory. The goal is to ensure we didn’t “recover” by corrupting.
Prevention / guardrails
- Treat CSI controllers like control-plane
- HA, PDB, priority class, observability
- Time budgets
- “volume detach must complete within N minutes” as an SLO
- Alerting
- stuck VolumeAttachment, repeated attach errors, multi-attach
- Runbooks
- explicit “safe vs risky” actions for storage incidents
Related reading
- Pods Stuck in Terminating: A Production Decision Tree for Finalizers, Volumes, and Dead Nodes
- Kubernetes Graceful Shutdown as a Contract: Zero 502s During Rollouts (HTTP + gRPC)
- Kubernetes Rollout Without DB Outage: How to Stop PostgreSQL Connection Storm
- Zero-Downtime PostgreSQL Migrations: Expand/Contract, Backfill and Rollback Strategies
- ‘No Space Left on Device’ with 40% Disk Free: The Inode and OverlayFS Death Spiral
- PostgreSQL Idle in Transaction: Emergency Playbook for Stuck Connections
Related posts
Pods Stuck in Terminating: A Production Decision Tree for Finalizers, Volumes, and Dead Nodes
A conservative runbook to unstick Pods safely: finalizers, CSI/volume cleanup stalls, dead nodes, and when (and how) to force-delete.
Ephemeral-Storage Evictions in Kubernetes: The Log Storm That Took Down Healthy Pods
Pods get evicted for ephemeral-storage while disk looks free. Debug nodefs/imagefs, container logs, kubelet GC, then enforce budgets and log rotation.
Redis AOF fsync Latency Spikes: When Durability Becomes Your p99
Redis AOF can turn durability into p99 spikes: fsync pressure and rewrite fork CoW. Runbook to confirm, mitigate safely, and add guardrails.
etcd Quota Alarm: When Your Kubernetes Cluster Goes Read-Only
Cluster stops accepting writes, pods can't schedule. The cause: etcd hit its storage quota because compaction wasn't running, history accumulated beyond limits.
Cite this article
If you reference this post, please link to the original URL and credit the author.