Back to blog

etcd Quota Alarm: When Your Kubernetes Cluster Goes Read-Only

The cluster looked fine until etcd started screaming about its quota. “kubectl apply hangs, new pods stuck in Pending.” The cause: etcd exceeded its storage quota and entered alarm mode, making the cluster effectively read-only until you compact and defragment.

Environment: Kubernetes with etcd (self-managed or kubeadm), clusters with high churn (frequent deployments, many events), default etcd configuration

The Problem

The Sudden Lockdown

Timeline of cluster freeze:

T+0:00   Normal cluster operation
         etcd DB size: 2GB (quota: 2GB)
         Compaction: running every 5 min

T+1:00   Compaction job fails silently
         DB size starts growing with history

T+24:00  DB size: 2.05GB
         etcd triggers ALARM: NOSPACE
         All write operations rejected!

T+24:01  kubectl apply deployment.yaml
         Error: etcdserver: mvcc: database space exceeded

T+24:02  Pod crashes, can't reschedule
         scheduler: can't create binding: space exceeded

T+24:03  ConfigMap update fails
         Everything is frozen

What the Errors Look Like

# API server logs
E0115 03:42:17.123456 etcdserver: mvcc: database space exceeded

# kubectl errors
$ kubectl apply -f deployment.yaml
Error from server: etcdserver: mvcc: database space exceeded

$ kubectl create namespace test
error: etcdserver: mvcc: database space exceeded

# Even deletions fail!
$ kubectl delete pod stuck-pod
error: etcdserver: mvcc: database space exceeded

Root Cause

How etcd Storage Works

etcd MVCC (Multi-Version Concurrency Control):

┌─────────────────────────────────────────────────────────────┐
│ Every write creates a NEW revision, old versions kept       │
│                                                             │
│ Key: /registry/pods/default/nginx                          │
│                                                             │
│ Rev 1000: {replicas: 1}  ← kept for history                │
│ Rev 1001: {replicas: 2}  ← kept for history                │
│ Rev 1002: {replicas: 3}  ← kept for history                │
│ Rev 1003: {replicas: 5}  ← current                         │
│                                                             │
│ Without compaction:                                         │
│ - All revisions stored forever                              │
│ - DB grows with every write                                 │
│ - watch operations can read old history                     │
│                                                             │
│ Quota (default 2GB) prevents unbounded growth               │
│ When exceeded → ALARM → read-only mode                      │
└─────────────────────────────────────────────────────────────┘

Why Compaction Stops

# Common reasons compaction fails:

# 1. etcd running without auto-compaction
etcd --auto-compaction-retention=0  # Disabled!

# 2. kube-apiserver not setting compaction
# Check apiserver flags:
ps aux | grep kube-apiserver | grep etcd-compaction
# Missing: --etcd-compaction-interval

# 3. Compaction runs but defrag doesn't
# Compaction marks space as reclaimable
# Defragmentation actually frees it
# DB file stays large without defrag

# 4. High write rate exceeds compaction rate
# Cluster with 1000s of deployments/hour
# Compaction can't keep up

The Quota Math

# Check current etcd status
etcdctl endpoint status --write-out=table

# +----------------+------------------+-------+-------+----------+
# |    ENDPOINT    |        ID        | V     | DB SZ | IS LEADER|
# +----------------+------------------+-------+-------+----------+
# | 127.0.0.1:2379 | 8e9e05c52164694d | 3.5.0 | 2.1GB | true     |
# +----------------+------------------+-------+-------+----------+

# Check quota
etcdctl endpoint status --write-out=json | jq '.[] | .Status.dbSize, .Status.dbSizeInUse'
# 2147483648  (DB size on disk: 2GB)
# 1073741824  (Actually used: 1GB - rest is history!)

# Check alarms
etcdctl alarm list
# memberID:8e9e05c52164694d alarm:NOSPACE

Diagnosis

Check etcd Health

# Connect to etcd (find certs in /etc/kubernetes/pki/etcd/)
export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key

# Check health
etcdctl endpoint health

# Check status including DB size
etcdctl endpoint status --write-out=table

# List any alarms
etcdctl alarm list

Analyze Storage Usage

# Get top keys by size
etcdctl get / --prefix --keys-only | \
  cut -d/ -f1-4 | sort | uniq -c | sort -rn | head -20

# Output shows what's filling your etcd:
# 15234 /registry/events/default
# 8234 /registry/pods/kube-system
# 5123 /registry/configmaps/default

# Events are often the biggest culprit!
# Default retention: forever (until you delete)

Check Compaction Status

# Get current revision
etcdctl endpoint status --write-out=json | jq '.[].Status.header.revision'
# 12345678

# See what revision is compacted to
etcdctl endpoint status --write-out=json | jq '.[].Status.header.raft_term'

# Check apiserver compaction settings
kubectl -n kube-system get pod kube-apiserver-* -o yaml | \
  grep -A5 etcd-compaction

The Fix

Step 1: Emergency - Clear the Alarm

# First, compact to free up logical space
# Get current revision
REVISION=$(etcdctl endpoint status --write-out=json | \
  jq -r '.[].Status.header.revision')

# Compact to current revision (removes history)
etcdctl compact $REVISION

# Defragment to free physical space
etcdctl defrag --endpoints=https://127.0.0.1:2379

# Clear the alarm
etcdctl alarm disarm

# Verify
etcdctl alarm list
# (should be empty)

etcdctl endpoint status --write-out=table
# DB size should be smaller now

Step 2: Increase Quota (Temporary)

# If compaction alone isn't enough, increase quota
# Edit etcd static pod manifest
vim /etc/kubernetes/manifests/etcd.yaml

# Add/modify:
spec:
  containers:
  - command:
    - etcd
    - --quota-backend-bytes=4294967296  # 4GB
    # ... other flags

# etcd will restart automatically
# WARNING: This is treating symptom, not cause

Step 3: Enable Auto-Compaction

# etcd auto-compaction (edit etcd manifest)
spec:
  containers:
  - command:
    - etcd
    - --auto-compaction-mode=periodic
    - --auto-compaction-retention=1h  # Keep 1 hour of history
# For kube-apiserver (edit apiserver manifest)
spec:
  containers:
  - command:
    - kube-apiserver
    - --etcd-compaction-interval=5m0s  # Compact every 5 minutes

Step 4: Clean Up Events

# Events are often the biggest space consumer
# Delete old events
kubectl delete events --all -A

# Or set shorter TTL (Kubernetes 1.25+)
# In apiserver:
--event-ttl=1h  # Default is 1h, but check yours

Step 5: Set Up Regular Defragmentation

# CronJob for regular defragmentation
apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-defrag
  namespace: kube-system
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          containers:
          - name: etcd-defrag
            image: bitnami/etcd:3.5
            command:
            - /bin/sh
            - -c
            - |
              etcdctl defrag \
                --endpoints=https://127.0.0.1:2379 \
                --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                --cert=/etc/kubernetes/pki/etcd/server.crt \
                --key=/etc/kubernetes/pki/etcd/server.key
            volumeMounts:
            - name: etcd-certs
              mountPath: /etc/kubernetes/pki/etcd
              readOnly: true
          volumes:
          - name: etcd-certs
            hostPath:
              path: /etc/kubernetes/pki/etcd
          restartPolicy: OnFailure
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          tolerations:
          - effect: NoSchedule
            key: node-role.kubernetes.io/control-plane

Step 6: Reduce Write Volume

# Reduce event spam from controllers
# In controller-manager:
spec:
  containers:
  - command:
    - kube-controller-manager
    - --event-burst=20      # Default 30
    - --event-qps=5         # Default 20
# Reduce leader election churn
# Increase lease duration
spec:
  containers:
  - command:
    - kube-controller-manager
    - --leader-elect-lease-duration=30s  # Default 15s
    - --leader-elect-renew-deadline=20s  # Default 10s

Monitoring

groups:
  - name: etcd
    rules:
      - alert: EtcdDatabaseSizeHigh
        expr: |
          etcd_mvcc_db_total_size_in_bytes /
          etcd_server_quota_backend_bytes > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "etcd database {{ $value | humanizePercentage }} of quota"

      - alert: EtcdDatabaseSpaceExceeded
        expr: |
          etcd_server_has_leader == 1 and
          etcd_mvcc_db_total_size_in_bytes > etcd_server_quota_backend_bytes
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "etcd quota exceeded - cluster read-only!"

      - alert: EtcdCompactionPaused
        expr: |
          increase(etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_count[1h]) == 0
        for: 2h
        labels:
          severity: warning
        annotations:
          summary: "etcd compaction hasn't run in 2 hours"

      - alert: EtcdDefragNeeded
        expr: |
          (etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes) /
          etcd_mvcc_db_total_size_in_bytes > 0.5
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "etcd has >50% reclaimable space - defrag needed"

Checklist

## etcd Quota Alarm Recovery

### Emergency Recovery
- [ ] Check alarm status: etcdctl alarm list
- [ ] Get current revision for compaction
- [ ] Run compaction: etcdctl compact $REVISION
- [ ] Run defragmentation: etcdctl defrag
- [ ] Disarm alarm: etcdctl alarm disarm
- [ ] Verify cluster is writable

### Prevention
- [ ] Enable auto-compaction in etcd (--auto-compaction-retention)
- [ ] Enable compaction in apiserver (--etcd-compaction-interval)
- [ ] Set up scheduled defragmentation
- [ ] Delete old events regularly
- [ ] Monitor DB size vs quota

### Capacity Planning
- [ ] Size quota appropriately (start 2GB, increase as needed)
- [ ] Estimate write rate and retention needs
- [ ] Consider 3+ node etcd cluster for HA

Conclusion

The lesson: etcd’s MVCC design keeps all history until compaction runs. Without regular compaction and defragmentation, your cluster will hit the quota and become read-only.

Key principles:

  1. Every write adds to DB size - history accumulates fast
  2. Compaction removes history - but doesn’t free disk space
  3. Defragmentation frees space - must run after compaction
  4. Events are often the biggest consumer - clean them regularly

Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "etcd Quota Alarm: When Your Kubernetes Cluster Goes Read-Only". https://www.michal-drozd.com/en/blog/etcd-compaction-quota-alarm/ (Published November 27, 2024).