Redis Cluster Slot Migration: Temporary Memory Explosion
Slot migration is usually fine, until it suddenly isn’t. “Redis pod killed by OOM during cluster reshard.” We were adding a node to our Redis cluster—a routine operation we’d done many times. The rebalance started, slots began migrating, and then node-1 died. Kubernetes restarted it. The migration resumed. Node-1 died again. After the third restart, the cluster went into FAIL state and we had a production incident.
The cause was something we’d never considered: Redis slot migration is a copy operation, not a move. When you migrate a slot from node-0 to node-1, the keys are copied to node-1 first, and only deleted from node-0 after the entire slot migration completes. During that window, both nodes hold copies of all the data. If your nodes are running near their memory limits—as ours were at ~80% capacity—the incoming copy pushes the destination node over its limit.
What made this particularly frustrating was that everything worked in staging. Our staging cluster had plenty of memory headroom. Production, optimized for cost, ran much closer to limits. The migration that took seconds in staging became a death spiral in production.
The fundamental lesson is that Redis cluster operations aren’t just about steady-state capacity. You need to plan for transient states—migrations, failovers, restarts—where memory usage temporarily exceeds normal levels. A cluster that’s “right-sized” for running may be dangerously undersized for maintenance.
Environment: Redis Cluster 6.0+, Kubernetes with memory limits, slot rebalancing operations, large key migrations
The Problem
OOMKill During Rebalancing
Cluster rebalancing timeline:
T+0:00 Cluster has 3 masters, each ~4GB memory (limit 5GB)
node-0: slots 0-5460 (4.1GB)
node-1: slots 5461-10922 (3.9GB)
node-2: slots 10923-16383 (4.0GB)
T+0:05 Start rebalancing: move 1000 slots from node-0 to node-1
T+0:30 node-1 memory: 4.2GB (receiving keys from migration)
T+1:00 node-1 memory: 4.8GB (more keys arriving)
T+1:15 node-1 memory: 5.2GB → OOMKilled!
Cluster goes into FAIL state
Failover confusion, data inconsistency
The Migration Process
How Redis slot migration works:
Source Node (node-0) Destination Node (node-1)
┌──────────────────────┐ ┌──────────────────────┐
│ Slot 5461 │ │ │
│ ├─ key1: 100MB │─── COPY ───>│ key1: 100MB (new) │
│ ├─ key2: 50MB │ │ │
│ └─ key3: 200MB │ │ │
│ │ │ │
│ (still here!) │ │ (also here!) │
└──────────────────────┘ └──────────────────────┘
Memory during migration:
- Source keeps keys until slot fully migrated
- Destination holds new copies
- Total memory temporarily DOUBLED for migrating keys!
Only after slot migration completes:
- Source deletes keys
- But during migration, both nodes hold copies
Root Cause
Migration Phases
Slot migration detailed steps:
1. CLUSTER SETSLOT <slot> MIGRATING <destination-id>
Source marks slot as "migrating" - still serves reads/writes
2. CLUSTER SETSLOT <slot> IMPORTING <source-id>
Destination marks slot as "importing" - ready to receive
3. For EACH key in slot:
MIGRATE <dest-host> <dest-port> "" 0 5000 KEYS <key>
- Key is COPIED to destination (memory allocated)
- Key remains on source until all keys migrated!
4. CLUSTER SETSLOT <slot> NODE <destination-id>
Slot ownership transferred
5. Source deletes migrated keys (finally!)
Memory spike window: Steps 3-5
Duration: Proportional to slot data size and network speed
Why OOM Happens
# Simplified calculation
source_memory_before = 4.1 GB
destination_memory_before = 3.9 GB
memory_limit = 5.0 GB
slots_to_migrate = 1000 # out of 5461
data_per_migrating_slot = 4.1 GB * (1000/5461) = 750 MB
# During migration:
destination_memory_during = 3.9 GB + 750 MB = 4.65 GB # Close!
# But if migration is slow or keys are large:
# - More keys accumulate on destination
# - Source hasn't released yet
# - Destination OOMs before source can free memory
Diagnosis
Monitor Memory During Migration
# Watch memory on both nodes during rebalancing
watch -n 1 "redis-cli -h node-0 INFO memory | grep used_memory_human"
watch -n 1 "redis-cli -h node-1 INFO memory | grep used_memory_human"
# Check slot migration status
redis-cli -h node-0 CLUSTER INFO | grep migrating
redis-cli -h node-1 CLUSTER INFO | grep importing
# See which keys are being migrated
redis-cli -h node-0 CLUSTER GETKEYSINSLOT 5461 10
Check Before Migration
#!/bin/bash
# pre_migration_check.sh
SOURCE_NODE=$1
DEST_NODE=$2
SLOTS_TO_MOVE=$3
source_mem=$(redis-cli -h $SOURCE_NODE INFO memory | grep used_memory: | cut -d: -f2)
dest_mem=$(redis-cli -h $DEST_NODE INFO memory | grep used_memory: | cut -d: -f2)
dest_limit=$(redis-cli -h $DEST_NODE CONFIG GET maxmemory | tail -1)
# Estimate data size for migrating slots
source_total_keys=$(redis-cli -h $SOURCE_NODE DBSIZE)
source_total_slots=5461 # For a 3-master cluster
estimated_migration_size=$((source_mem * SLOTS_TO_MOVE / source_total_slots))
projected_dest_mem=$((dest_mem + estimated_migration_size))
echo "Source memory: $source_mem"
echo "Destination memory: $dest_mem"
echo "Destination limit: $dest_limit"
echo "Estimated migration size: $estimated_migration_size"
echo "Projected destination memory: $projected_dest_mem"
if [ $projected_dest_mem -gt $dest_limit ]; then
echo "WARNING: Migration may cause OOM on destination!"
exit 1
fi
Identify Large Keys
# Find large keys that will be migrated
for slot in $(seq 5461 6461); do
keys=$(redis-cli -h node-0 CLUSTER GETKEYSINSLOT $slot 100)
for key in $keys; do
size=$(redis-cli -h node-0 MEMORY USAGE $key)
if [ "$size" -gt 10000000 ]; then # > 10MB
echo "Large key in slot $slot: $key ($size bytes)"
fi
done
done
The Fix
Option 1: Increase Memory Headroom
# Kubernetes: Give more memory headroom
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-cluster
spec:
template:
spec:
containers:
- name: redis
resources:
limits:
memory: "8Gi" # Was 5Gi
requests:
memory: "6Gi"
# Configure Redis to leave headroom
args:
- redis-server
- --maxmemory
- "5gb" # 60% of limit, leave room for migration
- --maxmemory-policy
- "noeviction"
Option 2: Migrate Fewer Slots at Once
#!/bin/bash
# gradual_rebalance.sh - Migrate in small batches
BATCH_SIZE=100 # Slots per batch
PAUSE_SECONDS=30
total_slots_to_move=1000
moved=0
while [ $moved -lt $total_slots_to_move ]; do
echo "Migrating slots $moved to $((moved + BATCH_SIZE))"
redis-cli --cluster reshard node-0:6379 \
--cluster-from node-0-id \
--cluster-to node-1-id \
--cluster-slots $BATCH_SIZE \
--cluster-yes
echo "Waiting for memory to stabilize..."
sleep $PAUSE_SECONDS
# Check destination memory before continuing
dest_mem=$(redis-cli -h node-1 INFO memory | grep used_memory_rss: | cut -d: -f2)
max_mem=$(redis-cli -h node-1 CONFIG GET maxmemory | tail -1)
if [ $((dest_mem * 100 / max_mem)) -gt 80 ]; then
echo "Destination at ${dest_mem}/${max_mem}, waiting for cleanup..."
sleep 60
fi
moved=$((moved + BATCH_SIZE))
done
Option 3: Pre-Delete Cold Keys
#!/usr/bin/env python3
# pre_migration_cleanup.py - Remove cold keys before migration
import redis
from datetime import datetime, timedelta
def cleanup_cold_keys(source_host, slots_to_migrate, idle_threshold_days=30):
r = redis.Redis(host=source_host, port=6379)
for slot in slots_to_migrate:
keys = r.cluster('GETKEYSINSLOT', slot, 1000)
for key in keys:
# Check if key is cold (not accessed recently)
idle_time = r.object('IDLETIME', key) # Returns seconds
if idle_time > idle_threshold_days * 86400:
# Key hasn't been accessed in N days
key_size = r.memory_usage(key)
print(f"Deleting cold key {key} ({key_size} bytes, idle {idle_time}s)")
r.delete(key)
# Force memory defragmentation
r.config_set('activedefrag', 'yes')
Option 4: Use MIGRATE REPLACE Carefully
# For very large keys, consider DUMP/RESTORE with deletion
# This is manual but gives more control
redis-cli -h source-node DUMP large_key > /tmp/key.dump
redis-cli -h dest-node RESTORE large_key 0 "$(cat /tmp/key.dump)"
redis-cli -h source-node DEL large_key
# This deletes from source before migration, avoiding double memory
# But requires careful slot transition handling
Monitoring
groups:
- name: redis-cluster
rules:
- alert: RedisClusterMigrationMemory
expr: |
(redis_memory_used_bytes / redis_memory_max_bytes) > 0.85
and redis_cluster_slots_migrating > 0
for: 2m
labels:
severity: critical
annotations:
summary: "Redis node memory high during slot migration"
- alert: RedisSlotMigrationStuck
expr: |
increase(redis_cluster_slots_migrating[10m]) == 0
and redis_cluster_slots_migrating > 0
for: 10m
labels:
severity: warning
annotations:
summary: "Slot migration appears stuck"
Checklist
## Redis Cluster Slot Migration Memory
### Before Migration
- [ ] Calculate total data size in slots to migrate
- [ ] Verify destination has headroom (data + 50%)
- [ ] Identify and handle large keys (>100MB)
- [ ] Consider deleting cold/expired keys first
- [ ] Set up memory monitoring on both nodes
### During Migration
- [ ] Migrate in small batches (100-500 slots)
- [ ] Pause between batches for memory stabilization
- [ ] Monitor both source and destination memory
- [ ] Watch for migration stuck/timeout
### If OOM Occurs
- [ ] Check cluster state: CLUSTER INFO
- [ ] Identify which slots were mid-migration
- [ ] May need to abort and restart: CLUSTER SETSLOT STABLE
- [ ] Increase memory limits before retry
Conclusion
Redis slot migration is a perfect example of how transient states can break systems that work fine in steady state. Your cluster can run for months at 80% memory utilization with no issues. Then you try to add a node, rebalance starts, and suddenly you’re in an OOM death spiral. The steady-state metrics didn’t predict the transient failure.
The core insight is that slot migration is a copy-then-delete process. The source node keeps its data until migration completes; the destination node accumulates incoming data. For the duration of migration, the data exists in both places. If your nodes are sized for normal operation without migration headroom, you’ll hit memory limits during any rebalancing operation.
The fix requires thinking about capacity differently. Instead of asking “how much memory do I need for my data?” ask “how much memory do I need for my data plus the overhead of any operation I might perform?” For Redis clusters, that means leaving 30-50% headroom for migrations, failovers, and background persistence operations.
Key principles:
- Destination needs headroom for migrating data plus existing data—plan for 150% capacity during migration
- Migrate in batches with pauses for memory stabilization—100-500 slots at a time
- Monitor both source and destination during migration—the destination will spike
- Pre-clean cold keys to reduce migration size before starting
- Size for operations, not just data—a cluster that can’t safely rebalance isn’t properly sized
Related Articles
- Go Timer Heap Pressure - Memory management edge cases
- Java Native Memory OOMKilled - Container memory limits
Related posts
Java OOMKilled With Stable Heap: Native Memory, Direct Buffers, and glibc Arenas
Heap metrics look fine, GC is happy, but the container keeps dying. The culprit: native memory from direct buffers, JNI, and glibc memory allocator fragmentation.
JVM Metaspace OOM in Kubernetes: Why MaxMetaspaceSize Alone Won't Save You
Pod OOMKilled despite MaxMetaspaceSize set. The cause: Metaspace grows outside heap, container memory limit doesn't account for it, and class unloading isn't happening.
Redis Memory Fragmentation: When maxmemory Isn't Enough
Your Redis has 4GB maxmemory but RSS shows 6GB. OOM killer strikes. I explain jemalloc fragmentation with reproduction steps and activedefrag tuning.
Kubernetes OOM Killer: Why Your Container Dies at 50% Memory
Container memory limit is 4GB but OOM kills at 2GB used. Kernel buffers, page cache, and cgroup accounting tricks cause early OOMKills. Here's the full picture.
Cite this article
If you reference this post, please link to the original URL and credit the author.