Redis Memory Fragmentation: When maxmemory Isn't Enough
Redis said 4GB. The kernel said 6GB. The kernel won. That’s how I learned that maxmemory is a data limit, not an RSS limit.
The gap is memory fragmentation, and it is one of the most misunderstood parts of Redis in production. maxmemory controls how much data Redis stores. RSS (Resident Set Size) is how much physical memory the process uses. When allocations and frees are uneven, the allocator (jemalloc) leaves holes that count toward RSS even if Redis thinks it is under its limit.
What makes this particularly insidious is the pattern that causes it: variable-sized keys with TTL. You set a 1KB session with a 5-minute TTL. You set a 10KB cache entry with a 1-minute TTL. They expire at different times, leaving fragmented holes in the memory allocator’s arenas. New allocations can’t fill the exact-sized holes, so they allocate fresh memory. RSS grows while used_memory stays constant.
The tragic irony is that Redis is often praised for its memory efficiency. And it is—for the data it stores. But the memory allocator’s overhead is separate, and under fragmentation-inducing workloads, that overhead can be 50% or more of your actual data size.
Tested on: Redis 7.2, jemalloc 5.3, Kubernetes with 8GB memory limit
The Hidden Memory
What Redis Reports vs Reality
# Redis INFO memory output
redis-cli INFO memory
used_memory:4294967296 # 4GB - what Redis tracks
used_memory_rss:6442450944 # 6GB - what OS sees!
mem_fragmentation_ratio:1.50 # 50% overhead
# This gap = fragmentation
# OOM killer sees RSS, not used_memory
Why Fragmentation Happens
Memory allocation pattern matters:
Scenario: Variable-sized keys with TTL
1. Allocate 1KB key (jemalloc picks 1KB slab)
2. Allocate 500B key (jemalloc picks 512B slab)
3. Key 1 expires, 1KB slab now has hole
4. New 600B key can't fit in 512B slab, needs new allocation
5. 1KB slab still reserved but partially empty
Over time:
┌─────────────────────────────────────────┐
│ [used][ hole ][used][ hole ] │ ← jemalloc arena
│ [used][used][ hole ] │
│ [ hole ][used][used][ hole ] │
└─────────────────────────────────────────┘
RSS = all allocated slabs
used_memory = actual data
Fragmentation = RSS / used_memory
Reproducing the Problem
Test Script
# fragment_redis.py
import redis
import random
import string
import time
r = redis.Redis(host='localhost', port=6379)
def random_value(min_size, max_size):
size = random.randint(min_size, max_size)
return ''.join(random.choices(string.ascii_letters, k=size))
# Phase 1: Create variable-sized keys with TTL
print("Phase 1: Creating 1M variable-sized keys...")
for i in range(1_000_000):
key = f"key:{i}"
value = random_value(100, 10000) # 100B to 10KB
ttl = random.randint(60, 300) # 1-5 min TTL
r.setex(key, ttl, value)
if i % 100000 == 0:
info = r.info('memory')
ratio = info['mem_fragmentation_ratio']
print(f"Keys: {i}, Fragmentation: {ratio:.2f}")
# Phase 2: Wait for TTLs and observe fragmentation
print("\nPhase 2: Waiting for expiration...")
for _ in range(10):
time.sleep(60)
info = r.info('memory')
print(f"used_memory: {info['used_memory_human']}, "
f"RSS: {info['used_memory_rss_human']}, "
f"fragmentation: {info['mem_fragmentation_ratio']:.2f}")
Results
Phase 1: Creating 1M variable-sized keys...
Keys: 100000, Fragmentation: 1.05
Keys: 500000, Fragmentation: 1.12
Keys: 1000000, Fragmentation: 1.18
Phase 2: Waiting for expiration...
Minute 1: used_memory: 850MB, RSS: 1.2GB, fragmentation: 1.41
Minute 3: used_memory: 620MB, RSS: 1.1GB, fragmentation: 1.77
Minute 5: used_memory: 380MB, RSS: 980MB, fragmentation: 2.58 # Critical!
# Data shrunk but RSS barely moved
# jemalloc holds onto fragmented memory
Solutions
1. Active Defragmentation (Redis 4.0+)
# redis.conf
activedefrag yes
# Start defrag when fragmentation > 10%
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
# Stop defrag when fragmentation < 5%
active-defrag-threshold-upper 100
# CPU effort (1-100)
active-defrag-cycle-min 1 # Min CPU% when defragging
active-defrag-cycle-max 25 # Max CPU% when defragging
# Scan limits per cycle
active-defrag-max-scan-fields 1000
How Active Defrag Works
Before defrag:
┌─────────────────────────────────────────┐
│ [A][ hole ][B][hole][C][ hole ] │ Arena 1
│ [D][hole][E][ hole ][F] │ Arena 2
└─────────────────────────────────────────┘
Defrag process:
1. Scan for fragmented values
2. Allocate new memory for value
3. Copy data to new location
4. Update pointer atomically
5. Free old memory
After defrag:
┌─────────────────────────────────────────┐
│ [A][B][C][D][E][F] │ Arena 1 (compacted)
│ (returned to OS) │ Arena 2 (freed)
└─────────────────────────────────────────┘
2. Memory Allocator Tuning
# jemalloc background thread for memory return
# Set in redis.conf or environment
# Option 1: Enable jemalloc background threads
redis-server --jemalloc-bg-thread yes
# Option 2: Force memory return to OS
# MEMORY PURGE command (Redis 4.0+)
redis-cli MEMORY PURGE
# Option 3: Tune jemalloc decay time
# Lower = faster memory return, higher CPU
export MALLOC_CONF="background_thread:true,dirty_decay_ms:1000,muzzy_decay_ms:1000"
3. Uniform Value Sizes
# Bad: Variable sizes cause fragmentation
r.set("user:1", json.dumps(small_user)) # 200B
r.set("user:2", json.dumps(large_user)) # 50KB
# Better: Pad to power-of-2 sizes
def pad_value(value, target_size=None):
data = json.dumps(value)
if target_size is None:
# Round up to nearest power of 2
size = len(data)
target_size = 1 << (size - 1).bit_length()
return data.ljust(target_size, '\0')
# Or use separate Redis instances for different size classes
# small_redis: values < 1KB
# large_redis: values > 1KB
4. Kubernetes Memory Configuration
# Don't set memory limit = maxmemory!
apiVersion: v1
kind: Pod
spec:
containers:
- name: redis
resources:
requests:
memory: "4Gi"
limits:
memory: "6Gi" # 50% headroom for fragmentation!
env:
- name: REDIS_MAXMEMORY
value: "4gb"
---
# Redis config
apiVersion: v1
kind: ConfigMap
data:
redis.conf: |
maxmemory 4gb
maxmemory-policy allkeys-lru
activedefrag yes
active-defrag-threshold-lower 10
active-defrag-cycle-max 25
Monitoring
Prometheus Metrics
# Redis exporter metrics
- alert: RedisHighFragmentation
expr: |
redis_memory_fragmentation_ratio > 1.5
for: 30m
labels:
severity: warning
annotations:
summary: "Redis fragmentation ratio {{ $value }}"
description: "Consider enabling activedefrag"
- alert: RedisFragmentationCritical
expr: |
redis_memory_fragmentation_ratio > 2.0
for: 10m
labels:
severity: critical
annotations:
summary: "Redis fragmentation critical: {{ $value }}"
description: "OOM risk - RSS much higher than used_memory"
- alert: RedisRSSNearLimit
expr: |
redis_memory_used_rss_bytes / on(instance)
(container_spec_memory_limit_bytes) > 0.85
for: 5m
labels:
severity: critical
annotations:
summary: "Redis RSS at {{ $value | humanizePercentage }} of limit"
Grafana Dashboard Queries
# Fragmentation ratio over time
redis_memory_fragmentation_ratio
# Memory breakdown
redis_memory_used_bytes
redis_memory_used_rss_bytes
redis_memory_used_peak_bytes
# Active defrag stats
redis_active_defrag_running
redis_active_defrag_hits
redis_active_defrag_misses
redis_active_defrag_key_hits
Debugging Commands
# Check current fragmentation
redis-cli INFO memory | grep -E "(used_memory|fragmentation)"
# Memory doctor (Redis 4.0+)
redis-cli MEMORY DOCTOR
# Example output:
# "Sam, I have a few reports for you:
# * Peak memory: 6.2GB, RSS: 8.1GB, Fragmentation: 1.31
# * High fragmentation: Consider enabling activedefrag"
# Check allocator stats
redis-cli MEMORY MALLOC-SIZE 1024
redis-cli MEMORY STATS
# Force defrag check
redis-cli DEBUG QUICKLIST-PACKED-THRESHOLD 0
# Force memory return (careful in production)
redis-cli MEMORY PURGE
Prevention Strategies
Key Design
# Avoid mixing tiny and huge values
# Bad
r.set("config", "true") # 4 bytes
r.set("user:session:123", huge_json) # 100KB
# Better: Use appropriate data structures
r.hset("config", "feature_x", "true") # Hash for small values
r.set("session:123", compressed_data) # Compress large values
# Use consistent TTLs per key type
SESSION_TTL = 3600 # 1 hour for all sessions
CACHE_TTL = 300 # 5 min for all cache entries
Architecture Patterns
Pattern: Size-based sharding
┌─────────────────────────────────────────┐
│ Application │
└───────────────┬─────────────────────────┘
│
┌───────────┼───────────┐
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐
│ Small │ │ Medium│ │ Large │
│ <1KB │ │ 1-10KB│ │ >10KB │
│ Redis │ │ Redis │ │ Redis │
└───────┘ └───────┘ └───────┘
Each instance has uniform value sizes = minimal fragmentation
Checklist
## Redis Memory Fragmentation Prevention
### Configuration
- [ ] Enable activedefrag in redis.conf
- [ ] Set threshold-lower to 10%
- [ ] Set cycle-max to 25% (adjust based on CPU budget)
- [ ] Configure maxmemory with 30-50% headroom to container limit
### Monitoring
- [ ] Alert on fragmentation_ratio > 1.5
- [ ] Alert on RSS approaching container limit
- [ ] Dashboard showing used_memory vs RSS
### Key Design
- [ ] Use consistent value sizes where possible
- [ ] Compress large values before storing
- [ ] Use appropriate data structures (hashes for small values)
### Operations
- [ ] Schedule MEMORY PURGE during low traffic (if not using activedefrag)
- [ ] Monitor activedefrag hits/misses
- [ ] Consider size-based sharding for extreme cases
Conclusion
Redis memory management is a perfect example of how abstractions can be misleading. maxmemory sounds like it controls how much memory Redis uses. It doesn’t. It controls how much data Redis stores. The actual memory usage—what the OOM killer sees—includes the data plus all the overhead from the memory allocator’s internal bookkeeping, fragmentation, and reserved-but-unused space.
The core insight is that mem_fragmentation_ratio is the metric that reveals the truth. A ratio of 1.0 means RSS equals used_memory—perfect efficiency, rarely achieved. A ratio of 1.5 means you’re using 50% more memory than your data size. A ratio of 2.0 or higher is critical—you’re approaching OOM territory even though Redis thinks it has plenty of room.
Active defragmentation is the solution Redis provides, but it has costs. Defragmentation consumes CPU while it copies data to compact the memory layout. For latency-sensitive workloads, you might prefer to simply size your container with enough headroom that fragmentation doesn’t cause OOM. The choice depends on whether you’re optimizing for cost (smaller containers with defrag) or latency (larger containers without).
Key principles:
- Fragmentation ratio reveals real memory overhead—RSS / used_memory tells you the actual cost
- Variable-sized keys with TTL cause the worst fragmentation—different-sized holes left at different times
- Active defragmentation compacts memory automatically—enable it unless you’re latency-sensitive
- Set container limits 30-50% above maxmemory—leave room for the overhead you can’t avoid
- Monitor
mem_fragmentation_ratiocontinuously—it’s your leading indicator of OOM risk
Your next OOM might be hiding in the gap between used_memory and RSS. Check the fragmentation ratio now.
Related Articles
- Connection Pool Sizing with Little’s Law - Resource sizing
- Redlock vs Postgres Advisory Locks - Redis patterns
Related posts
PostgreSQL OOM by Design: work_mem × Parallel Workers × Plan Nodes
work_mem looks small at 256MB, but a parallel hash join with 4 workers across 3 plan nodes uses 3GB. Here's how to prevent PostgreSQL from legitimately OOMing your container.
Redis Cluster Slot Migration: Temporary Memory Explosion
Redis nodes OOMKilled during cluster rebalancing. The cause: slot migration copies keys to destination before deleting from source, temporarily doubling memory usage.
Java OOMKilled With Stable Heap: Native Memory, Direct Buffers, and glibc Arenas
Heap metrics look fine, GC is happy, but the container keeps dying. The culprit: native memory from direct buffers, JNI, and glibc memory allocator fragmentation.
Kubernetes OOM Killer: Why Your Container Dies at 50% Memory
Container memory limit is 4GB but OOM kills at 2GB used. Kernel buffers, page cache, and cgroup accounting tricks cause early OOMKills. Here's the full picture.
Cite this article
If you reference this post, please link to the original URL and credit the author.