Java OOMKilled With Stable Heap: Native Memory, Direct Buffers, and glibc Arenas
The heap was fine; the pod still died. “Heap is stable at 2GB, container limit is 4GB, but we keep getting OOMKilled.” This was the ticket that landed on my desk from our payments team. They’d done everything right by the book—set -Xmx well below the container limit, monitored GC pauses, kept heap usage under control. Yet their service kept dying every few hours, always with an OOM from the kernel, never from the JVM.
The missing memory turned out to be the silent killer of many Java applications in containers: native memory. Direct buffers from Netty, thread stacks from an unbounded executor, Metaspace growth from class loading, and glibc’s tendency to hold onto freed memory were collectively consuming an extra 2GB that never appeared in any heap metric.
This problem is particularly frustrating because Java developers are trained to watch the heap. We set -Xmx, we monitor garbage collection, we profile with VisualVM or JFR. None of those tools prominently show native memory consumption. You can have a perfectly tuned heap while your process silently balloons toward the container limit.
Environment: Java 17, Spring Boot, Kubernetes with 4GB container limit, Netty-based HTTP client
The Problem
Heap Looks Perfect
The symptoms are maddening. Every metric you know to check looks fine:
# JVM metrics say everything is fine
Heap: 2GB / 2.5GB (-Xmx2560m)
GC: G1, no full GCs, pause times <50ms
Metaspace: 150MB (stable)
# But container keeps dying
kubectl describe pod my-app
# Last State: OOMKilled
# dmesg on node shows:
# memory cgroup out of memory: Killed process 12345 (java)
The JVM is happy. GC is happy. Heap usage is within limits. Yet Linux keeps killing your process. This is your first clue that the memory problem isn’t in the heap at all.
Where’s the Memory?
Let me break down where memory actually goes in a typical Java application:
Container memory budget (4GB):
What you configured:
├── -Xmx2560m (Heap max) = 2.5 GB
└── Expected buffer = 1.5 GB (seems plenty!)
What's actually used:
├── Heap (actual) = 2.2 GB
├── Metaspace = 150 MB
├── Thread stacks (500 threads × 1MB) = 500 MB
├── Direct buffers = 800 MB ← Hidden!
├── JIT code cache = 240 MB
├── Native memory (JNI/Unsafe) = 200 MB ← Hidden!
└── glibc arena overhead = 400 MB ← Hidden!
─────────
Total: = 4.5 GB
Container limit: 4 GB → OOMKilled!
The “hidden” items are the killers. They don’t show up in standard heap metrics, yet they consume real memory that counts against your cgroup limit. When the sum exceeds your container’s allocation, Linux’s OOM killer terminates your process—and the JVM never sees it coming because it was never close to an OutOfMemoryError from a Java perspective.
Understanding Native Memory
Before diving into solutions, let’s understand each component of native memory consumption.
Direct Buffers (Off-Heap)
Direct buffers are Java’s mechanism for allocating memory outside the garbage-collected heap. They’re essential for high-performance I/O because they avoid copying data between Java’s heap and native memory when communicating with the operating system.
// Netty, gRPC, and many libraries use direct buffers
ByteBuffer buffer = ByteBuffer.allocateDirect(1024 * 1024);
// These are NOT counted in heap!
// Default limit: same as -Xmx, but additive
// Check current direct buffer usage:
// Via JMX: java.nio:type=BufferPool,name=direct
// Common sources:
// - Netty PooledByteBufAllocator
// - gRPC message buffers
// - NIO file operations
// - Compression libraries
The problem is that many libraries use direct buffers aggressively. Netty, the networking library underlying Spring WebFlux, gRPC, and many other frameworks, pools direct buffers for performance. A busy Netty server might hold hundreds of megabytes of direct buffers even when idle, waiting for reuse.
By default, the JVM allows direct buffer allocation up to -Xmx. So with -Xmx2560m, you could theoretically allocate another 2.5GB of direct buffers—well beyond what your container can handle. Unlike heap memory, direct buffer limits aren’t automatically adjusted for containers.
glibc Memory Arenas
This is perhaps the sneakiest source of memory bloat, and it’s not even Java’s fault. When your JVM allocates native memory (through JNI, Unsafe, or even internal operations), it uses the system’s C library allocator—typically glibc on Linux.
# glibc creates arenas for multi-threaded apps
# Each arena can hold onto freed memory
# Default arenas = 8 × CPU cores
# With 8 cores = 64 arenas
# Each arena can retain megabytes of freed memory
# Memory appears "freed" to Java
# But glibc hasn't returned it to OS
# RSS keeps growing!
# Check arena settings:
cat /proc/$(pgrep java)/environ | tr '\0' '\n' | grep MALLOC
# Often not set, using defaults
glibc’s memory allocator creates separate “arenas” for different threads to reduce lock contention. This is great for performance, but each arena maintains its own free list. When memory is freed, it often stays in the arena rather than being returned to the operating system. With many threads (common in Java applications), you can have dozens of arenas, each holding onto megabytes of “freed” memory.
I’ve seen cases where glibc arena fragmentation accounted for 30-40% of a process’s RSS. The memory was technically freed from Java’s perspective, but glibc was still holding it, and Linux counted it against the cgroup limit.
Thread Stack Accumulation
Every Java thread requires its own stack for method call frames, local variables, and other execution context. The default stack size is typically 1MB per thread.
// Each thread uses ~1MB stack by default
// 500 threads = 500MB outside heap
// Count threads:
jcmd <pid> Thread.print | grep "^\"" | wc -l
// Common causes of thread explosion:
// - Blocking I/O without proper pools
// - Unbounded executor services
// - One-thread-per-request patterns
A service with 500 threads consumes 500MB just for stacks—half a gigabyte that never appears in heap metrics. Thread count can creep up gradually: a slow dependency causes requests to queue, each waiting request holds a thread, and before you know it you have a thread explosion that consumes all available memory.
Other Native Memory Consumers
Beyond the big three, several other sources consume native memory:
Metaspace: Stores class metadata. Usually stable, but can grow with dynamic class loading (e.g., many Groovy scripts, heavy reflection).
Code Cache: JIT-compiled code lives here. Typically 240MB with default settings, but can grow with large applications.
GC Data Structures: The garbage collector needs its own memory for tracking objects. G1 uses about 5-10% of heap size for metadata.
Internal JVM Structures: Symbol tables, string intern pools, and other JVM internals.
Diagnosis
Step 1: Native Memory Tracking
Java’s built-in Native Memory Tracking (NMT) is the most valuable tool for understanding where memory goes:
# Enable NMT (requires JVM restart)
java -XX:NativeMemoryTracking=summary -jar app.jar
# Get report:
jcmd <pid> VM.native_memory summary
# Output shows:
# Total: reserved=5GB, committed=4.2GB
#
# - Java Heap: 2560MB
# - Thread: 524MB (500 threads)
# - Code: 245MB
# - GC: 180MB
# - Internal: 156MB
# - Symbol: 32MB
# - Native Memory Tracking: 12MB
# - Arena Chunk: 1MB
# - Direct Buffer: 820MB ← Here it is!
The NMT report breaks down memory by category, making it easy to identify which area is consuming unexpected amounts. Note that NMT itself has about 5-10% overhead, so don’t run it in production permanently—enable it when debugging memory issues.
You can also compare memory over time:
# Create baseline
jcmd <pid> VM.native_memory baseline
# Later, compare to baseline
jcmd <pid> VM.native_memory summary.diff
Step 2: Check Direct Buffers
Direct buffer usage can be queried programmatically via JMX:
// Via JMX programmatically
import java.lang.management.ManagementFactory;
import java.lang.management.BufferPoolMXBean;
for (BufferPoolMXBean pool : ManagementFactory.getPlatformMXBeans(BufferPoolMXBean.class)) {
System.out.println(pool.getName() + ": " +
pool.getMemoryUsed() / 1024 / 1024 + "MB used, " +
pool.getCount() + " buffers");
}
// Output:
// direct: 820MB used, 12543 buffers
// mapped: 0MB used, 0 buffers
If you see hundreds of megabytes in direct buffers, that’s likely a significant contributor to your memory problem. The buffer count can also reveal leaks—if you’re accumulating thousands of small buffers, something might not be releasing them properly.
Step 3: Monitor RSS vs Heap
The relationship between RSS (actual memory usage) and heap tells you whether you have an off-heap problem:
# Compare container RSS with heap
# RSS = actual memory used by process
# Get RSS (in KB):
cat /proc/$(pgrep java)/status | grep VmRSS
# Get heap via jstat:
jstat -gc <pid>
# If RSS >> Heap, you have off-heap consumption
If your heap is 2GB but RSS is 3.5GB, that 1.5GB difference is native memory. Track this ratio over time—a growing gap indicates a native memory leak.
The Fix
Option 1: Limit Direct Memory
The most immediate fix is to explicitly cap direct buffer allocation:
# Explicitly limit direct buffer allocation
java -XX:MaxDirectMemorySize=256m -Xmx2560m -jar app.jar
# Total = Heap + DirectMemory + Metaspace + Threads + Overhead
# 2560m + 256m + 256m + 500m + 500m = ~4GB
This prevents runaway direct buffer allocation. If your application tries to allocate more, it will get an OutOfMemoryError with a clear message about direct buffers—much better than a mysterious OOMKill.
Choose the limit based on your application’s needs. Network-heavy services (lots of concurrent connections, large payloads) need more. CPU-bound services with minimal I/O can use less.
Option 2: Tame glibc Arenas
Reducing glibc arena count dramatically reduces memory fragmentation:
# Kubernetes deployment
env:
- name: MALLOC_ARENA_MAX
value: "2" # Limit to 2 arenas instead of 8×cores
# Or use jemalloc/tcmalloc instead
- name: LD_PRELOAD
value: "/usr/lib/x86_64-linux-gnu/libjemalloc.so.2"
Setting MALLOC_ARENA_MAX=2 is often the single most impactful change for reducing native memory overhead. It may slightly increase lock contention for memory allocation, but in practice, the impact on throughput is negligible for most applications.
Alternatively, use jemalloc or tcmalloc instead of glibc’s allocator. These allocators are designed for multi-threaded applications and have better memory return behavior. Many organizations use jemalloc as their standard for JVM containers.
Option 3: Use Container-Aware JVM Settings
Modern JVMs (Java 10+) are container-aware, but you should verify and configure them properly:
# Java 17+ is container-aware by default
# But verify the limits are detected:
java -XX:+PrintFlagsFinal -version | grep -E "(MaxHeapSize|MaxRAM)"
# If running in container:
java -XX:MaxRAMPercentage=75 -jar app.jar
# Uses 75% of container memory for heap
# Leaves 25% for off-heap
Using MaxRAMPercentage instead of fixed -Xmx values adapts to container size changes. When you resize your container, the heap automatically adjusts.
The 75% recommendation leaves room for all the non-heap memory we’ve discussed. Some organizations use 60% for Netty-heavy applications that need significant direct buffer space.
Option 4: Monitor and Alert
Set up monitoring to catch memory divergence before it causes OOMKills:
# Prometheus alert for memory divergence
groups:
- name: java-memory
rules:
- alert: JavaNativeMemoryLeak
expr: |
(container_memory_working_set_bytes{container="my-app"} -
jvm_memory_used_bytes{area="heap"}) > 1500000000
for: 30m
labels:
severity: warning
annotations:
summary: "Off-heap memory growing: {{ $value | humanize }}"
- alert: DirectBufferHigh
expr: |
jvm_buffer_memory_used_bytes{id="direct"} > 500000000
for: 15m
labels:
severity: warning
annotations:
summary: "Direct buffer usage > 500MB"
The first alert catches any significant gap between container memory and heap—a sign of native memory consumption. The second specifically tracks direct buffer usage.
Option 5: Configure Netty Buffer Pools
If Netty is a significant memory consumer, tune its allocator:
// Spring Boot application.properties
spring.netty.leak-detection=paranoid
// Limit Netty's pooled allocator
System.setProperty("io.netty.allocator.maxOrder", "9"); // 2MB chunks vs 16MB default
// Or use unpooled allocator (slower but no fragmentation)
System.setProperty("io.netty.allocator.type", "unpooled");
The maxOrder setting controls chunk size in Netty’s pooled allocator. Smaller chunks mean less memory waste but slightly more allocation overhead. For memory-constrained environments, this trade-off is usually worth it.
Enabling leak detection (paranoid mode) helps identify buffer leaks in development. Don’t run this in production—it has significant overhead—but it’s invaluable for finding the source of buffer accumulation.
Checklist
## Java Native Memory OOM
### Symptoms
- [ ] Container OOMKilled but heap looks stable
- [ ] RSS >> heap size
- [ ] GC metrics look healthy
- [ ] Happens gradually over time
### Diagnosis
- [ ] Enable NMT: -XX:NativeMemoryTracking=summary
- [ ] Check direct buffer usage via JMX
- [ ] Count threads: jcmd <pid> Thread.print
- [ ] Compare RSS vs heap
### Fixes
- [ ] Set -XX:MaxDirectMemorySize
- [ ] Set MALLOC_ARENA_MAX=2
- [ ] Use -XX:MaxRAMPercentage for heap sizing
- [ ] Limit Netty buffer pool sizes
- [ ] Monitor off-heap vs container limit
Conclusion
The lesson after years of debugging these issues: Java memory ≠ heap. Direct buffers, thread stacks, Metaspace, JIT code cache, and native allocator overhead can easily consume 50%+ of your container’s memory without appearing in any heap metric.
Here’s my rule of thumb for sizing Java containers:
| Component | Percentage of Container Limit |
|---|---|
| Heap | 50-60% |
| Direct Memory | 10-15% |
| Thread Stacks | 10-15% |
| Metaspace, Code Cache, GC | 15-20% |
With a 4GB container, that means:
- Heap: 2-2.4GB
- Direct: 400-600MB
- Threads: 400-600MB
- Other: 600-800MB
Enable NMT in your next debugging session and look at where memory actually goes. The numbers might surprise you—and they’ll definitely help you set more accurate container limits.
Related Articles
- Kubernetes OOM Killer Memory Limits - Container memory accounting
- Java Profiling in Hardened K8s - Debugging in production
Related posts
JVM Metaspace OOM in Kubernetes: Why MaxMetaspaceSize Alone Won't Save You
Pod OOMKilled despite MaxMetaspaceSize set. The cause: Metaspace grows outside heap, container memory limit doesn't account for it, and class unloading isn't happening.
JVM Native Memory in Kubernetes: Why Your Pod Gets OOMKilled with 50% Heap
Heap is 50% full but pod gets OOMKilled. I'll show how to track native memory (Metaspace, threads, NIO) and prevent container memory issues.
Kubernetes OOM Killer: Why Your Container Dies at 50% Memory
Container memory limit is 4GB but OOM kills at 2GB used. Kernel buffers, page cache, and cgroup accounting tricks cause early OOMKills. Here's the full picture.
RSS Contracts: Stop OOMKilled Java Pods in Kubernetes by Testing RSS as an API
Use cgroup RSS budgets, CI sampling, and runtime headroom to catch JVM memory regressions before they hit production.
Cite this article
If you reference this post, please link to the original URL and credit the author.