Back to blog

Java OOMKilled With Stable Heap: Native Memory, Direct Buffers, and glibc Arenas

The heap was fine; the pod still died. “Heap is stable at 2GB, container limit is 4GB, but we keep getting OOMKilled.” This was the ticket that landed on my desk from our payments team. They’d done everything right by the book—set -Xmx well below the container limit, monitored GC pauses, kept heap usage under control. Yet their service kept dying every few hours, always with an OOM from the kernel, never from the JVM.

The missing memory turned out to be the silent killer of many Java applications in containers: native memory. Direct buffers from Netty, thread stacks from an unbounded executor, Metaspace growth from class loading, and glibc’s tendency to hold onto freed memory were collectively consuming an extra 2GB that never appeared in any heap metric.

This problem is particularly frustrating because Java developers are trained to watch the heap. We set -Xmx, we monitor garbage collection, we profile with VisualVM or JFR. None of those tools prominently show native memory consumption. You can have a perfectly tuned heap while your process silently balloons toward the container limit.

Environment: Java 17, Spring Boot, Kubernetes with 4GB container limit, Netty-based HTTP client

The Problem

Heap Looks Perfect

The symptoms are maddening. Every metric you know to check looks fine:

# JVM metrics say everything is fine
Heap: 2GB / 2.5GB (-Xmx2560m)
GC: G1, no full GCs, pause times <50ms
Metaspace: 150MB (stable)

# But container keeps dying
kubectl describe pod my-app
# Last State: OOMKilled

# dmesg on node shows:
# memory cgroup out of memory: Killed process 12345 (java)

The JVM is happy. GC is happy. Heap usage is within limits. Yet Linux keeps killing your process. This is your first clue that the memory problem isn’t in the heap at all.

Where’s the Memory?

Let me break down where memory actually goes in a typical Java application:

Container memory budget (4GB):

What you configured:
├── -Xmx2560m (Heap max)              = 2.5 GB
└── Expected buffer                   = 1.5 GB (seems plenty!)

What's actually used:
├── Heap (actual)                     = 2.2 GB
├── Metaspace                         = 150 MB
├── Thread stacks (500 threads × 1MB) = 500 MB
├── Direct buffers                    = 800 MB  ← Hidden!
├── JIT code cache                    = 240 MB
├── Native memory (JNI/Unsafe)        = 200 MB  ← Hidden!
└── glibc arena overhead              = 400 MB  ← Hidden!
                                      ─────────
Total:                                = 4.5 GB

Container limit: 4 GB → OOMKilled!

The “hidden” items are the killers. They don’t show up in standard heap metrics, yet they consume real memory that counts against your cgroup limit. When the sum exceeds your container’s allocation, Linux’s OOM killer terminates your process—and the JVM never sees it coming because it was never close to an OutOfMemoryError from a Java perspective.

Understanding Native Memory

Before diving into solutions, let’s understand each component of native memory consumption.

Direct Buffers (Off-Heap)

Direct buffers are Java’s mechanism for allocating memory outside the garbage-collected heap. They’re essential for high-performance I/O because they avoid copying data between Java’s heap and native memory when communicating with the operating system.

// Netty, gRPC, and many libraries use direct buffers
ByteBuffer buffer = ByteBuffer.allocateDirect(1024 * 1024);

// These are NOT counted in heap!
// Default limit: same as -Xmx, but additive

// Check current direct buffer usage:
// Via JMX: java.nio:type=BufferPool,name=direct

// Common sources:
// - Netty PooledByteBufAllocator
// - gRPC message buffers
// - NIO file operations
// - Compression libraries

The problem is that many libraries use direct buffers aggressively. Netty, the networking library underlying Spring WebFlux, gRPC, and many other frameworks, pools direct buffers for performance. A busy Netty server might hold hundreds of megabytes of direct buffers even when idle, waiting for reuse.

By default, the JVM allows direct buffer allocation up to -Xmx. So with -Xmx2560m, you could theoretically allocate another 2.5GB of direct buffers—well beyond what your container can handle. Unlike heap memory, direct buffer limits aren’t automatically adjusted for containers.

glibc Memory Arenas

This is perhaps the sneakiest source of memory bloat, and it’s not even Java’s fault. When your JVM allocates native memory (through JNI, Unsafe, or even internal operations), it uses the system’s C library allocator—typically glibc on Linux.

# glibc creates arenas for multi-threaded apps
# Each arena can hold onto freed memory

# Default arenas = 8 × CPU cores
# With 8 cores = 64 arenas
# Each arena can retain megabytes of freed memory

# Memory appears "freed" to Java
# But glibc hasn't returned it to OS
# RSS keeps growing!

# Check arena settings:
cat /proc/$(pgrep java)/environ | tr '\0' '\n' | grep MALLOC

# Often not set, using defaults

glibc’s memory allocator creates separate “arenas” for different threads to reduce lock contention. This is great for performance, but each arena maintains its own free list. When memory is freed, it often stays in the arena rather than being returned to the operating system. With many threads (common in Java applications), you can have dozens of arenas, each holding onto megabytes of “freed” memory.

I’ve seen cases where glibc arena fragmentation accounted for 30-40% of a process’s RSS. The memory was technically freed from Java’s perspective, but glibc was still holding it, and Linux counted it against the cgroup limit.

Thread Stack Accumulation

Every Java thread requires its own stack for method call frames, local variables, and other execution context. The default stack size is typically 1MB per thread.

// Each thread uses ~1MB stack by default
// 500 threads = 500MB outside heap

// Count threads:
jcmd <pid> Thread.print | grep "^\"" | wc -l

// Common causes of thread explosion:
// - Blocking I/O without proper pools
// - Unbounded executor services
// - One-thread-per-request patterns

A service with 500 threads consumes 500MB just for stacks—half a gigabyte that never appears in heap metrics. Thread count can creep up gradually: a slow dependency causes requests to queue, each waiting request holds a thread, and before you know it you have a thread explosion that consumes all available memory.

Other Native Memory Consumers

Beyond the big three, several other sources consume native memory:

Metaspace: Stores class metadata. Usually stable, but can grow with dynamic class loading (e.g., many Groovy scripts, heavy reflection).

Code Cache: JIT-compiled code lives here. Typically 240MB with default settings, but can grow with large applications.

GC Data Structures: The garbage collector needs its own memory for tracking objects. G1 uses about 5-10% of heap size for metadata.

Internal JVM Structures: Symbol tables, string intern pools, and other JVM internals.

Diagnosis

Step 1: Native Memory Tracking

Java’s built-in Native Memory Tracking (NMT) is the most valuable tool for understanding where memory goes:

# Enable NMT (requires JVM restart)
java -XX:NativeMemoryTracking=summary -jar app.jar

# Get report:
jcmd <pid> VM.native_memory summary

# Output shows:
# Total: reserved=5GB, committed=4.2GB
#
# - Java Heap: 2560MB
# - Thread: 524MB (500 threads)
# - Code: 245MB
# - GC: 180MB
# - Internal: 156MB
# - Symbol: 32MB
# - Native Memory Tracking: 12MB
# - Arena Chunk: 1MB
# - Direct Buffer: 820MB  ← Here it is!

The NMT report breaks down memory by category, making it easy to identify which area is consuming unexpected amounts. Note that NMT itself has about 5-10% overhead, so don’t run it in production permanently—enable it when debugging memory issues.

You can also compare memory over time:

# Create baseline
jcmd <pid> VM.native_memory baseline

# Later, compare to baseline
jcmd <pid> VM.native_memory summary.diff

Step 2: Check Direct Buffers

Direct buffer usage can be queried programmatically via JMX:

// Via JMX programmatically
import java.lang.management.ManagementFactory;
import java.lang.management.BufferPoolMXBean;

for (BufferPoolMXBean pool : ManagementFactory.getPlatformMXBeans(BufferPoolMXBean.class)) {
    System.out.println(pool.getName() + ": " +
        pool.getMemoryUsed() / 1024 / 1024 + "MB used, " +
        pool.getCount() + " buffers");
}

// Output:
// direct: 820MB used, 12543 buffers
// mapped: 0MB used, 0 buffers

If you see hundreds of megabytes in direct buffers, that’s likely a significant contributor to your memory problem. The buffer count can also reveal leaks—if you’re accumulating thousands of small buffers, something might not be releasing them properly.

Step 3: Monitor RSS vs Heap

The relationship between RSS (actual memory usage) and heap tells you whether you have an off-heap problem:

# Compare container RSS with heap
# RSS = actual memory used by process

# Get RSS (in KB):
cat /proc/$(pgrep java)/status | grep VmRSS

# Get heap via jstat:
jstat -gc <pid>

# If RSS >> Heap, you have off-heap consumption

If your heap is 2GB but RSS is 3.5GB, that 1.5GB difference is native memory. Track this ratio over time—a growing gap indicates a native memory leak.

The Fix

Option 1: Limit Direct Memory

The most immediate fix is to explicitly cap direct buffer allocation:

# Explicitly limit direct buffer allocation
java -XX:MaxDirectMemorySize=256m -Xmx2560m -jar app.jar

# Total = Heap + DirectMemory + Metaspace + Threads + Overhead
# 2560m + 256m + 256m + 500m + 500m = ~4GB

This prevents runaway direct buffer allocation. If your application tries to allocate more, it will get an OutOfMemoryError with a clear message about direct buffers—much better than a mysterious OOMKill.

Choose the limit based on your application’s needs. Network-heavy services (lots of concurrent connections, large payloads) need more. CPU-bound services with minimal I/O can use less.

Option 2: Tame glibc Arenas

Reducing glibc arena count dramatically reduces memory fragmentation:

# Kubernetes deployment
env:
  - name: MALLOC_ARENA_MAX
    value: "2"  # Limit to 2 arenas instead of 8×cores

  # Or use jemalloc/tcmalloc instead
  - name: LD_PRELOAD
    value: "/usr/lib/x86_64-linux-gnu/libjemalloc.so.2"

Setting MALLOC_ARENA_MAX=2 is often the single most impactful change for reducing native memory overhead. It may slightly increase lock contention for memory allocation, but in practice, the impact on throughput is negligible for most applications.

Alternatively, use jemalloc or tcmalloc instead of glibc’s allocator. These allocators are designed for multi-threaded applications and have better memory return behavior. Many organizations use jemalloc as their standard for JVM containers.

Option 3: Use Container-Aware JVM Settings

Modern JVMs (Java 10+) are container-aware, but you should verify and configure them properly:

# Java 17+ is container-aware by default
# But verify the limits are detected:

java -XX:+PrintFlagsFinal -version | grep -E "(MaxHeapSize|MaxRAM)"

# If running in container:
java -XX:MaxRAMPercentage=75 -jar app.jar
# Uses 75% of container memory for heap
# Leaves 25% for off-heap

Using MaxRAMPercentage instead of fixed -Xmx values adapts to container size changes. When you resize your container, the heap automatically adjusts.

The 75% recommendation leaves room for all the non-heap memory we’ve discussed. Some organizations use 60% for Netty-heavy applications that need significant direct buffer space.

Option 4: Monitor and Alert

Set up monitoring to catch memory divergence before it causes OOMKills:

# Prometheus alert for memory divergence
groups:
  - name: java-memory
    rules:
      - alert: JavaNativeMemoryLeak
        expr: |
          (container_memory_working_set_bytes{container="my-app"} -
           jvm_memory_used_bytes{area="heap"}) > 1500000000
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Off-heap memory growing: {{ $value | humanize }}"

      - alert: DirectBufferHigh
        expr: |
          jvm_buffer_memory_used_bytes{id="direct"} > 500000000
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Direct buffer usage > 500MB"

The first alert catches any significant gap between container memory and heap—a sign of native memory consumption. The second specifically tracks direct buffer usage.

Option 5: Configure Netty Buffer Pools

If Netty is a significant memory consumer, tune its allocator:

// Spring Boot application.properties
spring.netty.leak-detection=paranoid

// Limit Netty's pooled allocator
System.setProperty("io.netty.allocator.maxOrder", "9"); // 2MB chunks vs 16MB default

// Or use unpooled allocator (slower but no fragmentation)
System.setProperty("io.netty.allocator.type", "unpooled");

The maxOrder setting controls chunk size in Netty’s pooled allocator. Smaller chunks mean less memory waste but slightly more allocation overhead. For memory-constrained environments, this trade-off is usually worth it.

Enabling leak detection (paranoid mode) helps identify buffer leaks in development. Don’t run this in production—it has significant overhead—but it’s invaluable for finding the source of buffer accumulation.

Checklist

## Java Native Memory OOM

### Symptoms
- [ ] Container OOMKilled but heap looks stable
- [ ] RSS >> heap size
- [ ] GC metrics look healthy
- [ ] Happens gradually over time

### Diagnosis
- [ ] Enable NMT: -XX:NativeMemoryTracking=summary
- [ ] Check direct buffer usage via JMX
- [ ] Count threads: jcmd <pid> Thread.print
- [ ] Compare RSS vs heap

### Fixes
- [ ] Set -XX:MaxDirectMemorySize
- [ ] Set MALLOC_ARENA_MAX=2
- [ ] Use -XX:MaxRAMPercentage for heap sizing
- [ ] Limit Netty buffer pool sizes
- [ ] Monitor off-heap vs container limit

Conclusion

The lesson after years of debugging these issues: Java memory ≠ heap. Direct buffers, thread stacks, Metaspace, JIT code cache, and native allocator overhead can easily consume 50%+ of your container’s memory without appearing in any heap metric.

Here’s my rule of thumb for sizing Java containers:

ComponentPercentage of Container Limit
Heap50-60%
Direct Memory10-15%
Thread Stacks10-15%
Metaspace, Code Cache, GC15-20%

With a 4GB container, that means:

  • Heap: 2-2.4GB
  • Direct: 400-600MB
  • Threads: 400-600MB
  • Other: 600-800MB

Enable NMT in your next debugging session and look at where memory actually goes. The numbers might surprise you—and they’ll definitely help you set more accurate container limits.


Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "Java OOMKilled With Stable Heap: Native Memory, Direct Buffers, and glibc Arenas". https://www.michal-drozd.com/en/blog/java-native-memory-oomkilled/ (Published January 20, 2025).