RSS Contracts: Stop OOMKilled Java Pods in Kubernetes by Testing RSS as an API
Use cgroup RSS budgets, CI sampling, and runtime headroom to catch JVM memory regressions before they hit production.
4 posts
Use cgroup RSS budgets, CI sampling, and runtime headroom to catch JVM memory regressions before they hit production.
Queue looks healthy until deployment, then messages_unacknowledged explodes, memory spikes, and redelivery storms start. The culprit: your prefetch is too high and nobody tested actual ack behavior.
All partitions look balanced in testing, then production traffic arrives and one partition melts. The culprit: your partition key has terrible cardinality and nobody noticed until now.
Producer upgraded Protobuf, consumer still on old version. No errors, no warnings—just silent data loss in production. Your schema evolution broke backward compatibility and CI didn't notice.