Kubernetes Graceful Shutdown as a Contract: Zero 502s During Rollouts (HTTP + gRPC)
If you’ve ever rolled out a Deployment and watched:
- a burst of 502/504 from an ingress,
- ECONNRESET / “connection reset by peer” in clients,
- gRPC UNAVAILABLE spikes,
- and then everything “stabilizes”…
…you already know the uncomfortable truth: “graceful shutdown” is not a boolean feature. It’s a contract between:
- the client (keepalive, retries, connection reuse),
- the LB/ingress/sidecar (draining behavior),
- Kubernetes endpoint propagation (EndpointSlice → kube-proxy),
- and your application (SIGTERM handling, refusing new work, finishing in-flight work).
This post is a production-minded, reproducible approach to make rollouts boring by design.
Tested on: Kubernetes 1.27–1.30, NGINX Ingress and Envoy-based proxies, Go HTTP servers and gRPC services.
What “graceful shutdown” must guarantee
Define a Drain Contract with explicit invariants:
- Stop new traffic first (from Kubernetes routing)
- Stop accepting new work (inside the process)
- Finish or cancel in-flight work within a bounded time
- Only then exit, before Kubernetes sends SIGKILL
If any of these are missing, you get rollout errors even if you handle SIGTERM.
How Kubernetes termination actually plays out
When a Pod is terminated, Kubernetes (simplified) does:
- Marks the Pod with a deletion timestamp.
- Runs each container’s
preStophook (if configured). - Sends SIGTERM to containers.
- Waits up to
terminationGracePeriodSeconds. - Sends SIGKILL if still running.
Separately (and importantly), traffic stop-routing depends on:
- readiness state and controllers updating EndpointSlices,
- and kube-proxy / dataplane propagation delays,
- and client connection reuse (keepalive pools).
This means: your process might still be receiving requests after termination started, unless you intentionally drain.
The core idea: readiness-driven draining
The simplest reliable pattern is:
- Your app exposes a readiness endpoint that returns not ready once draining starts.
- On termination, you flip the app into draining mode before stopping the server.
You can trigger draining via:
preStophook calling a local endpoint (recommended for consistency),- or handling SIGTERM and toggling a drain flag immediately (also fine).
Drain budget math (don’t guess)
You need a grace period large enough for:
grace >= endpoint_propagation + drain_delay + worst_case_request_time + safety_margin
Where:
- endpoint_propagation: time for EndpointSlice update + dataplane to stop routing
- drain_delay: a small wait after becoming NotReady (to let routing converge)
- worst_case_request_time: your real upper bound (or enforced deadline)
- safety_margin: buffer for jitter
You don’t need perfect numbers. You need measured numbers.
Reference implementation: Kubernetes YAML
Below is a minimal but production-grade Pod contract for HTTP or gRPC services.
1) Readiness probe (must reflect draining)
readinessProbe:
httpGet:
path: /health/ready
port: 8080
periodSeconds: 2
timeoutSeconds: 1
failureThreshold: 1
Key points:
- Keep the probe quick.
failureThreshold: 1makes readiness react fast (don’t do this for liveness unless you want restart storms).
2) preStop hook: trigger draining
lifecycle:
preStop:
httpGet:
path: /admin/drain
port: 8080
And then set terminationGracePeriodSeconds big enough:
terminationGracePeriodSeconds: 60
Where does the “wait” happen? Prefer the app to own the drain timing:
/admin/drainflips draining mode and starts a timer.- The process shuts down only after the drain delay and after attempting graceful stop.
Why not sleep in preStop?
- Sleeping outside the app doesn’t stop the app from accepting new requests.
- You lose observability and control.
Reference implementation: app-level draining (Go examples)
You can implement this in any language/runtime. Here’s the logic.
HTTP server: stop accepting new connections, finish in-flight requests
// Illustrative Go-like pseudocode.
// Readiness depends on a drain flag, and termination triggers shutdown with a hard deadline.
var draining atomic.Bool
func readyHandler(w http.ResponseWriter, r *http.Request) {
if draining.Load() {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
}
func drainHandler(w http.ResponseWriter, r *http.Request) {
draining.Store(true)
w.WriteHeader(http.StatusOK)
}
func main() {
srv := &http.Server{Addr: ":8080", Handler: mux()}
go func() {
<-sigterm
draining.Store(true)
// Allow time for routing to converge before closing.
time.Sleep(5 * time.Second)
ctx, cancel := context.WithTimeout(context.Background(), 45*time.Second)
defer cancel()
_ = srv.Shutdown(ctx)
}()
_ = srv.ListenAndServe()
}
Important details:
- Readiness flips first.
- You wait a short “routing convergence” period.
- You shutdown with a bounded timeout.
gRPC server: GracefulStop with a hard cap
gRPC servers can block indefinitely on long streams. You want:
- attempt graceful stop,
- but enforce a max drain time (then force close).
go func() {
<-sigterm
draining.Store(true)
time.Sleep(5 * time.Second)
done := make(chan struct{})
go func() {
grpcServer.GracefulStop()
close(done)
}()
select {
case <-done:
// Graceful stop finished
case <-time.After(45 * time.Second):
grpcServer.Stop() // hard stop to avoid SIGKILL
}
}()
Repro lab: prove it with numbers (before/after)
Don’t ship this as theory. Prove it.
Step 1: Generate load
hey -z 2m -c 50 http://my-service.default.svc.cluster.local/
For gRPC, a common tool is ghz (run it wherever you normally run load).
Step 2: Roll out repeatedly
kubectl rollout restart deploy/my-service
kubectl rollout status deploy/my-service
Step 3: Watch endpoints propagate
kubectl get endpointslices -l kubernetes.io/service-name=my-service -w
What you’re looking for:
- do endpoints drop quickly after draining starts?
- does your load generator still see errors during the window?
Step 4: Compare error rates
Track:
- HTTP 5xx / resets
- gRPC UNAVAILABLE / CANCELLED
- p95/p99 latency during rollout window
If you can’t graph it, at least log it in the load generator output and in app metrics.
Common failure modes (and how to recognize them)
“We have preStop sleep” but we still drop requests
Symptom:
- errors persist
- readiness stays OK during sleep
Cause:
- app continues accepting traffic while the hook sleeps
Fix:
- use readiness-driven drain, not external sleep
Grace period too short → SIGKILL → partial work
Symptom:
- pods exit with SIGKILL
- in-flight requests fail near the end of grace period
Fix:
- compute your drain budget and increase
terminationGracePeriodSeconds - enforce request deadlines so you have a real upper bound
Long-lived streams (gRPC streaming, WebSockets)
Symptom:
- graceful stop takes forever
- pods hit SIGKILL during rollouts
Fix:
- define a max drain window
- enforce server-side stream limits / keepalive policies
- version your clients so they reconnect cleanly
Retries amplify rollouts
Symptom:
- upstream load spikes during rollout
- error rate triggers a retry storm
Fix:
- align retries with deadlines (retry budget)
- bounded retries + backoff
What I’d do in prod
If I had to set a default “production termination contract” today:
- Make readiness reflect draining (never stay Ready while shutting down)
preStoptriggers draining (HTTP call or exec), not sleeping- Wait a short, measured routing convergence delay
- Shut down servers with bounded timeouts
- Set
terminationGracePeriodSecondsusing the drain budget formula - Add a rollout SLO: “error rate during deploy window” must be near zero
- Rehearse it: run the rollout lab on at least one critical service
This turns graceful shutdown from a belief into an enforced contract.
FAQ
Why do I still see errors even though my app handles SIGTERM?
Because traffic stop-routing is not instantaneous: EndpointSlice/controller and kube-proxy propagation plus connection reuse can keep sending requests briefly.
Should I fail readiness immediately on SIGTERM?
Usually yes—if your readiness check truly represents “safe to receive new requests.” It should become false during draining.
Is preStop: sleep 10 ever acceptable?
Only as a last resort and only if your app also refuses new work immediately. Otherwise it’s just “wait while still accepting traffic.”
What about sidecars/ingress that keep connections open?
Then you must measure where draining happens (ingress/sidecar vs app). The contract stays the same; the draining hop changes.
How do I pick the drain delay (routing convergence time)?
Measure it: watch EndpointSlice changes and observe when new requests stop hitting the terminating pod. Use that as your baseline.
Related reading
/en/blog/conntrack-stale-nat-mapping/(deploy 503s that aren’t graceful shutdown)/en/blog/kubernetes-ghost-pod-conntrack/(why traffic can still hit dead Pods)/en/blog/k8s-postgresql-connection-storm/(rollouts as a system-wide event)
Further reading
- https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
- https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
- https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
- https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/
- https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
- https://pkg.go.dev/net/http#Server.Shutdown
- https://pkg.go.dev/google.golang.org/grpc#Server.GracefulStop
- https://grpc.io/docs/what-is-grpc/core-concepts/
Related posts
Pods Stuck in Terminating: A Production Decision Tree for Finalizers, Volumes, and Dead Nodes
A conservative runbook to unstick Pods safely: finalizers, CSI/volume cleanup stalls, dead nodes, and when (and how) to force-delete.
Kubernetes APF Starvation: When One Controller Makes kubectl Hang
APF can starve your Kubernetes API: kubectl hangs, controllers timeout, and 429s spike. Runbook to isolate the noisy client, fix FlowSchemas, and prove it.
OpenTelemetry Collector Backpressure: Fixing Drops with memory_limiter and Queues
OpenTelemetry Collector drops spans under load when exporters backpressure. Fix with memory_limiter, queues, and batch tuning, with commands to verify.
Envoy Outlier Detection Brownouts: When the Mesh Ejects Healthy Pods
Debug Istio/Envoy outlier detection brownouts: why healthy pods get ejected and 503s spike in production. Includes xDS checks, safe fixes, and alerting.
Cite this article
If you reference this post, please link to the original URL and credit the author.