Back to blog

Kubernetes APF Starvation: When One Controller Makes kubectl Hang

The symptom pattern is weird the first time you see it:

  • kubectl get pods hangs or times out.
  • Deployments stop progressing (“waiting for rollout” forever).
  • Controllers start logging errors like “Too Many Requests (429)” and retrying.
  • Nodes are fine, etcd is fine, and CPU on the API server isn’t pegged.

I’ve hit this in real clusters more than once. The root cause wasn’t etcd, and it wasn’t “the API server is slow”. It was API Priority and Fairness (APF) doing exactly what it was designed to do — except our configuration made critical control-plane traffic compete with a noisy controller.

Tested on: Kubernetes 1.29–1.31, managed control planes + self-hosted kube-apiserver, Prometheus scraping apiserver metrics.

Why this matters in 2026

APF is no longer an exotic feature. Multi-tenant clusters, GitOps, operators, and “everything is a controller” means API load is constantly under pressure. When APF isn’t isolating traffic correctly, you get a failure mode that looks like “random Kubernetes flakiness” — until you graph APF metrics and it becomes obvious.

Incident narrative (anonymized)

I rolled out a new internal controller that watched a CRD and reconciled related resources. A small bug made it do this:

  • list pods cluster-wide
  • list secrets cluster-wide
  • every reconciliation loop
  • with aggressive retries

At the same time, we had previously introduced a custom FlowSchema for “platform automation” that matched most authenticated service accounts and shoved them into a single priority level with low concurrency shares.

Blast radius:

  • GitOps drifted
  • HPA stopped reacting
  • kubectl became unreliable for everyone (including on-call)
  • a few workloads hit pod restarts because their controllers couldn’t update objects

Constraint:

  • I couldn’t “just scale the API server” (managed control plane).
  • I needed an in-cluster mitigation: isolate the noisy controller and protect system traffic.

Timeline (what I actually did)

  • T-0: On-call report: “kubectl is hanging”, rollouts stuck.
  • T+5m: I see a wave of 429s in controller logs and GitOps retries.
  • T+10m: I graph APF rejections and see a clear step change after my controller rollout.
  • T+20m: I find a FlowSchema that acts like a catch-all for most service accounts.
  • T+30m: I scale down the noisy controller (immediate relief).
  • T+45m: I add a FlowSchema + PriorityLevelConfiguration to isolate that controller permanently.
  • T+60m: APF rejections drop to near-zero, control plane stabilizes, kubectl works again.

Mechanism: how APF starvation happens

APF is a traffic router + concurrency limiter

APF matches each request to a FlowSchema, which maps it to a PriorityLevelConfiguration.

That priority level controls:

  • how many “seats” (concurrency) this flow can get (assuredConcurrencyShares)
  • whether requests are queued or rejected when overloaded (queuing vs reject)

Starvation happens when “everything matches the same flow”

The common misconfig is not “APF is broken”.

It’s:

  • a FlowSchema matches more traffic than you intended (often because it targets system:authenticated broadly)
  • that FlowSchema has higher precedence than you think
  • it maps critical and bulk traffic into the same priority bucket

Once the bucket saturates, clients see:

  • queued latency (kubectl hangs)
  • 429 rejections (controllers retry → even more load)
  • unstable watch/list behavior (more retries, more LISTs, more API churn)

Why retries make APF incidents spiral

429s are “polite”. Many controllers treat them as transient and retry aggressively. That creates a feedback loop:

  1. APF rejects (429)
  2. controller retries → increases request rate
  3. APF rejects more
  4. the control plane becomes unusable for everything sharing that priority level

Runbook: diagnosing APF starvation fast

What to check first

  1. Confirm 429s and timeouts are real and clustered in time
  • Controller logs (GitOps, operators, custom controllers)
  • API client errors in apps that talk to the API (rare but happens)
  1. Check APF rejection metrics If you scrape kube-apiserver metrics, these are the fastest signals.

PromQL examples (paste as-is; metric labels may vary slightly by distro):

# Overall APF rejections
sum(rate(apiserver_flowcontrol_rejected_requests_total[5m]))

# Who is getting rejected (by priority level)
sum by (priority_level) (rate(apiserver_flowcontrol_rejected_requests_total[5m]))

# Which FlowSchema is rejecting
topk(10, sum by (flow_schema) (rate(apiserver_flowcontrol_rejected_requests_total[5m])))

If you don’t have these metrics, your “first fix” is to enable scraping — but during an incident, you can still proceed with object inspection + log correlation.

  1. Inspect FlowSchemas and precedence I start here because I’ve been burned by “catch-all” FlowSchemas more than once.
kubectl get flowschemas.flowcontrol.apiserver.k8s.io
kubectl get prioritylevelconfigurations.flowcontrol.apiserver.k8s.io

Then:

kubectl get flowschemas.flowcontrol.apiserver.k8s.io -o yaml | grep -n "matchingPrecedence" -n

I look for:

  • unusually low matchingPrecedence (higher priority)
  • rules matching broad groups like system:authenticated
  • rules matching “all namespaces, all resources” unintentionally

How to confirm the hypothesis

A. Identify the FlowSchema / priority level under pressure From metrics, pick the top flow_schema or priority_level involved.

Then open it:

kubectl get flowschema <name> -o yaml
kubectl get prioritylevelconfiguration <name> -o yaml

You want to answer:

  • Who matches this FlowSchema? (service accounts, groups, namespaces)
  • Does it include your controller’s service account?
  • Is queuing enabled, and what are the queue lengths?
  • How many concurrency shares does it get?

B. Correlate with a known noisy client In my case it was the controller I had just deployed. If it’s not obvious, the fastest way is:

  • temporarily scale down suspected controllers one by one (GitOps, custom operators)
  • watch APF rejection rate drop
  • stop once you found the offender

This is crude, but it works when you don’t have audit logs or per-client metrics handy.

Safe mitigations (in order)

  1. Scale down / pause the noisy controller This is the quickest way to break the retry loop.
kubectl -n <ns> scale deploy/<controller> --replicas=0
  1. Lower its client-side QPS/burst If you own the controller code, this is usually a one-line client-go change. If you don’t, see if it has env/config for QPS.

  2. Isolate it in APF Create a FlowSchema matching only that controller’s service account, and map it to a low-share priority level.

  3. Protect critical system traffic If your APF config accidentally down-prioritized system flows, fix precedence and restore sensible shares for system priority levels.

Risky mitigations (avoid unless you have no choice)

  • Disabling APF (often not possible on managed control planes, and it can turn “fair throttling” into “total meltdown”).
  • Blindly raising concurrency shares everywhere
    • You can remove fairness and spike etcd load hard.
  • Restarting controllers to “fix it”
    • Restarts often increase list/watches and make the storm worse.

What we changed (concrete)

1) We isolated the controller with a dedicated PriorityLevelConfiguration

Before, our controller matched a broad FlowSchema that shared capacity with other automation.

After, we created a priority level with explicit low shares:

apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: PriorityLevelConfiguration
metadata:
  name: plc-noisy-controller
spec:
  type: Limited
  limited:
    assuredConcurrencyShares: 5
    limitResponse:
      type: Queuing
      queuing:
        queues: 16
        handSize: 4
        queueLengthLimit: 50

2) We created a FlowSchema that matches only that service account

apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  name: fs-noisy-controller
spec:
  matchingPrecedence: 2000
  priorityLevelConfiguration:
    name: plc-noisy-controller
  rules:
  - subjects:
    - kind: ServiceAccount
      serviceAccount:
        name: noisy-controller
        namespace: platform
    resourceRules:
    - apiGroups: ["*"]
      resources: ["*"]
      verbs: ["*"]
      namespaces: ["*"]
      clusterScope: true

3) We fixed the controller bug and added a client-side rate limit

The “real fix” was to stop doing cluster-wide LISTs in a tight loop. But the APF isolation meant one bug could no longer starve the whole cluster.

How to verify (what I look at after the fix)

  1. APF rejections drop
sum(rate(apiserver_flowcontrol_rejected_requests_total[5m]))

Expected: returns near baseline.

  1. kubectl responsiveness returns I literally measure it:
time kubectl get ns >/dev/null
time kubectl get pods -A --request-timeout=10s >/dev/null
  1. Controllers stop retry-storming GitOps and kube-controller-manager logs should stop spamming 429s.

Prevention / guardrails

  • API QPS budgets per controller
    • treat “API requests/sec” as an SLO just like DB connections
  • APF guardrails
    • no FlowSchema allowed to match system:authenticated without explicit review
    • precedence changes require a diff-based review
  • Alerts
    • APF rejections > 0 for more than N minutes
    • queue depth non-zero for critical priority levels
  • Game day
    • intentionally scale a noisy controller in staging and prove the cluster still works

Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "Kubernetes APF Starvation: When One Controller Makes kubectl Hang". https://www.michal-drozd.com/en/blog/kubernetes-api-priority-fairness-starvation/ (Published November 14, 2025).