Kubernetes DNS: The ndots:5 Latency Tax

The day DNS lookups tripled, ndots was the last thing I expected. “Our service-to-service calls have 100ms baseline latency.” The team was baffled. Their services were running on the same node, communication should be sub-millisecond. Yet every HTTP call carried an unexplainable 100ms tax. Network debugging showed the packets were fast. The services themselves were fast. Something else was adding latency.

After hours of tcpdump captures, we found the culprit: DNS. Every single external API call triggered not one, not two, but four DNS queries—three of which failed with NXDOMAIN before the fourth finally succeeded. This is the hidden cost of Kubernetes’ default ndots:5 setting.

The ndots problem is one of those Kubernetes gotchas that affects virtually every cluster but rarely makes it into the headlines. Teams blame network latency, tune connection pools, add timeouts—never suspecting that DNS resolution is silently multiplying their query count by four or more.

Tested on: Kubernetes 1.28, CoreDNS 1.11, Go and Python applications

Understanding ndots

Let me explain exactly what ndots does and why Kubernetes defaults to such a high value.

The DNS Search Path

When you create a pod in Kubernetes, the kubelet automatically configures DNS by writing to /etc/resolv.conf. A typical configuration looks like this:

Default pod DNS config:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

What ndots:5 means:
- If hostname has < 5 dots, append search domains first
- "api.example.com" has 2 dots (< 5)
- Try these in order:
  1. api.example.com.default.svc.cluster.local  → NXDOMAIN
  2. api.example.com.svc.cluster.local          → NXDOMAIN
  3. api.example.com.cluster.local              → NXDOMAIN
  4. api.example.com                             → SUCCESS

Result: 4 DNS queries instead of 1!

The ndots option controls how the resolver decides whether a name is “absolute” (fully qualified) or “relative” (needs search domains appended). If the name has fewer dots than the ndots value, the resolver tries appending each search domain before falling back to the bare name.

Why Kubernetes Uses ndots:5

The default of 5 exists for a good reason: Kubernetes service discovery. Within a cluster, you want to be able to reference services by short names:

# These should all resolve to the same service:
my-service                                    # Just the name
my-service.default                            # Name + namespace
my-service.default.svc                        # Name + namespace + svc
my-service.default.svc.cluster.local          # Full FQDN (5 dots)

With ndots:5, all of these short forms work because they have fewer than 5 dots, so the resolver tries appending search domains. The first match wins. This is very convenient for cluster-internal communication.

The problem arises when you call external domains. api.stripe.com has only 2 dots, so the resolver dutifully tries appending all the search domains before finally trying the bare name.

The Latency Impact

The impact on external API calls can be severe:

Single external API call to api.stripe.com:

DNS resolution:
  api.stripe.com.default.svc.cluster.local  → 8ms (NXDOMAIN)
  api.stripe.com.svc.cluster.local          → 7ms (NXDOMAIN)
  api.stripe.com.cluster.local              → 8ms (NXDOMAIN)
  api.stripe.com                             → 9ms (SUCCESS)

Total DNS time: 32ms per request (vs 9ms optimal)
With connection timeout/retry: can be 100ms+

For 100 requests/second = 3,200 wasted DNS queries/second

Each failed NXDOMAIN query takes time—typically 5-15ms depending on your network and CoreDNS load. With 3 failed queries plus 1 successful, you’re looking at 25-60ms of DNS overhead per request. If CoreDNS is under load or your network has any congestion, these numbers climb higher.

I’ve seen production systems where DNS resolution accounted for 80% of total request latency for external API calls. The application code was fast, but every HTTP client was waiting for DNS before it could even open a connection.

The Hidden Load on CoreDNS

Beyond latency, the extra queries multiply load on CoreDNS. If your cluster makes 1000 external API calls per second, you’re generating 4000 DNS queries per second—3000 of which are guaranteed to fail.

CoreDNS handles NXDOMAIN responses efficiently, but they still consume CPU, memory, and network resources. I’ve seen CoreDNS pods become CPU-bound purely from handling search domain queries for external domains.

Measuring the Problem

Before applying fixes, measure your current state:

# Check current ndots setting in pod
cat /etc/resolv.conf

# Trace DNS resolution
apt-get update && apt-get install -y dnsutils
time nslookup api.stripe.com

# Or with dig showing all queries
dig +trace +all api.stripe.com

# Check CoreDNS NXDOMAIN rate
kubectl top pods -n kube-system | grep coredns

If you see high NXDOMAIN rates in CoreDNS metrics (often 50-80% of all queries), the ndots setting is likely the cause. These failed queries are the search domain attempts for external names.

Solutions

There are several approaches to fix the ndots problem, ranging from application changes to cluster-wide infrastructure improvements.

1. Use FQDN (Trailing Dot)

The simplest fix is adding a trailing dot to external domain names. A trailing dot signals to the DNS resolver that the name is fully qualified and shouldn’t have search domains appended:

// Go - add trailing dot for external domains
resp, err := http.Get("https://api.stripe.com./v1/charges")
//                                         ^ trailing dot = FQDN

// This bypasses search domain logic entirely
// Only 1 DNS query instead of 4

# Python - same approach
import requests

# With trailing dot - single DNS query
response = requests.get("https://api.stripe.com./v1/charges")

The trailing dot is part of the DNS specification—it explicitly marks a name as absolute. Most HTTP libraries and DNS resolvers handle it correctly. However, some CDNs, load balancers, or strict hostname validators may not accept trailing dots, so test thoroughly.

This approach is a surgical fix—apply it to specific external domains in your code without changing cluster configuration.

2. Set Lower ndots in Pod Spec

For pods that primarily call external APIs, reduce ndots in the pod specification:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      dnsConfig:
        options:
          - name: ndots
            value: "2"  # External domains have 2+ dots anyway
      containers:
        - name: app
          # ...

With ndots:2, any hostname with 2 or more dots is treated as absolute immediately. Since most external domains like api.stripe.com have exactly 2 dots, they resolve directly without trying search domains.

The tradeoff is that some Kubernetes short names stop working. my-service.default (1 dot) would still work, but if you have services with dashes that look like they have 2+ components, behavior might change. Test your internal service discovery after making this change.

A safe middle ground is ndots:3—this preserves most Kubernetes service discovery patterns while eliminating search domain queries for typical external API domains.

3. Use dnsPolicy: Default

For services that exclusively call external APIs and never use cluster service discovery, you can bypass cluster DNS entirely:

# For services that mainly call external APIs
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      dnsPolicy: "Default"  # Use node's DNS, not cluster DNS
      containers:
        - name: external-api-caller
          # ...

With dnsPolicy: Default, the pod uses the node’s /etc/resolv.conf instead of cluster DNS. This typically means no search domains and direct resolution through whatever DNS the node uses (often cloud provider DNS or corporate DNS servers).

The obvious downside is that Kubernetes service discovery doesn’t work—you can’t use my-service or my-service.namespace to reach other pods. Only use this for pods that genuinely don’t need cluster DNS.

4. NodeLocal DNS Cache

For a cluster-wide improvement, deploy NodeLocal DNSCache. This runs a DNS cache on every node, reducing latency and load on CoreDNS:

# Install NodeLocal DNSCache
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml

# Benefits:
# - Caches DNS responses locally on each node
# - Reduces latency for repeated queries
# - Reduces load on CoreDNS
# - NXDOMAIN responses are cached too!

NodeLocal DNS is particularly helpful for the ndots problem because it caches NXDOMAIN responses. After the first query for api.stripe.com.default.svc.cluster.local returns NXDOMAIN, subsequent queries hit the local cache instead of going to CoreDNS.

Configuration to use NodeLocal DNS:

# Verify NodeLocal DNS is working
# Pods should resolve via 169.254.20.10 (link-local)
apiVersion: v1
kind: Pod
spec:
  dnsConfig:
    nameservers:
      - 169.254.20.10
    searches:
      - default.svc.cluster.local
      - svc.cluster.local
      - cluster.local
    options:
      - name: ndots
        value: "2"

Combining NodeLocal DNS with lower ndots gives you the best of both worlds: fast local caching plus fewer queries to begin with.

5. Application-Level DNS Caching

For maximum control, implement DNS caching in your application:

// Go - custom resolver with caching
package main

import (
    "context"
    "net"
    "sync"
    "time"
)

type CachingResolver struct {
    cache map[string]cachedResult
    mu    sync.RWMutex
    ttl   time.Duration
}

type cachedResult struct {
    addrs   []string
    expires time.Time
}

func (r *CachingResolver) LookupHost(ctx context.Context, host string) ([]string, error) {
    r.mu.RLock()
    if cached, ok := r.cache[host]; ok && time.Now().Before(cached.expires) {
        r.mu.RUnlock()
        return cached.addrs, nil
    }
    r.mu.RUnlock()

    addrs, err := net.DefaultResolver.LookupHost(ctx, host)
    if err != nil {
        return nil, err
    }

    r.mu.Lock()
    r.cache[host] = cachedResult{addrs: addrs, expires: time.Now().Add(r.ttl)}
    r.mu.Unlock()

    return addrs, nil
}

Application-level caching is the most aggressive optimization. Once you resolve an external domain, you cache the IP address and reuse it for subsequent requests. This completely eliminates DNS latency for repeated calls to the same domain.

Be careful with TTLs—cache too long and you might miss legitimate IP changes. Most external APIs have stable IPs, but CDNs and load balancers may rotate addresses.

CoreDNS Tuning

Regardless of which fixes you apply, tuning CoreDNS improves overall DNS performance:

# CoreDNS ConfigMap optimization
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }
        cache 30  # Increase cache TTL
        forward . /etc/resolv.conf {
            max_concurrent 1000  # Increase concurrent queries
        }
        loop
        reload
        loadbalance
    }

Key tuning options:

cache: Increase the TTL for cached responses. Default is 30 seconds; you can safely increase to 60-300 for stable external domains.
max_concurrent: Increase the limit on concurrent forwarded queries. Default is often too low for busy clusters.

Monitoring

Set up alerts to catch DNS problems before they affect users:

# Alert on high DNS latency
- alert: HighDNSLatency
  expr: |
    histogram_quantile(0.99,
      rate(coredns_dns_request_duration_seconds_bucket[5m])
    ) > 0.1
  for: 5m
  annotations:
    summary: "DNS p99 latency > 100ms"

# Alert on high NXDOMAIN rate (ndots issue)
- alert: HighNXDOMAINRate
  expr: |
    rate(coredns_dns_responses_total{rcode="NXDOMAIN"}[5m]) /
    rate(coredns_dns_responses_total[5m]) > 0.5
  for: 10m
  annotations:
    summary: "50%+ of DNS queries returning NXDOMAIN"

The NXDOMAIN rate alert is particularly valuable for detecting the ndots problem. If more than half of your DNS queries are returning NXDOMAIN, you’re almost certainly burning queries on search domain lookups.

Checklist

## Kubernetes DNS Optimization

### Diagnosis
- [ ] Check ndots value in pods (cat /etc/resolv.conf)
- [ ] Measure DNS resolution time for external domains
- [ ] Check CoreDNS NXDOMAIN rate
- [ ] Monitor CoreDNS CPU usage

### Quick Fixes
- [ ] Add trailing dot to external domains in code
- [ ] Set ndots: 2 for pods calling external APIs
- [ ] Deploy NodeLocal DNS Cache

### Infrastructure
- [ ] Tune CoreDNS cache settings
- [ ] Increase max_concurrent for forward
- [ ] Monitor DNS latency metrics
- [ ] Alert on NXDOMAIN rate spikes

Conclusion

The Kubernetes default of ndots:5 is great for service discovery but terrible for external API calls. Every external domain triggers multiple failed queries before succeeding—a hidden tax on latency that affects every cluster.

Key takeaways:

ndots:5 means 4 extra queries for most external domains—each failed query adds latency
Use trailing dots in URLs (like api.stripe.com.) for immediate, surgical fixes
Set ndots:2 in dnsConfig for pods that primarily call external APIs
Deploy NodeLocal DNS for cluster-wide caching and reduced CoreDNS load
Monitor NXDOMAIN rate—high rates reveal the ndots problem instantly

Check your CoreDNS metrics right now. If you see 50%+ NXDOMAIN responses, the ndots tax is costing you latency on every external call.

Kubernetes Conntrack Exhaustion - Another K8s networking trap
gRPC Deadline Propagation - Latency in microservices

Kubernetes DNS: The ndots:5 Latency Tax

Understanding ndots

The DNS Search Path

Why Kubernetes Uses ndots:5

The Latency Impact

The Hidden Load on CoreDNS

Measuring the Problem

Solutions

1. Use FQDN (Trailing Dot)

2. Set Lower ndots in Pod Spec

3. Use dnsPolicy: Default

4. NodeLocal DNS Cache

5. Application-Level DNS Caching

CoreDNS Tuning

Monitoring

Checklist

Conclusion

Related posts

Cite this article

Understanding ndots

The DNS Search Path

Why Kubernetes Uses ndots:5

The Latency Impact

The Hidden Load on CoreDNS

Measuring the Problem

Solutions

1. Use FQDN (Trailing Dot)

2. Set Lower ndots in Pod Spec

3. Use dnsPolicy: Default

4. NodeLocal DNS Cache

5. Application-Level DNS Caching

CoreDNS Tuning

Monitoring

Checklist

Conclusion

Related Articles

Related posts

Cite this article