CoreDNS vs NodeLocal DNS Cache: Cutting Kubernetes DNS Latency by 10x
We benchmarked NodeLocal DNSCache after a DNS incident we don’t want to repeat. “Why does every HTTP call add 5ms latency?” Every service call requires DNS lookup. Your pods talk to CoreDNS over the network. With NodeLocal DNS Cache, that drops to 0.2ms.
Tested on: Kubernetes 1.28, CoreDNS 1.11, NodeLocal DNSCache 1.22, 50-node cluster
The DNS Bottleneck
How Kubernetes DNS Works
Without NodeLocal DNS Cache:
Pod → kube-dns Service (ClusterIP) → CoreDNS Pod
└─ Network hop (5-20ms) └─ Possibly on different node
DNS path:
1. Pod makes DNS query (UDP)
2. Query goes to kube-dns ClusterIP (10.96.0.10)
3. kube-proxy/iptables routes to CoreDNS pod
4. CoreDNS resolves (cache hit or upstream query)
5. Response returns through same path
The Problem
Typical web request DNS lookups:
1. Service discovery: api.default.svc.cluster.local
2. Database: postgres.db.svc.cluster.local
3. Cache: redis.cache.svc.cluster.local
4. External API: api.stripe.com
4 DNS lookups × 5ms = 20ms added latency per request!
At 1000 RPS:
- 4000 DNS queries/sec to CoreDNS
- CoreDNS becomes bottleneck
- Tail latency increases
NodeLocal DNS Cache
How It Works
With NodeLocal DNS Cache:
Pod → NodeLocal DaemonSet → CoreDNS (only on cache miss)
└─ Local (0.2ms) └─ Network (5ms, rare)
NodeLocal runs as DaemonSet:
- One pod per node
- Listens on link-local IP (169.254.20.10)
- Caches responses locally
- Falls back to CoreDNS on miss
Installation
# Download NodeLocal DNS manifest
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml
# Or with Helm
helm install nodelocaldns stable/nodelocaldns \
--set config.localDNS=169.254.20.10 \
--set config.clusterDNS=10.96.0.10
Configuration
# nodelocaldns-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: node-local-dns
namespace: kube-system
data:
Corefile: |
cluster.local:53 {
errors
cache {
success 9984 30 # Cache 30 seconds
denial 9984 5 # Cache NXDOMAIN 5 seconds
}
reload
loop
bind 169.254.20.10
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
health 169.254.20.10:8080
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 169.254.20.10
forward . __PILLAR__UPSTREAM__SERVERS__
prometheus :9253
}
Pod Configuration
# Option 1: Modify kubelet to use NodeLocal
# /var/lib/kubelet/config.yaml
clusterDNS:
- 169.254.20.10 # NodeLocal first
- 10.96.0.10 # Fallback to CoreDNS
# Option 2: Per-pod dnsConfig
apiVersion: v1
kind: Pod
spec:
dnsPolicy: None
dnsConfig:
nameservers:
- 169.254.20.10
searches:
- default.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: "5"
Benchmark Results
Test Setup
// dns_benchmark.go
package main
import (
"net"
"testing"
"time"
)
func BenchmarkDNSLookup(b *testing.B) {
hosts := []string{
"kubernetes.default.svc.cluster.local",
"kube-dns.kube-system.svc.cluster.local",
}
for i := 0; i < b.N; i++ {
for _, host := range hosts {
_, err := net.LookupHost(host)
if err != nil {
b.Fatal(err)
}
}
}
}
Results
CoreDNS Only (network path):
Latency p50: 5.2ms
Latency p99: 28.4ms
Latency p999: 89.2ms
Queries/sec: 8,500
NodeLocal DNS Cache (local path):
Latency p50: 0.18ms (29x faster)
Latency p99: 0.45ms (63x faster)
Latency p999: 1.2ms (74x faster)
Queries/sec: 45,000 (5x higher)
Cache hit rate: 92% (typical production)
Load Test
# Using dnsperf
dnsperf -s 169.254.20.10 -d queries.txt -l 60 -c 100
# Results with NodeLocal:
# Queries sent: 2,812,456
# Queries completed: 2,812,456
# Queries lost: 0 (0.00%)
# Response codes: NOERROR 2,812,456 (100.00%)
# Average latency: 0.21ms
# Maximum latency: 2.34ms
Monitoring
Prometheus Metrics
# Cache hit rate
sum(rate(coredns_cache_hits_total{server="dns://:53"}[5m]))
/
sum(rate(coredns_dns_requests_total{server="dns://:53"}[5m]))
# DNS latency (NodeLocal)
histogram_quantile(0.99,
sum(rate(coredns_dns_request_duration_seconds_bucket[5m])) by (le)
)
# Upstream forward latency (CoreDNS)
histogram_quantile(0.99,
sum(rate(coredns_forward_request_duration_seconds_bucket[5m])) by (le)
)
Alert Rules
groups:
- name: dns
rules:
- alert: DNSLatencyHigh
expr: |
histogram_quantile(0.99, sum(rate(coredns_dns_request_duration_seconds_bucket[5m])) by (le))
> 0.01
for: 10m
labels:
severity: warning
annotations:
summary: "DNS p99 latency > 10ms"
- alert: NodeLocalDNSDown
expr: |
up{job="nodelocaldns"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "NodeLocal DNS not running on {{ $labels.node }}"
Troubleshooting
DNS Not Using NodeLocal
# Check resolv.conf in pod
kubectl exec -it mypod -- cat /etc/resolv.conf
# Should show:
# nameserver 169.254.20.10
# If shows 10.96.0.10, check kubelet config
NodeLocal Pod Crashing
# Check logs
kubectl logs -n kube-system -l k8s-app=node-local-dns
# Common issues:
# - Port conflict (another process on 53)
# - Link-local IP already in use
# - Insufficient permissions (needs NET_ADMIN)
Cache Not Working
# Check cache stats
kubectl exec -n kube-system node-local-dns-xxxxx -- \
wget -qO- http://localhost:9253/metrics | grep cache
# Look for:
# coredns_cache_hits_total
# coredns_cache_misses_total
Production Configuration
Optimized Settings
# nodelocaldns-configmap.yaml
data:
Corefile: |
cluster.local:53 {
errors
cache {
success 9984 60 # Cache success 60 seconds
denial 9984 10 # Cache NXDOMAIN 10 seconds
prefetch 10 1m 10% # Prefetch popular entries
}
reload
loop
bind 169.254.20.10
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
max_concurrent 1000 # Higher concurrency
}
prometheus :9253
health 169.254.20.10:8080
ready 169.254.20.10:8181
}
Resource Limits
# DaemonSet resource limits
resources:
requests:
cpu: 25m
memory: 32Mi
limits:
cpu: 100m
memory: 128Mi
ndots Optimization
The Problem
# Default ndots=5 in Kubernetes
# Query: api.stripe.com
# DNS search order:
1. api.stripe.com.default.svc.cluster.local (NXDOMAIN)
2. api.stripe.com.svc.cluster.local (NXDOMAIN)
3. api.stripe.com.cluster.local (NXDOMAIN)
4. api.stripe.com. (SUCCESS)
# 4 DNS queries for one external name!
Solution
# Pod spec with reduced ndots
spec:
dnsConfig:
options:
- name: ndots
value: "2" # Reduced from 5
# Or add trailing dot for external names
# api.stripe.com. (absolute name, no search)
Checklist
## NodeLocal DNS Cache Setup
### Installation
- [ ] Deploy NodeLocal DaemonSet
- [ ] Configure kubelet clusterDNS
- [ ] Verify pods use 169.254.20.10
### Configuration
- [ ] Set appropriate cache TTLs
- [ ] Enable prefetch for popular entries
- [ ] Configure resource limits
### Monitoring
- [ ] Dashboard with cache hit rate
- [ ] Alert on DNS latency > 10ms
- [ ] Alert on NodeLocal pod failures
### Optimization
- [ ] Consider reducing ndots
- [ ] Use absolute DNS names for external services
- [ ] Monitor cache hit rates
Conclusion
DNS is a hidden Kubernetes bottleneck:
- Every service call needs DNS lookup
- CoreDNS over network adds 5-20ms per query
- NodeLocal cache reduces to 0.2ms (29x faster)
- 92% cache hit rate in production
Install NodeLocal DNS Cache and cut your tail latency.
Related Articles
- K8s CPU Throttling Autopsy - Performance tuning
- HTTP Keep-Alive Connection Reset - Network optimization
Related posts
Kubernetes DNS: The ndots:5 Latency Tax
Every DNS query in K8s makes 5 failed lookups before succeeding. ndots:5 default causes 100ms+ latency. Here's how to fix it properly.
Kubernetes conntrack Table Exhaustion: The Silent Packet Killer
Random DNS timeouts, dropped connections, services timing out. Your nf_conntrack table is full. I show how to diagnose, monitor, and fix this Kubernetes networking issue.
Go cgo DNS Resolution Thread Explosion: When net.LookupHost Spawns Thousands of Threads
Go application suddenly has 10,000 threads consuming all memory. The cause: cgo-based DNS resolution blocking in slow DNS environments, bypassing Go's goroutine scheduler.
Kubernetes Headless Service DNS: Stale Records After Pod Deletion
Requests go to non-existent pods. The cause: headless service DNS records persist in client DNS cache after pods are deleted, before endpoints update propagates.
Cite this article
If you reference this post, please link to the original URL and credit the author.