Kubernetes Cross-Zone Traffic: The Hidden Cost Eating Your Cloud Bill
We learned cross-zone traffic pricing the expensive way. “Why is our AWS bill so high? We’re not even doing that much.” I look at Cost Explorer: $5000/month in data transfer, and we’re not serving external traffic. It’s cross-zone traffic between pods.
In multi-AZ Kubernetes clusters, every service call can incur cross-zone charges if you’re not careful.
Tested on: EKS 1.28, AWS us-east-1, 3 AZs, 20 nodes
The Hidden Cost
AWS Cross-Zone Pricing
Same AZ (us-east-1a → us-east-1a): FREE
Cross-AZ (us-east-1a → us-east-1b): $0.01/GB each direction = $0.02/GB total
Example calculation:
- Service A calls Service B: 1KB request, 10KB response
- 1000 RPS = 11KB × 1000 × 86400 = 950GB/day
- If 66% cross-zone: 627GB × $0.02 = $12.54/day
- Monthly: $376 for ONE service pair!
10 chatty services × $376 = $3760/month in cross-zone traffic
Why It Happens
3-AZ cluster:
- AZ-a: 7 nodes, 30% of pods
- AZ-b: 7 nodes, 35% of pods
- AZ-c: 6 nodes, 35% of pods
Service A pod (AZ-a) → Service B ClusterIP → kube-proxy → random endpoint
↓
66% chance: different AZ!
Measuring Cross-Zone Traffic
Prometheus Metrics
# If using Istio - traffic by source/destination zone
sum(rate(istio_tcp_sent_bytes_total[5m])) by (source_workload_zone, destination_workload_zone)
# Without Istio - use AWS VPC Flow Logs or CNI metrics
AWS Cost Explorer
1. Go to AWS Cost Explorer
2. Filter: Service = "EC2-Other"
3. Group by: Usage Type
4. Look for: DataTransfer-Regional-Bytes
Monthly cross-zone cost visible immediately
Node Labels
# Check node zones
kubectl get nodes -L topology.kubernetes.io/zone
# Output:
# node-1 Ready topology.kubernetes.io/zone=us-east-1a
# node-2 Ready topology.kubernetes.io/zone=us-east-1b
# node-3 Ready topology.kubernetes.io/zone=us-east-1c
Solutions
1. Topology-Aware Routing (Kubernetes 1.21+)
# Service with topology-aware routing
apiVersion: v1
kind: Service
metadata:
name: backend
annotations:
# Deprecated in 1.27, use topologySpreadConstraints
service.kubernetes.io/topology-aware-hints: Auto
spec:
selector:
app: backend
ports:
- port: 80
2. Topology Spread Constraints
# Deployment that spreads evenly across zones
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
spec:
replicas: 6 # 2 per zone for 3 AZs
template:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: backend
3. Istio Locality Load Balancing
# DestinationRule with locality priority
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: backend
spec:
host: backend.default.svc.cluster.local
trafficPolicy:
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
connectionPool:
tcp:
maxConnections: 100
loadBalancer:
localityLbSetting:
enabled: true
# Prefer same zone, then same region
distribute:
- from: "us-east-1/us-east-1a/*"
to:
"us-east-1/us-east-1a/*": 80
"us-east-1/us-east-1b/*": 10
"us-east-1/us-east-1c/*": 10
4. Pod Anti-Affinity + Local Service
# Ensure pods in each zone
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: topology.kubernetes.io/zone
5. Internal Traffic Policy (Kubernetes 1.26+)
# Route to node-local endpoints only
apiVersion: v1
kind: Service
metadata:
name: backend
spec:
selector:
app: backend
internalTrafficPolicy: Local # Prefer same-node endpoints
ports:
- port: 80
Architecture Patterns
Pattern 1: Zone-Aware Microservices
Before (cross-zone heavy):
┌─────────────────────────────────────────────────┐
│ AZ-a │
│ [API Gateway] ─────────────────────────────────┼──→ [Service B in AZ-b]
│ │
│ [Service A] ───────────────────────────────────┼──→ [Service C in AZ-c]
└─────────────────────────────────────────────────┘
After (zone-local):
┌─────────────────────────────────────────────────┐
│ AZ-a │
│ [API Gateway] → [Service A] → [Service B] → [Service C]
│ (all local) │
└─────────────────────────────────────────────────┘
│ AZ-b │
│ [API Gateway] → [Service A] → [Service B] → [Service C]
│ (all local) │
└─────────────────────────────────────────────────┘
Pattern 2: Shard by Zone
# Database read replicas per zone
# Application connects to local replica
# ConfigMap with zone-specific config
apiVersion: v1
kind: ConfigMap
metadata:
name: db-config
data:
DB_HOST_AZ_A: "postgres-replica-a.db.svc"
DB_HOST_AZ_B: "postgres-replica-b.db.svc"
DB_HOST_AZ_C: "postgres-replica-c.db.svc"
# Init container or sidecar determines zone
# and sets appropriate DB_HOST
Pattern 3: Cache per Zone
# Redis per zone, not cross-zone
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
replicas: 3
template:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
Cost Estimation
Before Optimization
Cluster: 20 nodes, 3 AZs
Services: 15 microservices
Average service-to-service traffic: 50GB/day
Without zone awareness:
- 66% cross-zone = 33GB × $0.02 = $0.66/day per service pair
- 15 services × 14 connections / 2 = 105 pairs
- BUT not all pairs communicate
Realistic: 30 active pairs × $0.66 = $19.80/day
Monthly: ~$600 in cross-zone traffic
After Optimization
With topology-aware routing:
- 80% same-zone traffic
- 20% cross-zone = 10GB × $0.02 = $0.20/day per pair
- 30 active pairs × $0.20 = $6/day
Monthly: ~$180 in cross-zone traffic
Savings: $420/month = $5040/year
Monitoring
Prometheus Metrics
# Cross-zone traffic ratio (with Istio)
sum(rate(istio_tcp_sent_bytes_total{
source_workload_zone != destination_workload_zone
}[5m]))
/
sum(rate(istio_tcp_sent_bytes_total[5m]))
# Traffic by zone pair
sum(rate(istio_tcp_sent_bytes_total[5m])) by (
source_workload_zone,
destination_workload_zone
)
Alert Rules
groups:
- name: cross_zone_traffic
rules:
- alert: HighCrossZoneTrafficRatio
expr: |
sum(rate(istio_tcp_sent_bytes_total{source_workload_zone != destination_workload_zone}[1h]))
/
sum(rate(istio_tcp_sent_bytes_total[1h]))
> 0.5
for: 1h
annotations:
summary: "Cross-zone traffic >50% of total"
description: "Consider enabling topology-aware routing"
Checklist
## Cross-Zone Traffic Optimization
### Measurement
- [ ] Enable topology labels on nodes
- [ ] Check AWS Cost Explorer for DataTransfer-Regional
- [ ] Measure cross-zone ratio with Istio/CNI metrics
### Quick Wins
- [ ] Enable topology-aware hints on high-traffic services
- [ ] Set internalTrafficPolicy: Local where applicable
- [ ] Ensure even pod distribution across zones
### Architecture Changes
- [ ] Add read replicas per zone for databases
- [ ] Add cache layer per zone
- [ ] Consider zone-sharded services for heavy traffic
### Monitoring
- [ ] Dashboard with cross-zone traffic ratio
- [ ] Alert on cross-zone ratio > 50%
- [ ] Track monthly cross-zone cost trend
Conclusion
Cross-zone traffic is a hidden Kubernetes cost:
- $0.02/GB adds up fast with chatty microservices
- Topology-aware routing can reduce cross-zone 80%
- Measure first with Istio metrics or VPC Flow Logs
- Zone-local caching is often the biggest win
Check your AWS bill - you might be surprised.
Related Articles
- CoreDNS vs NodeLocal DNS Cache - Network optimization
- K8s CPU Throttling Autopsy - Performance tuning
Related posts
The $10k/Month AWS Mistake: NAT Gateway vs VPC Endpoints
Your private subnets use NAT Gateway for S3 and DynamoDB. You're paying $0.045/GB for free traffic. I show how VPC Endpoints save thousands monthly.
S3 Intelligent-Tiering: The Small Object Cost Trap
S3 Intelligent-Tiering saves money for large files but charges minimum 128KB overhead. For millions of small objects, it INCREASES costs. I show the math.
CoreDNS vs NodeLocal DNS Cache: Cutting Kubernetes DNS Latency by 10x
Your pods make 100 DNS queries per request. CoreDNS is a bottleneck. I benchmark NodeLocal DNS cache and show configuration for production.
Kubernetes conntrack Table Exhaustion: The Silent Packet Killer
Random DNS timeouts, dropped connections, services timing out. Your nf_conntrack table is full. I show how to diagnose, monitor, and fix this Kubernetes networking issue.
Cite this article
If you reference this post, please link to the original URL and credit the author.