Back to blog

The $10k/Month AWS Mistake: NAT Gateway vs VPC Endpoints

|
| aws, cost-optimization, networking, vpc, nat-gateway, cloud

I only noticed this topic because our AWS bill suddenly looked like a phone number. “Why is our AWS data transfer bill $15,000/month?” I checked the architecture: private subnets routing all traffic through NAT Gateway. Including S3 and DynamoDB. That’s paying for traffic that should be free.

Tested on: AWS us-east-1, EKS cluster with 50 nodes, 100TB/month S3 traffic

The Problem

Typical Private Subnet Setup

Default architecture (expensive):

┌─────────────────────────────────────────────────────────────┐
│                     Private Subnet                          │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐                     │
│  │   EKS   │  │   EKS   │  │   EKS   │                     │
│  │  Node   │  │  Node   │  │  Node   │                     │
│  └────┬────┘  └────┬────┘  └────┬────┘                     │
│       │            │            │                           │
│       └────────────┼────────────┘                           │
│                    │                                        │
│                    ▼                                        │
│              ┌──────────┐                                   │
│              │   NAT    │  ← $0.045/GB data processing      │
│              │ Gateway  │  ← $0.045/hour per gateway        │
│              └────┬─────┘                                   │
└───────────────────┼─────────────────────────────────────────┘


            ┌───────────────┐
            │   Internet    │
            │   Gateway     │
            └───────┬───────┘

        ┌───────────┼───────────┐
        ▼           ▼           ▼
    ┌──────┐   ┌─────────┐  ┌───────┐
    │  S3  │   │DynamoDB │  │  ECR  │
    └──────┘   └─────────┘  └───────┘

All AWS service traffic goes through NAT = paying for free traffic

Cost Breakdown

Scenario: 100TB/month S3 traffic from private subnet

Via NAT Gateway:
  Data processing:  100,000 GB × $0.045 = $4,500/month
  Hourly charge:    720 hours × $0.045 × 3 AZs = $97/month
  Total NAT cost:   $4,597/month

Via VPC Gateway Endpoint (S3):
  Data processing:  $0 (free!)
  Hourly charge:    $0 (free!)
  Total:            $0/month

Monthly savings: $4,597
Annual savings:  $55,164

And that's just S3. Add DynamoDB, ECR, and other services...

VPC Endpoints Types

Gateway Endpoints (Free)

Supported services:
  - S3
  - DynamoDB

Characteristics:
  - Route table entry (no ENI)
  - No hourly or data charges
  - Regional scope
  - Must be in same region as bucket/table

Interface Endpoints (Paid)

Supported services:
  - ECR (ecr.api, ecr.dkr)
  - Secrets Manager
  - SSM
  - CloudWatch
  - SQS, SNS
  - And 100+ more

Characteristics:
  - ENI in your subnet
  - $0.01/hour per AZ
  - $0.01/GB data processed
  - But STILL cheaper than NAT for heavy traffic

Implementation

S3 Gateway Endpoint

# terraform/vpc_endpoints.tf

# S3 Gateway Endpoint (FREE)
resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.s3"
  vpc_endpoint_type = "Gateway"

  route_table_ids = [
    aws_route_table.private_a.id,
    aws_route_table.private_b.id,
    aws_route_table.private_c.id,
  ]

  tags = {
    Name = "s3-gateway-endpoint"
  }
}

# DynamoDB Gateway Endpoint (FREE)
resource "aws_vpc_endpoint" "dynamodb" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.dynamodb"
  vpc_endpoint_type = "Gateway"

  route_table_ids = [
    aws_route_table.private_a.id,
    aws_route_table.private_b.id,
    aws_route_table.private_c.id,
  ]

  tags = {
    Name = "dynamodb-gateway-endpoint"
  }
}

ECR Interface Endpoints

# ECR needs TWO endpoints: api and dkr

resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = {
    Name = "ecr-api-endpoint"
  }
}

resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = {
    Name = "ecr-dkr-endpoint"
  }
}

# ECR also needs S3 endpoint for image layers!
# (Already created above)

# Security group for interface endpoints
resource "aws_security_group" "vpc_endpoints" {
  name        = "vpc-endpoints"
  description = "Security group for VPC endpoints"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [aws_vpc.main.cidr_block]
  }

  tags = {
    Name = "vpc-endpoints-sg"
  }
}

Common Endpoints for EKS

# Complete EKS-optimized endpoint setup

locals {
  interface_endpoints = [
    "ecr.api",
    "ecr.dkr",
    "logs",           # CloudWatch Logs
    "monitoring",     # CloudWatch Metrics
    "sts",            # STS for IAM roles
    "ssm",            # Systems Manager
    "ssmmessages",    # Session Manager
    "ec2messages",    # EC2 messages
    "autoscaling",    # Auto Scaling
    "elasticloadbalancing",  # ALB/NLB
  ]
}

resource "aws_vpc_endpoint" "interface_endpoints" {
  for_each = toset(local.interface_endpoints)

  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.${each.value}"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = {
    Name = "${each.value}-endpoint"
  }
}

Cost Comparison

Real-World Scenario

EKS cluster with 50 nodes:
- 100TB/month S3 traffic (logs, artifacts, backups)
- 10TB/month ECR pulls
- 5TB/month DynamoDB
- 2TB/month CloudWatch Logs

WITHOUT VPC Endpoints (all via NAT):
┌─────────────────────────────────────────────────────────┐
│ Service        │ Traffic  │ NAT Cost    │ Monthly      │
├─────────────────────────────────────────────────────────┤
│ S3             │ 100 TB   │ $0.045/GB   │ $4,500       │
│ ECR            │ 10 TB    │ $0.045/GB   │ $450         │
│ DynamoDB       │ 5 TB     │ $0.045/GB   │ $225         │
│ CloudWatch     │ 2 TB     │ $0.045/GB   │ $90          │
│ NAT hourly     │ 3 AZs    │ $0.045/hr   │ $97          │
├─────────────────────────────────────────────────────────┤
│ TOTAL          │          │             │ $5,362/month │
└─────────────────────────────────────────────────────────┘

WITH VPC Endpoints:
┌─────────────────────────────────────────────────────────┐
│ Service        │ Traffic  │ Endpoint Cost│ Monthly     │
├─────────────────────────────────────────────────────────┤
│ S3 (Gateway)   │ 100 TB   │ FREE         │ $0          │
│ DynamoDB (GW)  │ 5 TB     │ FREE         │ $0          │
│ ECR (Interface)│ 10 TB    │ $0.01/GB     │ $100        │
│ CloudWatch (IF)│ 2 TB     │ $0.01/GB     │ $20         │
│ Endpoint hourly│ 10 eps   │ $0.01/hr×3AZ │ $216        │
│ NAT (reduced)  │ ext only │ $0.045/hr    │ $32         │
├─────────────────────────────────────────────────────────┤
│ TOTAL          │          │              │ $368/month  │
└─────────────────────────────────────────────────────────┘

Monthly savings: $4,994
Annual savings:  $59,928

Verification

Check Traffic Path

# From an EC2 instance in private subnet

# Before endpoint: traffic goes via NAT (public IP)
curl -s http://169.254.169.254/latest/meta-data/public-ipv4
# Returns NAT Gateway's public IP

# Test S3 connectivity
aws s3 ls s3://my-bucket --debug 2>&1 | grep "endpoint"
# Look for: "Endpoint: s3.us-east-1.amazonaws.com"

# After S3 Gateway Endpoint
traceroute s3.us-east-1.amazonaws.com
# Should show internal AWS routing, no NAT hop

Verify Endpoint Usage

# Check VPC Flow Logs for endpoint traffic
# Gateway endpoints: traffic stays within VPC
# Interface endpoints: traffic goes to endpoint ENI

# CloudWatch Insights query
fields @timestamp, srcAddr, dstAddr, dstPort, bytes
| filter dstPort = 443
| filter srcAddr like /^10\./
| stats sum(bytes) as totalBytes by dstAddr
| sort totalBytes desc
| limit 20

Common Pitfalls

1. S3 Cross-Region Access

# Gateway endpoint only works for SAME region
# Cross-region S3 access still goes via NAT or internet

# Solution: Use S3 Transfer Acceleration or replicate to same region
# Or accept NAT cost for cross-region (usually small traffic)

2. Missing ECR Layer Endpoint

ECR pull requires THREE endpoints:
1. ecr.api     - ECR API calls
2. ecr.dkr     - Docker registry protocol
3. s3          - Image layers stored in S3!

Missing S3 endpoint = ECR pulls fail or go via NAT

3. Private DNS Not Enabled

# Interface endpoint with private_dns_enabled = false
# Means service URL doesn't resolve to endpoint

# Must use endpoint-specific DNS:
# vpce-xxx.ecr.us-east-1.vpce.amazonaws.com

# Better: Enable private DNS
private_dns_enabled = true
# Now ecr.us-east-1.amazonaws.com resolves to endpoint ENI

4. Security Group Blocking

# Interface endpoints need HTTPS (443) from VPC CIDR
resource "aws_security_group" "vpc_endpoints" {
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [aws_vpc.main.cidr_block]  # Whole VPC
  }
}

Monitoring

CloudWatch Metrics

# VPC Endpoint metrics
aws cloudwatch get-metric-statistics \
  --namespace AWS/PrivateLinkEndpoints \
  --metric-name BytesProcessed \
  --dimensions Name=VpcEndpointId,Value=vpce-xxx \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-31T23:59:59Z \
  --period 86400 \
  --statistics Sum

Cost Explorer

Filter by:
  Service: EC2 - Other
  Usage Type: DataTransfer-Regional-Bytes

Group by: Operation

Look for:
  - NatGateway-Bytes (should decrease)
  - VPCEndpoint-Bytes (new category)

Checklist

## VPC Endpoints Cost Optimization

### Free Gateway Endpoints (Priority 1)
- [ ] Create S3 Gateway Endpoint
- [ ] Create DynamoDB Gateway Endpoint
- [ ] Add to all private route tables
- [ ] Verify S3 traffic bypasses NAT

### High-Traffic Interface Endpoints (Priority 2)
- [ ] ECR endpoints (api + dkr)
- [ ] CloudWatch Logs endpoint
- [ ] Secrets Manager (if used)
- [ ] Enable private DNS

### Verification
- [ ] Check NAT Gateway data processing (should drop)
- [ ] Verify ECR pulls work from private subnets
- [ ] Test S3 access from private subnets

### Monitoring
- [ ] Track endpoint BytesProcessed
- [ ] Compare NAT costs before/after
- [ ] Alert on endpoint errors

Conclusion

Stop paying for free AWS traffic:

  1. S3 and DynamoDB Gateway Endpoints are FREE
  2. Interface Endpoints are cheaper than NAT for heavy traffic
  3. ECR needs three endpoints (api, dkr, s3)
  4. $50k+/year savings is common for medium clusters

Check your NAT Gateway costs today. You’re probably overpaying.


Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "The $10k/Month AWS Mistake: NAT Gateway vs VPC Endpoints". https://www.michal-drozd.com/en/blog/aws-nat-gateway-vs-vpc-endpoints/ (Published July 1, 2025).