TCP TIME_WAIT Port Exhaustion: When Connection Pooling Isn't Enough
The day we ran out of ports, I finally respected TIME_WAIT. “Suddenly can’t connect to database - address already in use.” The error made no sense. The database was healthy. The network was fine. Our connection pool was configured correctly—we checked three times. Yet our service was throwing connection errors during a traffic spike, claiming it couldn’t bind to an address.
The answer was hiding in ss -tan state time-wait: 27,000 sockets. Nearly all of our ephemeral port range was consumed by TCP TIME_WAIT sockets—connections that had closed but were being held open by the kernel for the mandated 60-second safety period. Each socket blocked a source-port combination from being reused. When we ran out of ports, we couldn’t create new connections to the database.
What made this particularly frustrating was that we had a connection pool. We’d done everything right—or so we thought. But somewhere in the codebase, a developer had created a new database client for each request instead of using the shared pool. The pool was configured correctly; it just wasn’t being used. Every request opened a new connection, used it once, and closed it. The connection was gone from the application’s perspective, but the kernel kept the socket in TIME_WAIT for 60 seconds.
TIME_WAIT is a fascinating example of TCP’s conservative design. It exists to prevent old packets from a closed connection from corrupting a new connection that reuses the same 4-tuple (source IP, source port, destination IP, destination port). It’s a safety feature. But at high throughput, it becomes a resource exhaustion vector. The math is simple: if you close 500 connections per second to the same destination, and each connection holds a port for 60 seconds, you need 30,000 ports—more than the default ephemeral range provides.
Environment: High-throughput services, microservices with many outbound connections, connection pool misconfigurations, short-lived HTTP connections
The Problem
The Mysterious Connection Failures
Timeline of port exhaustion:
T+0:00 Service running normally
Outbound connections to DB, cache, APIs
Ephemeral port range: 32768-60999 (28,231 ports)
T+0:10 Traffic spike - 1000 req/sec
Each request makes 3 outbound calls
3000 connections/sec opened and closed
T+0:30 TIME_WAIT sockets accumulating
Each stays for 60 seconds
3000 × 60 = 180,000 sockets needed!
T+0:35 Error: cannot assign requested address
All ephemeral ports to DB IP:port exhausted
New connections impossible
What TIME_WAIT Actually Is
TCP Connection Lifecycle:
Client Server
| |
|-------- SYN ------------------> |
|<------- SYN-ACK --------------- |
|-------- ACK ------------------> |
| ESTABLISHED |
|<======= DATA ==================> |
| |
|-------- FIN ------------------> | Client initiates close
|<------- ACK ------------------- |
|<------- FIN ------------------- |
|-------- ACK ------------------> |
| |
| TIME_WAIT (2 × MSL = 60s) | ← Socket unusable!
| |
↓ Finally closed |
Why TIME_WAIT exists:
1. Ensure final ACK reaches server
2. Let duplicate packets expire
3. Prevent old packets corrupting new connection
Problem: High-throughput = many TIME_WAITs
Root Cause
The 4-Tuple Problem
TCP socket identified by 4-tuple:
(source_ip, source_port, dest_ip, dest_port)
Your service: 10.0.1.50
Database: 10.0.2.100:5432
Available combinations:
10.0.1.50:32768 → 10.0.2.100:5432
10.0.1.50:32769 → 10.0.2.100:5432
...
10.0.1.50:60999 → 10.0.2.100:5432
Only 28,231 unique 4-tuples possible!
At 500 connections/sec with 60s TIME_WAIT:
500 × 60 = 30,000 sockets needed
30,000 > 28,231 available → EXHAUSTION
Check Your Socket State
# Count sockets by state
ss -tan | awk '{print $1}' | sort | uniq -c | sort -rn
# Output:
# 24567 TIME-WAIT
# 2341 ESTABLISHED
# 234 LISTEN
# 45 FIN-WAIT-2
# TIME_WAIT to specific destination
ss -tan state time-wait | grep "10.0.2.100:5432" | wc -l
# 23456 ← Almost all ports used!
# Check ephemeral port range
cat /proc/sys/net/ipv4/ip_local_port_range
# 32768 60999
The Real Math
# Calculate maximum sustainable rate
# Ephemeral ports: 60999 - 32768 = 28,231
# TIME_WAIT duration: 60 seconds
# Max new connections/sec: 28,231 / 60 = 470/sec per destination
# If you have 10 unique destinations:
# Max total: 4,700 new connections/sec
# But if all go to ONE destination (your DB):
# Max: 470 connections/sec sustained
# Reality check your workload:
ss -tan state time-wait dst 10.0.2.100:5432 | wc -l
# If near 28,231 → you're at the limit
Diagnosis
Identify the Bottleneck
# Find top TIME_WAIT destinations
ss -tan state time-wait | awk '{print $4}' | sort | uniq -c | sort -rn | head
# 23456 10.0.2.100:5432 ← Database
# 3421 10.0.3.50:6379 ← Redis
# 1234 10.0.4.100:80 ← API service
# If one destination dominates → that's your bottleneck
# Check for connection pool bypass
netstat -an | grep "10.0.2.100:5432" | grep -c ESTABLISHED
# Should be stable (pool size), not fluctuating
Application-Level Diagnosis
# Trace connection creation (Java example)
jcmd <pid> VM.native_memory summary | grep -A5 "Internal"
# Check HikariCP pool stats
curl localhost:8080/actuator/metrics/hikaricp.connections.active
curl localhost:8080/actuator/metrics/hikaricp.connections.idle
# If connections created > pool size
# → Pool is bypassed or misconfigured
The Fix
Option 1: Reuse Connections (Best)
# HikariCP - proper pool sizing
spring:
datasource:
hikari:
maximum-pool-size: 20
minimum-idle: 10
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000
# KEY: Don't create new connections for every query!
// Go - configure connection pool
db, _ := sql.Open("postgres", connStr)
db.SetMaxOpenConns(20)
db.SetMaxIdleConns(10)
db.SetConnMaxLifetime(30 * time.Minute)
db.SetConnMaxIdleTime(10 * time.Minute)
# Python - SQLAlchemy pool
engine = create_engine(
"postgresql://...",
pool_size=20,
max_overflow=10,
pool_pre_ping=True,
pool_recycle=1800
)
Option 2: HTTP Keep-Alive
// Go HTTP client - reuse connections
client := &http.Client{
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 100,
IdleConnTimeout: 90 * time.Second,
// Reuse TCP connections for multiple requests
},
}
// WRONG: Creating new client per request
func badHandler(w http.ResponseWriter, r *http.Request) {
client := &http.Client{} // New client = new connections!
resp, _ := client.Get("http://api/endpoint")
}
// CORRECT: Reuse client
var httpClient = &http.Client{...} // Package-level
func goodHandler(w http.ResponseWriter, r *http.Request) {
resp, _ := httpClient.Get("http://api/endpoint")
}
Option 3: TCP Tuning (Careful!)
# Expand ephemeral port range
echo "1024 65535" > /proc/sys/net/ipv4/ip_local_port_range
# Now 64,511 ports instead of 28,231
# Enable TIME_WAIT reuse (requires timestamps)
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
# Allows reusing TIME_WAIT sockets for new outbound connections
# SAFE for client-side connections
# WARNING: tcp_tw_recycle is DANGEROUS and removed in Linux 4.12
# It breaks connections through NAT
# NEVER use tcp_tw_recycle
# Reduce TIME_WAIT duration (not recommended)
# Linux doesn't support changing this directly
# Would require kernel recompilation
# Kubernetes sysctl tuning
apiVersion: v1
kind: Pod
spec:
securityContext:
sysctls:
- name: net.ipv4.ip_local_port_range
value: "1024 65535"
- name: net.ipv4.tcp_tw_reuse
value: "1"
Option 4: Multiple Source IPs
# If connecting to single destination, add source IPs
# Each source IP gets its own port range
ip addr add 10.0.1.51/24 dev eth0
ip addr add 10.0.1.52/24 dev eth0
# Configure application to rotate source IPs
# Effective port range multiplied by number of IPs
// Go - bind to specific source IP
dialer := &net.Dialer{
LocalAddr: &net.TCPAddr{
IP: net.ParseIP("10.0.1.51"),
},
}
transport := &http.Transport{
DialContext: dialer.DialContext,
}
Option 5: SO_LINGER (Last Resort)
// Force immediate socket close (dangerous!)
conn, _ := net.Dial("tcp", "10.0.2.100:5432")
tcpConn := conn.(*net.TCPConn)
// Set linger to 0 = send RST instead of FIN
// Avoids TIME_WAIT but can lose data!
tcpConn.SetLinger(0)
tcpConn.Close()
// WARNING: This can cause:
// - Lost data if send buffer not empty
// - Server receives RST (connection reset)
// - Only use for read-only connections or non-critical
Monitoring
groups:
- name: tcp-exhaustion
rules:
- alert: TimeWaitSocketsHigh
expr: |
node_sockstat_TCP_tw > 20000
for: 5m
labels:
severity: warning
annotations:
summary: "{{ $value }} sockets in TIME_WAIT"
- alert: EphemeralPortsLow
expr: |
(node_sockstat_TCP_tw + node_sockstat_TCP_alloc) /
(node_nf_conntrack_entries_limit) > 0.8
for: 5m
labels:
severity: critical
annotations:
summary: "Ephemeral port exhaustion imminent"
- alert: ConnectionPoolBypass
expr: |
rate(hikaricp_connections_creation_seconds_count[5m]) > 10
for: 10m
labels:
severity: warning
annotations:
summary: "High connection creation rate - pool may be bypassed"
# Quick monitoring script
watch -n 5 'echo "TIME_WAIT:"; ss -tan state time-wait | wc -l;
echo "ESTABLISHED:"; ss -tan state established | wc -l;
echo "Top destinations:"; ss -tan state time-wait | awk "{print \$4}" | sort | uniq -c | sort -rn | head -5'
Checklist
## TCP TIME_WAIT Port Exhaustion
### Diagnosis
- [ ] Count TIME_WAIT sockets: ss -tan state time-wait | wc -l
- [ ] Identify top destinations by TIME_WAIT count
- [ ] Check ephemeral port range: cat /proc/sys/net/ipv4/ip_local_port_range
- [ ] Calculate max sustainable rate per destination
### Application Fixes (Do First)
- [ ] Verify connection pool is configured
- [ ] Check pool is actually being used (not bypassed)
- [ ] Enable HTTP keep-alive for API calls
- [ ] Reuse HTTP clients across requests
### System Tuning (If Needed)
- [ ] Expand ephemeral port range: 1024-65535
- [ ] Enable tcp_tw_reuse (safe for clients)
- [ ] Consider multiple source IPs for single destination
- [ ] NEVER use tcp_tw_recycle
### Monitoring
- [ ] Alert on TIME_WAIT socket count
- [ ] Alert on connection creation rate
- [ ] Monitor port range utilization
Conclusion
TCP TIME_WAIT is one of those features that’s invisible until it breaks you. It exists for good reasons—preventing packet corruption across connection reuse—but at high throughput, those 60 seconds of socket hold time accumulate into resource exhaustion. The symptom is “cannot assign requested address,” which doesn’t obviously point to TIME_WAIT. You have to know to check socket states.
The fundamental fix is almost always connection reuse. If you’re pooling connections properly, you don’t create and destroy thousands of sockets per second. The pool maintains persistent connections, reuses them for multiple requests, and TIME_WAIT never accumulates. The problem only emerges when pooling is misconfigured or bypassed—and it’s remarkably easy to bypass a pool accidentally in code.
The system tuning options—expanding port range, enabling tcp_tw_reuse—are legitimate but secondary. They raise the ceiling but don’t fix the underlying issue. If your application is creating connections faster than it can reuse ports, expanding the port range just delays the exhaustion. Fix the application first; tune the system if you still need headroom.
Key principles:
- Connection pooling is the fix—reuse connections, don’t create new ones for each request
- TIME_WAIT exists for safety—don’t try to eliminate it, work around it with reuse
- tcp_tw_reuse is safe for client-side connections with timestamps enabled
- tcp_tw_recycle is dangerous—removed from modern kernels because it breaks NAT
- Monitor TIME_WAIT counts—they’re your early warning for connection misuse
Check your TIME_WAIT count now. If it’s in the thousands and climbing, your connection pools might not be doing what you think they’re doing.
Related Articles
- Database Connection Pool Exhaustion - Pool configuration issues
- Kubernetes Service Connection Issues - K8s networking problems
Related posts
tcpdump Sees SYNs, but the Service Times Out: The Listen Backlog Trap
Clients time out, tcpdump shows SYNs (sometimes even SYN-ACK), yet your app logs nothing. The culprit is often the Linux listen/accept queues overflowing under load or CPU starvation.
kube-proxy Micro-Outages: The xtables Lock Contention Problem
Random 1-3 second connection drops during deployments. CPU looks fine, memory is stable. The hidden cause: iptables-restore grabbing the xtables lock while endpoints churn.
PMTU Blackholes: When Only Large Responses Hang
Small API responses work, large ones hang forever. The cause: ICMP 'Fragmentation Needed' messages filtered by firewalls, breaking Path MTU Discovery in overlay networks.
Ephemeral Port Exhaustion: The Node That 'Goes Bad'
A single Kubernetes node starts failing connections to external services while pods look healthy. The hidden cause: sidecar proxies exhausting ephemeral ports with short-lived connections.
Cite this article
If you reference this post, please link to the original URL and credit the author.