Back to blog

Linux ARP Cache Stale Entries: Failover Traffic Blackhole

We failed over cleanly, then traffic disappeared into a stale ARP cache. “Load balancer failover completed but traffic still going to dead node.” The monitoring showed the database primary had switched to the standby node five minutes ago. Keepalived reported the VIP was active on the new primary. The new primary was receiving connections. But half our application servers were timing out, unable to connect to the database.

The cause was something that doesn’t appear in any application log or monitoring dashboard: Linux ARP cache. Each server had cached the MAC address of the old primary, and they kept sending packets to that MAC address even after the IP moved to a different machine. The packets went to a server that was either powered off or no longer owned that IP—a network blackhole where packets enter and never return.

This incident taught me that network failover isn’t just about IP addresses. It’s about the entire Layer 2 to Layer 3 mapping that every host maintains in its ARP cache. When you move an IP to a new server, you’re changing which MAC address should receive traffic for that IP. And every client that has the old mapping cached will continue sending traffic to the old destination until their cache expires or is forcibly updated.

What made this particularly frustrating was the inconsistency. Some application servers worked fine—they happened to have empty ARP caches or had recently refreshed them. Others failed completely. The “some work, some don’t” pattern initially led us down the wrong debugging path, suspecting application bugs or load balancer misconfiguration.

Environment: Linux servers, VRRP/keepalived/floating IPs, database failovers, load balancer migrations

The Problem

Traffic Blackhole After Failover

Database failover timeline:

T+0:00   Primary (10.0.1.10, MAC aa:bb:cc:11:22:33) is healthy
         App servers have ARP: 10.0.1.10 → aa:bb:cc:11:22:33

T+0:30   Primary fails, keepalived triggers failover
T+0:31   Secondary takes over VIP 10.0.1.10
         Secondary MAC: aa:bb:cc:44:55:66
         Secondary sends gratuitous ARP

T+0:32   Some app servers update ARP cache ✓
T+0:33   Other app servers still have old cache:
         10.0.1.10 → aa:bb:cc:11:22:33 (dead MAC!)

T+0:35   Traffic from stale-cache servers → blackhole
         Packets sent to old MAC, never reach new primary
         Connection timeouts, application errors

T+5:00   ARP cache expires, gets refreshed
         Finally works correctly

Why Gratuitous ARP Doesn’t Always Work

You might be thinking, “Doesn’t gratuitous ARP solve this?” In theory, yes. When a server takes over a VIP, it broadcasts a gratuitous ARP—an unsolicited ARP reply that says “this IP is now at my MAC address.” Every host on the network should update their cache.

In practice, gratuitous ARP is unreliable for several reasons. First, it’s a broadcast, and broadcasts don’t always reach everywhere you expect. Switches might not flood to all ports. Network firewalls might drop ARP packets. VLANs might not propagate broadcasts correctly. Second, Linux has conservative ARP cache update rules—it doesn’t blindly accept unsolicited updates, especially for entries it considers “confirmed.”

Gratuitous ARP propagation issues:

┌─────────────────────────────────────────────────────────────┐
│ Secondary sends: "10.0.1.10 is at aa:bb:cc:44:55:66"       │
│                                                             │
│ This is a broadcast, but...                                │
│                                                             │
│ ✗ Switches may not flood to all ports                      │
│ ✗ Some hosts ignore unsolicited ARP updates                │
│ ✗ Linux gc_stale_time prevents immediate update            │
│ ✗ Network firewalls may drop ARP broadcasts                │
│ ✗ VLANs may not propagate gratuitous ARP correctly         │
└─────────────────────────────────────────────────────────────┘

Linux ARP cache behavior:
- gc_stale_time: 60s default - won't refresh before this
- base_reachable_time: 30s - considers entry valid
- Even with gratuitous ARP, may prefer existing "confirmed" entry

Root Cause

Linux ARP Cache States

Understanding the ARP cache state machine is essential for debugging these issues. Linux doesn’t simply have “cached” and “not cached” entries. Each ARP entry goes through a lifecycle of states, and critically, even “stale” entries are still used for sending traffic.

The key insight is that STALE doesn’t mean “don’t use.” It means “use, but maybe verify soon.” When you send a packet to a STALE entry, Linux delivers the packet using the cached MAC address while simultaneously initiating a probe to verify the entry is still valid. If the probe fails, the entry eventually moves to FAILED. But during that transition—which can take 60+ seconds—traffic continues flowing to the old, possibly dead, MAC address.

This design makes sense for normal operation: you don’t want to delay every packet while waiting for ARP verification. But during failover, it means your servers stubbornly continue sending traffic to the old primary for up to two minutes.

ARP entry lifecycle:

INCOMPLETE → REACHABLE → STALE → DELAY → PROBE → FAILED
    ↑                       │
    └───────────────────────┘ (or refresh)

State descriptions:
- REACHABLE: Recently confirmed, used directly
- STALE: Not confirmed recently, but still used (!)
- DELAY: About to probe for confirmation
- PROBE: Actively probing (unicast ARP)
- FAILED: Unreachable

The problem: STALE entries are USED, not refreshed immediately!

Default Timeouts

# Check ARP cache parameters
sysctl net.ipv4.neigh.default.gc_stale_time
# 60 seconds - how long before STALE is garbage collected

sysctl net.ipv4.neigh.default.base_reachable_time_ms
# 30000 ms - how long entry is REACHABLE

sysctl net.ipv4.neigh.default.gc_thresh3
# 1024 - max entries before aggressive GC

# Total time entry can be stale but used: gc_stale_time + probing
# Can be 60-120 seconds of traffic to wrong destination!

Diagnosis

Check ARP Cache State

# View ARP cache with state
ip neigh show

# Output:
# 10.0.1.10 dev eth0 lladdr aa:bb:cc:11:22:33 STALE
#                                              ^^^^^ Problem!

# Watch ARP cache changes
ip monitor neigh

# Check specific entry
ip neigh show 10.0.1.10

Verify Traffic Path

# Check if packets are reaching the right destination
tcpdump -i eth0 -n host 10.0.1.10

# You'll see traffic going to WRONG MAC:
# 12:00:01 IP app-server > 10.0.1.10: TCP...
# Frame dst: aa:bb:cc:11:22:33 (old MAC, dead server!)

# Correct after refresh:
# Frame dst: aa:bb:cc:44:55:66 (new MAC, active server)

Check Gratuitous ARP Reception

# On app server, watch for gratuitous ARP
tcpdump -i eth0 -n arp

# Should see:
# 12:00:30 ARP, Reply 10.0.1.10 is-at aa:bb:cc:44:55:66

# But if not seen, check:
# - Network path (VLANs, firewalls)
# - Switch flooding behavior
# - Kernel ignoring unsolicited ARP

The Fix

Option 1: Reduce ARP Cache Timeouts

# Reduce time entries stay STALE
sysctl -w net.ipv4.neigh.default.gc_stale_time=30
sysctl -w net.ipv4.neigh.eth0.gc_stale_time=30

# Reduce REACHABLE time
sysctl -w net.ipv4.neigh.default.base_reachable_time_ms=15000

# Make persistent in /etc/sysctl.d/arp.conf:
net.ipv4.neigh.default.gc_stale_time = 30
net.ipv4.neigh.default.base_reachable_time_ms = 15000

Option 2: Accept Gratuitous ARP Updates

# Enable accepting unsolicited ARP updates
sysctl -w net.ipv4.conf.all.arp_accept=1
sysctl -w net.ipv4.conf.eth0.arp_accept=1

# This makes Linux update cache on gratuitous ARP
# even for existing entries

# Persistent:
net.ipv4.conf.all.arp_accept = 1

Option 3: Flush ARP Cache on Failover

#!/bin/bash
# failover_arp_flush.sh - Run on client servers after failover

VIP="10.0.1.10"

# Delete specific ARP entry
ip neigh del $VIP dev eth0 2>/dev/null

# Or flush and let it re-learn
ip neigh flush $VIP

# Verify
ip neigh show $VIP
# Ansible playbook to flush ARP on failover
---
- name: Flush stale ARP entries
  hosts: app_servers
  tasks:
    - name: Delete VIP ARP entry
      command: ip neigh del {{ vip }} dev eth0
      ignore_errors: yes

    - name: Force ARP refresh with ping
      command: ping -c 1 {{ vip }}
      ignore_errors: yes

Option 4: Send Multiple Gratuitous ARPs

# In keepalived.conf - send more gratuitous ARPs
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100

    # Send gratuitous ARP multiple times
    garp_master_delay 1
    garp_master_repeat 5        # Send 5 times
    garp_master_refresh 30      # Repeat every 30s
    garp_master_refresh_repeat 2

    virtual_ipaddress {
        10.0.1.10/24
    }
}

Option 5: Use arping for Active Refresh

#!/bin/bash
# active_arp_announce.sh - Run on new primary after failover

VIP="10.0.1.10"
INTERFACE="eth0"

# Send gratuitous ARP flood
for i in {1..10}; do
    arping -U -I $INTERFACE $VIP -c 1
    arping -A -I $INTERFACE $VIP -c 1
    sleep 0.1
done

# -U: Unsolicited ARP reply (gratuitous)
# -A: Unsolicited ARP request (some systems prefer this)

Monitoring

groups:
  - name: arp-cache
    rules:
      - alert: ARPCacheStaleVIP
        expr: |
          time() - arp_entry_last_confirmed_seconds{ip=~"10.0.1.*"} > 120
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Stale ARP entry for VIP {{ $labels.ip }}"

      - alert: FailoverTrafficBlackhole
        expr: |
          rate(tcp_retransmits_total{dest=~"10.0.1.*"}[1m]) > 100
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High TCP retransmits to VIP - possible ARP issue"

Checklist

## Linux ARP Cache Failover

### Before Failover
- [ ] Configure reduced gc_stale_time (30s instead of 60s)
- [ ] Enable arp_accept=1 on client servers
- [ ] Configure keepalived for multiple gratuitous ARPs
- [ ] Test failover with tcpdump monitoring

### During/After Failover
- [ ] Verify gratuitous ARP sent from new primary
- [ ] Check ARP cache state on client servers
- [ ] Flush stale entries if needed
- [ ] Monitor for connection timeouts

### If Traffic Blackhole Occurs
- [ ] Identify affected servers: ip neigh show | grep STALE
- [ ] Flush ARP cache: ip neigh flush <VIP>
- [ ] Send additional gratuitous ARPs from new primary
- [ ] Check network path for ARP propagation issues

Conclusion

This failure mode is particularly insidious because everything looks correct at the network layer—the VIP is active on the new server, the new server is responding to health checks, and monitoring shows the failover completed successfully. The problem is invisible until you start looking at Layer 2 addresses and ARP cache state on client machines.

The fundamental lesson is that Linux ARP cache is sticky—it prefers existing entries over gratuitous ARP updates. This is actually good behavior for stability (you don’t want a malicious host to hijack traffic by sending fake gratuitous ARPs), but it creates problems during legitimate failovers.

The defensive strategy is layered: reduce cache timeouts so stale entries expire faster, enable arp_accept so gratuitous ARPs are actually processed, configure your failover tools to send multiple gratuitous ARPs over time, and have automation to flush caches if needed.

Key principles:

  1. STALE entries are still used - traffic continues to old MAC address for up to 60-120 seconds
  2. gc_stale_time=30 for faster cache expiry instead of the default 60 seconds
  3. arp_accept=1 to accept gratuitous ARP updates for existing entries
  4. Send multiple gratuitous ARPs from new primary—one might not reach everyone
  5. Monitor TCP retransmits after failover as an indicator of ARP problems

Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "Linux ARP Cache Stale Entries: Failover Traffic Blackhole". https://www.michal-drozd.com/en/blog/linux-arp-cache-failover-stale/ (Published February 14, 2025).