Back to blog

Split-Brain From a Clock Step Backwards: Wall Time in Lease-Based Systems

|
| distributed-systems, debugging, time, leader-election, ntp

The weirdest split-brain I saw started with a clock that went backwards. “Two nodes became leader simultaneously.” The alert woke me at 2 AM. Our job scheduler was in chaos—duplicate jobs running, data being processed twice, race conditions everywhere. Both node A and node B believed they held the leadership lease. Both were processing work. Both were right, according to their local clocks.

The cause turned out to be a 2-second NTP clock correction backwards on node A. The VM had drifted while suspended for a live migration, and when NTP caught up, it stepped the clock back to correct the drift. Node A, which had acquired a 30-second lease, suddenly found itself with 32 seconds of “remaining” lease time. During those extra 2 seconds, node B acquired a new lease in the database. Both nodes were now leaders.

This incident taught me one of the fundamental truths of distributed systems: wall-clock time is not monotonic, and any code that treats it as monotonic is waiting to fail. Mixing System.currentTimeMillis() (wall-clock) with duration-based timeouts creates a hidden trap that only springs when time goes backwards—which happens more often than you’d expect in virtualized environments.

The fix wasn’t complicated, but it required understanding why the original code was wrong. You can’t measure elapsed time with wall-clock readings, because wall-clock can jump. You need monotonic time for durations and fencing tokens for safety.

Environment: Custom leader election using database-backed leases, NTP-synchronized nodes

The Problem

The Split-Brain Incident

Timeline:
10:00:00  Node A acquires lease (expires at 10:00:30)
10:00:15  Node A: "I'm leader, lease valid until 10:00:30"
10:00:16  NTP steps Node A's clock BACK 2 seconds
10:00:14* Node A: "Clock says 10:00:14, lease valid until 10:00:30"
                  "I have 16 more seconds!" (actually only 14)

10:00:30  Lease expires in database
10:00:31  Node B acquires lease (expires at 10:01:01)
10:00:31  Node B: "I'm leader!"

10:00:28* Node A still thinks it's 10:00:28
          Node A: "I'm still leader! 2 seconds left!"

Result: Both nodes think they're leader!

The Vulnerable Code

// Common lease-based leader election pattern
public class LeaseManager {
    private Instant leaseExpiry;

    public boolean isLeader() {
        // BUG: Uses wall clock time
        return Instant.now().isBefore(leaseExpiry);
    }

    public void acquireLease(Duration leaseDuration) {
        // Store expiry as wall-clock time
        leaseExpiry = Instant.now().plus(leaseDuration);
        writeToDatabase(leaseExpiry);
    }
}

// If clock steps backwards:
// - leaseExpiry stays at original wall-clock value
// - Instant.now() returns earlier time
// - isLeader() returns true for "extra" time

Root Cause

Wall Clock vs Monotonic Time

This is one of those computer science fundamentals that’s easy to forget in practice. Modern operating systems provide two different time sources, and they serve very different purposes.

Wall-clock time (System.currentTimeMillis(), time.time(), Instant.now()) represents “what time is it?”—the time you’d see on a clock on the wall. This time is synchronized with external sources (NTP servers) and can jump forwards or backwards to stay accurate. It’s great for logging, scheduling, and displaying to users.

Monotonic time (System.nanoTime(), time.monotonic()) represents “how much time has passed?”—a steadily increasing counter that never goes backwards. It’s not synchronized with anything external. It’s purely for measuring durations.

The trap in lease-based systems is using wall-clock time to measure whether a lease has expired. If you acquired a lease at wall-clock 10:00:00 with a 30-second duration, you compute expiry as 10:00:30. Then you check if (now < expiry) to see if you’re still leader. This works perfectly—until wall-clock steps backwards.

Wall Clock (System.currentTimeMillis(), Instant.now()):
├── Can jump forwards (NTP sync)
├── Can jump backwards (NTP correction)
├── Affected by DST, leap seconds
└── NOT suitable for measuring durations!

Monotonic Clock (System.nanoTime()):
├── Only moves forward
├── Unaffected by NTP
├── Rate may vary slightly
└── Suitable for measuring durations

The trap:
┌─────────────────────────────────────────────────┐
│ Duration timeout = start_wall_time + 30 seconds│
│                                                 │
│ If wall clock steps back 2 seconds:            │
│ Timeout appears to be 32 seconds!              │
└─────────────────────────────────────────────────┘

How NTP Causes Time Steps

# NTP typically slews time (gradual adjustment)
# But if drift is too large, it STEPS (instant jump)

# Check NTP status:
chronyc tracking
# System time: 0.000000002 seconds slow of NTP time
# Last offset: -0.000000814 seconds  # Small slew
# OR
# Last offset: -2.345 seconds  # Large step!

# Force NTP to step (dangerous in production):
# chronyc makestep

# Common causes of large steps:
# - VM suspend/resume
# - Container migration
# - Network partition resolving
# - New node joining cluster

Diagnosis

Check for Time Jumps

# Monitor time jumps with a script
#!/bin/bash
PREV=$(date +%s.%N)
while true; do
  sleep 0.1
  NOW=$(date +%s.%N)
  DIFF=$(echo "$NOW - $PREV - 0.1" | bc)
  if (( $(echo "$DIFF > 0.5 || $DIFF < -0.5" | bc -l) )); then
    echo "TIME JUMP: $DIFF seconds at $(date)"
  fi
  PREV=$NOW
done

Check NTP Logs

# chrony logs
journalctl -u chronyd | grep -E "(makestep|System clock)"

# Look for:
# System clock was stepped by -2.345 seconds

Detect Split-Brain

-- If using database-backed leases
SELECT node_id, lease_acquired_at, lease_expires_at
FROM leader_leases
WHERE is_active = true;

-- Should return exactly 1 row
-- Multiple rows = split-brain!

The Fix

Option 1: Use Monotonic Time for Durations

public class SafeLeaseManager {
    private long leaseAcquiredNanos;  // Monotonic
    private long leaseDurationNanos;

    public boolean isLeader() {
        // Uses monotonic time - cannot be affected by clock adjustments
        long elapsed = System.nanoTime() - leaseAcquiredNanos;
        return elapsed < leaseDurationNanos;
    }

    public void acquireLease(Duration leaseDuration) {
        leaseAcquiredNanos = System.nanoTime();
        leaseDurationNanos = leaseDuration.toNanos();
        // Still write wall-clock expiry for external visibility
        writeToDatabase(Instant.now().plus(leaseDuration));
    }
}

Option 2: Use Fencing Tokens

// Fencing token: monotonically increasing number
// Even if two nodes think they're leader,
// only the one with higher token can write

public class FencedLeaseManager {
    private long fencingToken;

    public boolean acquireLease() {
        // Atomically increment and read fencing token
        Long newToken = database.incrementAndGet("lease_fencing_token");
        if (newToken != null) {
            this.fencingToken = newToken;
            return true;
        }
        return false;
    }

    public void writeWithFence(String key, Object value) {
        // Database rejects writes with lower fencing token
        database.conditionalWrite(key, value, this.fencingToken);
    }
}

Option 3: Server-Side Lease Validation

// Don't trust client-side lease checks
// Always validate lease at the coordination point

// Client requests work as leader
// Server validates lease EVERY TIME

public class LeaseCoordinator {
    public Result executeAsLeader(String nodeId, Work work) {
        // Fetch current lease from database (source of truth)
        Lease currentLease = database.getCurrentLease();

        if (!currentLease.heldBy(nodeId)) {
            throw new NotLeaderException();
        }

        if (currentLease.isExpired()) {  // Server time!
            throw new LeaseExpiredException();
        }

        return work.execute();
    }
}

Option 4: Shorten Lease + Heartbeat

// Instead of 30-second lease with one check
// Use 5-second lease with continuous renewal

public class HeartbeatLeaseManager {
    private static final Duration LEASE_DURATION = Duration.ofSeconds(5);
    private static final Duration HEARTBEAT_INTERVAL = Duration.ofSeconds(1);

    public void maintainLeadership() {
        while (shouldBeLeader) {
            boolean renewed = renewLease(LEASE_DURATION);
            if (!renewed) {
                stepDown();
                return;
            }
            Thread.sleep(HEARTBEAT_INTERVAL.toMillis());
        }
    }

    // If clock steps back:
    // - Lease expires in database within 5 seconds
    // - Other node can acquire within 5 seconds
    // - Much smaller window for split-brain
}

Monitoring

groups:
  - name: time-sync
    rules:
      - alert: NTPClockStep
        expr: |
          abs(node_ntp_offset_seconds) > 0.5
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Large NTP offset on {{ $labels.instance }}"

      - alert: MultipleLeaders
        expr: |
          count(leader_election_is_leader == 1) > 1
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "Split-brain detected - multiple leaders!"

Checklist

## Clock Step Split-Brain

### Symptoms
- [ ] Two nodes claiming leadership simultaneously
- [ ] Data inconsistency after NTP sync
- [ ] VM resume causing issues
- [ ] Lease-based systems misbehaving

### Diagnosis
- [ ] Check NTP logs for time steps
- [ ] Monitor for multiple leaders
- [ ] Review lease check code for wall-clock usage
- [ ] Check for VM suspend/resume events

### Fixes
- [ ] Use monotonic time for duration checks
- [ ] Implement fencing tokens
- [ ] Server-side lease validation
- [ ] Shorten lease duration + heartbeat
- [ ] Alert on NTP clock steps

Conclusion

This failure mode is particularly insidious because it only manifests under specific conditions: clock steps backwards, which are rare but not as rare as you’d think. VMs suspend and resume. Containers get live-migrated. NTP catches up after network partitions. Each of these can cause time to jump backwards.

The fundamental lesson is that “use leases for simplicity” is incomplete advice. Leases are simple conceptually, but implementing them correctly requires understanding the wall-clock vs monotonic-time distinction. Using Instant.now() for lease expiry checks is natural and intuitive—and wrong.

The defense-in-depth approach combines multiple techniques: monotonic time for local lease checks (prevents the clock-step problem), fencing tokens for write operations (prevents stale-leader writes even if split-brain occurs), shorter lease durations with frequent renewal (reduces the window of vulnerability), and monitoring for NTP clock adjustments (alerts you when conditions are ripe for problems).

Key principles:

  1. Never use wall-clock for duration measurement - use monotonic time (System.nanoTime())
  2. Fencing tokens prevent stale-leader writes - even if split-brain occurs, only the real leader can write
  3. Shorter leases = smaller split-brain window - 5 seconds is better than 30 seconds
  4. Monitor for clock adjustments - alert on NTP steps > 500ms
  5. Server-side validation - don’t trust client-side lease checks for critical operations

Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "Split-Brain From a Clock Step Backwards: Wall Time in Lease-Based Systems". https://www.michal-drozd.com/en/blog/clock-step-backwards-split-brain/ (Published January 22, 2025).