PostgreSQL Serialization Failures: Beyond 'Just Retry'

“ERROR: could not serialize access due to concurrent update” - every PostgreSQL developer hits this eventually. The standard advice is “just retry the transaction.” But that’s like saying “just restart it” for any error. The real questions are: Why did it happen? Should you even be using that isolation level? And how do you reduce the retry frequency from 30% to 0.1%?

Environment: PostgreSQL 12+, applications using REPEATABLE READ or SERIALIZABLE isolation levels, high-concurrency workloads

Understanding the Error

The Three Isolation Levels

PostgreSQL offers three isolation levels, each with different trade-offs:

┌────────────────────┬─────────────────┬──────────────────┬─────────────────────┐
│ Isolation Level    │ Dirty Reads     │ Non-Repeatable   │ Serialization       │
│                    │                 │ Reads            │ Failures            │
├────────────────────┼─────────────────┼──────────────────┼─────────────────────┤
│ READ COMMITTED     │ No              │ Yes              │ Never               │
│ REPEATABLE READ    │ No              │ No               │ Yes (concurrent     │
│                    │                 │                  │ updates only)       │
│ SERIALIZABLE       │ No              │ No               │ Yes (any conflict)  │
└────────────────────┴─────────────────┴──────────────────┴─────────────────────┘

Key insight: READ COMMITTED (the default) never throws serialization failures. If you’re getting this error, you explicitly chose a stricter isolation level.

When Each Error Occurs

-- REPEATABLE READ: Fails on concurrent UPDATE/DELETE to same row
-- Transaction A                    -- Transaction B
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM accounts WHERE id = 1;
                                    BEGIN;
                                    UPDATE accounts SET balance = 50 WHERE id = 1;
                                    COMMIT;
UPDATE accounts SET balance = 100 WHERE id = 1;
-- ERROR: could not serialize access due to concurrent update

-- SERIALIZABLE: Fails on ANY read-write dependency
-- Transaction A                    -- Transaction B
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT sum(balance) FROM accounts;
                                    BEGIN ISOLATION LEVEL SERIALIZABLE;
                                    INSERT INTO accounts (balance) VALUES (100);
                                    COMMIT;
INSERT INTO accounts (balance) VALUES (200);
COMMIT;
-- ERROR: could not serialize access due to read/write dependencies

The Decision Tree: Which Isolation Level Do You Need?

Do you need to prevent non-repeatable reads?
│
├─ NO → Use READ COMMITTED (default)
│       No serialization failures, best performance
│       Handle concurrent updates with SELECT FOR UPDATE if needed
│
└─ YES → Do you need full serializability (as if transactions ran one-by-one)?
         │
         ├─ NO → Use REPEATABLE READ
         │       Only fails on direct row conflicts
         │       Good for: Reports, read-heavy analytics
         │
         └─ YES → Use SERIALIZABLE
                  Fails on any read-write dependency
                  Good for: Financial calculations, inventory
                  Requires robust retry logic

When READ COMMITTED Is Actually Enough

90% of applications using REPEATABLE READ don’t actually need it:

-- Problem: "I need to read a value and update based on it"
-- Bad: Using REPEATABLE READ
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT balance FROM accounts WHERE id = 1;  -- Returns 100
-- Application calculates: new_balance = 100 + 50
UPDATE accounts SET balance = 150 WHERE id = 1;
COMMIT;
-- Can fail with serialization error!

-- Good: Using READ COMMITTED with atomic update
BEGIN;  -- Default READ COMMITTED
UPDATE accounts SET balance = balance + 50 WHERE id = 1;
COMMIT;
-- Never fails with serialization error!
-- The UPDATE sees the latest committed value

-- Good: Using SELECT FOR UPDATE when you need the value
BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
-- Now you have a lock, no concurrent updates possible
UPDATE accounts SET balance = 150 WHERE id = 1;
COMMIT;

Implementing Proper Retry Logic

If you genuinely need REPEATABLE READ or SERIALIZABLE, implement proper retry:

Basic Retry Pattern

import psycopg2
from psycopg2 import errors
import time
import random

def execute_with_retry(conn_params, transaction_func, max_retries=5):
    """Execute a transaction with exponential backoff retry."""

    for attempt in range(max_retries):
        conn = psycopg2.connect(**conn_params)
        try:
            conn.set_isolation_level(
                psycopg2.extensions.ISOLATION_LEVEL_REPEATABLE_READ
            )

            with conn.cursor() as cur:
                result = transaction_func(cur)
                conn.commit()
                return result

        except errors.SerializationFailure as e:
            conn.rollback()

            if attempt == max_retries - 1:
                raise  # Final attempt failed

            # Exponential backoff with jitter
            wait_time = (2 ** attempt) * 0.1 + random.uniform(0, 0.1)
            time.sleep(wait_time)

        finally:
            conn.close()

# Usage
def transfer_funds(cur):
    cur.execute("SELECT balance FROM accounts WHERE id = 1")
    balance = cur.fetchone()[0]
    cur.execute("UPDATE accounts SET balance = %s WHERE id = 1", (balance - 100,))
    cur.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2")
    return True

execute_with_retry(conn_params, transfer_funds)

Java/Spring Pattern

@Service
public class AccountService {

    @Autowired
    private AccountRepository accountRepository;

    @Retryable(
        value = {SerializationFailureException.class},
        maxAttempts = 5,
        backoff = @Backoff(delay = 100, multiplier = 2, random = true)
    )
    @Transactional(isolation = Isolation.REPEATABLE_READ)
    public void transfer(Long fromId, Long toId, BigDecimal amount) {
        Account from = accountRepository.findById(fromId).orElseThrow();
        Account to = accountRepository.findById(toId).orElseThrow();

        from.setBalance(from.getBalance().subtract(amount));
        to.setBalance(to.getBalance().add(amount));

        accountRepository.save(from);
        accountRepository.save(to);
    }

    @Recover
    public void transferFailed(SerializationFailureException e,
                               Long fromId, Long toId, BigDecimal amount) {
        log.error("Transfer failed after max retries: {} -> {} amount {}",
                  fromId, toId, amount);
        throw new TransferFailedException("Max retries exceeded", e);
    }
}

Go Pattern

func ExecuteSerializable(db *sql.DB, fn func(*sql.Tx) error) error {
    maxRetries := 5

    for attempt := 0; attempt < maxRetries; attempt++ {
        tx, err := db.BeginTx(context.Background(), &sql.TxOptions{
            Isolation: sql.LevelSerializable,
        })
        if err != nil {
            return err
        }

        err = fn(tx)
        if err != nil {
            tx.Rollback()

            // Check if it's a serialization failure (SQLSTATE 40001)
            if pqErr, ok := err.(*pq.Error); ok && pqErr.Code == "40001" {
                // Exponential backoff with jitter
                waitTime := time.Duration(math.Pow(2, float64(attempt))) * 100 * time.Millisecond
                jitter := time.Duration(rand.Int63n(100)) * time.Millisecond
                time.Sleep(waitTime + jitter)
                continue
            }
            return err
        }

        err = tx.Commit()
        if err != nil {
            // Commit can also fail with serialization error
            if pqErr, ok := err.(*pq.Error); ok && pqErr.Code == "40001" {
                continue
            }
            return err
        }

        return nil  // Success
    }

    return fmt.Errorf("max retries exceeded for serializable transaction")
}

Reducing Conflict Frequency

Retry logic is necessary, but reducing the need for retries is better:

1. Keep Transactions Short

-- Bad: Long transaction with idle time
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM products WHERE category = 'electronics';
-- Application processes data for 5 seconds...
UPDATE products SET stock = stock - 1 WHERE id = 123;
COMMIT;
-- High conflict probability!

-- Good: Minimal transaction time
-- Do processing OUTSIDE the transaction
products = fetch_products('electronics')  -- Separate query
process_products(products)  -- Application logic

BEGIN ISOLATION LEVEL SERIALIZABLE;
UPDATE products SET stock = stock - 1 WHERE id = 123;
COMMIT;
-- Low conflict probability

2. Use Indexes to Narrow Scan Scope

-- Without index: Scans all rows, creates wide read dependencies
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
-- Sequential scan = read dependency on ALL rows

-- With index: Only reads specific rows
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
-- Now only creates dependencies on rows that match

3. Order Your Operations Consistently

-- Deadlock-prone: Different order in different transactions
-- Transaction A: UPDATE accounts SET ... WHERE id = 1; then id = 2
-- Transaction B: UPDATE accounts SET ... WHERE id = 2; then id = 1

-- Safe: Always same order
-- Sort by ID before updating
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Both transactions use same order = no deadlocks

4. Consider Optimistic Locking Instead

-- Add version column
ALTER TABLE products ADD COLUMN version INTEGER DEFAULT 1;

-- Read with version
SELECT id, name, stock, version FROM products WHERE id = 123;
-- Returns: (123, 'Widget', 10, 5)

-- Update with version check (READ COMMITTED is fine!)
UPDATE products
SET stock = 9, version = version + 1
WHERE id = 123 AND version = 5;

-- If affected rows = 0, someone else updated first → retry in application

Monitoring Serialization Failures

-- Check serialization failure rate
SELECT datname,
       xact_commit,
       xact_rollback,
       conflicts,
       round(100.0 * xact_rollback / nullif(xact_commit + xact_rollback, 0), 2)
         as rollback_pct
FROM pg_stat_database
WHERE datname = current_database();

-- Find queries causing conflicts (requires pg_stat_statements)
SELECT query,
       calls,
       total_time / calls as avg_time_ms
FROM pg_stat_statements
WHERE query ILIKE '%ISOLATION LEVEL%'
ORDER BY calls DESC
LIMIT 20;

Prometheus Alerting

groups:
  - name: postgresql-serialization
    rules:
      - alert: HighSerializationFailureRate
        expr: |
          rate(pg_stat_database_xact_rollback[5m]) /
          (rate(pg_stat_database_xact_commit[5m]) +
           rate(pg_stat_database_xact_rollback[5m])) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High transaction rollback rate ({{ $value | humanizePercentage }})"

      - alert: SerializationRetriesExhausted
        expr: |
          rate(app_serialization_retries_exhausted_total[5m]) > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Transactions failing after max retries"

Managed PostgreSQL Specifics

AWS RDS

# RDS doesn't have special settings for serialization
# But connection limits affect retry storms
max_connections = 100  # Default varies by instance size

# Monitor via Performance Insights
# Look for wait events: Lock:transactionid

Google Cloud SQL

# Same PostgreSQL behavior
# Monitor via Cloud SQL Insights
# Filter by: Serialization failures

Azure Database for PostgreSQL

# Standard PostgreSQL isolation levels
# Use Query Performance Insight for monitoring

Checklist

## Serialization Failure Resolution

### Diagnosis
- [ ] Confirm you actually need REPEATABLE READ/SERIALIZABLE
- [ ] Check if READ COMMITTED with SELECT FOR UPDATE would work
- [ ] Measure current failure rate from pg_stat_database

### If You Need Strict Isolation
- [ ] Implement retry with exponential backoff + jitter
- [ ] Set max retry limit (5 is usually enough)
- [ ] Log retries for monitoring

### Reduce Failure Frequency
- [ ] Keep transactions as short as possible
- [ ] Add indexes to narrow read dependencies
- [ ] Use consistent ordering for multi-row updates
- [ ] Consider optimistic locking instead

### Monitoring
- [ ] Alert on rollback rate > 5%
- [ ] Alert on retry exhaustion
- [ ] Track avg retries per transaction

Conclusion

“Could not serialize access” isn’t really an error - it’s PostgreSQL telling you that concurrent transactions couldn’t be ordered consistently. The question is whether you actually need that guarantee.

Key takeaways:

90% of applications don’t need REPEATABLE READ - atomic UPDATE or SELECT FOR UPDATE with READ COMMITTED solves most use cases
If you need it, implement proper retry - exponential backoff with jitter, not immediate retry
Reduce conflict probability - short transactions, proper indexes, consistent ordering
Monitor your rollback rate - if it’s above 5%, something is wrong with your access patterns

The best serialization failure is the one that never happens because you chose the right isolation level for your use case.

PostgreSQL Read Replica Conflicts - Recovery conflicts on read replicas
PostgreSQL Idle Transaction Playbook - Connection exhaustion from idle transactions

PostgreSQL Serialization Failures: Beyond 'Just Retry'

Understanding the Error

The Three Isolation Levels

When Each Error Occurs

The Decision Tree: Which Isolation Level Do You Need?

When READ COMMITTED Is Actually Enough

Implementing Proper Retry Logic

Basic Retry Pattern

Java/Spring Pattern

Go Pattern

Reducing Conflict Frequency

1. Keep Transactions Short

2. Use Indexes to Narrow Scan Scope

3. Order Your Operations Consistently

4. Consider Optimistic Locking Instead

Monitoring Serialization Failures

Prometheus Alerting

Managed PostgreSQL Specifics

AWS RDS

Google Cloud SQL

Azure Database for PostgreSQL

Checklist

Conclusion

Related posts

Cite this article

Understanding the Error

The Three Isolation Levels

When Each Error Occurs

The Decision Tree: Which Isolation Level Do You Need?

When READ COMMITTED Is Actually Enough

Implementing Proper Retry Logic

Basic Retry Pattern

Java/Spring Pattern

Go Pattern

Reducing Conflict Frequency

1. Keep Transactions Short

2. Use Indexes to Narrow Scan Scope

3. Order Your Operations Consistently

4. Consider Optimistic Locking Instead

Monitoring Serialization Failures

Prometheus Alerting

Managed PostgreSQL Specifics

AWS RDS

Google Cloud SQL

Azure Database for PostgreSQL

Checklist

Conclusion

Related Articles

Related posts

Cite this article