PostgreSQL Idle in Transaction: Emergency Playbook for Stuck Connections

Idle in transaction is the slow leak that sinks stable systems. “Table is 50GB but only has 1M rows.” The disk usage made no sense. I ran VACUUM FULL expecting to reclaim space, but it blocked indefinitely. That’s when I checked pg_stat_activity and found the culprit: one connection had been “idle in transaction” for 6 hours, started by a background job that crashed mid-execution without rolling back.

The six-hour idle transaction was blocking autovacuum from cleaning dead tuples in any table it had touched. Every update since then created dead tuples that couldn’t be removed. The table had bloated to 50× its efficient size—1M rows consuming 50GB because 49GB was dead tuples that vacuum couldn’t clean.

What makes “idle in transaction” particularly insidious is that it looks innocent. The connection isn’t actively doing anything. It’s not consuming CPU or holding locks in the traditional sense. It’s just… sitting there. But in PostgreSQL, an open transaction is a snapshot anchor. Any tuple visible to that transaction cannot be vacuumed away, because the transaction might still read it. So a single forgotten transaction can cause unlimited bloat across all tables.

This is the dark side of MVCC (Multi-Version Concurrency Control). MVCC’s promise is that readers don’t block writers and writers don’t block readers. Each transaction sees a consistent snapshot. But that snapshot has to be maintained, which means old row versions can’t be cleaned up until all transactions that might need them are finished. One old transaction holds back cleanup for the entire database.

Tested on: PostgreSQL 16.1, Spring Boot applications, connection pools

What is Idle in Transaction?

Transaction States

SELECT state, count(*) FROM pg_stat_activity GROUP BY state;

-- idle                   -- Connection open, no active transaction
-- active                 -- Currently executing query
-- idle in transaction    -- Transaction open, waiting for next command ← PROBLEM
-- idle in transaction (aborted) -- Transaction failed, waiting for rollback

Why It’s Bad

Idle in Transaction:
1. Holds locks on tables/rows
2. Prevents autovacuum from cleaning dead tuples
3. Holds transaction ID (affects wraparound)
4. Blocks other transactions waiting for locks

Timeline:
T+0:    BEGIN; SELECT ... (transaction starts)
T+1:    Application processing... (transaction open)
T+10m:  Still processing... (10 min idle in transaction)
T+10m:  Autovacuum tries table → BLOCKED
T+2h:   Table bloat increasing...
T+6h:   Table 10x normal size

Detection

Find Long-Running Transactions

-- Connections idle in transaction > 5 minutes
SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    query,
    now() - xact_start AS transaction_duration,
    now() - state_change AS idle_duration
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND now() - xact_start > interval '5 minutes'
ORDER BY xact_start;

Check Lock Blocking

-- Who is blocking whom
SELECT
    blocked.pid AS blocked_pid,
    blocked.usename AS blocked_user,
    blocked.query AS blocked_query,
    blocking.pid AS blocking_pid,
    blocking.usename AS blocking_user,
    blocking.state AS blocking_state,
    now() - blocking.xact_start AS blocking_duration
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocked_locks.locktype = blocking_locks.locktype
    AND blocked_locks.relation = blocking_locks.relation
    AND blocked_locks.pid != blocking_locks.pid
JOIN pg_stat_activity blocking ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted
AND blocking.state = 'idle in transaction';

Check Autovacuum Impact

-- Tables waiting for vacuum
SELECT
    schemaname,
    relname,
    n_dead_tup,
    last_autovacuum,
    last_vacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 10;

Emergency Response

Step 1: Identify and Document

-- Get full details of idle transaction
SELECT
    pid,
    usename,
    application_name,
    client_addr,
    backend_start,
    xact_start,
    query_start,
    state_change,
    state,
    query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY xact_start;

-- Save this information before killing!

Step 2: Terminate Connection

-- Graceful: Cancel current query (won't work for idle)
SELECT pg_cancel_backend(12345);

-- Force: Terminate connection
SELECT pg_terminate_backend(12345);

-- Terminate all idle in transaction > 1 hour
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND now() - xact_start > interval '1 hour';

Step 3: Clean Up

-- Run vacuum on affected tables
VACUUM ANALYZE affected_table;

-- Check if bloat reduced
SELECT
    schemaname,
    relname,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'affected_table';

Prevention

1. idle_in_transaction_session_timeout

-- postgresql.conf (PostgreSQL 9.6+)
idle_in_transaction_session_timeout = '10min'

-- Or per-role
ALTER ROLE app_user SET idle_in_transaction_session_timeout = '10min';

-- After timeout: connection terminated automatically
-- Application gets: FATAL: terminating connection due to idle-in-transaction timeout

2. statement_timeout

-- Maximum query execution time
statement_timeout = '30s'

-- Per-role
ALTER ROLE app_user SET statement_timeout = '30s';

-- Prevents long-running queries, but not idle in transaction

3. Application-Level Fixes

// Spring Boot - Connection timeout
@Configuration
public class DataSourceConfig {

    @Bean
    @ConfigurationProperties("spring.datasource.hikari")
    public HikariConfig hikariConfig() {
        HikariConfig config = new HikariConfig();

        // Connection max lifetime
        config.setMaxLifetime(1800000);  // 30 min

        // Leak detection
        config.setLeakDetectionThreshold(60000);  // 1 min warning

        return config;
    }
}

// Proper transaction handling
@Transactional
public void processOrder(Order order) {
    // BAD: External call inside transaction
    // paymentService.charge(order);  // HTTP call holding transaction!

    // GOOD: External call outside transaction
}

public void processOrderCorrect(Order order) {
    Order saved = saveOrder(order);  // Transaction ends here
    paymentService.charge(saved);    // Outside transaction
}

4. PgBouncer Configuration

# pgbouncer.ini
[pgbouncer]
# Kill connections idle in transaction
server_idle_timeout = 600  # 10 min

# Use transaction pooling (releases connection after each transaction)
pool_mode = transaction

# With statement pooling (most aggressive, but breaks some apps)
# pool_mode = statement

Monitoring

Prometheus Alert Rules

groups:
- name: postgresql
  rules:
  - alert: IdleInTransactionLong
    expr: |
      pg_stat_activity_max_tx_duration{state="idle in transaction"}
      > 300
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Connection idle in transaction > 5 minutes"

  - alert: IdleInTransactionCritical
    expr: |
      pg_stat_activity_max_tx_duration{state="idle in transaction"}
      > 3600
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Connection idle in transaction > 1 hour - blocking vacuum"

Custom Metric

-- Export as Prometheus metric
SELECT
    coalesce(max(extract(epoch from (now() - xact_start))), 0) AS max_idle_tx_seconds
FROM pg_stat_activity
WHERE state = 'idle in transaction';

Grafana Dashboard

{
  "panels": [
    {
      "title": "Idle in Transaction Connections",
      "type": "timeseries",
      "targets": [
        {
          "expr": "pg_stat_activity_count{state='idle in transaction'}",
          "legendFormat": "count"
        }
      ]
    },
    {
      "title": "Longest Idle in Transaction",
      "type": "stat",
      "targets": [
        {
          "expr": "pg_stat_activity_max_tx_duration{state='idle in transaction'}"
        }
      ]
    }
  ]
}

Root Cause Patterns

1. External API Calls in Transaction

// BAD: HTTP call holds transaction open
@Transactional
public void process(Order order) {
    orderRepository.save(order);
    externalApi.notify(order);  // 30 second timeout = 30s idle in tx
}

// GOOD: Move external call out
@Transactional
public Order save(Order order) {
    return orderRepository.save(order);
}

public void process(Order order) {
    Order saved = save(order);
    externalApi.notify(saved);
}

2. User Input in Transaction

// BAD: Waiting for user in transaction
@Transactional
public void wizard(WizardRequest request) {
    stepOne();
    waitForUserInput();  // User goes to lunch...
    stepTwo();
}

// GOOD: Separate transactions
public void wizardStepOne() { ... }
public void wizardStepTwo() { ... }

3. Missing Rollback on Error

// BAD: Error doesn't rollback
try {
    connection.setAutoCommit(false);
    doStuff();
    connection.commit();
} catch (Exception e) {
    // Missing: connection.rollback();
    throw e;
}  // Connection returned to pool with open transaction!

// GOOD: Always rollback on error
try {
    connection.setAutoCommit(false);
    doStuff();
    connection.commit();
} catch (Exception e) {
    connection.rollback();
    throw e;
}

Checklist

## Idle in Transaction Response

### Detection
- [ ] Query pg_stat_activity for idle in transaction
- [ ] Check lock blocking chains
- [ ] Identify affected tables (vacuum blocked)

### Immediate Response
- [ ] Document connection details (pid, query, duration)
- [ ] Terminate connection if > 1 hour
- [ ] Run VACUUM on affected tables

### Prevention
- [ ] Set idle_in_transaction_session_timeout
- [ ] Enable HikariCP leak detection
- [ ] Review code for transactions holding external calls

### Monitoring
- [ ] Alert on idle in transaction > 5 min
- [ ] Track max idle transaction duration
- [ ] Monitor table bloat growth

Conclusion

Idle in transaction is one of the most common PostgreSQL performance killers, and also one of the most preventable. The pattern is always the same: an application opens a transaction, does some work, then pauses—waiting for user input, calling an external API, or simply crashing without cleanup. Meanwhile, autovacuum sits blocked, dead tuples accumulate, and tables bloat.

The root cause is almost always application design. Transactions should be short—start, do database work, commit. Any waiting, external calls, or complex processing should happen outside the transaction. The moment you add @Transactional to a method that calls an HTTP endpoint, you’ve created a potential idle-in-transaction bomb.

PostgreSQL’s idle_in_transaction_session_timeout is your safety net. Set it aggressively—10 minutes is generous for most applications. When a connection hits this limit, PostgreSQL terminates it automatically. The application gets an error, which is better than the alternative: tables bloating until the database becomes unusable.

The key insight is that idle in transaction doesn’t look like a problem until it’s a crisis. The connection shows as “idle”—not blocking, not consuming CPU, not doing anything obviously wrong. You have to specifically check pg_stat_activity for the state and transaction start time to see the danger.

Key principles:

Set idle_in_transaction_session_timeout to auto-terminate stuck connections before they cause damage
Never call external APIs inside transactions—this is the single most common root cause
Monitor and alert on idle in transaction > 5 minutes—catch problems before they compound
Enable HikariCP leak detection—catches connections that aren’t properly returned to the pool
Have a kill playbook ready—know how to identify, terminate, and recover

One forgotten transaction can bloat your tables 10×. The connection looks idle, but the damage is active.

PostgreSQL Autovacuum SLO - Vacuum configuration
Connection Pool Sizing with Little’s Law - Pool management

PostgreSQL Idle in Transaction: Emergency Playbook for Stuck Connections

What is Idle in Transaction?

Transaction States

Why It’s Bad

Detection

Find Long-Running Transactions

Check Lock Blocking

Check Autovacuum Impact

Emergency Response

Step 1: Identify and Document

Step 2: Terminate Connection

Step 3: Clean Up

Prevention

1. idle_in_transaction_session_timeout

2. statement_timeout

3. Application-Level Fixes

4. PgBouncer Configuration

Monitoring

Prometheus Alert Rules

Custom Metric

Grafana Dashboard

Root Cause Patterns

1. External API Calls in Transaction

2. User Input in Transaction

3. Missing Rollback on Error

Checklist

Conclusion

Related posts

Cite this article

What is Idle in Transaction?

Transaction States

Why It’s Bad

Detection

Find Long-Running Transactions

Check Lock Blocking

Check Autovacuum Impact

Emergency Response

Step 1: Identify and Document

Step 2: Terminate Connection

Step 3: Clean Up

Prevention

1. idle_in_transaction_session_timeout

2. statement_timeout

3. Application-Level Fixes

4. PgBouncer Configuration

Monitoring

Prometheus Alert Rules

Custom Metric

Grafana Dashboard

Root Cause Patterns

1. External API Calls in Transaction

2. User Input in Transaction

3. Missing Rollback on Error

Checklist

Conclusion

Related Articles

Related posts

Cite this article