Keď Prepared Statements Spravia PostgreSQL 10× Pomalším: Generic Plan Trap

Prepared statements su super, kym ta nezamknu do zleho planu. “Ten istý query, tie isté parametre, ale na produkcii trvá 800ms a na stagingu 15ms.” Toto som počul od kolegu, ktorý strávil dva dni debugovaním. Vinník? PostgreSQL generic plan cache.

Testované na: PostgreSQL 16.1, pgBouncer 1.21, HikariCP 5.1, Go pgx 5.5

Čo je Generic vs Custom Plan

PostgreSQL má dva typy execution planov pre prepared statements:

Custom Plan (pomalšie plánovanie, optimálny plán)

-- Každé volanie: plánovač vidí konkrétne hodnoty
PREPARE get_user AS SELECT * FROM users WHERE status = $1;
EXECUTE get_user('active');  -- Planner vie: status = 'active', 95% riadkov
EXECUTE get_user('banned');  -- Planner vie: status = 'banned', 0.1% riadkov

Generic Plan (rýchlejšie plánovanie, možno neoptimálny)

-- Po 5 executions: PostgreSQL prepne na generic plan
-- Planner NEVIDÍ konkrétne hodnoty, používa štatistiky
EXECUTE get_user('active');  -- Používa priemerný odhad
EXECUTE get_user('banned');  -- Používa TEN ISTÝ plán!

Problém: Skewed Data Distribution

-- Tabuľka s 10M riadkov
CREATE TABLE orders (
    id bigserial PRIMARY KEY,
    status varchar(20),
    customer_id bigint,
    created_at timestamp
);

-- Distribúcia statusov (skewed!)
-- 'completed': 9,500,000 (95%)
-- 'pending':     450,000 (4.5%)
-- 'processing':   49,000 (0.49%)
-- 'failed':        1,000 (0.01%)

CREATE INDEX idx_orders_status ON orders(status);

Custom Plan pre ‘failed’ (0.01%)

EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'failed';
-- Index Scan using idx_orders_status
-- Rows: 1,000
-- Time: 2ms

Generic Plan (priemerný odhad)

-- Generic plan odhaduje: status má 4 distinct values
-- Priemerný odhad: 10M / 4 = 2.5M riadkov
-- Pri 2.5M riadkoch je Seq Scan "lepší" než Index Scan

EXPLAIN ANALYZE SELECT * FROM orders WHERE status = $1;
-- Seq Scan on orders
-- Rows: 2,500,000 (odhad) vs 1,000 (realita)
-- Time: 2500ms vs 2ms

Reprodukovateľný Lab

Setup

-- test_prepared_statements.sql
CREATE TABLE test_skewed (
    id bigserial PRIMARY KEY,
    category varchar(10),
    data text
);

-- Skewed distribution
INSERT INTO test_skewed (category, data)
SELECT
    CASE
        WHEN random() < 0.99 THEN 'common'
        WHEN random() < 0.999 THEN 'rare'
        ELSE 'unicorn'
    END,
    md5(random()::text)
FROM generate_series(1, 10000000);

CREATE INDEX idx_category ON test_skewed(category);
ANALYZE test_skewed;

-- Verifikuj distribúciu
SELECT category, count(*), round(100.0 * count(*) / sum(count(*)) over(), 2) as pct
FROM test_skewed
GROUP BY category;
-- common:  9,900,000 (99%)
-- rare:       99,000 (0.99%)
-- unicorn:     1,000 (0.01%)

Test Script (Python)

# test_generic_plan.py
import psycopg2
import time

conn = psycopg2.connect("postgresql://localhost/testdb")
cur = conn.cursor()

# Prepared statement
cur.execute("PREPARE test_stmt AS SELECT * FROM test_skewed WHERE category = $1")

results = []

for i in range(20):
    for category in ['unicorn', 'common', 'rare']:
        start = time.time()
        cur.execute("EXECUTE test_stmt(%s)", (category,))
        rows = cur.fetchall()
        elapsed = (time.time() - start) * 1000

        results.append({
            'iteration': i,
            'category': category,
            'rows': len(rows),
            'time_ms': round(elapsed, 2)
        })

        # Po 5 iteráciách prepne na generic plan
        if i == 5:
            print(f"--- GENERIC PLAN KICKS IN ---")

# Výstup
for r in results:
    print(f"Iter {r['iteration']:2d} | {r['category']:8s} | {r['rows']:8d} rows | {r['time_ms']:8.2f}ms")

Výsledky

Iter  0 | unicorn  |     1000 rows |     2.34ms  # custom plan
Iter  1 | unicorn  |     1000 rows |     1.89ms
...
Iter  4 | unicorn  |     1000 rows |     2.01ms
--- GENERIC PLAN KICKS IN ---
Iter  5 | unicorn  |     1000 rows |   890.45ms  # 400× pomalšie!
Iter  6 | unicorn  |     1000 rows |   912.23ms

pgBouncer Komplikácia

Transaction Pooling Mode

# pgbouncer.ini
[pgbouncer]
pool_mode = transaction  # Problém!

V transaction mode:

Každá transakcia môže ísť na inú connection
Prepared statements sú per-connection
Statement pripravený na conn A neexistuje na conn B
Niektoré drivery “simulujú” prepared statements

Java HikariCP + pgBouncer

// Problem: HikariCP defaultne používa prepared statements
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://pgbouncer:6432/mydb");

// Fix 1: Disable server-side prepared statements
config.addDataSourceProperty("prepareThreshold", "0");

// Fix 2: Alebo použite JDBC parameter
// jdbc:postgresql://host/db?prepareThreshold=0

Go pgx

// pgx defaultne používa prepared statements
config, _ := pgxpool.ParseConfig(connString)

// Fix: Disable prepared statement cache
config.ConnConfig.DefaultQueryExecMode = pgx.QueryExecModeSimpleProtocol

// Alebo per-query
rows, err := pool.Query(ctx, sql, pgx.QueryExecModeSimpleProtocol, args...)

Riešenia

1. Force Custom Plan (PostgreSQL 12+)

-- Session level
SET plan_cache_mode = 'force_custom_plan';

-- Alebo per-query (PostgreSQL 16+)
SELECT /*+ CustomPlan */ * FROM orders WHERE status = $1;

2. Zvýš Threshold pred Generic Plan

-- Default je 5, zvýš na vyššiu hodnotu
-- -1 = nikdy nepoužívaj generic plan
ALTER SYSTEM SET plan_cache_mode = 'force_custom_plan';

3. Partial Indexes pre Skewed Data

-- Namiesto jedného indexu
CREATE INDEX idx_orders_status ON orders(status);

-- Vytvor partial indexes pre rare values
CREATE INDEX idx_orders_failed ON orders(id) WHERE status = 'failed';
CREATE INDEX idx_orders_processing ON orders(id) WHERE status = 'processing';

-- Generic plan teraz použije správny index

4. Application-Level Routing

// Pre kritické queries použite explicit SQL
public List<Order> getFailedOrders() {
    // Tento query vždy použije optimal plan
    return jdbcTemplate.query(
        "SELECT * FROM orders WHERE status = 'failed'",  // nie parameter!
        orderRowMapper
    );
}

// Pre bežné queries je parameter OK
public List<Order> getOrdersByStatus(String status) {
    return jdbcTemplate.query(
        "SELECT * FROM orders WHERE status = ?",
        orderRowMapper,
        status
    );
}

Diagnostika

Zisti či Query Používa Generic Plan

-- pg_stat_statements
SELECT
    query,
    calls,
    mean_exec_time,
    stddev_exec_time,
    -- Vysoký stddev = možný generic plan problém
    stddev_exec_time / mean_exec_time as coefficient_of_variation
FROM pg_stat_statements
WHERE query LIKE '%orders%'
ORDER BY coefficient_of_variation DESC;

EXPLAIN s Konkrétnymi vs Generic Parameters

-- Custom plan (s hodnotami)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE status = 'failed';

-- Generic plan (bez hodnôt)
PREPARE test AS SELECT * FROM orders WHERE status = $1;
EXPLAIN (ANALYZE, BUFFERS, GENERIC_PLAN)  -- PostgreSQL 16+
EXECUTE test('failed');

Benchmark: Custom vs Generic

Query	Custom Plan	Generic Plan	Slowdown
status=‘failed’ (0.01%)	2ms	890ms	445×
status=‘rare’ (1%)	45ms	920ms	20×
status=‘common’ (99%)	1200ms	1150ms	0.96×

Checklist

## Prepared Statement Diagnostika

### Identifikácia
- [ ] Máš skewed data distribution?
- [ ] Používaš pgBouncer v transaction mode?
- [ ] Queries majú vysoký stddev v pg_stat_statements?

### Quick Fixes
- [ ] Nastav plan_cache_mode = 'force_custom_plan' pre session
- [ ] Disable prepared statements v connection poole
- [ ] Použij partial indexes pre rare values

### Dlhodobé Riešenie
- [ ] Implementuj application-level query routing
- [ ] Monitoruj coefficient of variation v pg_stat_statements
- [ ] Pravidelne ANALYZE po bulk operáciách

Záver

Generic plan je optimalizácia, ktorá sa môže stať problémom pri skewed dátach:

PostgreSQL po 5 executions prepne na generic plan
Generic plan používa priemerné štatistiky
Pri skewed dátach (0.01% vs 99%) je to katastrofa
pgBouncer transaction mode to ešte komplikuje

Riešenie: Identifikuj skewed queries a buď force custom plan, alebo použi partial indexes.

Súvisiace články

PostgreSQL Autovacuum SLO - Vacuum tuning
Connection Pool Sizing - Správne dimenzovanie poolov

Čo je Generic vs Custom Plan

Custom Plan (pomalšie plánovanie, optimálny plán)

Generic Plan (rýchlejšie plánovanie, možno neoptimálny)

Problém: Skewed Data Distribution

Custom Plan pre ‘failed’ (0.01%)

Generic Plan (priemerný odhad)

Reprodukovateľný Lab

Setup

Test Script (Python)

Výsledky

pgBouncer Komplikácia

Transaction Pooling Mode

Java HikariCP + pgBouncer

Go pgx

Riešenia

1. Force Custom Plan (PostgreSQL 12+)

2. Zvýš Threshold pred Generic Plan

3. Partial Indexes pre Skewed Data

4. Application-Level Routing

Diagnostika

Zisti či Query Používa Generic Plan

EXPLAIN s Konkrétnymi vs Generic Parameters

Benchmark: Custom vs Generic

Checklist

Záver

Súvisiace články

Súvisiace články

Citujte tento článok