Connection Pool Sizing s Little's Law: Matematický Prístup k HikariCP a PgBouncer

Littleov zakon znie teoreticky, kym ti pool nenarazi do steny o druhej rano. “Nastav pool size na 50.” Prečo 50? “Lebo tak to máme všade.” Toto nie je engineering, toto je cargo cult.

Connection pool sizing je matematický problém, ktorý má presné riešenie. Little’s Law ti povie presne koľko connections potrebuješ.

Testované na: PostgreSQL 16.1, HikariCP 5.1, pgbouncer 1.21, k6 load testing

Little’s Law Základy

Vzorec

L = λ × W

L = priemerný počet položiek v systéme (concurrent connections)
λ = arrival rate (requests per second)
W = priemerný čas v systéme (query duration)

Aplikácia na Connection Pool

Pool Size = Requests/sec × Avg Query Duration

Príklad:
- 500 requests/sec
- Priemerná query trvá 20ms = 0.02s
- Pool Size = 500 × 0.02 = 10 connections

S 20% rezervou: Pool Size = 12

Prečo Väčší Pool ≠ Lepší Výkon

Connection Overhead

Každá PostgreSQL connection spotrebuje:

~10MB RAM na server
Jeden backend process
Lock contention pri vysokom počte connections

Amdahlova Prekážka

Ak máš 8 CPU cores na DB serveri:
- Viac ako 8 aktívnych queries = context switching
- Pool size 100 = 92 queries čaká + overhead

Optimum: pool_size ≈ (2 × CPU cores) + disk spindles
Pre NVMe: pool_size ≈ 2 × CPU cores = 16

Výpočet Pool Size

Krok 1: Zmeraj Query Duration

-- Priemerná a p99 query duration
SELECT
    substring(query, 1, 50) as query_sample,
    calls,
    round(mean_exec_time::numeric, 2) as avg_ms,
    round(max_exec_time::numeric, 2) as max_ms
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 20;

Krok 2: Odhadni Target Throughput

# pool_calculator.py
from dataclasses import dataclass
from typing import Tuple

@dataclass
class PoolCalculation:
    requests_per_sec: int
    avg_query_ms: float
    p99_query_ms: float
    safety_factor: float = 1.5

    def calculate(self) -> Tuple[int, int]:
        """Vracia (minimum, recommended) pool size"""
        # Little's Law: L = λ × W
        avg_duration_sec = self.avg_query_ms / 1000

        # Minimum pre priemerný load
        min_pool = self.requests_per_sec * avg_duration_sec

        # Pre p99 (spike handling)
        p99_duration_sec = self.p99_query_ms / 1000
        p99_pool = self.requests_per_sec * p99_duration_sec

        # S bezpečnostnou rezervou
        recommended = p99_pool * self.safety_factor

        return int(min_pool) + 1, int(recommended) + 1

# Príklad
calc = PoolCalculation(
    requests_per_sec=500,
    avg_query_ms=15,
    p99_query_ms=80
)
min_size, recommended = calc.calculate()
print(f"Minimum: {min_size}, Recommended: {recommended}")
# Minimum: 8, Recommended: 60

Krok 3: Validuj Load Testom

// k6_pool_test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

const errorRate = new Rate('errors');
const waitTime = new Trend('db_wait_time');

export const options = {
    scenarios: {
        constant_load: {
            executor: 'constant-arrival-rate',
            rate: 500,           // 500 RPS
            timeUnit: '1s',
            duration: '5m',
            preAllocatedVUs: 100,
            maxVUs: 200,
        },
    },
};

export default function () {
    const start = Date.now();
    const res = http.get('http://api:8080/query');
    const duration = Date.now() - start;

    check(res, {
        'status is 200': (r) => r.status === 200,
        'no pool exhaustion': (r) => !r.body.includes('pool exhausted'),
    });

    if (res.headers['X-Db-Wait-Time']) {
        waitTime.add(parseFloat(res.headers['X-Db-Wait-Time']));
    }
}

Reprodukovateľný Benchmark

Setup

// PoolBenchmark.java
@RestController
public class PoolBenchmark {

    @Autowired
    private JdbcTemplate jdbcTemplate;

    @GetMapping("/query")
    public Map<String, Object> query() {
        long start = System.currentTimeMillis();

        // Simuluj typický query
        List<Map<String, Object>> result = jdbcTemplate.queryForList(
            "SELECT pg_sleep(0.015), current_timestamp"  // 15ms query
        );

        long elapsed = System.currentTimeMillis() - start;
        return Map.of("duration_ms", elapsed, "rows", result.size());
    }
}

Test s Rôznymi Pool Sizes

# application-pool5.yml
spring:
  datasource:
    hikari:
      maximum-pool-size: 5
      connection-timeout: 1000

# application-pool20.yml
spring:
  datasource:
    hikari:
      maximum-pool-size: 20
      connection-timeout: 1000

# application-pool100.yml
spring:
  datasource:
    hikari:
      maximum-pool-size: 100
      connection-timeout: 1000

Výsledky

Target: 500 RPS, Query duration: 15ms
Theoretical minimum: 500 × 0.015 = 7.5 = 8 connections

Pool Size  | Achieved RPS | p99 Latency | Timeouts | DB CPU
-----------|--------------|-------------|----------|--------
5          | 320          | 890ms       | 36%      | 25%
10         | 498          | 45ms        | 0.1%     | 55%
20         | 502          | 32ms        | 0%       | 62%
50         | 501          | 35ms        | 0%       | 68%
100        | 499          | 42ms        | 0%       | 75%  ← overhead!

Optimum: 10-20 connections pre 500 RPS s 15ms queries.

HikariCP Konfigurácia

spring:
  datasource:
    hikari:
      # Vypočítané z Little's Law
      maximum-pool-size: 20

      # Timeout keď pool je plný
      connection-timeout: 5000  # 5 sekúnd

      # Idle connection lifetime
      idle-timeout: 600000      # 10 minút

      # Max connection age (force refresh)
      max-lifetime: 1800000     # 30 minút

      # Minimum idle (pre cold start)
      minimum-idle: 5

      # Leak detection
      leak-detection-threshold: 30000  # 30 sekúnd

PgBouncer Sizing

Transaction Mode

# pgbouncer.ini
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000      # Klienti môžu byť veľa
default_pool_size = 20      # Reálne connections k DB
reserve_pool_size = 5       # Pre spike
reserve_pool_timeout = 3    # Kedy použiť reserve

[databases]
mydb = host=postgres port=5432 dbname=mydb

Výpočet pre PgBouncer

Aplikačné pody: 10
HikariCP pool per pod: 20
Total app connections: 200

Ale všetky nie sú aktívne súčasne!
Active ratio: ~10-20%

PgBouncer pool: 200 × 0.15 = 30 connections k DB

Monitoring Pool Health

HikariCP Metrics

// Prometheus metrics sú automatické s Micrometer
// hikaricp_connections_active
// hikaricp_connections_idle
// hikaricp_connections_pending
// hikaricp_connections_timeout_total

# Pool utilization
hikaricp_connections_active / hikaricp_connections_max * 100

# Queue depth (čakajúce requesty)
hikaricp_connections_pending

# Connection wait time
rate(hikaricp_connections_acquire_seconds_sum[5m])
/
rate(hikaricp_connections_acquire_seconds_count[5m])

Alert Rules

groups:
- name: connection_pool
  rules:
  - alert: PoolNearlyExhausted
    expr: |
      hikaricp_connections_active / hikaricp_connections_max > 0.9
    for: 5m
    annotations:
      summary: "Connection pool >90% utilized"

  - alert: HighConnectionWaitTime
    expr: |
      rate(hikaricp_connections_acquire_seconds_sum[5m])
      /
      rate(hikaricp_connections_acquire_seconds_count[5m])
      > 0.1
    for: 5m
    annotations:
      summary: "Average connection wait >100ms"

Gotchas

1. Transaction Duration ≠ Query Duration

@Transactional
public void processOrder(Order order) {
    // Query 1: 5ms
    Order saved = orderRepository.save(order);

    // External API call: 500ms !!!
    paymentService.charge(order);

    // Query 2: 5ms
    auditRepository.log(saved);
}
// Connection držaná 510ms, nie 10ms!

Riešenie: Externú komunikáciu rob mimo transakcie.

2. N+1 Queries

// 1 query + 100 queries = 101 queries
// Connection držaná celý čas
List<User> users = userRepository.findAll();  // 1 query
for (User u : users) {
    u.getOrders().size();  // 100 queries (lazy loading)
}

3. Long-running Reports

// Report query: 30 sekúnd
// Blokuje 1 connection na 30s
// Pri 10 concurrent reportoch = pool exhausted

Riešenie: Samostatný pool pre reports.

spring:
  datasource:
    hikari:
      pool-name: transactional
      maximum-pool-size: 20

  reports-datasource:
    hikari:
      pool-name: reporting
      maximum-pool-size: 5

Checklist

## Connection Pool Sizing

### Analýza
- [ ] Zmeraj avg a p99 query duration (pg_stat_statements)
- [ ] Identifikuj peak RPS z load balancer logs
- [ ] Skontroluj transakčnú vs query duration

### Výpočet
- [ ] L = λ × W (Little's Law)
- [ ] Pridaj 50% safety margin
- [ ] Neprekroč 2 × DB CPU cores

### Validácia
- [ ] Load test s target RPS
- [ ] Monitoruj pool utilization
- [ ] Sleduj connection wait time

### Monitoring
- [ ] Alert na pool > 90%
- [ ] Alert na wait time > 100ms
- [ ] Dashboard s pool metrics

Záver

Connection pool sizing nie je hádanie:

Little’s Law: Pool = RPS × Query Duration
Väčší pool ≠ lepší výkon (overhead!)
Optimum: 10-20 connections pre väčšinu workloadov
Validuj load testom, nie produkciou

Prestať s “pool size 50 lebo tak to bolo vždy” a začni počítať.

Súvisiace články

K8s PostgreSQL Connection Storm - Connection management
PostgreSQL Autovacuum SLO - DB performance

Little’s Law Základy

Vzorec

Aplikácia na Connection Pool

Prečo Väčší Pool ≠ Lepší Výkon

Connection Overhead

Amdahlova Prekážka

Výpočet Pool Size

Krok 1: Zmeraj Query Duration

Krok 2: Odhadni Target Throughput

Krok 3: Validuj Load Testom

Reprodukovateľný Benchmark

Setup

Test s Rôznymi Pool Sizes

Výsledky

HikariCP Konfigurácia

PgBouncer Sizing

Transaction Mode

Výpočet pre PgBouncer

Monitoring Pool Health

HikariCP Metrics

Alert Rules

Gotchas

1. Transaction Duration ≠ Query Duration

2. N+1 Queries

3. Long-running Reports

Checklist

Záver

Súvisiace články

Súvisiace články

Citujte tento článok