Connection Pool Sizing s Little's Law: Matematický Prístup k HikariCP a PgBouncer
Littleov zakon znie teoreticky, kym ti pool nenarazi do steny o druhej rano. “Nastav pool size na 50.” Prečo 50? “Lebo tak to máme všade.” Toto nie je engineering, toto je cargo cult.
Connection pool sizing je matematický problém, ktorý má presné riešenie. Little’s Law ti povie presne koľko connections potrebuješ.
Testované na: PostgreSQL 16.1, HikariCP 5.1, pgbouncer 1.21, k6 load testing
Little’s Law Základy
Vzorec
L = λ × W
L = priemerný počet položiek v systéme (concurrent connections)
λ = arrival rate (requests per second)
W = priemerný čas v systéme (query duration)
Aplikácia na Connection Pool
Pool Size = Requests/sec × Avg Query Duration
Príklad:
- 500 requests/sec
- Priemerná query trvá 20ms = 0.02s
- Pool Size = 500 × 0.02 = 10 connections
S 20% rezervou: Pool Size = 12
Prečo Väčší Pool ≠ Lepší Výkon
Connection Overhead
Každá PostgreSQL connection spotrebuje:
- ~10MB RAM na server
- Jeden backend process
- Lock contention pri vysokom počte connections
Amdahlova Prekážka
Ak máš 8 CPU cores na DB serveri:
- Viac ako 8 aktívnych queries = context switching
- Pool size 100 = 92 queries čaká + overhead
Optimum: pool_size ≈ (2 × CPU cores) + disk spindles
Pre NVMe: pool_size ≈ 2 × CPU cores = 16
Výpočet Pool Size
Krok 1: Zmeraj Query Duration
-- Priemerná a p99 query duration
SELECT
substring(query, 1, 50) as query_sample,
calls,
round(mean_exec_time::numeric, 2) as avg_ms,
round(max_exec_time::numeric, 2) as max_ms
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 20;
Krok 2: Odhadni Target Throughput
# pool_calculator.py
from dataclasses import dataclass
from typing import Tuple
@dataclass
class PoolCalculation:
requests_per_sec: int
avg_query_ms: float
p99_query_ms: float
safety_factor: float = 1.5
def calculate(self) -> Tuple[int, int]:
"""Vracia (minimum, recommended) pool size"""
# Little's Law: L = λ × W
avg_duration_sec = self.avg_query_ms / 1000
# Minimum pre priemerný load
min_pool = self.requests_per_sec * avg_duration_sec
# Pre p99 (spike handling)
p99_duration_sec = self.p99_query_ms / 1000
p99_pool = self.requests_per_sec * p99_duration_sec
# S bezpečnostnou rezervou
recommended = p99_pool * self.safety_factor
return int(min_pool) + 1, int(recommended) + 1
# Príklad
calc = PoolCalculation(
requests_per_sec=500,
avg_query_ms=15,
p99_query_ms=80
)
min_size, recommended = calc.calculate()
print(f"Minimum: {min_size}, Recommended: {recommended}")
# Minimum: 8, Recommended: 60
Krok 3: Validuj Load Testom
// k6_pool_test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
const errorRate = new Rate('errors');
const waitTime = new Trend('db_wait_time');
export const options = {
scenarios: {
constant_load: {
executor: 'constant-arrival-rate',
rate: 500, // 500 RPS
timeUnit: '1s',
duration: '5m',
preAllocatedVUs: 100,
maxVUs: 200,
},
},
};
export default function () {
const start = Date.now();
const res = http.get('http://api:8080/query');
const duration = Date.now() - start;
check(res, {
'status is 200': (r) => r.status === 200,
'no pool exhaustion': (r) => !r.body.includes('pool exhausted'),
});
if (res.headers['X-Db-Wait-Time']) {
waitTime.add(parseFloat(res.headers['X-Db-Wait-Time']));
}
}
Reprodukovateľný Benchmark
Setup
// PoolBenchmark.java
@RestController
public class PoolBenchmark {
@Autowired
private JdbcTemplate jdbcTemplate;
@GetMapping("/query")
public Map<String, Object> query() {
long start = System.currentTimeMillis();
// Simuluj typický query
List<Map<String, Object>> result = jdbcTemplate.queryForList(
"SELECT pg_sleep(0.015), current_timestamp" // 15ms query
);
long elapsed = System.currentTimeMillis() - start;
return Map.of("duration_ms", elapsed, "rows", result.size());
}
}
Test s Rôznymi Pool Sizes
# application-pool5.yml
spring:
datasource:
hikari:
maximum-pool-size: 5
connection-timeout: 1000
# application-pool20.yml
spring:
datasource:
hikari:
maximum-pool-size: 20
connection-timeout: 1000
# application-pool100.yml
spring:
datasource:
hikari:
maximum-pool-size: 100
connection-timeout: 1000
Výsledky
Target: 500 RPS, Query duration: 15ms
Theoretical minimum: 500 × 0.015 = 7.5 = 8 connections
Pool Size | Achieved RPS | p99 Latency | Timeouts | DB CPU
-----------|--------------|-------------|----------|--------
5 | 320 | 890ms | 36% | 25%
10 | 498 | 45ms | 0.1% | 55%
20 | 502 | 32ms | 0% | 62%
50 | 501 | 35ms | 0% | 68%
100 | 499 | 42ms | 0% | 75% ← overhead!
Optimum: 10-20 connections pre 500 RPS s 15ms queries.
HikariCP Konfigurácia
spring:
datasource:
hikari:
# Vypočítané z Little's Law
maximum-pool-size: 20
# Timeout keď pool je plný
connection-timeout: 5000 # 5 sekúnd
# Idle connection lifetime
idle-timeout: 600000 # 10 minút
# Max connection age (force refresh)
max-lifetime: 1800000 # 30 minút
# Minimum idle (pre cold start)
minimum-idle: 5
# Leak detection
leak-detection-threshold: 30000 # 30 sekúnd
PgBouncer Sizing
Transaction Mode
# pgbouncer.ini
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000 # Klienti môžu byť veľa
default_pool_size = 20 # Reálne connections k DB
reserve_pool_size = 5 # Pre spike
reserve_pool_timeout = 3 # Kedy použiť reserve
[databases]
mydb = host=postgres port=5432 dbname=mydb
Výpočet pre PgBouncer
Aplikačné pody: 10
HikariCP pool per pod: 20
Total app connections: 200
Ale všetky nie sú aktívne súčasne!
Active ratio: ~10-20%
PgBouncer pool: 200 × 0.15 = 30 connections k DB
Monitoring Pool Health
HikariCP Metrics
// Prometheus metrics sú automatické s Micrometer
// hikaricp_connections_active
// hikaricp_connections_idle
// hikaricp_connections_pending
// hikaricp_connections_timeout_total
# Pool utilization
hikaricp_connections_active / hikaricp_connections_max * 100
# Queue depth (čakajúce requesty)
hikaricp_connections_pending
# Connection wait time
rate(hikaricp_connections_acquire_seconds_sum[5m])
/
rate(hikaricp_connections_acquire_seconds_count[5m])
Alert Rules
groups:
- name: connection_pool
rules:
- alert: PoolNearlyExhausted
expr: |
hikaricp_connections_active / hikaricp_connections_max > 0.9
for: 5m
annotations:
summary: "Connection pool >90% utilized"
- alert: HighConnectionWaitTime
expr: |
rate(hikaricp_connections_acquire_seconds_sum[5m])
/
rate(hikaricp_connections_acquire_seconds_count[5m])
> 0.1
for: 5m
annotations:
summary: "Average connection wait >100ms"
Gotchas
1. Transaction Duration ≠ Query Duration
@Transactional
public void processOrder(Order order) {
// Query 1: 5ms
Order saved = orderRepository.save(order);
// External API call: 500ms !!!
paymentService.charge(order);
// Query 2: 5ms
auditRepository.log(saved);
}
// Connection držaná 510ms, nie 10ms!
Riešenie: Externú komunikáciu rob mimo transakcie.
2. N+1 Queries
// 1 query + 100 queries = 101 queries
// Connection držaná celý čas
List<User> users = userRepository.findAll(); // 1 query
for (User u : users) {
u.getOrders().size(); // 100 queries (lazy loading)
}
3. Long-running Reports
// Report query: 30 sekúnd
// Blokuje 1 connection na 30s
// Pri 10 concurrent reportoch = pool exhausted
Riešenie: Samostatný pool pre reports.
spring:
datasource:
hikari:
pool-name: transactional
maximum-pool-size: 20
reports-datasource:
hikari:
pool-name: reporting
maximum-pool-size: 5
Checklist
## Connection Pool Sizing
### Analýza
- [ ] Zmeraj avg a p99 query duration (pg_stat_statements)
- [ ] Identifikuj peak RPS z load balancer logs
- [ ] Skontroluj transakčnú vs query duration
### Výpočet
- [ ] L = λ × W (Little's Law)
- [ ] Pridaj 50% safety margin
- [ ] Neprekroč 2 × DB CPU cores
### Validácia
- [ ] Load test s target RPS
- [ ] Monitoruj pool utilization
- [ ] Sleduj connection wait time
### Monitoring
- [ ] Alert na pool > 90%
- [ ] Alert na wait time > 100ms
- [ ] Dashboard s pool metrics
Záver
Connection pool sizing nie je hádanie:
- Little’s Law: Pool = RPS × Query Duration
- Väčší pool ≠ lepší výkon (overhead!)
- Optimum: 10-20 connections pre väčšinu workloadov
- Validuj load testom, nie produkciou
Prestať s “pool size 50 lebo tak to bolo vždy” a začni počítať.
Súvisiace články
- K8s PostgreSQL Connection Storm - Connection management
- PostgreSQL Autovacuum SLO - DB performance
Súvisiace články
Keď Prepared Statements Spravia PostgreSQL 10× Pomalším: Generic Plan Trap
Rovnaký query, rovnaké parametre, ale prod je pomalý a staging funguje. Ukážem ako reprodukovať generic plan problém s pgBouncer, Java/Go a ako ho fixnúť.
UUIDv4 vs ULID vs TSID: Dopad na PostgreSQL B-Tree Indexy po 100M Záznamoch
Náhodné UUID ako Primary Key spôsobujú index bloat a random I/O. Benchmark s konkrétnymi číslami - veľkosť indexu, cache hit ratio a WAL volume po 100M insertoch.
PostgreSQL HOT Updates + FILLFACTOR: Ako Znížiť Index Bloat o 60%
Vacuum beží úspešne, ale disk rastie a cache hit ratio klesá. Ukážem ako kvantifikovať HOT-update eligibility pomocou pgstattuple a optimalizovať fillfactor.
Vyčerpanie Connection Poolu: Tichý Spúšťač Výpadkov
Aplikácia visí, ale databáza vyzerá zdravo. Najčastejšie je vyčerpaný connection pool. Ukážem detekciu, rozumné dimenzovanie a prevenciu únikov spojení.
Citujte tento článok
Ak na článok odkazujete, pridajte pôvodnú URL a uveďte autora.