Späť na blog

Cardinality Contracts: sprav z Prometheus labelov API s budgetom

Prometheus som uz parkrat polozil aj jednym “nevinne” vyzerajucim labelom. Ak pouzivas Prometheus (alebo OTel metrics, ktore idu do Promethea/remote_write), poznas tento incident pattern:

  • “nic sa nemenilo”
  • “len sme pridali jeden label”
  • a potom:
    • Prometheus zere RAM
    • remote_write ucet vyleti
    • dotazy sa spomalia
    • grafy a alerty sa rozpadnu

Pricina je skoro vzdy rovnaka: cardinality explosion. Jeden label vytvori prilis vela unikatnych time series a monitoring sa zmeni na incident.

Tento clanok zavadza prakticky koncept, ktory z rad spravi guardrail:

Cardinality Contracts = explicitne budgety na cardinality + automaticke overovanie v CI + runtime firewall, ked CI nestaci.

Nie je to teoria ani marketing. Je to operacny kontrakt, ktory spravi z labelov API.

Testovane na: Prometheus 2.47, OTel metrics cez remote_write, Kubernetes workloady s 500k+ aktivnymi seriami.

Preco “nedavaj tam user_id” nestaci

Vsetci vedia, ze user_id nema byt label. Aj tak sa to deje, lebo:

  • router prestane nastavovat route a vyexportujes raw path
  • debug label prejde do produkcie
  • feature flag prida novu dimenziu
  • middleware zmeni spravanie a zacne produkovat novu sadu hodnot

Problem je v tom, ze labels su API. Ak ich beries ako API, potrebujes:

  • explicitnu specifikaciu
  • kompatibilitu
  • detekciu breaking zmien

A presne to riesia Cardinality Contracts.

Co kontrakt pokryva

Dve veci, obe meratelne:

  1. Pocet time series na metriku (napr. http_server_requests_total musi zostat pod 5k seriami)
  2. Pocet unikatnych hodnot labelu (napr. route max 250, status max 25)

Ak niekto omylom exportuje raw pathy ako /users/123, kontrakt zlyha okamzite.

Artefakty v Gite

  1. cardinality.budgets.yml - citatelne budgety
  2. tools/cardinality_guard.py - guard skript
  3. CI job, ktory scrape-ne /metrics a failne pri prekroceni
  4. (volitelne) cardinality.baseline.json + diff report v PR

1) Definuj budgety v YAML

Zacni top 5 metrikami, ktore ti najviac rastu. Budgety su guardrails, nie absolutna pravda. Zacni vysoko a postupne sprisnuj.

# cardinality.budgets.yml
budgets:
  http_server_requests_total:
    max_series: 5000
    labels:
      method: 10
      status: 25
      route: 250
  http_server_request_duration_seconds_bucket:
    max_series: 20000
    labels:
      le: 50
      route: 250
      status: 25
  db_query_duration_seconds:
    max_series: 2000
    labels:
      operation: 20
      table: 200

2) CI smoke: scrape /metrics a spocitaj cardinality

Workflow:

  • spusti appku + dependencies (docker compose / kind / testcontainers)
  • prebehne kratky smoke test
  • scrape-ni /metrics
  • spusti guard skript

Minimal Python guard (low dependency)

# tools/cardinality_guard.py
import re
import sys
import json
from collections import defaultdict

try:
    import yaml  # pip install pyyaml
except ImportError:
    print("Missing dependency: pyyaml (pip install pyyaml)", file=sys.stderr)
    sys.exit(2)

METRIC_LINE = re.compile(r'^([a-zA-Z_:][a-zA-Z0-9_:]*)(\{.*\})?\s+[-+]?\d')

def parse_labels(label_blob: str):
    if not label_blob:
        return {}
    s = label_blob.strip()[1:-1].strip()
    if not s:
        return {}
    labels = {}
    parts = []
    cur = []
    in_q = False
    esc = False
    for ch in s:
        if esc:
            cur.append(ch)
            esc = False
        elif ch == '\\':
            cur.append(ch)
            esc = True
        elif ch == '"':
            cur.append(ch)
            in_q = not in_q
        elif ch == ',' and not in_q:
            parts.append(''.join(cur).strip())
            cur = []
        else:
            cur.append(ch)
    if cur:
        parts.append(''.join(cur).strip())

    for p in parts:
        if not p:
            continue
        k, v = p.split("=", 1)
        k = k.strip()
        v = v.strip()
        if v.startswith('"') and v.endswith('"'):
            v = v[1:-1]
        labels[k] = v
    return labels


def load_budgets(path: str):
    with open(path, "r", encoding="utf-8") as f:
        doc = yaml.safe_load(f)
    return doc.get("budgets", {})


def main():
    if len(sys.argv) != 4:
        print("Usage: python tools/cardinality_guard.py <budgets.yml> <metrics.txt> <report.json>", file=sys.stderr)
        sys.exit(2)

    budgets_path, metrics_path, report_path = sys.argv[1], sys.argv[2], sys.argv[3]
    budgets = load_budgets(budgets_path)

    series = defaultdict(set)
    label_values = defaultdict(lambda: defaultdict(set))

    with open(metrics_path, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line or line.startswith("#"):
                continue
            m = METRIC_LINE.match(line)
            if not m:
                continue
            metric = m.group(1)
            label_blob = m.group(2)
            labels = parse_labels(label_blob) if label_blob else {}

            fp = "|".join([f"{k}={labels[k]}" for k in sorted(labels.keys())])
            series[metric].add(fp)

            for k, v in labels.items():
                label_values[metric][k].add(v)

    report = {"metrics": {}, "violations": []}

    for metric, labelsets in series.items():
        metric_series = len(labelsets)
        rep = {
            "series": metric_series,
            "labels": {k: len(vs) for k, vs in label_values[metric].items()}
        }
        report["metrics"][metric] = rep

        if metric in budgets:
            b = budgets[metric]
            max_series = b.get("max_series")
            if max_series is not None and metric_series > int(max_series):
                report["violations"].append({
                    "metric": metric,
                    "type": "max_series",
                    "observed": metric_series,
                    "budget": int(max_series)
                })

            label_budgets = b.get("labels", {})
            for label, max_vals in label_budgets.items():
                observed = rep["labels"].get(label, 0)
                if observed > int(max_vals):
                    report["violations"].append({
                        "metric": metric,
                        "type": "label_values",
                        "label": label,
                        "observed": observed,
                        "budget": int(max_vals)
                    })

    with open(report_path, "w", encoding="utf-8") as f:
        json.dump(report, f, indent=2, sort_keys=True)

    if report["violations"]:
        print("CARDINALITY CONTRACT FAILED:")
        for v in report["violations"]:
            if v["type"] == "max_series":
                print(f"- {v['metric']}: series {v['observed']} > budget {v['budget']}")
            else:
                print(f"- {v['metric']}[{v['label']}]: values {v['observed']} > budget {v['budget']}")
        sys.exit(1)

    print("Cardinality contract OK.")
    sys.exit(0)


if __name__ == "__main__":
    main()

3) Minimal GitHub Actions job

# .github/workflows/cardinality.yml
name: Cardinality contracts

on:
  pull_request:

jobs:
  cardinality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start stack
        run: docker compose up -d --build

      - name: Run smoke tests
        run: |
          curl -fsS http://localhost:8080/healthz

      - name: Scrape metrics
        run: |
          curl -fsS http://localhost:8080/metrics > metrics.txt

      - name: Install deps
        run: |
          python -m pip install --upgrade pip
          pip install pyyaml

      - name: Enforce contracts
        run: |
          python tools/cardinality_guard.py cardinality.budgets.yml metrics.txt cardinality.report.json

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: cardinality-report
          path: cardinality.report.json

4) Runtime Cardinality Firewall (ked CI nestaci)

CI chyti regresie, ale niektore label hodnoty vzniknu az v produkcii (tenanty, feature flagy, edge pathy). Runtime guardrail ta zachrani, ked CI nieco nevidi.

Pattern: ak label prekroci budget, zabucketuj hodnotu do __other__ a inkrementuj overflow counter.

class BudgetedLabel {
  private max: number;
  private seen: Map<string, true>;

  constructor(max: number) {
    this.max = max;
    this.seen = new Map();
  }

  normalize(v: string): string {
    if (this.seen.has(v)) return v;
    if (this.seen.size < this.max) {
      this.seen.set(v, true);
      return v;
    }
    return "__other__";
  }
}

Je to circuit breaker pre telemetriu. Prometheus prezije a ty uvidis, kto prekrocil kontrakt.

Breaking vs non-breaking pravidla

Breaking:

  • pocet series prekroci budget
  • pocet unikatnych hodnot labelu prekroci budget

Non-breaking:

  • cardinality klesne
  • label sa odstrani (ak na nom nezavisis v dashboardoch)

Anti-patterny, ktore ludia stale shipuju

  • Raw path ako label (/users/123)
  • PII v labeloch (email, user_id, request_id)
  • “Docasny” debug label (zije vecne)

Pouzi route template alebo handler name.

FAQ

“Budgety nas budu otravovat.” Ak su rozumne nastavene, budes ich riesit iba vtedy, ked by si inak o par dni riesil incident.

“Sme multi-tenant a chceme tenant label.” Potom ho bud rozumne budgetuj, alebo ho presun do logov/traces. Ak to nevlezie do budgetu, do Promethea to nepatri.

“Histogramy su velke aj tak.” Ano. Preto davaju zmysel osobitne budgety pre *_bucket a fokus na hlavne multiplikatory: route, status, le.

Production checklist

  • Definuj budgety pre top 5 metrik
  • V CI failni len ked prekrocis budget
  • Pridaj runtime firewall pre rizikove labely
  • Sleduj overflow counter a alertuj na spikes

Suvisiace clanky

Zaver

Cardinality Contracts su jednoduche:

  • definuj budgety
  • over ich v CI
  • pridaj runtime poistku

Ale dopad je velky: menej incidentov, nizsie naklady na telemetriu a monitoring, ktory ostane zdravy aj pri zlom label rozhodnuti.

Súvisiace články

Citujte tento článok

Ak na článok odkazujete, pridajte pôvodnú URL a uveďte autora.

Michal Drozd. "Cardinality Contracts: sprav z Prometheus labelov API s budgetom". https://www.michal-drozd.com/sk/blog/cardinality-contracts-prometheus-label-budgety/ (Publikované 21. decembra 2025).