Back to blog

Cardinality Contracts: Prometheus Labels as an API with Budgets

I have broken Prometheus with a ‘harmless’ label more than once. If you run Prometheus (or OTel metrics feeding Prometheus/remote_write), you know this incident pattern:

  • “nothing changed”
  • “we only added one label”
  • and then:
    • Prometheus RAM climbs
    • remote_write costs spike
    • queries slow down
    • dashboards and alerts fall apart

The root cause is almost always the same: cardinality explosion. A single label creates too many unique time series, and your monitoring system becomes the incident.

This post introduces a practical concept that turns hand-wavy advice into a guardrail:

Cardinality Contracts = explicit budgets for metric cardinality + automatic verification in CI + a runtime firewall when CI is not enough.

This is not a theoretical best practice. It is an operational contract that makes labels behave like an API.

Tested on: Prometheus 2.47, OTel metrics via remote_write, Kubernetes workloads with 500k+ active series.

Why “just don’t add user_id” is not enough

Everyone knows user_id should not be a label. It still happens because:

  • a router stops setting route and you export raw path
  • a debug label slips in and never leaves
  • a feature flag introduces a new dimension
  • a middleware change starts emitting a new value set

The problem is that labels are an API. If you treat them like an API, you need:

  • explicit spec
  • compatibility rules
  • breaking-change detection

That is exactly what Cardinality Contracts provide.

What the contract covers

Two things, both measurable:

  1. Series count per metric (e.g., http_server_requests_total must stay under 5k series)
  2. Unique values per label (e.g., route max 250, status max 25)

If someone accidentally exposes raw paths like /users/123, the contract fails immediately.

Artifacts you keep in Git

  1. cardinality.budgets.yml - readable budgets
  2. tools/cardinality_guard.py - enforcement script
  3. CI job that scrapes /metrics and fails on budget violations
  4. (optional) cardinality.baseline.json + diff output in PRs

1) Define budgets in YAML

Start with the top 5 metrics that drive your bill or memory usage. Budgets are guardrails, not the truth. Start higher and tighten later.

# cardinality.budgets.yml
budgets:
  http_server_requests_total:
    max_series: 5000
    labels:
      method: 10
      status: 25
      route: 250
  http_server_request_duration_seconds_bucket:
    max_series: 20000
    labels:
      le: 50
      route: 250
      status: 25
  db_query_duration_seconds:
    max_series: 2000
    labels:
      operation: 20
      table: 200

2) CI smoke: scrape /metrics and count cardinality

Workflow:

  • start app + dependencies (docker compose / kind / testcontainers)
  • run a short smoke test
  • scrape /metrics
  • run the guard script

Minimal Python guard (low dependency)

# tools/cardinality_guard.py
import re
import sys
import json
from collections import defaultdict

try:
    import yaml  # pip install pyyaml
except ImportError:
    print("Missing dependency: pyyaml (pip install pyyaml)", file=sys.stderr)
    sys.exit(2)

METRIC_LINE = re.compile(r'^([a-zA-Z_:][a-zA-Z0-9_:]*)(\{.*\})?\s+[-+]?\d')

def parse_labels(label_blob: str):
    if not label_blob:
        return {}
    s = label_blob.strip()[1:-1].strip()
    if not s:
        return {}
    labels = {}
    parts = []
    cur = []
    in_q = False
    esc = False
    for ch in s:
        if esc:
            cur.append(ch)
            esc = False
        elif ch == '\\':
            cur.append(ch)
            esc = True
        elif ch == '"':
            cur.append(ch)
            in_q = not in_q
        elif ch == ',' and not in_q:
            parts.append(''.join(cur).strip())
            cur = []
        else:
            cur.append(ch)
    if cur:
        parts.append(''.join(cur).strip())

    for p in parts:
        if not p:
            continue
        k, v = p.split("=", 1)
        k = k.strip()
        v = v.strip()
        if v.startswith('"') and v.endswith('"'):
            v = v[1:-1]
        labels[k] = v
    return labels


def load_budgets(path: str):
    with open(path, "r", encoding="utf-8") as f:
        doc = yaml.safe_load(f)
    return doc.get("budgets", {})


def main():
    if len(sys.argv) != 4:
        print("Usage: python tools/cardinality_guard.py <budgets.yml> <metrics.txt> <report.json>", file=sys.stderr)
        sys.exit(2)

    budgets_path, metrics_path, report_path = sys.argv[1], sys.argv[2], sys.argv[3]
    budgets = load_budgets(budgets_path)

    series = defaultdict(set)
    label_values = defaultdict(lambda: defaultdict(set))

    with open(metrics_path, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line or line.startswith("#"):
                continue
            m = METRIC_LINE.match(line)
            if not m:
                continue
            metric = m.group(1)
            label_blob = m.group(2)
            labels = parse_labels(label_blob) if label_blob else {}

            fp = "|".join([f"{k}={labels[k]}" for k in sorted(labels.keys())])
            series[metric].add(fp)

            for k, v in labels.items():
                label_values[metric][k].add(v)

    report = {"metrics": {}, "violations": []}

    for metric, labelsets in series.items():
        metric_series = len(labelsets)
        rep = {
            "series": metric_series,
            "labels": {k: len(vs) for k, vs in label_values[metric].items()}
        }
        report["metrics"][metric] = rep

        if metric in budgets:
            b = budgets[metric]
            max_series = b.get("max_series")
            if max_series is not None and metric_series > int(max_series):
                report["violations"].append({
                    "metric": metric,
                    "type": "max_series",
                    "observed": metric_series,
                    "budget": int(max_series)
                })

            label_budgets = b.get("labels", {})
            for label, max_vals in label_budgets.items():
                observed = rep["labels"].get(label, 0)
                if observed > int(max_vals):
                    report["violations"].append({
                        "metric": metric,
                        "type": "label_values",
                        "label": label,
                        "observed": observed,
                        "budget": int(max_vals)
                    })

    with open(report_path, "w", encoding="utf-8") as f:
        json.dump(report, f, indent=2, sort_keys=True)

    if report["violations"]:
        print("CARDINALITY CONTRACT FAILED:")
        for v in report["violations"]:
            if v["type"] == "max_series":
                print(f"- {v['metric']}: series {v['observed']} > budget {v['budget']}")
            else:
                print(f"- {v['metric']}[{v['label']}]: values {v['observed']} > budget {v['budget']}")
        sys.exit(1)

    print("Cardinality contract OK.")
    sys.exit(0)


if __name__ == "__main__":
    main()

3) Minimal GitHub Actions job

# .github/workflows/cardinality.yml
name: Cardinality contracts

on:
  pull_request:

jobs:
  cardinality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start stack
        run: docker compose up -d --build

      - name: Run smoke tests
        run: |
          curl -fsS http://localhost:8080/healthz

      - name: Scrape metrics
        run: |
          curl -fsS http://localhost:8080/metrics > metrics.txt

      - name: Install deps
        run: |
          python -m pip install --upgrade pip
          pip install pyyaml

      - name: Enforce contracts
        run: |
          python tools/cardinality_guard.py cardinality.budgets.yml metrics.txt cardinality.report.json

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: cardinality-report
          path: cardinality.report.json

4) Runtime Cardinality Firewall (when CI is not enough)

CI catches regressions, but some labels only appear in prod (tenants, feature flags, edge paths). A runtime guardrail prevents a monitoring outage when CI misses a path.

Pattern: if a label exceeds budget, bucket the value into __other__ and increment an overflow counter.

class BudgetedLabel {
  private max: number;
  private seen: Map<string, true>;

  constructor(max: number) {
    this.max = max;
    this.seen = new Map();
  }

  normalize(v: string): string {
    if (this.seen.has(v)) return v;
    if (this.seen.size < this.max) {
      this.seen.set(v, true);
      return v;
    }
    return "__other__";
  }
}

This is a circuit breaker for telemetry. It keeps Prometheus alive and tells you who tried to break the contract.

Breaking vs non-breaking rules

Breaking:

  • series count exceeds budget
  • unique values for a labeled dimension exceed budget

Non-breaking:

  • cardinality goes down
  • a label is removed entirely (if you do not depend on it in dashboards)

Anti-patterns people still ship

  • Raw path as label (/users/123)
  • PII in labels (email, user_id, request_id)
  • “Temporary” debug label (it will live forever)

Use route templates or handler names instead.

FAQ

“Budgets will annoy us.” If set reasonably, budgets only trigger when you are about to ship a monitoring incident.

“We are multi-tenant and want tenant label.” Then treat it like a budgeted API. If it does not fit, put it in logs or traces instead of metrics.

“Histograms are huge anyway.” Yes. That is why budgets for *_bucket are separate and focus on the big multipliers: route, status, le.

Production checklist

  • Define budgets for top 5 metrics
  • Enforce in CI (fail only on over-budget)
  • Add runtime firewall for high-risk labels
  • Track overflow counter and page on spikes

Conclusion

Cardinality Contracts are simple:

  • define budgets
  • enforce in CI
  • add a runtime safety net

But the impact is large: fewer incidents, lower telemetry costs, and a monitoring stack that stays healthy even when someone makes a bad label choice.

Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "Cardinality Contracts: Prometheus Labels as an API with Budgets". https://www.michal-drozd.com/en/blog/cardinality-contracts-prometheus-label-budgets/ (Published December 21, 2025).