Prometheus Cardinality Explosion: Detection, Prevention, and Recovery

Cardinality explosions don’t happen in tests; they happen on your bill. Friday afternoon, quick deploy, and Grafana looked quiet. Not in a good way. Prometheus memory crossed 60GB and still OOMed. We’d already doubled the box once. The TSDB status endpoint told the story in seconds: someone had added user_id to http_requests_total.

On paper it sounded harmless: “I just want latency per user.” In Prometheus, every unique label combination is a new time series. Ten million users meant ten million series. One innocent-looking change turned monitoring into the incident.

What makes cardinality explosions particularly dangerous is how quickly they compound. If you have 5 HTTP methods, 50 status codes, and 100 endpoints, that’s 25,000 series—manageable. Add 10 million users as a label dimension, and you get 250 billion potential series. Even if only a fraction materialize, you’re still looking at millions of series, each consuming memory and disk.

The tragedy is that high-cardinality labels are useless in Prometheus anyway. You can’t meaningfully visualize 10 million user-specific time series. What you actually want—debugging a specific user’s requests—is better served by logs or traces. Prometheus is for aggregate metrics with bounded cardinality. Using it for high-cardinality data doesn’t just break Prometheus; it also doesn’t solve the problem you’re trying to solve.

Tested on: Prometheus 2.47, 50-node Kubernetes cluster, 2M active time series

Understanding Cardinality

What Creates Time Series

Metric cardinality = product of all label values

Example:
  http_requests_total{
    method="GET",       # 5 values (GET, POST, PUT, DELETE, PATCH)
    status="200",       # 50 values (200, 201, 400, 401, 404, 500...)
    endpoint="/api/v1"  # 100 values (endpoints)
  }

Cardinality: 5 × 50 × 100 = 25,000 time series

Add user_id label with 1M users:
Cardinality: 5 × 50 × 100 × 1,000,000 = 25,000,000,000 time series
                                        └─ Prometheus dies

Memory Impact

Prometheus memory usage:

Per active time series:
  - ~3KB RAM for recent samples (last 2 hours)
  - ~1.5KB for TSDB head chunks

Real-world example:
  Before: 500,000 time series × 3KB = 1.5GB
  After adding user_id: 50,000,000 × 3KB = 150GB

That's a single bad label causing 100x memory increase

Detection

TSDB Status Endpoint

# Check current cardinality
curl -s localhost:9090/api/v1/status/tsdb | jq .

# Output:
{
  "seriesCountByMetricName": [
    {"name": "http_requests_total", "value": 25000000},  # RED FLAG
    {"name": "process_cpu_seconds_total", "value": 500},
    ...
  ],
  "labelValueCountByLabelName": [
    {"name": "user_id", "value": 10000000},  # RED FLAG
    {"name": "instance", "value": 50},
    {"name": "method", "value": 5},
    ...
  ],
  "seriesCountByLabelValuePair": [
    {"name": "job=api-server", "value": 25000000},
    ...
  ]
}

PromQL Queries

# Total active time series
prometheus_tsdb_head_series

# Time series created per second (spike detection)
rate(prometheus_tsdb_head_series_created_total[5m])

# Memory used by TSDB head
prometheus_tsdb_head_chunks_storage_size_bytes

# Cardinality by metric name
topk(10, count by (__name__) ({__name__=~".+"}))

# Cardinality by label
topk(10, count by (user_id) ({user_id=~".+"}))

Proactive Monitoring

# prometheus-alerts.yaml
groups:
- name: cardinality
  rules:
  - alert: HighCardinalityMetric
    expr: |
      topk(1, count by (__name__) ({__name__=~".+"})) > 100000
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Metric {{ $labels.__name__ }} has >100k series"

  - alert: TimeSeriesExplosion
    expr: |
      rate(prometheus_tsdb_head_series_created_total[5m]) > 1000
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Creating {{ $value }}/sec new time series"

  - alert: HighCardinalityLabel
    expr: |
      prometheus_tsdb_head_series > 1000000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Total time series exceeds 1M"

Prevention

Relabel Config to Drop High-Cardinality Labels

# prometheus.yml
scrape_configs:
  - job_name: 'api-servers'
    static_configs:
      - targets: ['api:8080']
    metric_relabel_configs:
      # Drop metrics with user_id label entirely
      - source_labels: [user_id]
        regex: .+
        action: drop

      # Or drop just the label, keep the metric
      - regex: user_id
        action: labeldrop

      # Drop metrics matching pattern
      - source_labels: [__name__]
        regex: "expensive_metric_.*"
        action: drop

      # Hash high-cardinality labels to reduce cardinality
      - source_labels: [request_id]
        regex: (.+)
        target_label: request_id_bucket
        replacement: "bucket_${1:0:2}"  # First 2 chars = 256 buckets
        action: replace
      - regex: request_id
        action: labeldrop

Recording Rules for Aggregation

# Instead of storing high-cardinality metrics,
# aggregate them at scrape time

groups:
- name: aggregations
  rules:
  # Aggregate per-user metrics to per-endpoint
  - record: http_requests:by_endpoint:rate5m
    expr: |
      sum by (endpoint, method, status) (
        rate(http_requests_total[5m])
      )

  # Keep only top N label values
  - record: http_requests:top_endpoints:rate5m
    expr: |
      topk(100,
        sum by (endpoint) (rate(http_requests_total[5m]))
      )

Application-Level Prevention

// Bad: High-cardinality label
var httpRequests = prometheus.NewCounterVec(
    prometheus.CounterOpts{
        Name: "http_requests_total",
    },
    []string{"method", "status", "endpoint", "user_id"},  // BAD!
)

// Good: Remove unbounded labels
var httpRequests = prometheus.NewCounterVec(
    prometheus.CounterOpts{
        Name: "http_requests_total",
    },
    []string{"method", "status", "endpoint"},  // Bounded cardinality
)

// If you need per-user metrics, use histograms or logs
var requestDuration = prometheus.NewHistogramVec(
    prometheus.HistogramOpts{
        Name:    "http_request_duration_seconds",
        Buckets: prometheus.DefBuckets,
    },
    []string{"method", "endpoint"},  // No user_id!
)

Label Value Bounding

// Bound endpoint cardinality
func normalizeEndpoint(path string) string {
    // /users/12345 → /users/:id
    // /orders/abc-def → /orders/:id

    patterns := []struct {
        regex       *regexp.Regexp
        replacement string
    }{
        {regexp.MustCompile(`/users/[^/]+`), "/users/:id"},
        {regexp.MustCompile(`/orders/[^/]+`), "/orders/:id"},
        {regexp.MustCompile(`/\d+`), "/:id"},
    }

    result := path
    for _, p := range patterns {
        result = p.regex.ReplaceAllString(result, p.replacement)
    }

    // Catch-all for unknown patterns
    if strings.Count(result, "/") > 5 {
        return "/other"
    }

    return result
}

Recovery

Emergency Procedures

# 1. Identify the culprit
curl -s localhost:9090/api/v1/status/tsdb | jq '.data.seriesCountByMetricName[:10]'

# 2. Add drop rule immediately
# Edit prometheus.yml, add to metric_relabel_configs:
# - source_labels: [__name__]
#   regex: "bad_metric_name"
#   action: drop

# 3. Reload Prometheus config (no restart needed)
curl -X POST localhost:9090/-/reload

# 4. Force TSDB head compaction to free memory
# (Prometheus 2.39+)
curl -X POST localhost:9090/api/v1/admin/tsdb/head_compaction

# 5. If still OOMing, delete the bad metric series
# WARNING: This is destructive!
curl -X POST -g 'localhost:9090/api/v1/admin/tsdb/delete_series?match[]=bad_metric_name'

# 6. Clean tombstones
curl -X POST localhost:9090/api/v1/admin/tsdb/clean_tombstones

Preventing Future Incidents

# prometheus.yml
global:
  scrape_interval: 15s

  # Limit samples per scrape
  sample_limit: 50000  # Per target

  # Limit labels per sample
  label_limit: 30
  label_name_length_limit: 200
  label_value_length_limit: 2000

scrape_configs:
  - job_name: 'api'
    sample_limit: 10000  # Override per job
    metric_relabel_configs:
      # Drop all metrics with suspicious labels
      - source_labels: [user_id, customer_id, request_id, session_id]
        regex: .+
        action: drop

Monitoring Dashboard

Grafana Panels

# Panel 1: Total Time Series
prometheus_tsdb_head_series

# Panel 2: Time Series Growth Rate
rate(prometheus_tsdb_head_series_created_total[5m])

# Panel 3: Memory Usage
prometheus_tsdb_head_chunks_storage_size_bytes / 1024 / 1024 / 1024

# Panel 4: Top 10 Metrics by Cardinality
topk(10, count by (__name__) ({__name__=~".+"}))

# Panel 5: Churn Rate (series created - deleted)
rate(prometheus_tsdb_head_series_created_total[5m])
- rate(prometheus_tsdb_head_series_removed_total[5m])

# Panel 6: Scrape Duration (can indicate cardinality issues)
prometheus_target_scrape_pool_sync_total

Cardinality Budget

# Set cardinality budgets per team/service
# Implement via recording rules + alerts

groups:
- name: cardinality_budgets
  rules:
  # Track cardinality per job
  - record: job:prometheus_series:count
    expr: count by (job) ({__name__=~".+"})

  # Alert when job exceeds budget
  - alert: CardinalityBudgetExceeded
    expr: |
      job:prometheus_series:count{job="api-server"} > 50000
    labels:
      severity: warning
    annotations:
      summary: "Job {{ $labels.job }} exceeds 50k series budget"

Best Practices

Label Guidelines

## Safe Labels (bounded cardinality)
✅ method: GET, POST, PUT, DELETE, PATCH (5 values)
✅ status_code: 200, 201, 400, 401, 403, 404, 500, 502, 503 (~20 values)
✅ service_name: bounded by number of services (~100)
✅ environment: dev, staging, prod (3 values)
✅ region: us-east-1, us-west-2, eu-west-1 (~10 values)

## Dangerous Labels (unbounded cardinality)
❌ user_id: millions of users
❌ request_id: infinite
❌ email: millions
❌ ip_address: potentially millions
❌ trace_id: infinite
❌ timestamp: infinite
❌ url_path (raw): unbounded (needs normalization)

## Rule of Thumb
Label cardinality should be < 1000 values
Total metric cardinality should be < 10,000 series

Architecture for High-Cardinality Data

Need per-user metrics? Don't use Prometheus labels.

Alternative approaches:

1. Logs + Log aggregation
   User activity → Structured logs → Loki/Elasticsearch
   Query: sum(rate({job="api"} |= "user_id=123")) by (endpoint)

2. Event streaming
   User events → Kafka → ClickHouse/TimescaleDB
   Query: SELECT count(*) FROM events WHERE user_id = 123

3. Exemplars (Prometheus 2.26+)
   Attach trace_id to histogram buckets
   Low cardinality metrics + high cardinality exemplars

4. Remote write to specialized TSDB
   High-cardinality → Victoria Metrics / M3DB / Thanos
   Better cardinality handling

Checklist

## Prometheus Cardinality Management

### Detection
- [ ] Monitor prometheus_tsdb_head_series
- [ ] Alert on series creation rate > 1000/sec
- [ ] Check /api/v1/status/tsdb regularly
- [ ] Dashboard showing top metrics by cardinality

### Prevention
- [ ] Relabel configs to drop dangerous labels
- [ ] sample_limit per scrape target
- [ ] Application-level label bounding
- [ ] Code review for new metrics

### Recovery Plan
- [ ] Document emergency drop procedures
- [ ] Know how to delete_series
- [ ] Test config reload process
- [ ] Runbook for cardinality incidents

### Best Practices
- [ ] Label cardinality < 1000 values
- [ ] No unbounded labels (user_id, request_id)
- [ ] Use logs for high-cardinality data
- [ ] Recording rules for aggregation

Conclusion

Cardinality explosion is the number one way to kill Prometheus. Unlike CPU or memory pressure that builds gradually, cardinality explosion can take you from healthy to OOMing within hours of deploying a single bad metric. The failure mode is also catastrophic: when Prometheus OOMs, you lose not just the bad metric but all your monitoring.

The root cause is almost always a misunderstanding of what Prometheus is for. Prometheus tracks aggregate metrics with bounded cardinality—things like “how many requests per endpoint” or “what’s the 99th percentile latency by service.” It’s not designed for per-user, per-request, or per-session data. Those use cases belong in logs (for debugging individual events) or traces (for request flows).

Prevention is far easier than recovery. Add relabel configs to drop dangerous labels before they’re ingested. Set sample_limit per scrape target to cap damage from any single target. Review new metrics in code review with cardinality in mind. Monitor prometheus_tsdb_head_series and alert when it grows unexpectedly.

The key insight is that cardinality is multiplicative. Each label dimension multiplies with every other. A metric with labels that each have 10 values creates 10^n series where n is the number of labels. Five labels with 10 values each = 100,000 series. Add one label with 1 million values, and you have 100 billion potential series.

Key principles:

One bad label can create millions of series—label cardinality multiplies across all dimensions
Monitor prometheus_tsdb_head_series constantly—it’s your early warning system
Use relabel_configs to drop dangerous labels before ingestion, not after
Bound all label values at the application level—normalize URLs, hash IDs, limit cardinality
Use logs for high-cardinality data—Prometheus is for aggregates, not individual events

Check your TSDB status now. The explosion might already be happening, and every hour makes recovery harder.

OpenTelemetry Tail Sampling - Observability at scale
Structured Logging Performance - Log aggregation alternative

Prometheus Cardinality Explosion: Detection, Prevention, and Recovery

Understanding Cardinality

What Creates Time Series

Memory Impact

Detection

TSDB Status Endpoint

PromQL Queries

Proactive Monitoring

Prevention

Relabel Config to Drop High-Cardinality Labels

Recording Rules for Aggregation

Application-Level Prevention

Label Value Bounding

Recovery

Emergency Procedures

Preventing Future Incidents

Monitoring Dashboard

Grafana Panels

Cardinality Budget

Best Practices

Label Guidelines

Architecture for High-Cardinality Data

Checklist

Conclusion

Related posts

Cite this article

Understanding Cardinality

What Creates Time Series

Memory Impact

Detection

TSDB Status Endpoint

PromQL Queries

Proactive Monitoring

Prevention

Relabel Config to Drop High-Cardinality Labels

Recording Rules for Aggregation

Application-Level Prevention

Label Value Bounding

Recovery

Emergency Procedures

Preventing Future Incidents

Monitoring Dashboard

Grafana Panels

Cardinality Budget

Best Practices

Label Guidelines

Architecture for High-Cardinality Data

Checklist

Conclusion

Related Articles

Related posts

Cite this article