Back to blog

Dash Contracts in Go: CI Compiler for Grafana Dashboards and Prometheus Alerts

I got tired of No data panels after merges that looked clean in CI. Dashboards and alerts are clients of your metrics API. When you rename a metric, a label, or a label value, you can break:

  • Grafana panels (quietly showing “No data”)
  • alert rules (no longer firing)
  • recorded metrics that change meaning without notice

This post turns that implicit dependency into a contract and validates it in CI, without running Prometheus.

Dash Contracts = the set of PromQL selectors used by Grafana dashboards and Prometheus rules, verified against scraped /metrics.

The core check is simple: for every selector, does at least one time series satisfy all label matchers? If the answer is no, the dashboard or alert is already broken or will break after merge.

Tested on: Prometheus 2.47, Grafana 10, OTel metrics via remote_write, Kubernetes workloads with 500k+ active series.

Why this matters

Advice like “do not use user_id as a label” is correct but incomplete. The real failure modes are subtler:

  • a router stops setting route and you export raw path
  • a debug label slips into production and explodes cardinality
  • a label gets renamed (status to status_code) and dashboards go dark

Treating metrics as an API gives you explicit compatibility checks.

How the check works

  1. Scan Grafana dashboards JSON and Prometheus rule YAML
  2. Extract PromQL expressions
  3. Normalize Grafana macros so the parser can read them
  4. Extract all vector selectors from each expression
  5. Scrape /metrics and parse series labels
  6. Verify each selector is satisfiable by at least one series

No Prometheus needed. This is a compile step for observability.

Go implementation: dashcontract

The CLI lives in tools/dashcontract. It is a single binary you can run in CI.

go.mod

module example.com/dashcontract

go 1.22

require (
	github.com/prometheus/common v0.67.4
	github.com/prometheus/prometheus v0.308.1
	gopkg.in/yaml.v3 v3.0.1
)

Prometheus module tags look unusual: repository version v3.y.z maps to Go module v0.3y.z. Pin a version for CI stability.

main.go

package main

import (
	"bytes"
	"encoding/json"
	"errors"
	"flag"
	"fmt"
	"io"
	"net/http"
	"os"
	"path/filepath"
	"sort"
	"strings"
	"time"

	"github.com/prometheus/common/expfmt"
	"github.com/prometheus/common/model"
	"github.com/prometheus/prometheus/model/labels"
	"github.com/prometheus/prometheus/promql/parser"
	"gopkg.in/yaml.v3"
)

type QuerySource struct {
	Kind     string // grafana|promrule
	File     string
	JSONPath string // grafana
	Group    string // promrule
	Rule     string // promrule (alert or record name)
}

type Query struct {
	Expr     string
	NormExpr string
	Source   QuerySource
}

type Series struct {
	Labels map[string]string // includes __name__
}

type MetricStats struct {
	SeriesCount int
	LabelKeys   map[string]struct{}
	Values      map[string]map[string]struct{} // label -> distinct values (limited)
}

type MetricsIndex struct {
	ByMetric map[string][]Series
	All      []Series
	Stats    map[string]*MetricStats
	Names    []string
}

type Violation struct {
	Reason      string
	Expr        string
	Selector    []string
	MetricHint  string
	Source      QuerySource
	Suggestions []string
}

func main() {
	var (
		dashDir   = flag.String("dashboards", "", "Directory with Grafana dashboards JSON (optional)")
		rulesDir  = flag.String("rules", "", "Directory with Prometheus rules YAML (optional)")
		metrics   = flag.String("metrics-url", "", "URL to scrape /metrics from (required)")
		timeout   = flag.Duration("timeout", 5*time.Second, "HTTP timeout for scraping metrics")
		reportOut = flag.String("report", "dashcontract_report.json", "Path to JSON report output")
	)
	flag.Parse()

	if strings.TrimSpace(*metrics) == "" {
		fatal("missing -metrics-url")
	}
	if strings.TrimSpace(*dashDir) == "" && strings.TrimSpace(*rulesDir) == "" {
		fatal("provide at least one of: -dashboards, -rules")
	}

	var queries []Query
	if *dashDir != "" {
		q, err := ExtractGrafanaQueries(*dashDir)
		if err != nil {
			fatal(err.Error())
		}
		queries = append(queries, q...)
	}
	if *rulesDir != "" {
		q, err := ExtractPromRuleQueries(*rulesDir)
		if err != nil {
			fatal(err.Error())
		}
		queries = append(queries, q...)
	}

	idx, err := ScrapeAndIndexMetrics(*metrics, *timeout)
	if err != nil {
		fatal(fmt.Sprintf("scrape metrics failed: %v", err))
	}

	violations := VerifyQueries(queries, idx)

	fmt.Printf("DashContract: %d queries, %d metrics, %d series\n", len(queries), len(idx.Names), len(idx.All))
	if len(violations) == 0 {
		fmt.Println("OK: all selectors are satisfiable against scraped /metrics")
	} else {
		fmt.Printf("FAIL: %d selector(s) not satisfiable\n", len(violations))
		for i, v := range violations {
			fmt.Printf("\n[%d] %s\n", i+1, v.Reason)
			fmt.Printf("  expr: %s\n", v.Expr)
			fmt.Printf("  selector: %s\n", strings.Join(v.Selector, ", "))
			fmt.Printf("  source: %s (%s)\n", v.Source.File, v.Source.Kind)
			if v.Source.JSONPath != "" {
				fmt.Printf("  jsonPath: %s\n", v.Source.JSONPath)
			}
			if v.Source.Group != "" || v.Source.Rule != "" {
				fmt.Printf("  rule: group=%q name=%q\n", v.Source.Group, v.Source.Rule)
			}
			for _, s := range v.Suggestions {
				fmt.Printf("  hint: %s\n", s)
			}
		}
	}

	if err := WriteJSONReport(*reportOut, violations); err != nil {
		fmt.Fprintf(os.Stderr, "report write failed: %v\n", err)
	}

	if len(violations) > 0 {
		os.Exit(1)
	}
}

func fatal(msg string) {
	fmt.Fprintln(os.Stderr, "error:", msg)
	os.Exit(2)
}

func WriteJSONReport(path string, violations []Violation) error {
	type out struct {
		Violations []Violation `json:"violations"`
	}
	b, err := json.MarshalIndent(out{Violations: violations}, "", "  ")
	if err != nil {
		return err
	}
	return os.WriteFile(path, b, 0o644)
}

func ExtractGrafanaQueries(root string) ([]Query, error) {
	var out []Query
	err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error {
		if err != nil {
			return err
		}
		if d.IsDir() {
			return nil
		}
		if !strings.HasSuffix(strings.ToLower(path), ".json") {
			return nil
		}
		b, err := os.ReadFile(path)
		if err != nil {
			return err
		}
		var doc any
		if err := json.Unmarshal(b, &doc); err != nil {
			return nil
		}
		out = append(out, ExtractExprFieldsFromJSON(doc, path)...)
		return nil
	})
	return out, err
}

func ExtractExprFieldsFromJSON(v any, file string) []Query {
	var out []Query
	var walk func(node any, jp string)
	walk = func(node any, jp string) {
		switch t := node.(type) {
		case map[string]any:
			for k, vv := range t {
				p := jp
				if p == "" {
					p = "$"
				}
				p2 := p + "." + k
				if k == "expr" {
					if s, ok := vv.(string); ok {
						s = strings.TrimSpace(s)
						if s != "" {
							out = append(out, Query{
								Expr:     s,
								NormExpr: NormalizeGrafanaPromQL(s),
								Source: QuerySource{
									Kind:     "grafana",
									File:     file,
									JSONPath: p2,
								},
							})
						}
					}
				}
				walk(vv, p2)
			}
		case []any:
			for i, vv := range t {
				walk(vv, fmt.Sprintf("%s[%d]", jp, i))
			}
		default:
		}
	}
	walk(v, "$")
	return out
}

func NormalizeGrafanaPromQL(expr string) string {
	r := strings.NewReplacer(
		"${__rate_interval}", "5m",
		"$__rate_interval", "5m",
		"${__interval}", "1m",
		"$__interval", "1m",
		"${__interval_ms}", "60000",
		"$__interval_ms", "60000",
		"${__range}", "5m",
		"$__range", "5m",
		"${__range_s}", "300",
		"$__range_s", "300",
		"${__range_ms}", "300000",
		"$__range_ms", "300000",
	)
	return r.Replace(expr)
}

type promRuleFile struct {
	Groups []struct {
		Name  string `yaml:"name"`
		Rules []struct {
			Alert  string `yaml:"alert"`
			Record string `yaml:"record"`
			Expr   string `yaml:"expr"`
		} `yaml:"rules"`
	} `yaml:"groups"`
}

func ExtractPromRuleQueries(root string) ([]Query, error) {
	var out []Query
	err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error {
		if err != nil {
			return err
		}
		if d.IsDir() {
			return nil
		}
		low := strings.ToLower(path)
		if !(strings.HasSuffix(low, ".yml") || strings.HasSuffix(low, ".yaml")) {
			return nil
		}
		b, err := os.ReadFile(path)
		if err != nil {
			return err
		}

		dec := yaml.NewDecoder(bytes.NewReader(b))
		for {
			var doc promRuleFile
			err := dec.Decode(&doc)
			if errors.Is(err, io.EOF) {
				break
			}
			if err != nil {
				break
			}
			for _, g := range doc.Groups {
				for _, r := range g.Rules {
					expr := strings.TrimSpace(r.Expr)
					if expr == "" {
						continue
					}
					name := r.Alert
					if name == "" {
						name = r.Record
					}
					out = append(out, Query{
						Expr:     expr,
						NormExpr: expr,
						Source: QuerySource{
							Kind:  "promrule",
							File:  path,
							Group: g.Name,
							Rule:  name,
						},
					})
				}
			}
		}
		return nil
	})
	return out, err
}

func ScrapeAndIndexMetrics(url string, timeout time.Duration) (*MetricsIndex, error) {
	client := &http.Client{Timeout: timeout}
	resp, err := client.Get(url) // #nosec G107 - user-provided URL, intended for internal CI
	if err != nil {
		return nil, err
	}
	defer resp.Body.Close()

	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
		return nil, fmt.Errorf("http %d", resp.StatusCode)
	}

	p := expfmt.NewTextParser(model.LegacyValidation)
	fams, err := p.TextToMetricFamilies(resp.Body)
	if err != nil {
		return nil, err
	}

	idx := &MetricsIndex{
		ByMetric: make(map[string][]Series, len(fams)),
		Stats:    make(map[string]*MetricStats, len(fams)),
	}
	for name, mf := range fams {
		for _, m := range mf.Metric {
			lm := map[string]string{
				labels.MetricName: name,
			}
			for _, lp := range m.Label {
				lm[lp.GetName()] = lp.GetValue()
			}
			s := Series{Labels: lm}
			idx.ByMetric[name] = append(idx.ByMetric[name], s)
			idx.All = append(idx.All, s)

			st := idx.Stats[name]
			if st == nil {
				st = &MetricStats{
					LabelKeys: map[string]struct{}{},
					Values:    map[string]map[string]struct{}{},
				}
				idx.Stats[name] = st
			}
			st.SeriesCount++
			for k, v := range lm {
				if k == labels.MetricName {
					continue
				}
				st.LabelKeys[k] = struct{}{}
				if st.Values[k] == nil {
					st.Values[k] = map[string]struct{}{}
				}
				if len(st.Values[k]) < 10 {
					st.Values[k][v] = struct{}{}
				}
			}
		}
	}

	for name := range fams {
		idx.Names = append(idx.Names, name)
	}
	sort.Strings(idx.Names)

	return idx, nil
}

func VerifyQueries(queries []Query, idx *MetricsIndex) []Violation {
	var violations []Violation

	for _, q := range queries {
		ast, err := parser.ParseExpr(q.NormExpr)
		if err != nil {
			violations = append(violations, Violation{
				Reason: "promql_parse_error (often Grafana macro or templating; extend NormalizeGrafanaPromQL)",
				Expr:   q.Expr,
				Source: q.Source,
				Selector: []string{
					err.Error(),
				},
				Suggestions: []string{
					"Check for Grafana macros like $__... and normalize them before parsing.",
				},
			})
			continue
		}

		selectorSets := parser.ExtractSelectors(ast)
		if len(selectorSets) == 0 {
			continue
		}
		for _, mset := range selectorSets {
			if selectorSatisfied(mset, idx) {
				continue
			}
			violations = append(violations, buildViolation(q, mset, idx))
		}
	}

	return violations
}

func selectorSatisfied(matchers []*labels.Matcher, idx *MetricsIndex) bool {
	candidates := candidateSeries(matchers, idx)
	for _, s := range candidates {
		if seriesMatchesAll(matchers, s) {
			return true
		}
	}
	return false
}

func seriesMatchesAll(matchers []*labels.Matcher, s Series) bool {
	for _, m := range matchers {
		v := s.Labels[m.Name]
		if !m.Matches(v) {
			return false
		}
	}
	return true
}

func candidateSeries(matchers []*labels.Matcher, idx *MetricsIndex) []Series {
	var nm *labels.Matcher
	for _, m := range matchers {
		if m.Name == labels.MetricName {
			nm = m
			break
		}
	}

	if nm == nil {
		return idx.All
	}

	if nm.Type == labels.MatchEqual {
		return idx.ByMetric[nm.Value]
	}

	var out []Series
	for metricName, ss := range idx.ByMetric {
		if nm.Matches(metricName) {
			out = append(out, ss...)
		}
	}
	return out
}

func buildViolation(q Query, matchers []*labels.Matcher, idx *MetricsIndex) Violation {
	v := Violation{
		Reason: "selector_not_satisfiable_against_scraped_metrics",
		Expr:   q.Expr,
		Source: q.Source,
	}
	for _, m := range matchers {
		v.Selector = append(v.Selector, m.String())
		if m.Name == labels.MetricName {
			v.MetricHint = m.String()
		}
	}

	metricName := extractExactMetricName(matchers)
	if metricName != "" {
		if _, ok := idx.ByMetric[metricName]; !ok {
			v.Reason = "missing_metric_family_in_/metrics"
			v.Suggestions = append(v.Suggestions,
				fmt.Sprintf("exporter does not expose metric %q", metricName),
				fmt.Sprintf("top metrics: %s", strings.Join(sampleStrings(idx.Names, 10), ", ")),
			)
			return v
		}

		st := idx.Stats[metricName]
		if st != nil {
			keys := make([]string, 0, len(st.LabelKeys))
			for k := range st.LabelKeys {
				keys = append(keys, k)
			}
			sort.Strings(keys)
			v.Suggestions = append(v.Suggestions,
				fmt.Sprintf("metric exists (%d series) but no series matches all matchers", st.SeriesCount),
				fmt.Sprintf("available labels for %q: %s", metricName, strings.Join(keys, ", ")),
			)

			for _, m := range matchers {
				if m.Name == labels.MetricName {
					continue
				}
				if vals, ok := st.Values[m.Name]; ok && len(vals) > 0 {
					v.Suggestions = append(v.Suggestions,
						fmt.Sprintf("observed values for %q (sample): %s", m.Name, strings.Join(mapKeysSorted(vals), ", ")),
					)
				} else {
					v.Suggestions = append(v.Suggestions,
						fmt.Sprintf("label %q does not exist on this metric (rename or relabeling)", m.Name),
					)
				}
			}
		}
	} else {
		v.Suggestions = append(v.Suggestions,
			"selector has no exact metric name; verification is broader and may be noisy",
			fmt.Sprintf("top metrics: %s", strings.Join(sampleStrings(idx.Names, 10), ", ")),
		)
	}

	return v
}

func extractExactMetricName(matchers []*labels.Matcher) string {
	for _, m := range matchers {
		if m.Name == labels.MetricName && m.Type == labels.MatchEqual {
			return m.Value
		}
	}
	return ""
}

func mapKeysSorted(m map[string]struct{}) []string {
	out := make([]string, 0, len(m))
	for k := range m {
		out = append(out, k)
	}
	sort.Strings(out)
	return out
}

func sampleStrings(xs []string, n int) []string {
	if len(xs) <= n {
		return xs
	}
	return xs[:n]
}

Local usage

go run ./tools/dashcontract \
  -dashboards ./grafana/dashboards \
  -rules ./prometheus/rules \
  -metrics-url http://localhost:8080/metrics \
  -report dashcontract_report.json

If it fails, you get the exact expression, selector, and hints about missing metrics or labels.

Minimal CI gate

name: dashcontract
on: [pull_request]

jobs:
  dashcontract:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-go@v5
        with:
          go-version: "1.22"

      - name: Start app (example)
        run: |
          echo "TODO: start app and expose /metrics"

      - name: DashContract verify
        run: |
          go run ./tools/dashcontract \
            -dashboards ./grafana/dashboards \
            -rules ./prometheus/rules \
            -metrics-url http://localhost:8080/metrics \
            -report dashcontract_report.json

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: dashcontract_report
          path: dashcontract_report.json

Notes for accuracy

  • __name__ is the special label that carries the metric name
  • /metrics parsing uses expfmt.NewTextParser and TextToMetricFamilies
  • selector matchers are labels.Matcher and use Matches for evaluation

Extensions

  • Check group by (...) labels against available labels
  • Enforce a deprecation policy for metrics in a YAML manifest
  • Output SARIF so PRs show inline annotations

Conclusion

Dash Contracts make observability dependencies explicit:

  • extract queries
  • validate selectors
  • fail CI before dashboards go dark

It is a simple check, but it catches a class of failures that most teams only discover in production.

Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "Dash Contracts in Go: CI Compiler for Grafana Dashboards and Prometheus Alerts". https://www.michal-drozd.com/en/blog/dash-contracts-grafana-alerts-ci/ (Published December 15, 2025).