Späť na blog

Dash Contracts v Go: CI kompilator pre Grafana dashboardy a Prometheus alerty

Unavil ma No data panel po mergoch, ktore v CI vyzerali cisto. Dashboardy a alerty su klienti API tvojich metrik. Ked premenujes metriku, label alebo hodnotu labelu, mozes rozbit:

  • Grafana panely (ticho ukazuju “No data”)
  • alert rules (prestanu palit)
  • recorded metriky, ktore zmenia vyznam bez toho, aby si si to vsimol

Tento clanok z toho spravi kontrakt a overi ho v CI bez potreby Promethea.

Dash Contracts = mnozina PromQL selectorov z dashboardov a rules, overena proti scraped /metrics.

Klucovy check je jednoduchy: pre kazdy selector existuje aspon jedna seria, ktora splna vsetky label matchery. Ak nie, dashboard alebo alert je uz teraz rozbity alebo sa rozbije po merge.

Testovane na: Prometheus 2.47, Grafana 10, OTel metrics cez remote_write, Kubernetes workloady s 500k+ aktivnymi seriami.

Preco na tom zalezi

Rady typu “nedavaj user_id ako label” su spravne, ale nestacia. V praxi ta zlomia jemnejsie zmeny:

  • router prestane nastavovat route a exportujes raw path
  • debug label ostane v produkcii a vyrobi explozivnu cardinality
  • label sa premenuje (status na status_code) a dashboardy zhasnu

Metriky su API. Dash Contracts z nich spravia explicitne kompatibilitne testy.

Ako to funguje

  1. Prejdi Grafana dashboard JSONy a Prometheus rules YAML
  2. Vytiahni PromQL vyrazy
  3. Normalizuj Grafana makra, aby PromQL parser vedel vyraz precitat
  4. Z kazdeho vyrazu vytiahni vsetky vector selectory
  5. Scrape-ni /metrics a naparsuj series labely
  6. Over, ze kazdy selector je splnitelny aspon jednou seriou

Ziadny Prometheus. Len kompilacia observability.

Go implementacia: dashcontract

CLI je v tools/dashcontract. Je to jeden binar, ktory vies pustit v CI.

go.mod

module example.com/dashcontract

go 1.22

require (
	github.com/prometheus/common v0.67.4
	github.com/prometheus/prometheus v0.308.1
	gopkg.in/yaml.v3 v3.0.1
)

Prometheus moduly maju specificke tagovanie: repo verzia v3.y.z mapuje na Go modul v0.3y.z. V CI je dobre pinovat verziu.

main.go

package main

import (
	"bytes"
	"encoding/json"
	"errors"
	"flag"
	"fmt"
	"io"
	"net/http"
	"os"
	"path/filepath"
	"sort"
	"strings"
	"time"

	"github.com/prometheus/common/expfmt"
	"github.com/prometheus/common/model"
	"github.com/prometheus/prometheus/model/labels"
	"github.com/prometheus/prometheus/promql/parser"
	"gopkg.in/yaml.v3"
)

type QuerySource struct {
	Kind     string // grafana|promrule
	File     string
	JSONPath string // grafana
	Group    string // promrule
	Rule     string // promrule (alert or record name)
}

type Query struct {
	Expr     string
	NormExpr string
	Source   QuerySource
}

type Series struct {
	Labels map[string]string // includes __name__
}

type MetricStats struct {
	SeriesCount int
	LabelKeys   map[string]struct{}
	Values      map[string]map[string]struct{} // label -> distinct values (limited)
}

type MetricsIndex struct {
	ByMetric map[string][]Series
	All      []Series
	Stats    map[string]*MetricStats
	Names    []string
}

type Violation struct {
	Reason      string
	Expr        string
	Selector    []string
	MetricHint  string
	Source      QuerySource
	Suggestions []string
}

func main() {
	var (
		dashDir   = flag.String("dashboards", "", "Directory with Grafana dashboards JSON (optional)")
		rulesDir  = flag.String("rules", "", "Directory with Prometheus rules YAML (optional)")
		metrics   = flag.String("metrics-url", "", "URL to scrape /metrics from (required)")
		timeout   = flag.Duration("timeout", 5*time.Second, "HTTP timeout for scraping metrics")
		reportOut = flag.String("report", "dashcontract_report.json", "Path to JSON report output")
	)
	flag.Parse()

	if strings.TrimSpace(*metrics) == "" {
		fatal("missing -metrics-url")
	}
	if strings.TrimSpace(*dashDir) == "" && strings.TrimSpace(*rulesDir) == "" {
		fatal("provide at least one of: -dashboards, -rules")
	}

	var queries []Query
	if *dashDir != "" {
		q, err := ExtractGrafanaQueries(*dashDir)
		if err != nil {
			fatal(err.Error())
		}
		queries = append(queries, q...)
	}
	if *rulesDir != "" {
		q, err := ExtractPromRuleQueries(*rulesDir)
		if err != nil {
			fatal(err.Error())
		}
		queries = append(queries, q...)
	}

	idx, err := ScrapeAndIndexMetrics(*metrics, *timeout)
	if err != nil {
		fatal(fmt.Sprintf("scrape metrics failed: %v", err))
	}

	violations := VerifyQueries(queries, idx)

	fmt.Printf("DashContract: %d queries, %d metrics, %d series\n", len(queries), len(idx.Names), len(idx.All))
	if len(violations) == 0 {
		fmt.Println("OK: all selectors are satisfiable against scraped /metrics")
	} else {
		fmt.Printf("FAIL: %d selector(s) not satisfiable\n", len(violations))
		for i, v := range violations {
			fmt.Printf("\n[%d] %s\n", i+1, v.Reason)
			fmt.Printf("  expr: %s\n", v.Expr)
			fmt.Printf("  selector: %s\n", strings.Join(v.Selector, ", "))
			fmt.Printf("  source: %s (%s)\n", v.Source.File, v.Source.Kind)
			if v.Source.JSONPath != "" {
				fmt.Printf("  jsonPath: %s\n", v.Source.JSONPath)
			}
			if v.Source.Group != "" || v.Source.Rule != "" {
				fmt.Printf("  rule: group=%q name=%q\n", v.Source.Group, v.Source.Rule)
			}
			for _, s := range v.Suggestions {
				fmt.Printf("  hint: %s\n", s)
			}
		}
	}

	if err := WriteJSONReport(*reportOut, violations); err != nil {
		fmt.Fprintf(os.Stderr, "report write failed: %v\n", err)
	}

	if len(violations) > 0 {
		os.Exit(1)
	}
}

func fatal(msg string) {
	fmt.Fprintln(os.Stderr, "error:", msg)
	os.Exit(2)
}

func WriteJSONReport(path string, violations []Violation) error {
	type out struct {
		Violations []Violation `json:"violations"`
	}
	b, err := json.MarshalIndent(out{Violations: violations}, "", "  ")
	if err != nil {
		return err
	}
	return os.WriteFile(path, b, 0o644)
}

func ExtractGrafanaQueries(root string) ([]Query, error) {
	var out []Query
	err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error {
		if err != nil {
			return err
		}
		if d.IsDir() {
			return nil
		}
		if !strings.HasSuffix(strings.ToLower(path), ".json") {
			return nil
		}
		b, err := os.ReadFile(path)
		if err != nil {
			return err
		}
		var doc any
		if err := json.Unmarshal(b, &doc); err != nil {
			return nil
		}
		out = append(out, ExtractExprFieldsFromJSON(doc, path)...)
		return nil
	})
	return out, err
}

func ExtractExprFieldsFromJSON(v any, file string) []Query {
	var out []Query
	var walk func(node any, jp string)
	walk = func(node any, jp string) {
		switch t := node.(type) {
		case map[string]any:
			for k, vv := range t {
				p := jp
				if p == "" {
					p = "$"
				}
				p2 := p + "." + k
				if k == "expr" {
					if s, ok := vv.(string); ok {
						s = strings.TrimSpace(s)
						if s != "" {
							out = append(out, Query{
								Expr:     s,
								NormExpr: NormalizeGrafanaPromQL(s),
								Source: QuerySource{
									Kind:     "grafana",
									File:     file,
									JSONPath: p2,
								},
							})
						}
					}
				}
				walk(vv, p2)
			}
		case []any:
			for i, vv := range t {
				walk(vv, fmt.Sprintf("%s[%d]", jp, i))
			}
		default:
		}
	}
	walk(v, "$")
	return out
}

func NormalizeGrafanaPromQL(expr string) string {
	r := strings.NewReplacer(
		"${__rate_interval}", "5m",
		"$__rate_interval", "5m",
		"${__interval}", "1m",
		"$__interval", "1m",
		"${__interval_ms}", "60000",
		"$__interval_ms", "60000",
		"${__range}", "5m",
		"$__range", "5m",
		"${__range_s}", "300",
		"$__range_s", "300",
		"${__range_ms}", "300000",
		"$__range_ms", "300000",
	)
	return r.Replace(expr)
}

type promRuleFile struct {
	Groups []struct {
		Name  string `yaml:"name"`
		Rules []struct {
			Alert  string `yaml:"alert"`
			Record string `yaml:"record"`
			Expr   string `yaml:"expr"`
		} `yaml:"rules"`
	} `yaml:"groups"`
}

func ExtractPromRuleQueries(root string) ([]Query, error) {
	var out []Query
	err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error {
		if err != nil {
			return err
		}
		if d.IsDir() {
			return nil
		}
		low := strings.ToLower(path)
		if !(strings.HasSuffix(low, ".yml") || strings.HasSuffix(low, ".yaml")) {
			return nil
		}
		b, err := os.ReadFile(path)
		if err != nil {
			return err
		}

		dec := yaml.NewDecoder(bytes.NewReader(b))
		for {
			var doc promRuleFile
			err := dec.Decode(&doc)
			if errors.Is(err, io.EOF) {
				break
			}
			if err != nil {
				break
			}
			for _, g := range doc.Groups {
				for _, r := range g.Rules {
					expr := strings.TrimSpace(r.Expr)
					if expr == "" {
						continue
					}
					name := r.Alert
					if name == "" {
						name = r.Record
					}
					out = append(out, Query{
						Expr:     expr,
						NormExpr: expr,
						Source: QuerySource{
							Kind:  "promrule",
							File:  path,
							Group: g.Name,
							Rule:  name,
						},
					})
				}
			}
		}
		return nil
	})
	return out, err
}

func ScrapeAndIndexMetrics(url string, timeout time.Duration) (*MetricsIndex, error) {
	client := &http.Client{Timeout: timeout}
	resp, err := client.Get(url) // #nosec G107 - user-provided URL, intended for internal CI
	if err != nil {
		return nil, err
	}
	defer resp.Body.Close()

	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
		return nil, fmt.Errorf("http %d", resp.StatusCode)
	}

	p := expfmt.NewTextParser(model.LegacyValidation)
	fams, err := p.TextToMetricFamilies(resp.Body)
	if err != nil {
		return nil, err
	}

	idx := &MetricsIndex{
		ByMetric: make(map[string][]Series, len(fams)),
		Stats:    make(map[string]*MetricStats, len(fams)),
	}
	for name, mf := range fams {
		for _, m := range mf.Metric {
			lm := map[string]string{
				labels.MetricName: name,
			}
			for _, lp := range m.Label {
				lm[lp.GetName()] = lp.GetValue()
			}
			s := Series{Labels: lm}
			idx.ByMetric[name] = append(idx.ByMetric[name], s)
			idx.All = append(idx.All, s)

			st := idx.Stats[name]
			if st == nil {
				st = &MetricStats{
					LabelKeys: map[string]struct{}{},
					Values:    map[string]map[string]struct{}{},
				}
				idx.Stats[name] = st
			}
			st.SeriesCount++
			for k, v := range lm {
				if k == labels.MetricName {
					continue
				}
				st.LabelKeys[k] = struct{}{}
				if st.Values[k] == nil {
					st.Values[k] = map[string]struct{}{}
				}
				if len(st.Values[k]) < 10 {
					st.Values[k][v] = struct{}{}
				}
			}
		}
	}

	for name := range fams {
		idx.Names = append(idx.Names, name)
	}
	sort.Strings(idx.Names)

	return idx, nil
}

func VerifyQueries(queries []Query, idx *MetricsIndex) []Violation {
	var violations []Violation

	for _, q := range queries {
		ast, err := parser.ParseExpr(q.NormExpr)
		if err != nil {
			violations = append(violations, Violation{
				Reason: "promql_parse_error (often Grafana macro or templating; extend NormalizeGrafanaPromQL)",
				Expr:   q.Expr,
				Source: q.Source,
				Selector: []string{
					err.Error(),
				},
				Suggestions: []string{
					"Check for Grafana macros like $__... and normalize them before parsing.",
				},
			})
			continue
		}

		selectorSets := parser.ExtractSelectors(ast)
		if len(selectorSets) == 0 {
			continue
		}
		for _, mset := range selectorSets {
			if selectorSatisfied(mset, idx) {
				continue
			}
			violations = append(violations, buildViolation(q, mset, idx))
		}
	}

	return violations
}

func selectorSatisfied(matchers []*labels.Matcher, idx *MetricsIndex) bool {
	candidates := candidateSeries(matchers, idx)
	for _, s := range candidates {
		if seriesMatchesAll(matchers, s) {
			return true
		}
	}
	return false
}

func seriesMatchesAll(matchers []*labels.Matcher, s Series) bool {
	for _, m := range matchers {
		v := s.Labels[m.Name]
		if !m.Matches(v) {
			return false
		}
	}
	return true
}

func candidateSeries(matchers []*labels.Matcher, idx *MetricsIndex) []Series {
	var nm *labels.Matcher
	for _, m := range matchers {
		if m.Name == labels.MetricName {
			nm = m
			break
		}
	}

	if nm == nil {
		return idx.All
	}

	if nm.Type == labels.MatchEqual {
		return idx.ByMetric[nm.Value]
	}

	var out []Series
	for metricName, ss := range idx.ByMetric {
		if nm.Matches(metricName) {
			out = append(out, ss...)
		}
	}
	return out
}

func buildViolation(q Query, matchers []*labels.Matcher, idx *MetricsIndex) Violation {
	v := Violation{
		Reason: "selector_not_satisfiable_against_scraped_metrics",
		Expr:   q.Expr,
		Source: q.Source,
	}
	for _, m := range matchers {
		v.Selector = append(v.Selector, m.String())
		if m.Name == labels.MetricName {
			v.MetricHint = m.String()
		}
	}

	metricName := extractExactMetricName(matchers)
	if metricName != "" {
		if _, ok := idx.ByMetric[metricName]; !ok {
			v.Reason = "missing_metric_family_in_/metrics"
			v.Suggestions = append(v.Suggestions,
				fmt.Sprintf("exporter does not expose metric %q", metricName),
				fmt.Sprintf("top metrics: %s", strings.Join(sampleStrings(idx.Names, 10), ", ")),
			)
			return v
		}

		st := idx.Stats[metricName]
		if st != nil {
			keys := make([]string, 0, len(st.LabelKeys))
			for k := range st.LabelKeys {
				keys = append(keys, k)
			}
			sort.Strings(keys)
			v.Suggestions = append(v.Suggestions,
				fmt.Sprintf("metric exists (%d series) but no series matches all matchers", st.SeriesCount),
				fmt.Sprintf("available labels for %q: %s", metricName, strings.Join(keys, ", ")),
			)

			for _, m := range matchers {
				if m.Name == labels.MetricName {
					continue
				}
				if vals, ok := st.Values[m.Name]; ok && len(vals) > 0 {
					v.Suggestions = append(v.Suggestions,
						fmt.Sprintf("observed values for %q (sample): %s", m.Name, strings.Join(mapKeysSorted(vals), ", ")),
					)
				} else {
					v.Suggestions = append(v.Suggestions,
						fmt.Sprintf("label %q does not exist on this metric (rename or relabeling)", m.Name),
					)
				}
			}
		}
	} else {
		v.Suggestions = append(v.Suggestions,
			"selector has no exact metric name; verification is broader and may be noisy",
			fmt.Sprintf("top metrics: %s", strings.Join(sampleStrings(idx.Names, 10), ", ")),
		)
	}

	return v
}

func extractExactMetricName(matchers []*labels.Matcher) string {
	for _, m := range matchers {
		if m.Name == labels.MetricName && m.Type == labels.MatchEqual {
			return m.Value
		}
	}
	return ""
}

func mapKeysSorted(m map[string]struct{}) []string {
	out := make([]string, 0, len(m))
	for k := range m {
		out = append(out, k)
	}
	sort.Strings(out)
	return out
}

func sampleStrings(xs []string, n int) []string {
	if len(xs) <= n {
		return xs
	}
	return xs[:n]
}

Lokalne pouzitie

go run ./tools/dashcontract \
  -dashboards ./grafana/dashboards \
  -rules ./prometheus/rules \
  -metrics-url http://localhost:8080/metrics \
  -report dashcontract_report.json

Pri chybe dostanes konkretny vyraz, selector a hinty k tomu, co chyba.

Minimal CI gate

name: dashcontract
on: [pull_request]

jobs:
  dashcontract:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-go@v5
        with:
          go-version: "1.22"

      - name: Start app (example)
        run: |
          echo "TODO: start app and expose /metrics"

      - name: DashContract verify
        run: |
          go run ./tools/dashcontract \
            -dashboards ./grafana/dashboards \
            -rules ./prometheus/rules \
            -metrics-url http://localhost:8080/metrics \
            -report dashcontract_report.json

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: dashcontract_report
          path: dashcontract_report.json

Poznamky k presnosti

  • __name__ je specialny label s nazvom metriky
  • /metrics parsujeme cez expfmt.NewTextParser a TextToMetricFamilies
  • selector matchery su labels.Matcher a pouzivaju Matches

Rozsirenia

  • Over group by (...) labely proti dostupnym labelom
  • Deprecation policy pre metriky v YAML manifeste
  • SARIF vystup pre inline PR anotacie

Suvisiace clanky

Zaver

Dash Contracts spravia observability zavislosti explicitne:

  • vytiahni query
  • over selector
  • failni CI skor, nez zhasnu dashboardy

Je to jednoduchy check, ale chyti triedu problemov, ktore vacsina timov objavi az v produkcii.

Súvisiace články

Citujte tento článok

Ak na článok odkazujete, pridajte pôvodnú URL a uveďte autora.

Michal Drozd. "Dash Contracts v Go: CI kompilator pre Grafana dashboardy a Prometheus alerty". https://www.michal-drozd.com/sk/blog/dash-contracts-grafana-alerty-ci/ (Publikované 15. decembra 2025).