Dash Contracts in Go: CI Compiler for Grafana Dashboards and Prometheus Alerts
I got tired of No data panels after merges that looked clean in CI. Dashboards and alerts are clients of your metrics API. When you rename a metric, a label, or a label value, you can break:
- Grafana panels (quietly showing “No data”)
- alert rules (no longer firing)
- recorded metrics that change meaning without notice
This post turns that implicit dependency into a contract and validates it in CI, without running Prometheus.
Dash Contracts = the set of PromQL selectors used by Grafana dashboards and Prometheus rules, verified against scraped
/metrics.
The core check is simple: for every selector, does at least one time series satisfy all label matchers? If the answer is no, the dashboard or alert is already broken or will break after merge.
Tested on: Prometheus 2.47, Grafana 10, OTel metrics via remote_write, Kubernetes workloads with 500k+ active series.
Why this matters
Advice like “do not use user_id as a label” is correct but incomplete. The real failure modes are subtler:
- a router stops setting
routeand you export rawpath - a debug label slips into production and explodes cardinality
- a label gets renamed (
statustostatus_code) and dashboards go dark
Treating metrics as an API gives you explicit compatibility checks.
How the check works
- Scan Grafana dashboards JSON and Prometheus rule YAML
- Extract PromQL expressions
- Normalize Grafana macros so the parser can read them
- Extract all vector selectors from each expression
- Scrape
/metricsand parse series labels - Verify each selector is satisfiable by at least one series
No Prometheus needed. This is a compile step for observability.
Go implementation: dashcontract
The CLI lives in tools/dashcontract. It is a single binary you can run in CI.
go.mod
module example.com/dashcontract
go 1.22
require (
github.com/prometheus/common v0.67.4
github.com/prometheus/prometheus v0.308.1
gopkg.in/yaml.v3 v3.0.1
)
Prometheus module tags look unusual: repository version v3.y.z maps to Go module v0.3y.z. Pin a version for CI stability.
main.go
package main
import (
"bytes"
"encoding/json"
"errors"
"flag"
"fmt"
"io"
"net/http"
"os"
"path/filepath"
"sort"
"strings"
"time"
"github.com/prometheus/common/expfmt"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/promql/parser"
"gopkg.in/yaml.v3"
)
type QuerySource struct {
Kind string // grafana|promrule
File string
JSONPath string // grafana
Group string // promrule
Rule string // promrule (alert or record name)
}
type Query struct {
Expr string
NormExpr string
Source QuerySource
}
type Series struct {
Labels map[string]string // includes __name__
}
type MetricStats struct {
SeriesCount int
LabelKeys map[string]struct{}
Values map[string]map[string]struct{} // label -> distinct values (limited)
}
type MetricsIndex struct {
ByMetric map[string][]Series
All []Series
Stats map[string]*MetricStats
Names []string
}
type Violation struct {
Reason string
Expr string
Selector []string
MetricHint string
Source QuerySource
Suggestions []string
}
func main() {
var (
dashDir = flag.String("dashboards", "", "Directory with Grafana dashboards JSON (optional)")
rulesDir = flag.String("rules", "", "Directory with Prometheus rules YAML (optional)")
metrics = flag.String("metrics-url", "", "URL to scrape /metrics from (required)")
timeout = flag.Duration("timeout", 5*time.Second, "HTTP timeout for scraping metrics")
reportOut = flag.String("report", "dashcontract_report.json", "Path to JSON report output")
)
flag.Parse()
if strings.TrimSpace(*metrics) == "" {
fatal("missing -metrics-url")
}
if strings.TrimSpace(*dashDir) == "" && strings.TrimSpace(*rulesDir) == "" {
fatal("provide at least one of: -dashboards, -rules")
}
var queries []Query
if *dashDir != "" {
q, err := ExtractGrafanaQueries(*dashDir)
if err != nil {
fatal(err.Error())
}
queries = append(queries, q...)
}
if *rulesDir != "" {
q, err := ExtractPromRuleQueries(*rulesDir)
if err != nil {
fatal(err.Error())
}
queries = append(queries, q...)
}
idx, err := ScrapeAndIndexMetrics(*metrics, *timeout)
if err != nil {
fatal(fmt.Sprintf("scrape metrics failed: %v", err))
}
violations := VerifyQueries(queries, idx)
fmt.Printf("DashContract: %d queries, %d metrics, %d series\n", len(queries), len(idx.Names), len(idx.All))
if len(violations) == 0 {
fmt.Println("OK: all selectors are satisfiable against scraped /metrics")
} else {
fmt.Printf("FAIL: %d selector(s) not satisfiable\n", len(violations))
for i, v := range violations {
fmt.Printf("\n[%d] %s\n", i+1, v.Reason)
fmt.Printf(" expr: %s\n", v.Expr)
fmt.Printf(" selector: %s\n", strings.Join(v.Selector, ", "))
fmt.Printf(" source: %s (%s)\n", v.Source.File, v.Source.Kind)
if v.Source.JSONPath != "" {
fmt.Printf(" jsonPath: %s\n", v.Source.JSONPath)
}
if v.Source.Group != "" || v.Source.Rule != "" {
fmt.Printf(" rule: group=%q name=%q\n", v.Source.Group, v.Source.Rule)
}
for _, s := range v.Suggestions {
fmt.Printf(" hint: %s\n", s)
}
}
}
if err := WriteJSONReport(*reportOut, violations); err != nil {
fmt.Fprintf(os.Stderr, "report write failed: %v\n", err)
}
if len(violations) > 0 {
os.Exit(1)
}
}
func fatal(msg string) {
fmt.Fprintln(os.Stderr, "error:", msg)
os.Exit(2)
}
func WriteJSONReport(path string, violations []Violation) error {
type out struct {
Violations []Violation `json:"violations"`
}
b, err := json.MarshalIndent(out{Violations: violations}, "", " ")
if err != nil {
return err
}
return os.WriteFile(path, b, 0o644)
}
func ExtractGrafanaQueries(root string) ([]Query, error) {
var out []Query
err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error {
if err != nil {
return err
}
if d.IsDir() {
return nil
}
if !strings.HasSuffix(strings.ToLower(path), ".json") {
return nil
}
b, err := os.ReadFile(path)
if err != nil {
return err
}
var doc any
if err := json.Unmarshal(b, &doc); err != nil {
return nil
}
out = append(out, ExtractExprFieldsFromJSON(doc, path)...)
return nil
})
return out, err
}
func ExtractExprFieldsFromJSON(v any, file string) []Query {
var out []Query
var walk func(node any, jp string)
walk = func(node any, jp string) {
switch t := node.(type) {
case map[string]any:
for k, vv := range t {
p := jp
if p == "" {
p = "$"
}
p2 := p + "." + k
if k == "expr" {
if s, ok := vv.(string); ok {
s = strings.TrimSpace(s)
if s != "" {
out = append(out, Query{
Expr: s,
NormExpr: NormalizeGrafanaPromQL(s),
Source: QuerySource{
Kind: "grafana",
File: file,
JSONPath: p2,
},
})
}
}
}
walk(vv, p2)
}
case []any:
for i, vv := range t {
walk(vv, fmt.Sprintf("%s[%d]", jp, i))
}
default:
}
}
walk(v, "$")
return out
}
func NormalizeGrafanaPromQL(expr string) string {
r := strings.NewReplacer(
"${__rate_interval}", "5m",
"$__rate_interval", "5m",
"${__interval}", "1m",
"$__interval", "1m",
"${__interval_ms}", "60000",
"$__interval_ms", "60000",
"${__range}", "5m",
"$__range", "5m",
"${__range_s}", "300",
"$__range_s", "300",
"${__range_ms}", "300000",
"$__range_ms", "300000",
)
return r.Replace(expr)
}
type promRuleFile struct {
Groups []struct {
Name string `yaml:"name"`
Rules []struct {
Alert string `yaml:"alert"`
Record string `yaml:"record"`
Expr string `yaml:"expr"`
} `yaml:"rules"`
} `yaml:"groups"`
}
func ExtractPromRuleQueries(root string) ([]Query, error) {
var out []Query
err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error {
if err != nil {
return err
}
if d.IsDir() {
return nil
}
low := strings.ToLower(path)
if !(strings.HasSuffix(low, ".yml") || strings.HasSuffix(low, ".yaml")) {
return nil
}
b, err := os.ReadFile(path)
if err != nil {
return err
}
dec := yaml.NewDecoder(bytes.NewReader(b))
for {
var doc promRuleFile
err := dec.Decode(&doc)
if errors.Is(err, io.EOF) {
break
}
if err != nil {
break
}
for _, g := range doc.Groups {
for _, r := range g.Rules {
expr := strings.TrimSpace(r.Expr)
if expr == "" {
continue
}
name := r.Alert
if name == "" {
name = r.Record
}
out = append(out, Query{
Expr: expr,
NormExpr: expr,
Source: QuerySource{
Kind: "promrule",
File: path,
Group: g.Name,
Rule: name,
},
})
}
}
}
return nil
})
return out, err
}
func ScrapeAndIndexMetrics(url string, timeout time.Duration) (*MetricsIndex, error) {
client := &http.Client{Timeout: timeout}
resp, err := client.Get(url) // #nosec G107 - user-provided URL, intended for internal CI
if err != nil {
return nil, err
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return nil, fmt.Errorf("http %d", resp.StatusCode)
}
p := expfmt.NewTextParser(model.LegacyValidation)
fams, err := p.TextToMetricFamilies(resp.Body)
if err != nil {
return nil, err
}
idx := &MetricsIndex{
ByMetric: make(map[string][]Series, len(fams)),
Stats: make(map[string]*MetricStats, len(fams)),
}
for name, mf := range fams {
for _, m := range mf.Metric {
lm := map[string]string{
labels.MetricName: name,
}
for _, lp := range m.Label {
lm[lp.GetName()] = lp.GetValue()
}
s := Series{Labels: lm}
idx.ByMetric[name] = append(idx.ByMetric[name], s)
idx.All = append(idx.All, s)
st := idx.Stats[name]
if st == nil {
st = &MetricStats{
LabelKeys: map[string]struct{}{},
Values: map[string]map[string]struct{}{},
}
idx.Stats[name] = st
}
st.SeriesCount++
for k, v := range lm {
if k == labels.MetricName {
continue
}
st.LabelKeys[k] = struct{}{}
if st.Values[k] == nil {
st.Values[k] = map[string]struct{}{}
}
if len(st.Values[k]) < 10 {
st.Values[k][v] = struct{}{}
}
}
}
}
for name := range fams {
idx.Names = append(idx.Names, name)
}
sort.Strings(idx.Names)
return idx, nil
}
func VerifyQueries(queries []Query, idx *MetricsIndex) []Violation {
var violations []Violation
for _, q := range queries {
ast, err := parser.ParseExpr(q.NormExpr)
if err != nil {
violations = append(violations, Violation{
Reason: "promql_parse_error (often Grafana macro or templating; extend NormalizeGrafanaPromQL)",
Expr: q.Expr,
Source: q.Source,
Selector: []string{
err.Error(),
},
Suggestions: []string{
"Check for Grafana macros like $__... and normalize them before parsing.",
},
})
continue
}
selectorSets := parser.ExtractSelectors(ast)
if len(selectorSets) == 0 {
continue
}
for _, mset := range selectorSets {
if selectorSatisfied(mset, idx) {
continue
}
violations = append(violations, buildViolation(q, mset, idx))
}
}
return violations
}
func selectorSatisfied(matchers []*labels.Matcher, idx *MetricsIndex) bool {
candidates := candidateSeries(matchers, idx)
for _, s := range candidates {
if seriesMatchesAll(matchers, s) {
return true
}
}
return false
}
func seriesMatchesAll(matchers []*labels.Matcher, s Series) bool {
for _, m := range matchers {
v := s.Labels[m.Name]
if !m.Matches(v) {
return false
}
}
return true
}
func candidateSeries(matchers []*labels.Matcher, idx *MetricsIndex) []Series {
var nm *labels.Matcher
for _, m := range matchers {
if m.Name == labels.MetricName {
nm = m
break
}
}
if nm == nil {
return idx.All
}
if nm.Type == labels.MatchEqual {
return idx.ByMetric[nm.Value]
}
var out []Series
for metricName, ss := range idx.ByMetric {
if nm.Matches(metricName) {
out = append(out, ss...)
}
}
return out
}
func buildViolation(q Query, matchers []*labels.Matcher, idx *MetricsIndex) Violation {
v := Violation{
Reason: "selector_not_satisfiable_against_scraped_metrics",
Expr: q.Expr,
Source: q.Source,
}
for _, m := range matchers {
v.Selector = append(v.Selector, m.String())
if m.Name == labels.MetricName {
v.MetricHint = m.String()
}
}
metricName := extractExactMetricName(matchers)
if metricName != "" {
if _, ok := idx.ByMetric[metricName]; !ok {
v.Reason = "missing_metric_family_in_/metrics"
v.Suggestions = append(v.Suggestions,
fmt.Sprintf("exporter does not expose metric %q", metricName),
fmt.Sprintf("top metrics: %s", strings.Join(sampleStrings(idx.Names, 10), ", ")),
)
return v
}
st := idx.Stats[metricName]
if st != nil {
keys := make([]string, 0, len(st.LabelKeys))
for k := range st.LabelKeys {
keys = append(keys, k)
}
sort.Strings(keys)
v.Suggestions = append(v.Suggestions,
fmt.Sprintf("metric exists (%d series) but no series matches all matchers", st.SeriesCount),
fmt.Sprintf("available labels for %q: %s", metricName, strings.Join(keys, ", ")),
)
for _, m := range matchers {
if m.Name == labels.MetricName {
continue
}
if vals, ok := st.Values[m.Name]; ok && len(vals) > 0 {
v.Suggestions = append(v.Suggestions,
fmt.Sprintf("observed values for %q (sample): %s", m.Name, strings.Join(mapKeysSorted(vals), ", ")),
)
} else {
v.Suggestions = append(v.Suggestions,
fmt.Sprintf("label %q does not exist on this metric (rename or relabeling)", m.Name),
)
}
}
}
} else {
v.Suggestions = append(v.Suggestions,
"selector has no exact metric name; verification is broader and may be noisy",
fmt.Sprintf("top metrics: %s", strings.Join(sampleStrings(idx.Names, 10), ", ")),
)
}
return v
}
func extractExactMetricName(matchers []*labels.Matcher) string {
for _, m := range matchers {
if m.Name == labels.MetricName && m.Type == labels.MatchEqual {
return m.Value
}
}
return ""
}
func mapKeysSorted(m map[string]struct{}) []string {
out := make([]string, 0, len(m))
for k := range m {
out = append(out, k)
}
sort.Strings(out)
return out
}
func sampleStrings(xs []string, n int) []string {
if len(xs) <= n {
return xs
}
return xs[:n]
}
Local usage
go run ./tools/dashcontract \
-dashboards ./grafana/dashboards \
-rules ./prometheus/rules \
-metrics-url http://localhost:8080/metrics \
-report dashcontract_report.json
If it fails, you get the exact expression, selector, and hints about missing metrics or labels.
Minimal CI gate
name: dashcontract
on: [pull_request]
jobs:
dashcontract:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: "1.22"
- name: Start app (example)
run: |
echo "TODO: start app and expose /metrics"
- name: DashContract verify
run: |
go run ./tools/dashcontract \
-dashboards ./grafana/dashboards \
-rules ./prometheus/rules \
-metrics-url http://localhost:8080/metrics \
-report dashcontract_report.json
- name: Upload report
if: always()
uses: actions/upload-artifact@v4
with:
name: dashcontract_report
path: dashcontract_report.json
Notes for accuracy
__name__is the special label that carries the metric name/metricsparsing usesexpfmt.NewTextParserandTextToMetricFamilies- selector matchers are
labels.Matcherand useMatchesfor evaluation
Extensions
- Check
group by (...)labels against available labels - Enforce a deprecation policy for metrics in a YAML manifest
- Output SARIF so PRs show inline annotations
Related articles
- Prometheus Cardinality Explosion: Detection, Prevention, and Recovery
- Cardinality Contracts: Prometheus Labels as an API with Budgets
Conclusion
Dash Contracts make observability dependencies explicit:
- extract queries
- validate selectors
- fail CI before dashboards go dark
It is a simple check, but it catches a class of failures that most teams only discover in production.
Related posts
Cardinality Contracts: Prometheus Labels as an API with Budgets
Define label budgets, enforce them in CI, and add a runtime firewall to stop cardinality explosions before production.
Prometheus remote_write backpressure: when monitoring fills the disk (and still loses data)
A practical runbook for remote_write outages: measure lag, estimate time-to-disk-full, tune queue_config safely, and choose explicit survival trade-offs.
Prometheus Cardinality Explosion: Detection, Prevention, and Recovery
One developer added user_id label. Prometheus OOM'd. I show how to detect high-cardinality metrics before they kill your monitoring, with relabel configs to drop them.
Prometheus Native Histograms in Production: Rollout Plan, Budgets, and Failure Modes
Prometheus native histograms can blow up memory, WAL, and remote_write. This guide shows a staged rollout, budgets, and concrete queries to verify safety.
Cite this article
If you reference this post, please link to the original URL and credit the author.