CI/CD for Monorepo: Speed, Caching, Selective Tests and Supply-Chain Security

I used to dread monorepo pipelines until I measured where the time actually went. “Pipeline takes 47 minutes, but when I run it with [skip ci], nobody notices.” A colleague told me this 4 years ago. A week later, we deployed broken code to production - precisely because of that skipped CI.

Since then, I’ve optimized CI/CD for teams from 5 to 50 developers. I got the longest pipeline from 52 minutes down to 8. This article is a complete blueprint of what actually works.

My experience: GitHub Actions, GitLab CI, Jenkins. Monorepo with 15+ services, 200+ tests. All examples in this article I’ve actually implemented and optimized.

What Goes Wrong When Monorepo Grows

Typical symptoms:

Pipeline takes 40+ minutes - developers lose flow
Unnecessary builds - README change triggers all tests
“Works on my machine” - but CI fails
Release chaos - nobody knows what goes to production
Security theater - scans run but nobody reads results

Goals We Must Achieve

Metric	Bad State	Target State
PR pipeline	40+ min	< 10 min
Main pipeline	60+ min	< 20 min
False positives	Daily	Weekly
Security findings	Ignored	Triaged within 24h
Rollback time	Hours	Minutes

Pipeline Architecture

Basic Structure

stages:
  - detect      # What changed?
  - build       # Build only changed
  - test        # Test only affected
  - security    # SAST, DAST, dependencies
  - publish     # Artifacts, images
  - deploy      # Staging, production

Rules for Different Triggers

Trigger	What Runs	Why
PR	Affected services + lint	Fast feedback
Main	All affected + integration	Gatekeeping before release
Tag	Full + security + publish	Release readiness
Nightly	Everything + slow tests	Complete regression

Change Detection: Path Filters

The simplest way to speed up pipeline - don’t do what’s not needed.

GitHub Actions

on:
  pull_request:
    paths:
      - 'services/order-service/**'
      - 'libs/common/**'
      - 'proto/**'

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      order-service: ${{ steps.filter.outputs.order-service }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            order-service:
              - 'services/order-service/**'
              - 'libs/common/**'

  build-order-service:
    needs: detect-changes
    if: needs.detect-changes.outputs.order-service == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run build --workspace=order-service

GitLab CI

order-service:build:
  rules:
    - changes:
        - services/order-service/**/*
        - libs/common/**/*
        - proto/**/*

Services Map

For more complex dependencies, create an explicit map:

# .ci/services-map.yml
services:
  order-service:
    path: services/order-service
    depends_on:
      - libs/common
      - libs/database
      - proto/order.proto
    tests:
      - tests/order-service
      - tests/integration/order

  payment-service:
    path: services/payment-service
    depends_on:
      - libs/common
      - libs/payment-sdk
      - proto/payment.proto

Cache Strategy

Cache is the biggest lever for speed. But it has its pitfalls.

Types of Cache

Type	What We Cache	When to Invalidate
Dependency	node_modules, .m2, pip	lockfile change
Build	compiled artifacts	source change
Docker layers	base images	Dockerfile change
Test	test fixtures	test data change

GitHub Actions Example

- name: Cache dependencies
  uses: actions/cache@v4
  with:
    path: |
      ~/.npm
      node_modules
    key: deps-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      deps-

- name: Cache build
  uses: actions/cache@v4
  with:
    path: dist
    key: build-${{ github.sha }}
    restore-keys: |
      build-${{ github.event.pull_request.base.sha }}
      build-

Remote Cache for Build Tools

For larger projects, local cache isn’t enough. Use remote cache:

Gradle:

// settings.gradle.kts
buildCache {
    remote<HttpBuildCache> {
        url = uri("https://cache.mycompany.com/cache/")
        isPush = System.getenv("CI") != null
    }
}

Bazel:

# .bazelrc
build --remote_cache=grpcs://cache.mycompany.com
build --remote_upload_local_results=true

Turborepo:

{
  "remoteCache": {
    "teamId": "my-team",
    "signature": true
  }
}

Parallelization Without Cost Explosion

More parallel jobs = faster. But also more expensive. How to find balance?

Test Sharding

test:
  strategy:
    matrix:
      shard: [1, 2, 3, 4]
  steps:
    - run: npm test -- --shard=${{ matrix.shard }}/4

Fail Fast

strategy:
  fail-fast: true  # Stops all jobs on first failure
  matrix:
    service: [order, payment, inventory]

Limit Concurrency

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true  # Cancels older runs of the same branch

Selective Testing

Not all tests need to run every time.

Minimal Model

Change in service A → Run:
  - Unit tests A
  - Integration tests A
  - Contract tests A ↔ dependencies

Medium Model (Dependency Graph)

# Script that analyzes imports/dependencies
- name: Detect affected tests
  run: |
    ./scripts/affected-tests.sh > affected.txt

- name: Run affected tests
  run: |
    cat affected.txt | xargs npm test --

Test Impact Analysis

For enterprise projects, tools exist like:

Launchable - ML-based test selection
Gradle Test Retry - smart retry for flaky tests
Jest —changedSince - built-in for JS

Quality Gates

What to block, what to just report?

Check	PR Block	Main Block	Comment
Lint	Yes	Yes	Fast, unambiguous
Unit tests	Yes	Yes	Basic functionality
Integration	No (report)	Yes	Can be flaky
Coverage drop	No (report)	Yes	Trend matters more
Security HIGH	Yes	Yes	Critical
Security MED	No (report)	No	Triage

Practical Implementation

- name: Check coverage
  run: |
    COVERAGE=$(npm test -- --coverage | grep "All files" | awk '{print $10}')
    THRESHOLD=80
    if (( $(echo "$COVERAGE < $THRESHOLD" | bc -l) )); then
      echo "::warning::Coverage $COVERAGE% is below $THRESHOLD%"
      # Don't block, just warning
    fi

Security in Pipeline (Without Theater)

Security scanning is useful only if you do something with the results.

When to Run What

Type	When	Block?
SAST (Semgrep, CodeQL)	Every PR	HIGH = yes
Dependency scan	Every PR + nightly	Critical = yes
DAST	Nightly	No, triage
Container scan	Before push to registry	HIGH = yes

SBOM (Software Bill of Materials)

SBOM is a list of all components in your software. Required for supply-chain security.

- name: Generate SBOM
  uses: anchore/sbom-action@v0
  with:
    format: spdx-json
    output-file: sbom.spdx.json

- name: Upload SBOM
  uses: actions/upload-artifact@v4
  with:
    name: sbom
    path: sbom.spdx.json

SLSA (Supply-chain Levels for Software Artifacts)

SLSA defines security levels for the build process:

Level 1: Documented build
Level 2: Hosted build service
Level 3: Hardened build (what you want to achieve)

- name: Generate SLSA provenance
  uses: slsa-framework/slsa-github-generator/.github/workflows/builder_go_slsa3.yml@v1
  with:
    go-version: 1.21

Artifact Signing

- name: Sign container image
  run: |
    cosign sign --key env://COSIGN_PRIVATE_KEY \
      ${{ env.REGISTRY }}/${{ env.IMAGE }}:${{ github.sha }}

- name: Verify signature
  run: |
    cosign verify --key env://COSIGN_PUBLIC_KEY \
      ${{ env.REGISTRY }}/${{ env.IMAGE }}:${{ github.sha }}

Release Strategy

Semantic Versioning

- name: Determine version
  id: version
  uses: paulhatch/semantic-version@v5
  with:
    major_pattern: "BREAKING CHANGE:"
    minor_pattern: "feat:"

- name: Create release
  run: |
    gh release create v${{ steps.version.outputs.version }} \
      --generate-notes

Changelog Automation

- name: Generate changelog
  uses: orhun/git-cliff-action@v2
  with:
    config: cliff.toml
    args: --latest
  env:
    OUTPUT: CHANGELOG.md

Pipeline Anti-patterns

What to avoid:

“Build everything always” - 90% of time wasted
Flaky tests without quarantine - erode CI trust
Secrets in logs - set +x isn’t enough, use masking
Mega-jobs - one 30 min job vs. 10 jobs of 3 min
No job artifacts - debugging impossible

Reference Implementation

Minimal Pipeline (Starter)

# .github/workflows/ci.yml
name: CI
on: [push, pull_request]

jobs:
  detect:
    runs-on: ubuntu-latest
    outputs:
      services: ${{ steps.detect.outputs.services }}
    steps:
      - uses: actions/checkout@v4
      - id: detect
        run: ./scripts/detect-changes.sh

  build-test:
    needs: detect
    strategy:
      matrix:
        service: ${{ fromJson(needs.detect.outputs.services) }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/cache@v4
        with:
          path: node_modules
          key: deps-${{ hashFiles('package-lock.json') }}
      - run: npm ci
      - run: npm run build --workspace=${{ matrix.service }}
      - run: npm test --workspace=${{ matrix.service }}

Metrics and Observability

What to measure:

Metric	Source	Target
Lead time	CI timestamps	< 15 min
Pipeline duration	CI metrics	Decreasing trend
Flakiness rate	Test reruns	< 1%
Change failure rate	Rollback count	< 5%

Practical Dashboard

- name: Report metrics
  run: |
    curl -X POST https://metrics.mycompany.com/ci \
      -d "pipeline_duration=${{ steps.timer.outputs.duration }}" \
      -d "tests_passed=${{ steps.tests.outputs.passed }}" \
      -d "tests_failed=${{ steps.tests.outputs.failed }}"

Conclusion: 10 Things to Improve by Tomorrow

Add path filters - fastest speedup
Enable dependency cache
Set fail-fast: true
Add concurrency with cancel-in-progress
Split mega-jobs into smaller ones
Enable SAST at least for HIGH findings
Generate SBOM
Set up secrets masking
Add timing metrics
Document pipeline in README

Your next step: Measure your current pipeline time. Implement path filters. Measure again. You’ll see the difference within an hour.

Frequently Asked Questions (FAQ)

Is monorepo the right choice for our team?

Monorepo works best when you have shared libraries between services or need atomic changes across multiple components. If your services are completely independent, polyrepo might be simpler.

How much does remote build cache cost?

Depends on the provider. Self-hosted solutions (e.g., Gradle Enterprise) cost around $50-100k USD/year. Cloud solutions (Turborepo Cloud, NX Cloud) have free tiers and you pay for usage.

How to handle flaky tests?

Implement quarantine - automatically move flaky tests to a “quarantine” suite that runs nightly but doesn’t block PRs. Set up alerts for new flaky tests and fix them within 48 hours.

What is SLSA and do I need it?

SLSA is a framework for supply-chain security. Level 3 is recommended for production software. Yes, you need it - supply-chain attacks are becoming increasingly common.

Architecture as Code: ADR, C4 Diagrams and CI Quality Gates - How to document architectural decisions and automate their validation
Zero-Downtime PostgreSQL Migrations: Expand/Contract, Backfill and Rollback Strategies - Safe database migrations you can integrate into your CI/CD