Back to blog

CI/CD for Monorepo: Speed, Caching, Selective Tests and Supply-Chain Security

|
| cicd, monorepo, devops, security, kubernetes

I used to dread monorepo pipelines until I measured where the time actually went. “Pipeline takes 47 minutes, but when I run it with [skip ci], nobody notices.” A colleague told me this 4 years ago. A week later, we deployed broken code to production - precisely because of that skipped CI.

Since then, I’ve optimized CI/CD for teams from 5 to 50 developers. I got the longest pipeline from 52 minutes down to 8. This article is a complete blueprint of what actually works.

My experience: GitHub Actions, GitLab CI, Jenkins. Monorepo with 15+ services, 200+ tests. All examples in this article I’ve actually implemented and optimized.

What Goes Wrong When Monorepo Grows

Typical symptoms:

  • Pipeline takes 40+ minutes - developers lose flow
  • Unnecessary builds - README change triggers all tests
  • “Works on my machine” - but CI fails
  • Release chaos - nobody knows what goes to production
  • Security theater - scans run but nobody reads results

Goals We Must Achieve

MetricBad StateTarget State
PR pipeline40+ min< 10 min
Main pipeline60+ min< 20 min
False positivesDailyWeekly
Security findingsIgnoredTriaged within 24h
Rollback timeHoursMinutes

Pipeline Architecture

Basic Structure

stages:
  - detect      # What changed?
  - build       # Build only changed
  - test        # Test only affected
  - security    # SAST, DAST, dependencies
  - publish     # Artifacts, images
  - deploy      # Staging, production

Rules for Different Triggers

TriggerWhat RunsWhy
PRAffected services + lintFast feedback
MainAll affected + integrationGatekeeping before release
TagFull + security + publishRelease readiness
NightlyEverything + slow testsComplete regression

Change Detection: Path Filters

The simplest way to speed up pipeline - don’t do what’s not needed.

GitHub Actions

on:
  pull_request:
    paths:
      - 'services/order-service/**'
      - 'libs/common/**'
      - 'proto/**'

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      order-service: ${{ steps.filter.outputs.order-service }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            order-service:
              - 'services/order-service/**'
              - 'libs/common/**'

  build-order-service:
    needs: detect-changes
    if: needs.detect-changes.outputs.order-service == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run build --workspace=order-service

GitLab CI

order-service:build:
  rules:
    - changes:
        - services/order-service/**/*
        - libs/common/**/*
        - proto/**/*

Services Map

For more complex dependencies, create an explicit map:

# .ci/services-map.yml
services:
  order-service:
    path: services/order-service
    depends_on:
      - libs/common
      - libs/database
      - proto/order.proto
    tests:
      - tests/order-service
      - tests/integration/order

  payment-service:
    path: services/payment-service
    depends_on:
      - libs/common
      - libs/payment-sdk
      - proto/payment.proto

Cache Strategy

Cache is the biggest lever for speed. But it has its pitfalls.

Types of Cache

TypeWhat We CacheWhen to Invalidate
Dependencynode_modules, .m2, piplockfile change
Buildcompiled artifactssource change
Docker layersbase imagesDockerfile change
Testtest fixturestest data change

GitHub Actions Example

- name: Cache dependencies
  uses: actions/cache@v4
  with:
    path: |
      ~/.npm
      node_modules
    key: deps-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      deps-

- name: Cache build
  uses: actions/cache@v4
  with:
    path: dist
    key: build-${{ github.sha }}
    restore-keys: |
      build-${{ github.event.pull_request.base.sha }}
      build-

Remote Cache for Build Tools

For larger projects, local cache isn’t enough. Use remote cache:

Gradle:

// settings.gradle.kts
buildCache {
    remote<HttpBuildCache> {
        url = uri("https://cache.mycompany.com/cache/")
        isPush = System.getenv("CI") != null
    }
}

Bazel:

# .bazelrc
build --remote_cache=grpcs://cache.mycompany.com
build --remote_upload_local_results=true

Turborepo:

{
  "remoteCache": {
    "teamId": "my-team",
    "signature": true
  }
}

Parallelization Without Cost Explosion

More parallel jobs = faster. But also more expensive. How to find balance?

Test Sharding

test:
  strategy:
    matrix:
      shard: [1, 2, 3, 4]
  steps:
    - run: npm test -- --shard=${{ matrix.shard }}/4

Fail Fast

strategy:
  fail-fast: true  # Stops all jobs on first failure
  matrix:
    service: [order, payment, inventory]

Limit Concurrency

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true  # Cancels older runs of the same branch

Selective Testing

Not all tests need to run every time.

Minimal Model

Change in service A → Run:
  - Unit tests A
  - Integration tests A
  - Contract tests A ↔ dependencies

Medium Model (Dependency Graph)

# Script that analyzes imports/dependencies
- name: Detect affected tests
  run: |
    ./scripts/affected-tests.sh > affected.txt

- name: Run affected tests
  run: |
    cat affected.txt | xargs npm test --

Test Impact Analysis

For enterprise projects, tools exist like:

  • Launchable - ML-based test selection
  • Gradle Test Retry - smart retry for flaky tests
  • Jest —changedSince - built-in for JS

Quality Gates

What to block, what to just report?

CheckPR BlockMain BlockComment
LintYesYesFast, unambiguous
Unit testsYesYesBasic functionality
IntegrationNo (report)YesCan be flaky
Coverage dropNo (report)YesTrend matters more
Security HIGHYesYesCritical
Security MEDNo (report)NoTriage

Practical Implementation

- name: Check coverage
  run: |
    COVERAGE=$(npm test -- --coverage | grep "All files" | awk '{print $10}')
    THRESHOLD=80
    if (( $(echo "$COVERAGE < $THRESHOLD" | bc -l) )); then
      echo "::warning::Coverage $COVERAGE% is below $THRESHOLD%"
      # Don't block, just warning
    fi

Security in Pipeline (Without Theater)

Security scanning is useful only if you do something with the results.

When to Run What

TypeWhenBlock?
SAST (Semgrep, CodeQL)Every PRHIGH = yes
Dependency scanEvery PR + nightlyCritical = yes
DASTNightlyNo, triage
Container scanBefore push to registryHIGH = yes

SBOM (Software Bill of Materials)

SBOM is a list of all components in your software. Required for supply-chain security.

- name: Generate SBOM
  uses: anchore/sbom-action@v0
  with:
    format: spdx-json
    output-file: sbom.spdx.json

- name: Upload SBOM
  uses: actions/upload-artifact@v4
  with:
    name: sbom
    path: sbom.spdx.json

SLSA (Supply-chain Levels for Software Artifacts)

SLSA defines security levels for the build process:

  • Level 1: Documented build
  • Level 2: Hosted build service
  • Level 3: Hardened build (what you want to achieve)
- name: Generate SLSA provenance
  uses: slsa-framework/slsa-github-generator/.github/workflows/builder_go_slsa3.yml@v1
  with:
    go-version: 1.21

Artifact Signing

- name: Sign container image
  run: |
    cosign sign --key env://COSIGN_PRIVATE_KEY \
      ${{ env.REGISTRY }}/${{ env.IMAGE }}:${{ github.sha }}

- name: Verify signature
  run: |
    cosign verify --key env://COSIGN_PUBLIC_KEY \
      ${{ env.REGISTRY }}/${{ env.IMAGE }}:${{ github.sha }}

Release Strategy

Semantic Versioning

- name: Determine version
  id: version
  uses: paulhatch/semantic-version@v5
  with:
    major_pattern: "BREAKING CHANGE:"
    minor_pattern: "feat:"

- name: Create release
  run: |
    gh release create v${{ steps.version.outputs.version }} \
      --generate-notes

Changelog Automation

- name: Generate changelog
  uses: orhun/git-cliff-action@v2
  with:
    config: cliff.toml
    args: --latest
  env:
    OUTPUT: CHANGELOG.md

Pipeline Anti-patterns

What to avoid:

  1. “Build everything always” - 90% of time wasted
  2. Flaky tests without quarantine - erode CI trust
  3. Secrets in logs - set +x isn’t enough, use masking
  4. Mega-jobs - one 30 min job vs. 10 jobs of 3 min
  5. No job artifacts - debugging impossible

Reference Implementation

Minimal Pipeline (Starter)

# .github/workflows/ci.yml
name: CI
on: [push, pull_request]

jobs:
  detect:
    runs-on: ubuntu-latest
    outputs:
      services: ${{ steps.detect.outputs.services }}
    steps:
      - uses: actions/checkout@v4
      - id: detect
        run: ./scripts/detect-changes.sh

  build-test:
    needs: detect
    strategy:
      matrix:
        service: ${{ fromJson(needs.detect.outputs.services) }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/cache@v4
        with:
          path: node_modules
          key: deps-${{ hashFiles('package-lock.json') }}
      - run: npm ci
      - run: npm run build --workspace=${{ matrix.service }}
      - run: npm test --workspace=${{ matrix.service }}

Metrics and Observability

What to measure:

MetricSourceTarget
Lead timeCI timestamps< 15 min
Pipeline durationCI metricsDecreasing trend
Flakiness rateTest reruns< 1%
Change failure rateRollback count< 5%

Practical Dashboard

- name: Report metrics
  run: |
    curl -X POST https://metrics.mycompany.com/ci \
      -d "pipeline_duration=${{ steps.timer.outputs.duration }}" \
      -d "tests_passed=${{ steps.tests.outputs.passed }}" \
      -d "tests_failed=${{ steps.tests.outputs.failed }}"

Conclusion: 10 Things to Improve by Tomorrow

  1. Add path filters - fastest speedup
  2. Enable dependency cache
  3. Set fail-fast: true
  4. Add concurrency with cancel-in-progress
  5. Split mega-jobs into smaller ones
  6. Enable SAST at least for HIGH findings
  7. Generate SBOM
  8. Set up secrets masking
  9. Add timing metrics
  10. Document pipeline in README

Your next step: Measure your current pipeline time. Implement path filters. Measure again. You’ll see the difference within an hour.

Frequently Asked Questions (FAQ)

Is monorepo the right choice for our team?

Monorepo works best when you have shared libraries between services or need atomic changes across multiple components. If your services are completely independent, polyrepo might be simpler.

How much does remote build cache cost?

Depends on the provider. Self-hosted solutions (e.g., Gradle Enterprise) cost around $50-100k USD/year. Cloud solutions (Turborepo Cloud, NX Cloud) have free tiers and you pay for usage.

How to handle flaky tests?

Implement quarantine - automatically move flaky tests to a “quarantine” suite that runs nightly but doesn’t block PRs. Set up alerts for new flaky tests and fix them within 48 hours.

What is SLSA and do I need it?

SLSA is a framework for supply-chain security. Level 3 is recommended for production software. Yes, you need it - supply-chain attacks are becoming increasingly common.


Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "CI/CD for Monorepo: Speed, Caching, Selective Tests and Supply-Chain Security". https://www.michal-drozd.com/en/blog/cicd-monorepo/ (Published October 4, 2025).