Testing

Progressive Delivery Testing with Flagger

HelpMeTest

22 May 2026 — 5 min read

Progressive delivery is the practice of gradually rolling out changes to users while automatically verifying that the release meets your quality standards at each step. Flagger is a Kubernetes operator that automates this process — it manages canary deployments, traffic mirroring, and A/B testing with automated analysis and rollback.

Unlike Argo Rollouts (which uses its own CRD), Flagger works with whatever traffic routing layer you already have: Istio, Linkerd, AWS App Mesh, Contour, or Nginx. This makes it easier to adopt without changing your existing service mesh or ingress setup.

How Flagger Works

Flagger watches Kubernetes Deployments. When you update a deployment image (the normal Kubernetes way), Flagger intercepts the rollout and manages it progressively:

Creates a canary deployment alongside the stable deployment
Routes a small percentage of traffic to the canary
Runs automated metric analysis at each step
Promotes (increases traffic) if analysis passes, or rolls back if it fails
Scales down the canary pod when done

You interact with Flagger by updating your Deployment normally — kubectl set image or updating via GitOps. Flagger handles the rest.

Installation

# Install Flagger (with Istio)
helm repo add flagger https://flagger.app
helm repo update

helm upgrade -i flagger flagger/flagger \
  --namespace istio-system \
  --<span class="hljs-built_in">set crd.create=<span class="hljs-literal">false \
  --<span class="hljs-built_in">set meshProvider=istio \
  --<span class="hljs-built_in">set metricsServer=http://prometheus:9090

For other mesh providers:

# Nginx
--<span class="hljs-built_in">set meshProvider=nginx

<span class="hljs-comment"># Linkerd
--<span class="hljs-built_in">set meshProvider=linkerd

<span class="hljs-comment"># AWS App Mesh
--<span class="hljs-built_in">set meshProvider=appmesh

Defining a Canary

The Flagger Canary CRD wraps your deployment and defines the progressive delivery strategy:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
  namespace: default
spec:
  # Deployment to target
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  
  # Ingress/service mesh config
  service:
    port: 80
    targetPort: 8080

  # Progressive rollout config
  analysis:
    # Run analysis every 1 minute
    interval: 1m
    # How many failed checks before rollback
    threshold: 5
    # Max traffic weight for canary
    maxWeight: 50
    # How much to increase weight per step
    stepWeight: 10
    
    # Metrics to validate at each step
    metrics:
      - name: request-success-rate
        # Minimum 99% success rate
        thresholdRange:
          min: 99
        interval: 1m
      
      - name: request-duration
        # Max 500ms p99 latency
        thresholdRange:
          max: 500
        interval: 1m
    
    # Optional: webhooks for notifications or pre/post rollout hooks
    webhooks:
      - name: slack-notify
        type: event
        url: http://flagger-loadtester.test/

When you deploy a new image, Flagger:

Routes 10% traffic to the canary
Waits 1 minute, runs metrics analysis
If passing: routes 20%, repeats
If 5 consecutive failures: rolls back to 0%
At 50% with all passing: promotes canary to stable

Metric Providers

Flagger supports multiple metric sources. Configure the one you use:

Prometheus

The default for most setups. Flagger includes built-in metric templates for success rate and latency:

metrics:
  - name: request-success-rate
    thresholdRange:
      min: 99
    interval: 1m
  - name: request-duration
    thresholdRange:
      max: 500
    interval: 1m

For custom metrics, define a MetricTemplate:

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: not-found-rate
spec:
  provider:
    type: prometheus
    address: http://prometheus:9090
  query: |
    sum(
      rate(
        http_requests_total{
          namespace="{{ namespace }}",
          service="{{ service }}",
          status="404"
        }[{{ interval }}]
      )
    ) /
    sum(
      rate(
        http_requests_total{
          namespace="{{ namespace }}",
          service="{{ service }}"
        }[{{ interval }}]
      )
    ) * 100

Reference it in the Canary:

metrics:
  - name: not-found-rate
    templateRef:
      name: not-found-rate
    thresholdRange:
      max: 5  # Under 5% 404 rate
    interval: 1m

Datadog

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: error-rate
spec:
  provider:
    type: datadog
    address: https://api.datadoghq.com
    secretRef:
      name: datadog-credentials
  query: |
    sum:trace.web.request.errors{
      env:production,
      service:{{ service }}
    }.as_rate() /
    sum:trace.web.request.hits{
      env:production,
      service:{{ service }}
    }.as_rate() * 100

CloudWatch

spec:
  provider:
    type: cloudwatch
    region: us-east-1
  query: |
    [
      {
        "Id": "errors",
        "Expression": "SELECT COUNT(5xxError) FROM SCHEMA(\"AWS/ApplicationELB\", LoadBalancer, TargetGroup)"
      }
    ]

Traffic Mirroring with Flagger

Instead of (or before) a canary, use traffic mirroring to test the new version with real traffic without affecting users:

spec:
  analysis:
    mirror: true
    mirrorWeight: 100  # Mirror 100% of traffic (shadow only)
    
    # Analysis still runs — checks shadow service metrics
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m

In mirror mode, Flagger routes all real traffic to stable and copies it to the canary. The canary's response is discarded. Analysis runs on the canary's metrics (error rate, latency) without users being affected.

After mirroring validates the new version, switch to canary mode for the actual rollout.

A/B Testing with Flagger

For product A/B tests (not just deployment safety), use Flagger's A/B testing configuration:

spec:
  analysis:
    # A/B testing: route based on headers or cookies
    match:
      - headers:
          x-canary:
            exact: "true"
      - cookies:
          canary:
            exact: "always"
    
    # Sticky sessions for consistent assignment
    iterations: 10
    
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m

Users with the x-canary: true header or canary=always cookie see the new version consistently. All other users see stable. This enables proper A/B test assignment with sticky routing.

Webhooks for Load Testing and Validation

Before or during analysis, run load tests or acceptance tests automatically:

spec:
  analysis:
    webhooks:
      # Pre-rollout: acceptance test
      - name: acceptance-test
        type: pre-rollout
        url: http://flagger-loadtester.flagger-system/
        timeout: 30s
        metadata:
          type: bash
          cmd: "curl -sd 'test' http://my-app-canary.default/api/health | grep OK"
      
      # During rollout: load test
      - name: load-test
        url: http://flagger-loadtester.flagger-system/
        timeout: 5s
        metadata:
          type: cmd
          cmd: "hey -z 1m -q 10 -c 2 http://my-app-canary.default/"
      
      # Rollback notification
      - name: notify-rollback
        type: rollback
        url: https://hooks.slack.com/services/YOUR/WEBHOOK/URL

The flagger-loadtester is a companion service that executes webhook commands. Deploy it:

helm upgrade -i flagger-loadtester flagger/loadtester \
  --namespace flagger-system

Monitoring Flagger Rollouts

Kubernetes Events

kubectl describe canary my-app
kubectl get events --field-selector reason=Synced --namespace default

Flagger Dashboard (Grafana)

Import the Flagger Grafana dashboard (ID 15828) to visualize:

Current canary weight
Success rate comparison (stable vs canary)
Latency comparison
Analysis pass/fail history

CLI Monitoring

# Watch rollout status
watch kubectl get canary my-app

<span class="hljs-comment"># View Flagger logs
kubectl logs -n flagger-system deploy/flagger -f

GitOps Integration

Flagger integrates naturally with GitOps workflows (Flux, ArgoCD). The typical flow:

Developer updates image tag in Git
Flux/ArgoCD applies the Deployment change to Kubernetes
Flagger detects the image change and initiates the canary rollout
Analysis runs automatically
Flagger updates a Kubernetes event/status — GitOps can observe this

With Flux specifically, use ImageUpdateAutomation to automatically commit new image tags to Git, which then triggers Flagger via the normal GitOps apply.

Rollback and Manual Promotion

Manual Rollback

# Abort the current canary and rollback to stable
kubectl annotate canary my-app flagger.app/canaryWeight=<span class="hljs-string">"0"

Or delete and recreate the Canary resource.

Manual Promotion

If you want to skip automated analysis and promote immediately:

kubectl annotate canary my-app flagger.app/promote="true"

This is an escape hatch — use it carefully. It bypasses the safety checks.

Key Differences from Argo Rollouts

Feature	Flagger	Argo Rollouts
CRD design	Wraps standard Deployments	Replaces Deployment with Rollout
Mesh support	Istio, Linkerd, App Mesh, Nginx, Contour	Nginx, Istio, AWS ALB
Trigger	Image change in Deployment	Manual or via ArgoCD sync
GitOps	Native Flux integration	Native ArgoCD integration
Traffic mirroring	Built-in	Via service mesh
A/B testing	Built-in (header/cookie routing)	Limited
Learning curve	Lower (keeps Deployment)	Higher (new resource type)

If your team is on Flux + Istio/Linkerd, Flagger is the natural choice. If you're on ArgoCD with Nginx, Argo Rollouts fits better.

Getting Started

The fastest path to progressive delivery with Flagger:

Install Flagger for your mesh provider
Create a Canary resource for your highest-traffic, highest-risk service
Configure success-rate and latency metrics from your existing Prometheus
Make a small deployment change and watch Flagger manage the rollout

Once you see it automatically roll back a bad deployment, the investment in setup pays for itself.