Progressive Delivery Testing: Argo Rollouts, Flagger & Canary Analysis Templates

Progressive Delivery Testing: Argo Rollouts, Flagger & Canary Analysis Templates

Progressive delivery automatically routes a small percentage of traffic to a new version, analyzes metrics, and either advances the rollout or triggers an automatic rollback. The system replaces human judgment in the deployment pipeline — which means the testing focus shifts to validating that the automation itself is correctly configured.

Progressive Delivery vs Blue-Green

Blue-green is binary: old or new, 0% or 100%. Progressive delivery (canary) is continuous: 5% → 20% → 50% → 100%, with automated analysis at each step.

The testing challenge for progressive delivery is distinct: you're not just testing the application, you're testing the delivery system's decision-making. Does it advance when metrics are good? Does it roll back when error rates spike? Do the analysis templates correctly interpret your observability data?

Argo Rollouts

Argo Rollouts extends Kubernetes Deployments with progressive delivery strategies.

Installation

kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts \
  -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

# Install kubectl plugin
curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
<span class="hljs-built_in">chmod +x kubectl-argo-rollouts-linux-amd64
<span class="hljs-built_in">sudo <span class="hljs-built_in">mv kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts

Canary Rollout Definition

# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api-service
  namespace: production
spec:
  replicas: 10
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
    spec:
      containers:
        - name: api-service
          image: my-registry/api-service:latest
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi

  strategy:
    canary:
      # Traffic routing (requires Istio or similar)
      trafficRouting:
        istio:
          virtualService:
            name: api-service-vsvc
          destinationRule:
            name: api-service-destrule
            canarySubsetName: canary
            stableSubsetName: stable

      # Analysis runs at each step
      analysis:
        templates:
          - templateName: error-rate-analysis
          - templateName: latency-analysis
        args:
          - name: service-name
            value: api-service

      steps:
        - setWeight: 5          # 5% to canary
        - pause: {duration: 5m} # Wait 5 minutes
        - analysis:
            templates:
              - templateName: error-rate-analysis
        - setWeight: 20
        - pause: {duration: 10m}
        - analysis:
            templates:
              - templateName: error-rate-analysis
              - templateName: latency-analysis
        - setWeight: 50
        - pause: {duration: 10m}
        - analysis:
            templates:
              - templateName: error-rate-analysis
              - templateName: latency-analysis

Analysis Templates

Analysis templates define the metrics that determine whether to advance or rollback.

# analysis-template-error-rate.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: error-rate-analysis
  namespace: production
spec:
  args:
    - name: service-name

  metrics:
    - name: error-rate
      interval: 1m            # Query every minute
      count: 5                # Run 5 times total
      successCondition: result[0] < 0.05   # Pass if error rate < 5%
      failureLimit: 2         # Allow 2 failures before aborting rollout
      provider:
        prometheus:
          address: http://prometheus.monitoring.svc.cluster.local:9090
          query: |
            sum(rate(http_requests_total{
              service="{{args.service-name}}",
              status=~"5.."
            }[2m]))
            /
            sum(rate(http_requests_total{
              service="{{args.service-name}}"
            }[2m]))

    - name: canary-error-rate-vs-stable
      interval: 1m
      count: 5
      successCondition: result[0] < 0.02  # Canary error rate < 2%
      provider:
        prometheus:
          address: http://prometheus.monitoring.svc.cluster.local:9090
          query: |
            sum(rate(http_requests_total{
              service="{{args.service-name}}",
              version="canary",
              status=~"5.."
            }[2m]))
            /
            sum(rate(http_requests_total{
              service="{{args.service-name}}",
              version="canary"
            }[2m]))
# analysis-template-latency.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: latency-analysis
  namespace: production
spec:
  args:
    - name: service-name

  metrics:
    - name: p99-latency
      interval: 1m
      count: 5
      successCondition: result[0] < 0.5   # P99 latency < 500ms
      failureLimit: 1
      provider:
        prometheus:
          address: http://prometheus.monitoring.svc.cluster.local:9090
          query: |
            histogram_quantile(0.99,
              sum(rate(http_request_duration_seconds_bucket{
                service="{{args.service-name}}",
                version="canary"
              }[2m])) by (le)
            )

Testing Analysis Templates

Analysis templates must be tested independently. A misconfigured PromQL query silently returns no results and Argo Rollouts interprets that as a pass.

#!/bin/bash
<span class="hljs-comment"># test-analysis-templates.sh

PROMETHEUS_URL=<span class="hljs-string">"http://prometheus.monitoring.svc.cluster.local:9090"
SERVICE=<span class="hljs-string">"api-service"

<span class="hljs-built_in">echo <span class="hljs-string">"Testing analysis template PromQL queries..."

<span class="hljs-comment"># Test error rate query
ERROR_RATE=$(curl -s <span class="hljs-string">"$PROMETHEUS_URL/api/v1/query" \
  --data-urlencode <span class="hljs-string">"query=sum(rate(http_requests_total{service=\"$SERVICE\",status=~\"5..\"}[2m]))/sum(rate(http_requests_total{service=\"<span class="hljs-variable">$SERVICE\"}[2m]))" \
  <span class="hljs-pipe">| jq -r <span class="hljs-string">'.data.result[0].value[1]')

<span class="hljs-built_in">echo <span class="hljs-string">"Current error rate: $ERROR_RATE"
[ <span class="hljs-string">"$ERROR_RATE" = <span class="hljs-string">"null" ] && <span class="hljs-built_in">echo <span class="hljs-string">"WARNING: Query returned no data — check metric labels"

<span class="hljs-comment"># Test latency query
P99=$(curl -s <span class="hljs-string">"$PROMETHEUS_URL/api/v1/query" \
  --data-urlencode <span class="hljs-string">"query=histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service=\"$SERVICE\"}[2m])) by (le))" \
  <span class="hljs-pipe">| jq -r <span class="hljs-string">'.data.result[0].value[1]')

<span class="hljs-built_in">echo <span class="hljs-string">"Current P99 latency: ${P99}s"
[ <span class="hljs-string">"$P99" = <span class="hljs-string">"null" ] && <span class="hljs-built_in">echo <span class="hljs-string">"WARNING: Latency query returned no data — histogram_quantile requires _bucket metrics"

<span class="hljs-built_in">echo <span class="hljs-string">""
<span class="hljs-built_in">echo <span class="hljs-string">"Test the success conditions manually:"
<span class="hljs-built_in">echo <span class="hljs-string">"  Error rate < 0.05: $(echo "$ERROR_RATE < 0.05" <span class="hljs-pipe">| bc 2>/dev/null)"
<span class="hljs-built_in">echo <span class="hljs-string">"  P99 latency < 0.5s: $(echo "$P99 < 0.5" <span class="hljs-pipe">| bc 2>/dev/null)"

Canary Weight Step Tests

Test that traffic routing correctly shifts at each canary step:

#!/bin/bash
<span class="hljs-comment"># test-canary-weights.sh

STABLE_URL=<span class="hljs-string">"http://api-service-stable.production.svc.cluster.local"
CANARY_URL=<span class="hljs-string">"http://api-service-canary.production.svc.cluster.local"
PROXY_URL=<span class="hljs-string">"http://api-service.production.svc.cluster.local"

<span class="hljs-comment"># After setWeight: 5, expect ~5% of requests hit canary
<span class="hljs-comment"># Sample 1000 requests and count distribution
<span class="hljs-function">test_traffic_weight() {
  <span class="hljs-built_in">local expected_canary_pct=<span class="hljs-string">"${1:-5}"
  <span class="hljs-built_in">local sample_size=<span class="hljs-string">"${2:-1000}"
  <span class="hljs-built_in">local canary_count=0

  <span class="hljs-built_in">echo <span class="hljs-string">"Testing traffic split (expected ~${expected_canary_pct}% canary)..."

  <span class="hljs-keyword">for i <span class="hljs-keyword">in $(<span class="hljs-built_in">seq 1 <span class="hljs-variable">$sample_size); <span class="hljs-keyword">do
    VERSION=$(curl -s <span class="hljs-string">"$PROXY_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
    CANARY_VERSION=$(curl -s <span class="hljs-string">"$CANARY_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
    [ <span class="hljs-string">"$VERSION" = <span class="hljs-string">"$CANARY_VERSION" ] && ((canary_count++))
  <span class="hljs-keyword">done

  actual_pct=$(<span class="hljs-built_in">echo <span class="hljs-string">"scale=1; $canary_count * 100 / <span class="hljs-variable">$sample_size" <span class="hljs-pipe">| bc)
  expected_low=$(<span class="hljs-built_in">echo <span class="hljs-string">"$expected_canary_pct - 3" <span class="hljs-pipe">| bc)
  expected_high=$(<span class="hljs-built_in">echo <span class="hljs-string">"$expected_canary_pct + 3" <span class="hljs-pipe">| bc)

  <span class="hljs-built_in">echo <span class="hljs-string">"Canary traffic: $actual_pct% (<span class="hljs-variable">$canary_count/<span class="hljs-variable">$sample_size)"

  <span class="hljs-keyword">if (( $(echo "<span class="hljs-variable">$actual_pct >= <span class="hljs-variable">$expected_low" <span class="hljs-pipe">| bc -l) )) && \
     (( $(echo "<span class="hljs-variable">$actual_pct <= <span class="hljs-variable">$expected_high" <span class="hljs-pipe">| bc -l) )); <span class="hljs-keyword">then
    <span class="hljs-built_in">echo <span class="hljs-string">"PASS: Within ±3% of expected ${expected_canary_pct}%"
  <span class="hljs-keyword">else
    <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Expected ${expected_canary_pct}% ±3%, got <span class="hljs-variable">${actual_pct}%"
    <span class="hljs-built_in">return 1
  <span class="hljs-keyword">fi
}

<span class="hljs-comment"># Watch rollout progress and test at each step
kubectl argo rollouts get rollout api-service -n production --watch &

<span class="hljs-comment"># Test at 5% step
kubectl argo rollouts <span class="hljs-built_in">set weight api-service 5 -n production
<span class="hljs-built_in">sleep 5
test_traffic_weight 5

<span class="hljs-comment"># Test at 20% step
kubectl argo rollouts <span class="hljs-built_in">set weight api-service 20 -n production
<span class="hljs-built_in">sleep 5
test_traffic_weight 20

Flagger

Flagger is an alternative progressive delivery controller that integrates with service meshes (Istio, Linkerd, App Mesh) and ingress controllers.

Installation with Istio

helm repo add flagger https://flagger.app
helm repo update

helm install flagger flagger/flagger \
  --namespace istio-system \
  --set meshProvider=istio \
  --<span class="hljs-built_in">set metricsServer=http://prometheus.monitoring.svc.cluster.local:9090

Canary Resource

# canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: api-service
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service

  service:
    port: 80
    targetPort: 8080
    istio:
      timeout: 30s
      retries:
        attempts: 3
        perTryTimeout: 10s

  analysis:
    interval: 1m            # Analyze every minute
    threshold: 5            # Allow 5 failed checks before rollback
    maxWeight: 50           # Max 50% canary traffic
    stepWeight: 10          # Increase by 10% each step

    metrics:
      - name: request-success-rate
        threshold: 99         # Rollback if success rate drops below 99%
        interval: 1m

      - name: request-duration
        threshold: 500        # Rollback if P99 exceeds 500ms
        interval: 1m

    # Custom metric from Prometheus
    metrics:
      - name: custom-error-rate
        threshold: 2
        interval: 30s
        templateRef:
          name: error-rate
          namespace: flagger-system

    # Load testing during analysis
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://api-service.production/"

Flagger Metric Templates

# metric-template.yaml
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: error-rate
  namespace: flagger-system
spec:
  provider:
    type: prometheus
    address: http://prometheus.monitoring.svc.cluster.local:9090
  query: |
    100 - sum(
      rate(
        http_requests_total{
          namespace="{{ namespace }}",
          service="{{ target }}",
          status!~"5.."
        }[{{ interval }}]
      )
    )
    /
    sum(
      rate(
        http_requests_total{
          namespace="{{ namespace }}",
          service="{{ target }}"
        }[{{ interval }}]
      )
    )
    * 100

Testing Flagger Rollback Triggers

To verify Flagger correctly detects failures and rolls back:

#!/bin/bash
<span class="hljs-comment"># test-flagger-rollback.sh

NAMESPACE=<span class="hljs-string">"production"
CANARY_NAME=<span class="hljs-string">"api-service"
PROXY_URL=<span class="hljs-string">"http://api-service.production.svc.cluster.local"

<span class="hljs-comment"># 1. Trigger a canary deployment
kubectl <span class="hljs-built_in">set image deployment/api-service \
  api-service=my-registry/api-service:faulty-version \
  -n <span class="hljs-variable">$NAMESPACE

<span class="hljs-comment"># 2. Wait for Flagger to detect the canary
<span class="hljs-built_in">echo <span class="hljs-string">"Waiting for canary to initialize..."
kubectl <span class="hljs-built_in">wait --<span class="hljs-keyword">for=condition=Promoted=<span class="hljs-literal">false \
  canary/<span class="hljs-variable">$CANARY_NAME -n <span class="hljs-variable">$NAMESPACE \
  --<span class="hljs-built_in">timeout=120s

<span class="hljs-comment"># 3. Inject errors to trigger rollback
<span class="hljs-built_in">echo <span class="hljs-string">"Injecting errors into canary..."
kubectl apply -f - <<<span class="hljs-string">'EOF'
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: api-service-fault-injection
  namespace: production
spec:
  hosts:
    - api-service
  http:
    - match:
        - headers:
            x-version:
              exact: canary
      fault:
        abort:
          percentage:
            value: 50
          httpStatus: 500
      route:
        - destination:
            host: api-service
EOF

<span class="hljs-comment"># 4. Watch Flagger detect failures and roll back
<span class="hljs-built_in">echo <span class="hljs-string">"Monitoring Flagger canary status..."
<span class="hljs-built_in">timeout 300 bash -c <span class="hljs-string">'
while true; do
  STATUS=$(kubectl get canary api-service -n production \
    -o jsonpath="{.status.phase}")
  echo "$(date +%H:%M:%S) - Canary status: $STATUS"
  [ "$STATUS" = "Failed" ] && { echo "Rollback triggered!"; break; }
  [ "$STATUS" = "Succeeded" ] && { echo "WARNING: Should have failed but succeeded"; break; }
  sleep 15
done
'

<span class="hljs-comment"># 5. Verify rolled back to stable
ACTIVE_VERSION=$(kubectl get canary api-service -n production \
  -o jsonpath=<span class="hljs-string">'{.status.lastAppliedSpec}' <span class="hljs-pipe">| jq -r <span class="hljs-string">'.containers[0].image')
<span class="hljs-built_in">echo <span class="hljs-string">"Active version after rollback: $ACTIVE_VERSION"

<span class="hljs-comment"># 6. Cleanup fault injection
kubectl delete virtualservice api-service-fault-injection -n production

Integration Testing Canary Traffic in Tests

When running integration tests during a canary deployment, explicitly target either stable or canary:

// test/canary-integration.test.js

// Test stable version
describe('Stable version', () => {
  const stableClient = axios.create({
    baseURL: process.env.STABLE_URL,
    headers: { 'x-version': 'stable' }  // Istio header-based routing
  });

  it('returns expected response schema', async () => {
    const response = await stableClient.get('/api/products');
    expect(response.data).toMatchSchema(productListSchema);
  });
});

// Test canary version
describe('Canary version', () => {
  const canaryClient = axios.create({
    baseURL: process.env.CANARY_URL,
    headers: { 'x-version': 'canary' }
  });

  it('returns compatible response schema', async () => {
    const response = await canaryClient.get('/api/products');
    // Canary must be backwards-compatible with existing schema
    expect(response.data).toMatchSchema(productListSchema);
  });

  it('canary error rate is below threshold', async () => {
    const results = await Promise.all(
      Array(100).fill(null).map(() =>
        canaryClient.get('/api/products').then(r => r.status).catch(e => e.response?.status)
      )
    );
    const errors = results.filter(s => s >= 500).length;
    expect(errors / results.length).toBeLessThan(0.02);  // < 2% error rate
  });
});

Common Progressive Delivery Testing Mistakes

Testing only the happy path — verify that analysis templates actually block bad deployments by intentionally deploying a version that returns 5xx errors and confirming rollback occurs.

Unreachable Prometheus — PromQL queries fail silently; Flagger/Argo Rollouts interpret no data as success. Always test queries directly against Prometheus before relying on them.

Metric label mismatches — if your service emits app="api-service" but the analysis template queries service="api-service", analysis returns no data. Verify labels match exactly.

Missing version labels — canary analysis comparing canary vs. stable requires pods to have distinguishable labels. Verify canary pods have version: canary or equivalent.

Low traffic volume — error rate analysis is meaningless at < 10 RPS. The percentage is based on a tiny sample. Either run load tests during analysis or use absolute counts instead of rates for low-traffic services.

Analysis too aggressive — a single 500 error at 5% weight shouldn't abort a rollout. Set failureLimit appropriately and use intervals longer than 30 seconds to avoid noise.

Read more