Progressive Delivery Testing: Argo Rollouts, Flagger & Canary Analysis Templates
Progressive delivery automatically routes a small percentage of traffic to a new version, analyzes metrics, and either advances the rollout or triggers an automatic rollback. The system replaces human judgment in the deployment pipeline — which means the testing focus shifts to validating that the automation itself is correctly configured.
Progressive Delivery vs Blue-Green
Blue-green is binary: old or new, 0% or 100%. Progressive delivery (canary) is continuous: 5% → 20% → 50% → 100%, with automated analysis at each step.
The testing challenge for progressive delivery is distinct: you're not just testing the application, you're testing the delivery system's decision-making. Does it advance when metrics are good? Does it roll back when error rates spike? Do the analysis templates correctly interpret your observability data?
Argo Rollouts
Argo Rollouts extends Kubernetes Deployments with progressive delivery strategies.
Installation
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts \
-f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
# Install kubectl plugin
curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
<span class="hljs-built_in">chmod +x kubectl-argo-rollouts-linux-amd64
<span class="hljs-built_in">sudo <span class="hljs-built_in">mv kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rolloutsCanary Rollout Definition
# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: api-service
namespace: production
spec:
replicas: 10
selector:
matchLabels:
app: api-service
template:
metadata:
labels:
app: api-service
spec:
containers:
- name: api-service
image: my-registry/api-service:latest
ports:
- containerPort: 8080
resources:
requests:
cpu: 100m
memory: 128Mi
strategy:
canary:
# Traffic routing (requires Istio or similar)
trafficRouting:
istio:
virtualService:
name: api-service-vsvc
destinationRule:
name: api-service-destrule
canarySubsetName: canary
stableSubsetName: stable
# Analysis runs at each step
analysis:
templates:
- templateName: error-rate-analysis
- templateName: latency-analysis
args:
- name: service-name
value: api-service
steps:
- setWeight: 5 # 5% to canary
- pause: {duration: 5m} # Wait 5 minutes
- analysis:
templates:
- templateName: error-rate-analysis
- setWeight: 20
- pause: {duration: 10m}
- analysis:
templates:
- templateName: error-rate-analysis
- templateName: latency-analysis
- setWeight: 50
- pause: {duration: 10m}
- analysis:
templates:
- templateName: error-rate-analysis
- templateName: latency-analysisAnalysis Templates
Analysis templates define the metrics that determine whether to advance or rollback.
# analysis-template-error-rate.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: error-rate-analysis
namespace: production
spec:
args:
- name: service-name
metrics:
- name: error-rate
interval: 1m # Query every minute
count: 5 # Run 5 times total
successCondition: result[0] < 0.05 # Pass if error rate < 5%
failureLimit: 2 # Allow 2 failures before aborting rollout
provider:
prometheus:
address: http://prometheus.monitoring.svc.cluster.local:9090
query: |
sum(rate(http_requests_total{
service="{{args.service-name}}",
status=~"5.."
}[2m]))
/
sum(rate(http_requests_total{
service="{{args.service-name}}"
}[2m]))
- name: canary-error-rate-vs-stable
interval: 1m
count: 5
successCondition: result[0] < 0.02 # Canary error rate < 2%
provider:
prometheus:
address: http://prometheus.monitoring.svc.cluster.local:9090
query: |
sum(rate(http_requests_total{
service="{{args.service-name}}",
version="canary",
status=~"5.."
}[2m]))
/
sum(rate(http_requests_total{
service="{{args.service-name}}",
version="canary"
}[2m]))# analysis-template-latency.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: latency-analysis
namespace: production
spec:
args:
- name: service-name
metrics:
- name: p99-latency
interval: 1m
count: 5
successCondition: result[0] < 0.5 # P99 latency < 500ms
failureLimit: 1
provider:
prometheus:
address: http://prometheus.monitoring.svc.cluster.local:9090
query: |
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket{
service="{{args.service-name}}",
version="canary"
}[2m])) by (le)
)Testing Analysis Templates
Analysis templates must be tested independently. A misconfigured PromQL query silently returns no results and Argo Rollouts interprets that as a pass.
#!/bin/bash
<span class="hljs-comment"># test-analysis-templates.sh
PROMETHEUS_URL=<span class="hljs-string">"http://prometheus.monitoring.svc.cluster.local:9090"
SERVICE=<span class="hljs-string">"api-service"
<span class="hljs-built_in">echo <span class="hljs-string">"Testing analysis template PromQL queries..."
<span class="hljs-comment"># Test error rate query
ERROR_RATE=$(curl -s <span class="hljs-string">"$PROMETHEUS_URL/api/v1/query" \
--data-urlencode <span class="hljs-string">"query=sum(rate(http_requests_total{service=\"$SERVICE\",status=~\"5..\"}[2m]))/sum(rate(http_requests_total{service=\"<span class="hljs-variable">$SERVICE\"}[2m]))" \
<span class="hljs-pipe">| jq -r <span class="hljs-string">'.data.result[0].value[1]')
<span class="hljs-built_in">echo <span class="hljs-string">"Current error rate: $ERROR_RATE"
[ <span class="hljs-string">"$ERROR_RATE" = <span class="hljs-string">"null" ] && <span class="hljs-built_in">echo <span class="hljs-string">"WARNING: Query returned no data — check metric labels"
<span class="hljs-comment"># Test latency query
P99=$(curl -s <span class="hljs-string">"$PROMETHEUS_URL/api/v1/query" \
--data-urlencode <span class="hljs-string">"query=histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service=\"$SERVICE\"}[2m])) by (le))" \
<span class="hljs-pipe">| jq -r <span class="hljs-string">'.data.result[0].value[1]')
<span class="hljs-built_in">echo <span class="hljs-string">"Current P99 latency: ${P99}s"
[ <span class="hljs-string">"$P99" = <span class="hljs-string">"null" ] && <span class="hljs-built_in">echo <span class="hljs-string">"WARNING: Latency query returned no data — histogram_quantile requires _bucket metrics"
<span class="hljs-built_in">echo <span class="hljs-string">""
<span class="hljs-built_in">echo <span class="hljs-string">"Test the success conditions manually:"
<span class="hljs-built_in">echo <span class="hljs-string">" Error rate < 0.05: $(echo "$ERROR_RATE < 0.05" <span class="hljs-pipe">| bc 2>/dev/null)"
<span class="hljs-built_in">echo <span class="hljs-string">" P99 latency < 0.5s: $(echo "$P99 < 0.5" <span class="hljs-pipe">| bc 2>/dev/null)"Canary Weight Step Tests
Test that traffic routing correctly shifts at each canary step:
#!/bin/bash
<span class="hljs-comment"># test-canary-weights.sh
STABLE_URL=<span class="hljs-string">"http://api-service-stable.production.svc.cluster.local"
CANARY_URL=<span class="hljs-string">"http://api-service-canary.production.svc.cluster.local"
PROXY_URL=<span class="hljs-string">"http://api-service.production.svc.cluster.local"
<span class="hljs-comment"># After setWeight: 5, expect ~5% of requests hit canary
<span class="hljs-comment"># Sample 1000 requests and count distribution
<span class="hljs-function">test_traffic_weight() {
<span class="hljs-built_in">local expected_canary_pct=<span class="hljs-string">"${1:-5}"
<span class="hljs-built_in">local sample_size=<span class="hljs-string">"${2:-1000}"
<span class="hljs-built_in">local canary_count=0
<span class="hljs-built_in">echo <span class="hljs-string">"Testing traffic split (expected ~${expected_canary_pct}% canary)..."
<span class="hljs-keyword">for i <span class="hljs-keyword">in $(<span class="hljs-built_in">seq 1 <span class="hljs-variable">$sample_size); <span class="hljs-keyword">do
VERSION=$(curl -s <span class="hljs-string">"$PROXY_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
CANARY_VERSION=$(curl -s <span class="hljs-string">"$CANARY_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
[ <span class="hljs-string">"$VERSION" = <span class="hljs-string">"$CANARY_VERSION" ] && ((canary_count++))
<span class="hljs-keyword">done
actual_pct=$(<span class="hljs-built_in">echo <span class="hljs-string">"scale=1; $canary_count * 100 / <span class="hljs-variable">$sample_size" <span class="hljs-pipe">| bc)
expected_low=$(<span class="hljs-built_in">echo <span class="hljs-string">"$expected_canary_pct - 3" <span class="hljs-pipe">| bc)
expected_high=$(<span class="hljs-built_in">echo <span class="hljs-string">"$expected_canary_pct + 3" <span class="hljs-pipe">| bc)
<span class="hljs-built_in">echo <span class="hljs-string">"Canary traffic: $actual_pct% (<span class="hljs-variable">$canary_count/<span class="hljs-variable">$sample_size)"
<span class="hljs-keyword">if (( $(echo "<span class="hljs-variable">$actual_pct >= <span class="hljs-variable">$expected_low" <span class="hljs-pipe">| bc -l) )) && \
(( $(echo "<span class="hljs-variable">$actual_pct <= <span class="hljs-variable">$expected_high" <span class="hljs-pipe">| bc -l) )); <span class="hljs-keyword">then
<span class="hljs-built_in">echo <span class="hljs-string">"PASS: Within ±3% of expected ${expected_canary_pct}%"
<span class="hljs-keyword">else
<span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Expected ${expected_canary_pct}% ±3%, got <span class="hljs-variable">${actual_pct}%"
<span class="hljs-built_in">return 1
<span class="hljs-keyword">fi
}
<span class="hljs-comment"># Watch rollout progress and test at each step
kubectl argo rollouts get rollout api-service -n production --watch &
<span class="hljs-comment"># Test at 5% step
kubectl argo rollouts <span class="hljs-built_in">set weight api-service 5 -n production
<span class="hljs-built_in">sleep 5
test_traffic_weight 5
<span class="hljs-comment"># Test at 20% step
kubectl argo rollouts <span class="hljs-built_in">set weight api-service 20 -n production
<span class="hljs-built_in">sleep 5
test_traffic_weight 20Flagger
Flagger is an alternative progressive delivery controller that integrates with service meshes (Istio, Linkerd, App Mesh) and ingress controllers.
Installation with Istio
helm repo add flagger https://flagger.app
helm repo update
helm install flagger flagger/flagger \
--namespace istio-system \
--set meshProvider=istio \
--<span class="hljs-built_in">set metricsServer=http://prometheus.monitoring.svc.cluster.local:9090Canary Resource
# canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: api-service
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
service:
port: 80
targetPort: 8080
istio:
timeout: 30s
retries:
attempts: 3
perTryTimeout: 10s
analysis:
interval: 1m # Analyze every minute
threshold: 5 # Allow 5 failed checks before rollback
maxWeight: 50 # Max 50% canary traffic
stepWeight: 10 # Increase by 10% each step
metrics:
- name: request-success-rate
threshold: 99 # Rollback if success rate drops below 99%
interval: 1m
- name: request-duration
threshold: 500 # Rollback if P99 exceeds 500ms
interval: 1m
# Custom metric from Prometheus
metrics:
- name: custom-error-rate
threshold: 2
interval: 30s
templateRef:
name: error-rate
namespace: flagger-system
# Load testing during analysis
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://api-service.production/"Flagger Metric Templates
# metric-template.yaml
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: error-rate
namespace: flagger-system
spec:
provider:
type: prometheus
address: http://prometheus.monitoring.svc.cluster.local:9090
query: |
100 - sum(
rate(
http_requests_total{
namespace="{{ namespace }}",
service="{{ target }}",
status!~"5.."
}[{{ interval }}]
)
)
/
sum(
rate(
http_requests_total{
namespace="{{ namespace }}",
service="{{ target }}"
}[{{ interval }}]
)
)
* 100Testing Flagger Rollback Triggers
To verify Flagger correctly detects failures and rolls back:
#!/bin/bash
<span class="hljs-comment"># test-flagger-rollback.sh
NAMESPACE=<span class="hljs-string">"production"
CANARY_NAME=<span class="hljs-string">"api-service"
PROXY_URL=<span class="hljs-string">"http://api-service.production.svc.cluster.local"
<span class="hljs-comment"># 1. Trigger a canary deployment
kubectl <span class="hljs-built_in">set image deployment/api-service \
api-service=my-registry/api-service:faulty-version \
-n <span class="hljs-variable">$NAMESPACE
<span class="hljs-comment"># 2. Wait for Flagger to detect the canary
<span class="hljs-built_in">echo <span class="hljs-string">"Waiting for canary to initialize..."
kubectl <span class="hljs-built_in">wait --<span class="hljs-keyword">for=condition=Promoted=<span class="hljs-literal">false \
canary/<span class="hljs-variable">$CANARY_NAME -n <span class="hljs-variable">$NAMESPACE \
--<span class="hljs-built_in">timeout=120s
<span class="hljs-comment"># 3. Inject errors to trigger rollback
<span class="hljs-built_in">echo <span class="hljs-string">"Injecting errors into canary..."
kubectl apply -f - <<<span class="hljs-string">'EOF'
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: api-service-fault-injection
namespace: production
spec:
hosts:
- api-service
http:
- match:
- headers:
x-version:
exact: canary
fault:
abort:
percentage:
value: 50
httpStatus: 500
route:
- destination:
host: api-service
EOF
<span class="hljs-comment"># 4. Watch Flagger detect failures and roll back
<span class="hljs-built_in">echo <span class="hljs-string">"Monitoring Flagger canary status..."
<span class="hljs-built_in">timeout 300 bash -c <span class="hljs-string">'
while true; do
STATUS=$(kubectl get canary api-service -n production \
-o jsonpath="{.status.phase}")
echo "$(date +%H:%M:%S) - Canary status: $STATUS"
[ "$STATUS" = "Failed" ] && { echo "Rollback triggered!"; break; }
[ "$STATUS" = "Succeeded" ] && { echo "WARNING: Should have failed but succeeded"; break; }
sleep 15
done
'
<span class="hljs-comment"># 5. Verify rolled back to stable
ACTIVE_VERSION=$(kubectl get canary api-service -n production \
-o jsonpath=<span class="hljs-string">'{.status.lastAppliedSpec}' <span class="hljs-pipe">| jq -r <span class="hljs-string">'.containers[0].image')
<span class="hljs-built_in">echo <span class="hljs-string">"Active version after rollback: $ACTIVE_VERSION"
<span class="hljs-comment"># 6. Cleanup fault injection
kubectl delete virtualservice api-service-fault-injection -n productionIntegration Testing Canary Traffic in Tests
When running integration tests during a canary deployment, explicitly target either stable or canary:
// test/canary-integration.test.js
// Test stable version
describe('Stable version', () => {
const stableClient = axios.create({
baseURL: process.env.STABLE_URL,
headers: { 'x-version': 'stable' } // Istio header-based routing
});
it('returns expected response schema', async () => {
const response = await stableClient.get('/api/products');
expect(response.data).toMatchSchema(productListSchema);
});
});
// Test canary version
describe('Canary version', () => {
const canaryClient = axios.create({
baseURL: process.env.CANARY_URL,
headers: { 'x-version': 'canary' }
});
it('returns compatible response schema', async () => {
const response = await canaryClient.get('/api/products');
// Canary must be backwards-compatible with existing schema
expect(response.data).toMatchSchema(productListSchema);
});
it('canary error rate is below threshold', async () => {
const results = await Promise.all(
Array(100).fill(null).map(() =>
canaryClient.get('/api/products').then(r => r.status).catch(e => e.response?.status)
)
);
const errors = results.filter(s => s >= 500).length;
expect(errors / results.length).toBeLessThan(0.02); // < 2% error rate
});
});Common Progressive Delivery Testing Mistakes
Testing only the happy path — verify that analysis templates actually block bad deployments by intentionally deploying a version that returns 5xx errors and confirming rollback occurs.
Unreachable Prometheus — PromQL queries fail silently; Flagger/Argo Rollouts interpret no data as success. Always test queries directly against Prometheus before relying on them.
Metric label mismatches — if your service emits app="api-service" but the analysis template queries service="api-service", analysis returns no data. Verify labels match exactly.
Missing version labels — canary analysis comparing canary vs. stable requires pods to have distinguishable labels. Verify canary pods have version: canary or equivalent.
Low traffic volume — error rate analysis is meaningless at < 10 RPS. The percentage is based on a tiny sample. Either run load tests during analysis or use absolute counts instead of rates for low-traffic services.
Analysis too aggressive — a single 500 error at 5% weight shouldn't abort a rollout. Set failureLimit appropriately and use intervals longer than 30 seconds to avoid noise.