GitOps 2.0: Progressive Delivery with Flagger
devops kubernetes
GitOps handles deployments. But how do you deploy safely? Flagger adds progressive delivery—canary releases, A/B testing, and automated rollbacks to your GitOps workflow.
The Problem
Traditional deployments are all-or-nothing:
Deploy v2 → 100% traffic → Discover bug → All users affected
Progressive delivery:
Deploy v2 → 5% traffic → Monitor → 20% → Monitor → 50% → 100%
↓
Metrics bad?
↓
Auto-rollback
What is Flagger?
Flagger is a Kubernetes operator for progressive delivery. It works with:
- Istio
- Linkerd
- AWS App Mesh
- NGINX Ingress
- Contour
- Gloo Edge
┌─────────────────────────────────────────────┐
│ Flagger │
├─────────────────────────────────────────────┤
│ • Canary releases │
│ • A/B testing │
│ • Blue/Green deployments │
│ • Automated metrics analysis │
│ • Automatic rollback │
└─────────────────────────────────────────────┘
Installation
# With Flux
helm repo add flagger https://flagger.app
helm upgrade -i flagger flagger/flagger \
--namespace=flagger-system \
--set meshProvider=istio \
--set metricsServer=http://prometheus:9090
Canary Deployment
Define Canary Resource
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
service:
port: 80
targetPort: 8080
analysis:
# Run analysis every 1 minute
interval: 1m
# Max number of failed checks
threshold: 5
# Max traffic percentage
maxWeight: 50
# Traffic step weight
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 1m
How It Works
1. You update Deployment → Flagger detects change
2. Flagger creates canary Deployment (v2)
3. Routes 10% traffic to canary
4. Checks metrics (success rate, latency)
5. If good → increase to 20%, 30%...
6. If bad → rollback to v1
7. At 50% → promote v2 as primary
Traffic Progression
Time Primary (v1) Canary (v2)
0m 100% 0%
1m 90% 10% ← First analysis
2m 80% 20% ← Metrics OK, step up
3m 70% 30% ← Metrics OK, step up
...
5m 50% 50% ← Max weight reached
6m 0% 100% ← Promotion
Custom Metrics
Prometheus Query
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: error-rate
namespace: flagger-system
spec:
provider:
type: prometheus
address: http://prometheus.monitoring:9090
query: |
100 - (
sum(rate(http_requests_total{
namespace="{{ namespace }}",
app="{{ target }}",
status!~"5.*"
}[{{ interval }}]))
/
sum(rate(http_requests_total{
namespace="{{ namespace }}",
app="{{ target }}"
}[{{ interval }}]))
* 100
)
Using Custom Metric
spec:
analysis:
metrics:
- name: error-rate
templateRef:
name: error-rate
namespace: flagger-system
thresholdRange:
max: 1 # Max 1% error rate
interval: 1m
Webhooks
Load Testing
spec:
analysis:
webhooks:
- name: load-test
type: rollout
url: http://flagger-loadtester/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary:80/"
Smoke Tests
webhooks:
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester/
timeout: 30s
metadata:
cmd: "curl -s http://myapp-canary:80/health | grep ok"
Slack Notifications
webhooks:
- name: notify
type: event
url: https://hooks.slack.com/services/xxx/xxx/xxx
metadata:
payload: |
{
"text": "Canary {{ .Name }}: {{ .Phase }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "{{ .Message }}"
}
}
]
}
A/B Testing
Route based on headers/cookies:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
spec:
analysis:
interval: 1m
threshold: 10
iterations: 10
match:
- headers:
x-canary:
exact: "true"
- headers:
cookie:
regex: "^(.*?;)?(canary=true)(;.*)?$"
# No stepWeight - traffic is based on header match
Blue/Green Deployment
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
spec:
analysis:
interval: 1m
threshold: 5
iterations: 5
# No stepWeight = Blue/Green
# Traffic switches 0% → 100% after analysis passes
GitOps Integration
With Flux
# fluxcd/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: myapp
namespace: flux-system
spec:
interval: 5m
path: ./deploy
prune: true
sourceRef:
kind: GitRepository
name: myapp
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: myapp
namespace: production
Workflow:
- Push image to registry
- Update image tag in Git
- Flux syncs the change
- Flagger detects update
- Progressive rollout begins
Monitoring the Rollout
kubectl
# Watch canary progress
kubectl get canary myapp -w
# NAME STATUS WEIGHT LASTTRANSITIONTIME
# myapp Progressing 30 2022-05-27T10:30:00Z
# myapp Progressing 40 2022-05-27T10:31:00Z
# myapp Succeeded 0 2022-05-27T10:35:00Z
Events
kubectl describe canary myapp
# Events:
# Normal Synced 2m flagger New revision detected
# Normal Synced 1m flagger Starting canary analysis
# Normal Synced 1m flagger Advance myapp weight 10
# Normal Synced 30s flagger Advance myapp weight 20
Rollback Scenarios
Automatic Rollback
Events:
Warning Synced 1m flagger Halt advancement request-success-rate 95.00 < 99
Warning Synced 30s flagger Halt advancement request-success-rate 94.50 < 99
Warning Synced 10s flagger Rolling back myapp.production failed checks threshold reached
Manual Rollback
# Pause analysis
kubectl annotate canary myapp flagger.app/pause=true
# Resume
kubectl annotate canary myapp flagger.app/pause-
Final Thoughts
Progressive delivery with Flagger:
- Reduces deployment risk
- Enables data-driven rollouts
- Automates what humans forget
Combine with GitOps for a complete deployment pipeline. Deploy with confidence.
Ship fast, fail small, recover automatically.