GitOps 2.0: Progressive Delivery with Flagger

May 27, 2022

devops kubernetes

GitOps handles deployments. But how do you deploy safely? Flagger adds progressive delivery—canary releases, A/B testing, and automated rollbacks to your GitOps workflow.

The Problem

Traditional deployments are all-or-nothing:

Deploy v2 → 100% traffic → Discover bug → All users affected

Progressive delivery:

Deploy v2 → 5% traffic → Monitor → 20% → Monitor → 50% → 100%
               ↓
          Metrics bad?
               ↓
          Auto-rollback

What is Flagger?

Flagger is a Kubernetes operator for progressive delivery. It works with:

Istio
Linkerd
AWS App Mesh
NGINX Ingress
Contour
Gloo Edge

┌─────────────────────────────────────────────┐
│                 Flagger                      │
├─────────────────────────────────────────────┤
│  • Canary releases                          │
│  • A/B testing                              │
│  • Blue/Green deployments                   │
│  • Automated metrics analysis               │
│  • Automatic rollback                       │
└─────────────────────────────────────────────┘

Installation

# With Flux
helm repo add flagger https://flagger.app
helm upgrade -i flagger flagger/flagger \
  --namespace=flagger-system \
  --set meshProvider=istio \
  --set metricsServer=http://prometheus:9090

Canary Deployment

Define Canary Resource

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  
  service:
    port: 80
    targetPort: 8080
  
  analysis:
    # Run analysis every 1 minute
    interval: 1m
    # Max number of failed checks
    threshold: 5
    # Max traffic percentage
    maxWeight: 50
    # Traffic step weight
    stepWeight: 10
    
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m

How It Works

1. You update Deployment → Flagger detects change
2. Flagger creates canary Deployment (v2)
3. Routes 10% traffic to canary
4. Checks metrics (success rate, latency)
5. If good → increase to 20%, 30%...
6. If bad → rollback to v1
7. At 50% → promote v2 as primary

Traffic Progression

Time    Primary (v1)    Canary (v2)
  0m        100%            0%
  1m         90%           10%      ← First analysis
  2m         80%           20%      ← Metrics OK, step up
  3m         70%           30%      ← Metrics OK, step up
  ...
  5m         50%           50%      ← Max weight reached
  6m          0%          100%      ← Promotion

Custom Metrics

Prometheus Query

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: error-rate
  namespace: flagger-system
spec:
  provider:
    type: prometheus
    address: http://prometheus.monitoring:9090
  query: |
    100 - (
      sum(rate(http_requests_total{
        namespace="{{ namespace }}",
        app="{{ target }}",
        status!~"5.*"
      }[{{ interval }}])) 
      / 
      sum(rate(http_requests_total{
        namespace="{{ namespace }}",
        app="{{ target }}"
      }[{{ interval }}])) 
      * 100
    )

Using Custom Metric

spec:
  analysis:
    metrics:
    - name: error-rate
      templateRef:
        name: error-rate
        namespace: flagger-system
      thresholdRange:
        max: 1  # Max 1% error rate
      interval: 1m

Webhooks

Load Testing

spec:
  analysis:
    webhooks:
    - name: load-test
      type: rollout
      url: http://flagger-loadtester/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary:80/"

Smoke Tests

webhooks:
- name: acceptance-test
  type: pre-rollout
  url: http://flagger-loadtester/
  timeout: 30s
  metadata:
    cmd: "curl -s http://myapp-canary:80/health | grep ok"

Slack Notifications

webhooks:
- name: notify
  type: event
  url: https://hooks.slack.com/services/xxx/xxx/xxx
  metadata:
    payload: |
      {
        "text": "Canary {{ .Name }}: {{ .Phase }}",
        "blocks": [
          {
            "type": "section",
            "text": {
              "type": "mrkdwn",
              "text": "{{ .Message }}"
            }
          }
        ]
      }

A/B Testing

Route based on headers/cookies:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
spec:
  analysis:
    interval: 1m
    threshold: 10
    iterations: 10
    
    match:
    - headers:
        x-canary:
          exact: "true"
    - headers:
        cookie:
          regex: "^(.*?;)?(canary=true)(;.*)?$"
  
  # No stepWeight - traffic is based on header match

Blue/Green Deployment

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
spec:
  analysis:
    interval: 1m
    threshold: 5
    iterations: 5
    
    # No stepWeight = Blue/Green
    # Traffic switches 0% → 100% after analysis passes

GitOps Integration

With Flux

# fluxcd/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: myapp
  namespace: flux-system
spec:
  interval: 5m
  path: ./deploy
  prune: true
  sourceRef:
    kind: GitRepository
    name: myapp
  healthChecks:
    - apiVersion: apps/v1
      kind: Deployment
      name: myapp
      namespace: production

Workflow:

Push image to registry
Update image tag in Git
Flux syncs the change
Flagger detects update
Progressive rollout begins

Monitoring the Rollout

kubectl

# Watch canary progress
kubectl get canary myapp -w

# NAME    STATUS      WEIGHT   LASTTRANSITIONTIME
# myapp   Progressing 30       2022-05-27T10:30:00Z
# myapp   Progressing 40       2022-05-27T10:31:00Z
# myapp   Succeeded   0        2022-05-27T10:35:00Z

Events

kubectl describe canary myapp

# Events:
#   Normal   Synced  2m    flagger  New revision detected
#   Normal   Synced  1m    flagger  Starting canary analysis
#   Normal   Synced  1m    flagger  Advance myapp weight 10
#   Normal   Synced  30s   flagger  Advance myapp weight 20

Rollback Scenarios

Automatic Rollback

Events:
  Warning  Synced  1m   flagger  Halt advancement request-success-rate 95.00 < 99
  Warning  Synced  30s  flagger  Halt advancement request-success-rate 94.50 < 99
  Warning  Synced  10s  flagger  Rolling back myapp.production failed checks threshold reached

Manual Rollback

# Pause analysis
kubectl annotate canary myapp flagger.app/pause=true

# Resume
kubectl annotate canary myapp flagger.app/pause-

Final Thoughts

Progressive delivery with Flagger:

Reduces deployment risk
Enables data-driven rollouts
Automates what humans forget

Combine with GitOps for a complete deployment pipeline. Deploy with confidence.

Ship fast, fail small, recover automatically.