Kubernetes Operators: Automating Complex Apps

devops kubernetes go

Kubernetes manages containers well. But what about databases, message queues, and stateful applications? Operators encode operational knowledge into software.

What is an Operator?

An Operator is a Kubernetes controller that:

  1. Watches custom resources (CRDs)
  2. Reacts to changes
  3. Manages complex applications automatically

It’s the operations runbook, encoded as software.

Why Operators?

Without Operators

Running PostgreSQL on Kubernetes:

With an Operator

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: mydb
spec:
  postgresVersion: 14
  instances:
    - replicas: 3
  backup:
    pgbackrest:
      repos:
        - name: repo1
          schedules:
            full: "0 1 * * 0"

The operator handles replication, failover, backups, and scaling.

How Operators Work

The Reconciliation Loop:

┌────────────────────────────────────────────┐
│                                            │
│    1. Watch: Observe desired state         │
│           │                                │
│           ▼                                │
│    2. Compare: Current vs Desired          │
│           │                                │
│           ▼                                │
│    3. Act: Make changes to match           │
│           │                                │
│           └────────────────────────────────┘
│                    (Repeat)
└────────────────────────────────────────────┘

This is the controller pattern.

Prometheus Operator

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: main
spec:
  replicas: 2
  serviceMonitorSelector:
    matchLabels:
      team: frontend

Automatically discovers and scrapes services.

Kafka Operator (Strimzi)

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    replicas: 3
  zookeeper:
    replicas: 3

Full Kafka cluster lifecycle management.

Elasticsearch Operator (ECK)

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: logs
spec:
  version: 7.10.0
  nodeSets:
    - name: default
      count: 3

Building an Operator

Operator SDK

The easiest way to build operators:

# Install operator-sdk
brew install operator-sdk

# Create new operator
operator-sdk init --domain example.com --repo github.com/me/myoperator
operator-sdk create api --group cache --version v1 --kind Memcached --resource --controller

Define Your CRD

// api/v1/memcached_types.go
type MemcachedSpec struct {
    Size int32 `json:"size"`
}

type MemcachedStatus struct {
    Nodes []string `json:"nodes"`
}

type Memcached struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    
    Spec   MemcachedSpec   `json:"spec,omitempty"`
    Status MemcachedStatus `json:"status,omitempty"`
}

Implement Reconciliation

// controllers/memcached_controller.go
func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("memcached", req.NamespacedName)
    
    // Fetch the Memcached instance
    memcached := &cachev1.Memcached{}
    err := r.Get(ctx, req.NamespacedName, memcached)
    if err != nil {
        if errors.IsNotFound(err) {
            return ctrl.Result{}, nil
        }
        return ctrl.Result{}, err
    }
    
    // Check if deployment exists
    found := &appsv1.Deployment{}
    err = r.Get(ctx, types.NamespacedName{Name: memcached.Name, Namespace: memcached.Namespace}, found)
    
    if err != nil && errors.IsNotFound(err) {
        // Create deployment
        dep := r.deploymentForMemcached(memcached)
        err = r.Create(ctx, dep)
        return ctrl.Result{Requeue: true}, err
    }
    
    // Ensure correct replicas
    size := memcached.Spec.Size
    if *found.Spec.Replicas != size {
        found.Spec.Replicas = &size
        err = r.Update(ctx, found)
        return ctrl.Result{Requeue: true}, err
    }
    
    return ctrl.Result{}, nil
}

Operator Maturity Model

From OperatorHub:

LevelCapability
1Basic install
2Seamless upgrades
3Full lifecycle (backup/restore)
4Deep insights (metrics/alerts)
5Auto-pilot (auto-scaling, tuning)

Start simple, add capabilities over time.

When to Use an Operator

Good use cases:

Skip operators for:

When to Build Your Own

Build when:

Use existing when:

Finding Operators

Best Practices

Idempotency

Reconciliation must be safe to retry:

// Check before create
if err != nil && errors.IsNotFound(err) {
    // Safe to create
}

Status Updates

Report meaningful status:

memcached.Status.Nodes = getPodNames(podList.Items)
r.Status().Update(ctx, memcached)

Finalizers

Clean up external resources:

if memcached.ObjectMeta.DeletionTimestamp.IsZero() {
    if !containsString(memcached.Finalizers, myFinalizer) {
        memcached.Finalizers = append(memcached.Finalizers, myFinalizer)
    }
} else {
    // Object being deleted
    if containsString(memcached.Finalizers, myFinalizer) {
        cleanupExternalResources(memcached)
        memcached.Finalizers = removeString(memcached.Finalizers, myFinalizer)
    }
}

Final Thoughts

Operators bring human operational knowledge into Kubernetes. They’re the difference between “running PostgreSQL on K8s” and “managing PostgreSQL on K8s.”

Use existing operators for common software. Build your own when you have unique operational requirements.


Encode your runbooks. Let the machines operate.

All posts