Kubernetes Operators: Automating Complex Apps
Kubernetes manages containers well. But what about databases, message queues, and stateful applications? Operators encode operational knowledge into software.
What is an Operator?
An Operator is a Kubernetes controller that:
- Watches custom resources (CRDs)
- Reacts to changes
- Manages complex applications automatically
It’s the operations runbook, encoded as software.
Why Operators?
Without Operators
Running PostgreSQL on Kubernetes:
- Create StatefulSet, PVC, Service
- Configure replication manually
- Handle failover yourself
- Manage backups via cron jobs
- Scale by editing YAML and praying
With an Operator
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: mydb
spec:
postgresVersion: 14
instances:
- replicas: 3
backup:
pgbackrest:
repos:
- name: repo1
schedules:
full: "0 1 * * 0"
The operator handles replication, failover, backups, and scaling.
How Operators Work
The Reconciliation Loop:
┌────────────────────────────────────────────┐
│ │
│ 1. Watch: Observe desired state │
│ │ │
│ ▼ │
│ 2. Compare: Current vs Desired │
│ │ │
│ ▼ │
│ 3. Act: Make changes to match │
│ │ │
│ └────────────────────────────────┘
│ (Repeat)
└────────────────────────────────────────────┘
This is the controller pattern.
Popular Operators
Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: main
spec:
replicas: 2
serviceMonitorSelector:
matchLabels:
team: frontend
Automatically discovers and scrapes services.
Kafka Operator (Strimzi)
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
replicas: 3
zookeeper:
replicas: 3
Full Kafka cluster lifecycle management.
Elasticsearch Operator (ECK)
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: logs
spec:
version: 7.10.0
nodeSets:
- name: default
count: 3
Building an Operator
Operator SDK
The easiest way to build operators:
# Install operator-sdk
brew install operator-sdk
# Create new operator
operator-sdk init --domain example.com --repo github.com/me/myoperator
operator-sdk create api --group cache --version v1 --kind Memcached --resource --controller
Define Your CRD
// api/v1/memcached_types.go
type MemcachedSpec struct {
Size int32 `json:"size"`
}
type MemcachedStatus struct {
Nodes []string `json:"nodes"`
}
type Memcached struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec MemcachedSpec `json:"spec,omitempty"`
Status MemcachedStatus `json:"status,omitempty"`
}
Implement Reconciliation
// controllers/memcached_controller.go
func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := r.Log.WithValues("memcached", req.NamespacedName)
// Fetch the Memcached instance
memcached := &cachev1.Memcached{}
err := r.Get(ctx, req.NamespacedName, memcached)
if err != nil {
if errors.IsNotFound(err) {
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
// Check if deployment exists
found := &appsv1.Deployment{}
err = r.Get(ctx, types.NamespacedName{Name: memcached.Name, Namespace: memcached.Namespace}, found)
if err != nil && errors.IsNotFound(err) {
// Create deployment
dep := r.deploymentForMemcached(memcached)
err = r.Create(ctx, dep)
return ctrl.Result{Requeue: true}, err
}
// Ensure correct replicas
size := memcached.Spec.Size
if *found.Spec.Replicas != size {
found.Spec.Replicas = &size
err = r.Update(ctx, found)
return ctrl.Result{Requeue: true}, err
}
return ctrl.Result{}, nil
}
Operator Maturity Model
From OperatorHub:
| Level | Capability |
|---|---|
| 1 | Basic install |
| 2 | Seamless upgrades |
| 3 | Full lifecycle (backup/restore) |
| 4 | Deep insights (metrics/alerts) |
| 5 | Auto-pilot (auto-scaling, tuning) |
Start simple, add capabilities over time.
When to Use an Operator
Good use cases:
- Stateful applications (databases, message queues)
- Complex deployment patterns
- Operational tasks (backups, upgrades)
- When you’d otherwise write shell scripts
Skip operators for:
- Stateless applications (use Deployments)
- Simple configurations
- One-off deployments
When to Build Your Own
Build when:
- No existing operator meets your needs
- You have significant operational complexity
- The operations are repeatable and automatable
Use existing when:
- Popular software (PostgreSQL, Redis, Kafka)
- Active community support
- Your needs are standard
Finding Operators
- OperatorHub.io: Community operators
- Artifact Hub: CNCF artifact registry
- Vendor sites: Official supported operators
Best Practices
Idempotency
Reconciliation must be safe to retry:
// Check before create
if err != nil && errors.IsNotFound(err) {
// Safe to create
}
Status Updates
Report meaningful status:
memcached.Status.Nodes = getPodNames(podList.Items)
r.Status().Update(ctx, memcached)
Finalizers
Clean up external resources:
if memcached.ObjectMeta.DeletionTimestamp.IsZero() {
if !containsString(memcached.Finalizers, myFinalizer) {
memcached.Finalizers = append(memcached.Finalizers, myFinalizer)
}
} else {
// Object being deleted
if containsString(memcached.Finalizers, myFinalizer) {
cleanupExternalResources(memcached)
memcached.Finalizers = removeString(memcached.Finalizers, myFinalizer)
}
}
Final Thoughts
Operators bring human operational knowledge into Kubernetes. They’re the difference between “running PostgreSQL on K8s” and “managing PostgreSQL on K8s.”
Use existing operators for common software. Build your own when you have unique operational requirements.
Encode your runbooks. Let the machines operate.