Smørfugl Smørfugl, smoerfugl.dk

Kubernetes Operators: Extending Kubernetes with Custom Controllers

Kubernetes operators represent a powerful paradigm for extending Kubernetes functionality beyond its built-in capabilities. As applications become more complex and stateful, the need for domain-specific knowledge in managing these applications has led to the rise of operators as a crucial tool in the Kubernetes ecosystem. Having worked with various operators in production environments, I’ve seen firsthand how they transform the way we manage complex applications and infrastructure components.

What Are Kubernetes Operators?

At their core, Kubernetes operators are applications that use the Kubernetes API to manage other applications. They extend Kubernetes by implementing custom controllers that watch for changes to custom resources (CRs) and take action to ensure the desired state is maintained. Think of operators as “human operators” encoded in software - they encapsulate the operational knowledge needed to run complex applications.

The Operator Pattern The operator pattern consists of three main components:

  1. Custom Resource Definition (CRD): Defines the schema for your custom resource
  2. Custom Controller: Watches for changes to your custom resources and takes action
  3. Custom Resource (CR): Instances of your custom resource that represent the desired state

Why Use Operators?

Domain-Specific Knowledge Operators encapsulate operational expertise that would otherwise require specialized human operators. They know how to deploy, scale, backup, and recover applications based on domain-specific requirements.

Automation of Complex Operations Instead of manually managing complex stateful applications, operators automate routine tasks like:

  • Rolling updates with proper ordering
  • Backup and restore procedures
  • Disaster recovery processes
  • Configuration management
  • Health monitoring and self-healing

Consistency and Reliability Operators ensure that applications are managed consistently across environments, reducing human error and improving reliability. They implement best practices and operational procedures that might be difficult to enforce manually.

Building Your First Operator

Let’s walk through creating a simple operator using the Operator SDK, which provides a framework for building operators in Go, Ansible, or Helm.

1. Prerequisites

# Install Operator SDK
curl -L https://github.com/operator-framework/operator-sdk/releases/download/v1.28.0/operator-sdk_linux_amd64 -o operator-sdk
chmod +x operator-sdk
sudo mv operator-sdk /usr/local/bin/

# Install kubebuilder
curl -L -o kubebuilder https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)
chmod +x kubebuilder && mv kubebuilder /usr/local/bin/

2. Create a New Operator Project

# Create a new Go operator
operator-sdk init --domain example.com --repo github.com/example/my-operator

# Create an API for your custom resource
operator-sdk create api --group apps --version v1alpha1 --kind MyApp

3. Define Your Custom Resource

The generated CRD will look something like this:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: myapps.apps.example.com
spec:
  group: apps.example.com
  names:
    kind: MyApp
    listKind: MyAppList
    plural: myapps
    singular: myapp
  scope: Namespaced
  versions:
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              replicas:
                type: integer
                minimum: 1
                maximum: 10
              image:
                type: string
            required:
            - replicas
            - image

4. Implement the Controller Logic

Here’s a simplified example of controller logic:

// Reconcile handles the reconciliation logic
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // Fetch the MyApp instance
    myApp := &appsv1alpha1.MyApp{}
    err := r.Get(ctx, req.NamespacedName, myApp)
    if err != nil {
        if errors.IsNotFound(err) {
            // Request object not found, could have been deleted after reconcile request
            return ctrl.Result{}, nil
        }
        return ctrl.Result{}, err
    }

    // Check if deployment already exists
    deployment := &appsv1.Deployment{}
    err = r.Get(ctx, types.NamespacedName{Name: myApp.Name, Namespace: myApp.Namespace}, deployment)
    if err != nil && errors.IsNotFound(err) {
        // Create deployment
        dep := r.deploymentForMyApp(myApp)
        if err := r.Create(ctx, dep); err != nil {
            return ctrl.Result{}, err
        }
        return ctrl.Result{Requeue: true}, nil
    } else if err != nil {
        return ctrl.Result{}, err
    }

    // Ensure deployment spec matches MyApp spec
    if deployment.Spec.Replicas != &myApp.Spec.Replicas {
        deployment.Spec.Replicas = &myApp.Spec.Replicas
        if err := r.Update(ctx, deployment); err != nil {
            return ctrl.Result{}, err
        }
    }

    return ctrl.Result{}, nil
}

1. Operator SDK

The Operator SDK is the most popular framework for building operators, supporting multiple languages and providing scaffolding tools.

Go Operators

# Create a Go operator
operator-sdk init --domain example.com --repo github.com/example/go-operator
operator-sdk create api --group apps --version v1alpha1 --kind MyApp

Ansible Operators

# Create an Ansible operator
operator-sdk init --domain example.com --repo github.com/example/ansible-operator --plugins=ansible
operator-sdk create api --group apps --version v1alpha1 --kind MyApp --generate-role

Helm Operators

# Create a Helm operator
operator-sdk init --domain example.com --repo github.com/example/helm-operator --plugins=helm
operator-sdk create api --group apps --version v1alpha1 --kind MyApp --generate-helm-chart

2. Kubebuilder

Kubebuilder is another popular framework that focuses on Go operators and provides excellent tooling for building production-ready operators.

# Create a new project
kubebuilder init --domain example.com --repo github.com/example/my-operator

# Create an API
kubebuilder create api --group apps --version v1alpha1 --kind MyApp

Real-World Operator Examples

1. Prometheus Operator

The Prometheus Operator manages Prometheus monitoring stacks, handling complex configurations and service discovery.

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  replicas: 2
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi

2. Elasticsearch Operator

The Elasticsearch Operator manages Elasticsearch clusters, handling node management, scaling, and data persistence.

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
spec:
  version: 8.11.0
  nodeSets:
  - name: default
    count: 3
    config:
      node.roles: ["master", "data"]
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 2Gi
              cpu: 0.5

3. PostgreSQL Operator

The PostgreSQL Operator manages PostgreSQL clusters with features like automated backups, failover, and scaling.

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgresql-cluster
spec:
  instances: 3
  postgresql:
    parameters:
      max_connections: "100"
  bootstrap:
    initdb:
      database: myapp
      owner: myapp

Best Practices for Operator Development

1. Follow the Operator Maturity Model

The Operator Maturity Model provides guidelines for building production-ready operators:

  • Level 1: Basic Install: Operator can install and uninstall the application
  • Level 2: Seamless Upgrades: Operator handles application upgrades
  • Level 3: Full Lifecycle: Operator manages the complete application lifecycle
  • Level 4: Deep Insights: Operator provides detailed monitoring and alerting
  • Level 5: Auto Pilot: Operator can automatically handle complex scenarios

2. Implement Proper Error Handling

func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // Add proper error handling and logging
    logger := log.FromContext(ctx)
    
    myApp := &appsv1alpha1.MyApp{}
    if err := r.Get(ctx, req.NamespacedName, myApp); err != nil {
        if errors.IsNotFound(err) {
            logger.Info("MyApp resource not found, ignoring")
            return ctrl.Result{}, nil
        }
        logger.Error(err, "Failed to get MyApp")
        return ctrl.Result{}, err
    }
    
    // Handle reconciliation logic with proper error handling
    return ctrl.Result{}, nil
}

3. Use Finalizers for Cleanup

const myAppFinalizer = "myapp.example.com/finalizer"

func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    myApp := &appsv1alpha1.MyApp{}
    if err := r.Get(ctx, req.NamespacedName, myApp); err != nil {
        return ctrl.Result{}, err
    }

    // Handle deletion
    if myApp.DeletionTimestamp != nil {
        if contains(myApp.Finalizers, myAppFinalizer) {
            // Perform cleanup
            if err := r.cleanupResources(myApp); err != nil {
                return ctrl.Result{}, err
            }
            
            // Remove finalizer
            myApp.Finalizers = remove(myApp.Finalizers, myAppFinalizer)
            if err := r.Update(ctx, myApp); err != nil {
                return ctrl.Result{}, err
            }
        }
        return ctrl.Result{}, nil
    }

    // Add finalizer if not present
    if !contains(myApp.Finalizers, myAppFinalizer) {
        myApp.Finalizers = append(myApp.Finalizers, myAppFinalizer)
        if err := r.Update(ctx, myApp); err != nil {
            return ctrl.Result{}, err
        }
    }

    return ctrl.Result{}, nil
}

4. Implement Status Conditions

// Update status with conditions
func (r *MyAppReconciler) updateStatus(ctx context.Context, myApp *appsv1alpha1.MyApp, condition metav1.Condition) {
    meta.SetStatusCondition(&myApp.Status.Conditions, condition)
    if err := r.Status().Update(ctx, myApp); err != nil {
        log.FromContext(ctx).Error(err, "Failed to update status")
    }
}

Testing Your Operator

1. Unit Testing

func TestMyAppReconciler_Reconcile(t *testing.T) {
    // Create a test scheme
    scheme := runtime.NewScheme()
    _ = appsv1alpha1.AddToScheme(scheme)
    _ = appsv1.AddToScheme(scheme)

    // Create test objects
    myApp := &appsv1alpha1.MyApp{
        ObjectMeta: metav1.ObjectMeta{
            Name:      "test-myapp",
            Namespace: "default",
        },
        Spec: appsv1alpha1.MyAppSpec{
            Replicas: 3,
            Image:    "nginx:latest",
        },
    }

    // Create reconciler
    r := &MyAppReconciler{
        Client: fake.NewClientBuilder().WithScheme(scheme).WithObjects(myApp).Build(),
        Scheme: scheme,
    }

    // Test reconciliation
    req := ctrl.Request{
        NamespacedName: types.NamespacedName{
            Name:      "test-myapp",
            Namespace: "default",
        },
    }

    _, err := r.Reconcile(context.Background(), req)
    assert.NoError(t, err)
}

2. Integration Testing with EnvTest

func TestMyAppIntegration(t *testing.T) {
    // Setup test environment
    testEnv := &envtest.Environment{
        CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")},
    }

    cfg, err := testEnv.Start()
    require.NoError(t, err)
    defer testEnv.Stop()

    // Create manager and start controller
    mgr, err := manager.New(cfg, manager.Options{})
    require.NoError(t, err)

    // Add controller to manager
    err = (&MyAppReconciler{
        Client: mgr.GetClient(),
        Scheme: mgr.GetScheme(),
    }).SetupWithManager(mgr)
    require.NoError(t, err)

    // Start manager
    go func() {
        err := mgr.Start(context.Background())
        require.NoError(t, err)
    }()

    // Create test resource and verify
    // ... test implementation
}

Deploying Your Operator

1. Build and Push the Operator Image

# Build the operator image
operator-sdk build my-operator:v0.1.0

# Push to registry
docker push my-registry.com/my-operator:v0.1.0

2. Generate and Apply Manifests

# Generate manifests
operator-sdk generate kustomize manifests

# Build bundle
operator-sdk bundle create --generate-only

# Apply to cluster
kubectl apply -f config/samples/

3. Using Operator Lifecycle Manager (OLM)

# Install OLM
operator-sdk olm install

# Create bundle and catalog
operator-sdk bundle create --generate-only
operator-sdk bundle validate ./bundle

# Deploy using OLM
kubectl apply -f bundle/manifests/

Monitoring and Observability

1. Metrics and Prometheus Integration

import (
    "sigs.k8s.io/controller-runtime/pkg/metrics"
    "github.com/prometheus/client_golang/prometheus"
)

var (
    reconcileTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "myapp_reconcile_total",
            Help: "Total number of reconciliations",
        },
        []string{"result"},
    )
)

func init() {
    metrics.Registry.MustRegister(reconcileTotal)
}

2. Structured Logging

func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx).WithValues(
        "myapp", req.NamespacedName,
        "reconcileID", uuid.New().String(),
    )
    
    logger.Info("Starting reconciliation")
    defer logger.Info("Finished reconciliation")
    
    // ... reconciliation logic
}

Conclusion

Kubernetes operators represent a powerful way to extend Kubernetes with domain-specific knowledge and automation. They enable organizations to manage complex applications more effectively by encoding operational expertise into software.

The key to successful operator development lies in understanding the operator pattern, following best practices, and implementing proper testing and monitoring. Whether you’re building operators for internal applications or contributing to the broader Kubernetes ecosystem, operators provide a robust foundation for managing complex, stateful applications in Kubernetes.

As the Kubernetes ecosystem continues to evolve, operators will play an increasingly important role in managing the complexity of modern applications. By investing in operator development, organizations can achieve higher levels of automation, reliability, and operational efficiency in their Kubernetes environments.

The journey from basic CRDs to production-ready operators requires careful planning, proper testing, and adherence to best practices. However, the benefits in terms of operational efficiency and application reliability make it a worthwhile investment for any organization serious about Kubernetes operations.