Kubernetes Operators: Extending Kubernetes with Custom Controllers
Kubernetes operators represent a powerful paradigm for extending Kubernetes functionality beyond its built-in capabilities. As applications become more complex and stateful, the need for domain-specific knowledge in managing these applications has led to the rise of operators as a crucial tool in the Kubernetes ecosystem. Having worked with various operators in production environments, I’ve seen firsthand how they transform the way we manage complex applications and infrastructure components.
What Are Kubernetes Operators?
At their core, Kubernetes operators are applications that use the Kubernetes API to manage other applications. They extend Kubernetes by implementing custom controllers that watch for changes to custom resources (CRs) and take action to ensure the desired state is maintained. Think of operators as “human operators” encoded in software - they encapsulate the operational knowledge needed to run complex applications.
The Operator Pattern The operator pattern consists of three main components:
- Custom Resource Definition (CRD): Defines the schema for your custom resource
- Custom Controller: Watches for changes to your custom resources and takes action
- Custom Resource (CR): Instances of your custom resource that represent the desired state
Why Use Operators?
Domain-Specific Knowledge Operators encapsulate operational expertise that would otherwise require specialized human operators. They know how to deploy, scale, backup, and recover applications based on domain-specific requirements.
Automation of Complex Operations Instead of manually managing complex stateful applications, operators automate routine tasks like:
- Rolling updates with proper ordering
- Backup and restore procedures
- Disaster recovery processes
- Configuration management
- Health monitoring and self-healing
Consistency and Reliability Operators ensure that applications are managed consistently across environments, reducing human error and improving reliability. They implement best practices and operational procedures that might be difficult to enforce manually.
Building Your First Operator
Let’s walk through creating a simple operator using the Operator SDK, which provides a framework for building operators in Go, Ansible, or Helm.
1. Prerequisites
# Install Operator SDK
curl -L https://github.com/operator-framework/operator-sdk/releases/download/v1.28.0/operator-sdk_linux_amd64 -o operator-sdk
chmod +x operator-sdk
sudo mv operator-sdk /usr/local/bin/
# Install kubebuilder
curl -L -o kubebuilder https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)
chmod +x kubebuilder && mv kubebuilder /usr/local/bin/
2. Create a New Operator Project
# Create a new Go operator
operator-sdk init --domain example.com --repo github.com/example/my-operator
# Create an API for your custom resource
operator-sdk create api --group apps --version v1alpha1 --kind MyApp
3. Define Your Custom Resource
The generated CRD will look something like this:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: myapps.apps.example.com
spec:
group: apps.example.com
names:
kind: MyApp
listKind: MyAppList
plural: myapps
singular: myapp
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
replicas:
type: integer
minimum: 1
maximum: 10
image:
type: string
required:
- replicas
- image
4. Implement the Controller Logic
Here’s a simplified example of controller logic:
// Reconcile handles the reconciliation logic
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// Fetch the MyApp instance
myApp := &appsv1alpha1.MyApp{}
err := r.Get(ctx, req.NamespacedName, myApp)
if err != nil {
if errors.IsNotFound(err) {
// Request object not found, could have been deleted after reconcile request
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
// Check if deployment already exists
deployment := &appsv1.Deployment{}
err = r.Get(ctx, types.NamespacedName{Name: myApp.Name, Namespace: myApp.Namespace}, deployment)
if err != nil && errors.IsNotFound(err) {
// Create deployment
dep := r.deploymentForMyApp(myApp)
if err := r.Create(ctx, dep); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{Requeue: true}, nil
} else if err != nil {
return ctrl.Result{}, err
}
// Ensure deployment spec matches MyApp spec
if deployment.Spec.Replicas != &myApp.Spec.Replicas {
deployment.Spec.Replicas = &myApp.Spec.Replicas
if err := r.Update(ctx, deployment); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
Popular Operator Frameworks
1. Operator SDK
The Operator SDK is the most popular framework for building operators, supporting multiple languages and providing scaffolding tools.
Go Operators
# Create a Go operator
operator-sdk init --domain example.com --repo github.com/example/go-operator
operator-sdk create api --group apps --version v1alpha1 --kind MyApp
Ansible Operators
# Create an Ansible operator
operator-sdk init --domain example.com --repo github.com/example/ansible-operator --plugins=ansible
operator-sdk create api --group apps --version v1alpha1 --kind MyApp --generate-role
Helm Operators
# Create a Helm operator
operator-sdk init --domain example.com --repo github.com/example/helm-operator --plugins=helm
operator-sdk create api --group apps --version v1alpha1 --kind MyApp --generate-helm-chart
2. Kubebuilder
Kubebuilder is another popular framework that focuses on Go operators and provides excellent tooling for building production-ready operators.
# Create a new project
kubebuilder init --domain example.com --repo github.com/example/my-operator
# Create an API
kubebuilder create api --group apps --version v1alpha1 --kind MyApp
Real-World Operator Examples
1. Prometheus Operator
The Prometheus Operator manages Prometheus monitoring stacks, handling complex configurations and service discovery.
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
replicas: 2
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
2. Elasticsearch Operator
The Elasticsearch Operator manages Elasticsearch clusters, handling node management, scaling, and data persistence.
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
spec:
version: 8.11.0
nodeSets:
- name: default
count: 3
config:
node.roles: ["master", "data"]
podTemplate:
spec:
containers:
- name: elasticsearch
resources:
requests:
memory: 2Gi
cpu: 0.5
3. PostgreSQL Operator
The PostgreSQL Operator manages PostgreSQL clusters with features like automated backups, failover, and scaling.
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgresql-cluster
spec:
instances: 3
postgresql:
parameters:
max_connections: "100"
bootstrap:
initdb:
database: myapp
owner: myapp
Best Practices for Operator Development
1. Follow the Operator Maturity Model
The Operator Maturity Model provides guidelines for building production-ready operators:
- Level 1: Basic Install: Operator can install and uninstall the application
- Level 2: Seamless Upgrades: Operator handles application upgrades
- Level 3: Full Lifecycle: Operator manages the complete application lifecycle
- Level 4: Deep Insights: Operator provides detailed monitoring and alerting
- Level 5: Auto Pilot: Operator can automatically handle complex scenarios
2. Implement Proper Error Handling
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// Add proper error handling and logging
logger := log.FromContext(ctx)
myApp := &appsv1alpha1.MyApp{}
if err := r.Get(ctx, req.NamespacedName, myApp); err != nil {
if errors.IsNotFound(err) {
logger.Info("MyApp resource not found, ignoring")
return ctrl.Result{}, nil
}
logger.Error(err, "Failed to get MyApp")
return ctrl.Result{}, err
}
// Handle reconciliation logic with proper error handling
return ctrl.Result{}, nil
}
3. Use Finalizers for Cleanup
const myAppFinalizer = "myapp.example.com/finalizer"
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
myApp := &appsv1alpha1.MyApp{}
if err := r.Get(ctx, req.NamespacedName, myApp); err != nil {
return ctrl.Result{}, err
}
// Handle deletion
if myApp.DeletionTimestamp != nil {
if contains(myApp.Finalizers, myAppFinalizer) {
// Perform cleanup
if err := r.cleanupResources(myApp); err != nil {
return ctrl.Result{}, err
}
// Remove finalizer
myApp.Finalizers = remove(myApp.Finalizers, myAppFinalizer)
if err := r.Update(ctx, myApp); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
// Add finalizer if not present
if !contains(myApp.Finalizers, myAppFinalizer) {
myApp.Finalizers = append(myApp.Finalizers, myAppFinalizer)
if err := r.Update(ctx, myApp); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
4. Implement Status Conditions
// Update status with conditions
func (r *MyAppReconciler) updateStatus(ctx context.Context, myApp *appsv1alpha1.MyApp, condition metav1.Condition) {
meta.SetStatusCondition(&myApp.Status.Conditions, condition)
if err := r.Status().Update(ctx, myApp); err != nil {
log.FromContext(ctx).Error(err, "Failed to update status")
}
}
Testing Your Operator
1. Unit Testing
func TestMyAppReconciler_Reconcile(t *testing.T) {
// Create a test scheme
scheme := runtime.NewScheme()
_ = appsv1alpha1.AddToScheme(scheme)
_ = appsv1.AddToScheme(scheme)
// Create test objects
myApp := &appsv1alpha1.MyApp{
ObjectMeta: metav1.ObjectMeta{
Name: "test-myapp",
Namespace: "default",
},
Spec: appsv1alpha1.MyAppSpec{
Replicas: 3,
Image: "nginx:latest",
},
}
// Create reconciler
r := &MyAppReconciler{
Client: fake.NewClientBuilder().WithScheme(scheme).WithObjects(myApp).Build(),
Scheme: scheme,
}
// Test reconciliation
req := ctrl.Request{
NamespacedName: types.NamespacedName{
Name: "test-myapp",
Namespace: "default",
},
}
_, err := r.Reconcile(context.Background(), req)
assert.NoError(t, err)
}
2. Integration Testing with EnvTest
func TestMyAppIntegration(t *testing.T) {
// Setup test environment
testEnv := &envtest.Environment{
CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")},
}
cfg, err := testEnv.Start()
require.NoError(t, err)
defer testEnv.Stop()
// Create manager and start controller
mgr, err := manager.New(cfg, manager.Options{})
require.NoError(t, err)
// Add controller to manager
err = (&MyAppReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr)
require.NoError(t, err)
// Start manager
go func() {
err := mgr.Start(context.Background())
require.NoError(t, err)
}()
// Create test resource and verify
// ... test implementation
}
Deploying Your Operator
1. Build and Push the Operator Image
# Build the operator image
operator-sdk build my-operator:v0.1.0
# Push to registry
docker push my-registry.com/my-operator:v0.1.0
2. Generate and Apply Manifests
# Generate manifests
operator-sdk generate kustomize manifests
# Build bundle
operator-sdk bundle create --generate-only
# Apply to cluster
kubectl apply -f config/samples/
3. Using Operator Lifecycle Manager (OLM)
# Install OLM
operator-sdk olm install
# Create bundle and catalog
operator-sdk bundle create --generate-only
operator-sdk bundle validate ./bundle
# Deploy using OLM
kubectl apply -f bundle/manifests/
Monitoring and Observability
1. Metrics and Prometheus Integration
import (
"sigs.k8s.io/controller-runtime/pkg/metrics"
"github.com/prometheus/client_golang/prometheus"
)
var (
reconcileTotal = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "myapp_reconcile_total",
Help: "Total number of reconciliations",
},
[]string{"result"},
)
)
func init() {
metrics.Registry.MustRegister(reconcileTotal)
}
2. Structured Logging
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx).WithValues(
"myapp", req.NamespacedName,
"reconcileID", uuid.New().String(),
)
logger.Info("Starting reconciliation")
defer logger.Info("Finished reconciliation")
// ... reconciliation logic
}
Conclusion
Kubernetes operators represent a powerful way to extend Kubernetes with domain-specific knowledge and automation. They enable organizations to manage complex applications more effectively by encoding operational expertise into software.
The key to successful operator development lies in understanding the operator pattern, following best practices, and implementing proper testing and monitoring. Whether you’re building operators for internal applications or contributing to the broader Kubernetes ecosystem, operators provide a robust foundation for managing complex, stateful applications in Kubernetes.
As the Kubernetes ecosystem continues to evolve, operators will play an increasingly important role in managing the complexity of modern applications. By investing in operator development, organizations can achieve higher levels of automation, reliability, and operational efficiency in their Kubernetes environments.
The journey from basic CRDs to production-ready operators requires careful planning, proper testing, and adherence to best practices. However, the benefits in terms of operational efficiency and application reliability make it a worthwhile investment for any organization serious about Kubernetes operations.