Files
kubernetes-conventions/patterns/production-go.md
T

15 KiB
Raw Blame History

Production Go Patterns (from Kubernetes)

Patterns for building large-scale Go codebases that go beyond what stdlib teaches you.

1. Code Generation Pattern

Source: staging/src/k8s.io/apimachinery/pkg/runtime/zz_generated.deepcopy.go, staging/src/k8s.io/client-go/informers/apps/v1/deployment.go

What it does

Kubernetes generates massive amounts of boilerplate code from annotations on types:

  • deepcopy-gen → DeepCopy/DeepCopyInto methods
  • informer-gen → typed informers (List/Watch/Lister per resource)
  • client-gen → typed client sets
  • lister-gen → typed lister interfaces
  • conversion-gen → version conversion functions
  • defaulter-gen → defaulting functions

Why

At Kubernetes scale (~50 resource types × multiple versions), hand-writing deep copy, client wrappers, and conversion code is:

  1. Error-prone (forgetting to copy a new field breaks everything)
  2. Unmaintainable (thousands of nearly-identical files)
  3. Not verifiable by human review

How it works

Annotations drive generation:

// +k8s:deepcopy-gen=true
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
type RawExtension struct { ... }

Generated output uses zz_generated. prefix (convention for "don't edit"):

// staging/src/k8s.io/apimachinery/pkg/runtime/zz_generated.deepcopy.go:22
// Code generated by deepcopy-gen. DO NOT EDIT.
package runtime

func (in *RawExtension) DeepCopyInto(out *RawExtension) {
    *out = *in
    if in.Raw != nil {
        in, out := &in.Raw, &out.Raw
        *out = make([]byte, len(*in))
        copy(*out, *in)
    }
}

Generated informers (note the header comment):

// staging/src/k8s.io/client-go/informers/apps/v1/deployment.go:20
// Code generated by informer-gen. DO NOT EDIT.

When to Use

Triggers:

  • You have 10+ types that need identical boilerplate methods (DeepCopy, Validate, Marshal)
  • Hand-writing the code is error-prone (forgetting to copy a new field causes silent bugs)
  • The generated output is mechanical and reviewable, not creative

Example — before:

// Hand-written deep copy for every type — 50 types × 30 lines each = 1500 lines of bugs
func (in *Deployment) DeepCopy() *Deployment {
    out := new(Deployment)
    out.Name = in.Name
    out.Labels = make(map[string]string)
    for k, v := range in.Labels { out.Labels[k] = v }
    // Did you remember Annotations? Finalizers? Every nested struct?
}

Example — after:

// +k8s:deepcopy-gen=true
type Deployment struct {
    Name        string
    Labels      map[string]string
    Annotations map[string]string
}
// Generated: zz_generated.deepcopy.go handles ALL fields correctly, always.
// Adding a new field? Re-run generator. Zero chance of forgetting.

Key Insight

Stdlib has no code generation culture. stdlib keeps things small enough that hand-writing works. Kubernetes proves that once you cross ~20 types with shared behavior, code gen is the only sane path.


2. The Scheme / Type Registry Pattern

Source: staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go (lines 38100), scheme_builder.go

What it does

The Scheme is a runtime type registry that maps:

  • GroupVersionKind → Go type (reflect.Type)
  • Go type → []GroupVersionKind
  • Provides serialization, defaulting, conversion, and validation dispatch

Why

Kubernetes has 50+ resource types across 15+ API groups, each with multiple versions. The Scheme provides:

  • Dynamic dispatch: serialize any Object without knowing its concrete type
  • Version conversion: convert between v1 and v1beta1 transparently
  • Pluggability: third-party resources register into the same system

Structure

// staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go:38-98
type Scheme struct {
    gvkToType       map[schema.GroupVersionKind]reflect.Type
    typeToGVK       map[reflect.Type][]schema.GroupVersionKind
    unversionedTypes map[reflect.Type]schema.GroupVersionKind
    defaulterFuncs  map[reflect.Type]func(interface{})
    validationFuncs map[reflect.Type]func(ctx, op, obj, oldObj) field.ErrorList
    converter       *conversion.Converter
    versionPriority map[string][]string
}

SchemeBuilder Pattern

// staging/src/k8s.io/apimachinery/pkg/runtime/scheme_builder.go:23-48
type SchemeBuilder []func(*Scheme) error

func (sb *SchemeBuilder) AddToScheme(s *Scheme) error {
    for _, f := range *sb {
        if err := f(s); err != nil {
            return err
        }
    }
    return nil
}

func (sb *SchemeBuilder) Register(funcs ...func(*Scheme) error) {
    *sb = append(*sb, f)
}

How Registration Works

// staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go:151-160
func (s *Scheme) AddKnownTypes(gv schema.GroupVersion, types ...Object) {
    for _, obj := range types {
        t := reflect.TypeOf(obj)
        if t.Kind() != reflect.Pointer {
            panic("All types must be pointers to structs.")
        }
        t = t.Elem()
        s.AddKnownTypeWithName(gv.WithKind(t.Name()), obj)
    }
}

Key Insight

This is Java's ServiceLoader / dependency injection adapted for Go's type system. Stdlib uses interfaces; Kubernetes needs a runtime type system on top of Go's static type system because API objects must be dynamically dispatched across version boundaries.


3. The runtime.Object Interface

Source: staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go (lines 333342)

What it does

Every Kubernetes API object must implement this two-method interface:

// staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go:337-341
type Object interface {
    GetObjectKind() schema.ObjectKind
    DeepCopyObject() Object
}

Why

  • GetObjectKind() — allows the serialization layer to determine what type an object is without reflection
  • DeepCopyObject() — enables safe concurrent access (informer cache is shared; mutations must happen on copies)

Key Insight

This is the foundation of Kubernetes' extensibility. Any Go struct that satisfies these two methods can participate in the entire API machinery — serialization, storage, admission, informers, etc. CRDs generate code that implements this interface.


4. Deep Copy Everywhere

Source: Generated code in zz_generated.deepcopy.go files throughout the tree

What it does

Every API type has generated DeepCopy() and DeepCopyInto() methods that create true deep copies including nested slices, maps, and pointer fields.

Why

The informer cache is shared across all controllers in a process. If controller A gets an object from the cache and mutates it, controller B would see corrupted data. Deep copy provides the isolation guarantee.

// Usage pattern in controllers:
deployment := deploymentFromCache.DeepCopy()
deployment.Spec.Replicas = ptr.To[int32](3)
_, err := client.AppsV1().Deployments(ns).Update(ctx, deployment, metav1.UpdateOptions{})

Key Insight

Stdlib rarely needs deep copy because stdlib objects are typically owned by one goroutine. Kubernetes has a shared read cache (the informer store) that necessitates copy-on-write semantics at the application level.


5. Graceful Shutdown with Priority Classes

Source: pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go (lines 23100)

What it does

When a node is shutting down, pods are terminated in priority order. Critical pods (system-node-critical) get more grace time than regular pods.

Why

A hard kill of all pods simultaneously would lose important work. Priority-based graceful shutdown preserves the most important workloads longest.

// pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go:66-90
type managerImpl struct {
    logger             klog.Logger
    recorder           record.EventRecorder
    getPods            eviction.ActivePodsFunc
    syncNodeStatus     func(context.Context)
    dbusCon            dbusInhibiter
    inhibitLock        systemd.InhibitLock
    nodeShuttingDownMutex sync.Mutex
    nodeShuttingDownNow   bool
    podManager         *podManager
}

6. Context as Logger Carrier

Source: pkg/controller/deployment/deployment_controller.go (lines 106, 179, 500)

What it does

Kubernetes passes structured loggers through context:

// pkg/controller/deployment/deployment_controller.go:179
logger := klog.FromContext(ctx)
logger.Info("Starting controller", "controller", "deployment")

Why

At scale, you need structured logging with:

  • Consistent key-value pairs (controller name, object reference)
  • Verbosity levels (logger.V(4).Info(...))
  • No global state (context carries the logger configured by the caller)

Key Insight

Stdlib's log package is global. Kubernetes uses context-based structured logging (klog.FromContext) to allow each call chain to carry its own logger configuration. This enables filtering by controller, verbosity tuning per-component, and correlation.


7. Functional Options for Configuration

Source: staging/src/k8s.io/client-go/informers/factory.go (lines 83127)

What it does

The SharedInformerFactory uses functional options for configuration:

// staging/src/k8s.io/client-go/informers/factory.go:57
type SharedInformerOption func(*sharedInformerFactory) *sharedInformerFactory

func WithNamespace(namespace string) SharedInformerOption {
    return func(factory *sharedInformerFactory) *sharedInformerFactory {
        factory.namespace = namespace
        return factory
    }
}

func WithTransform(transform cache.TransformFunc) SharedInformerOption {
    return func(factory *sharedInformerFactory) *sharedInformerFactory {
        factory.transform = transform
        return factory
    }
}

func NewSharedInformerFactoryWithOptions(client kubernetes.Interface, defaultResync time.Duration, options ...SharedInformerOption) SharedInformerFactory {
    factory := &sharedInformerFactory{...}
    for _, opt := range options {
        factory = opt(factory)
    }
    return factory
}

Why

APIs evolve. Adding a new configuration option shouldn't break callers. Functional options provide:

  • Backward compatibility (new options don't change existing signatures)
  • Self-documenting (each option is a named function)
  • Composability (options can be collected and applied conditionally)

8. Type-Safe Generics in Critical Paths

Source: staging/src/k8s.io/client-go/util/workqueue/queue.go (lines 33200), staging/src/k8s.io/client-go/gentype/type.go (lines 33120)

What it does

Both workqueue and gentype use Go generics (1.18+) to provide type-safe interfaces while maintaining backward compatibility via type aliases:

// Workqueue: type-safe queue
type TypedInterface[T comparable] interface {
    Add(item T)
    Get() (item T, shutdown bool)
    Done(item T)
}

// Type alias for backward compat
type Type = Typed[any]

// Gentype: type-safe client
type Client[T objectWithMeta] struct {
    resource       string
    client         rest.Interface
    namespace      string
    newObject      func() T
}

Why

Before generics, Kubernetes used interface{} everywhere, requiring type assertions at every boundary. Generics eliminate entire classes of runtime panics and make the code self-documenting.

Key Insight

This is a migration pattern: introduce the generic version alongside the deprecated interface{} version using type aliases. Callers migrate at their own pace.


9. HandleCrash — Structured Panic Recovery

Source: staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go (lines 30120)

What it does

A standardized defer HandleCrash() pattern that:

  1. Catches panics
  2. Logs them with proper stack attribution
  3. Invokes registered panic handlers
  4. Optionally re-panics (controlled by ReallyCrash flag)
// staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:78-82
func HandleCrashWithContext(ctx context.Context, additionalHandlers ...func(context.Context, interface{})) {
    if r := recover(); r != nil {
        handleCrash(ctx, r, additionalHandlers...)
    }
}

Why

In a production system with hundreds of goroutines, an unrecovered panic in one kills the entire process. HandleCrash provides a standardized recovery point that:

  • Logs the panic with caller attribution
  • Allows cleanup handlers (shutdown gracefully)
  • In tests, can be configured to not actually crash

When to Use

Triggers:

  • You're running multiple independent subsystems in one process (multiple controllers, background workers)
  • A panic in one subsystem shouldn't kill the entire process
  • You need structured logging of panic stack traces before potential recovery

Example — before:

// One bad nil pointer in workerB kills workerA, workerC, and the whole server
func main() {
    go workerA(ctx)
    go workerB(ctx) // panics → entire process dies
    go workerC(ctx)
    select {}
}

Example — after:

func safeGo(ctx context.Context, name string, f func(ctx context.Context)) {
    go func() {
        defer func() {
            if r := recover(); r != nil {
                log.Printf("panic in %s: %v
%s", name, r, debug.Stack())
                // Log, alert, increment metric — but don't kill siblings
            }
        }()
        f(ctx)
    }()
}

func main() {
    safeGo(ctx, "worker-a", workerA)
    safeGo(ctx, "worker-b", workerB) // panics → logged, other workers continue
    safeGo(ctx, "worker-c", workerC)
    select {}
}

Key Insight

Stdlib's approach is "let it crash." Kubernetes' approach is "catch it, log it, let the controller retry on the next sync." This is only safe because the controller pattern is idempotent.


10. ContextForChannel — Bridge Pattern

Source: staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go (lines 120145)

What it does

Bridges the older <-chan struct{} stop pattern to the modern context.Context pattern:

// staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:120-142
func ContextForChannel(parentCh <-chan struct{}) context.Context {
    return channelContext{stopCh: parentCh}
}

type channelContext struct {
    stopCh <-chan struct{}
}

func (c channelContext) Done() <-chan struct{} { return c.stopCh }
func (c channelContext) Err() error {
    select {
    case <-c.stopCh:
        return context.Canceled
    default:
        return nil
    }
}

Why

Kubernetes predates context.Context (which arrived in Go 1.7). Millions of lines of code use stopCh <-chan struct{}. Rather than a big-bang rewrite, this adapter allows gradual migration.

Key Insight

Large codebases can't do breaking API changes atomically. This bridge pattern is how you evolve from one idiom to another over years without breaking everything at once.