rodin/kubernetes-conventions

Fork 0

Files

T

aweiker 5d2e5b43c3 docs: add when/when-not to all Kubernetes patterns

2026-04-30 13:41:33 +00:00

36 KiB

Raw Permalink Blame History

Production Go Patterns (from Kubernetes)

Patterns for building large-scale Go codebases that go beyond what stdlib teaches you.

1. Code Generation Pattern

Source: staging/src/k8s.io/apimachinery/pkg/runtime/zz_generated.deepcopy.go, staging/src/k8s.io/client-go/informers/apps/v1/deployment.go

What it does

Kubernetes generates massive amounts of boilerplate code from annotations on types:

deepcopy-gen → DeepCopy/DeepCopyInto methods
informer-gen → typed informers (List/Watch/Lister per resource)
client-gen → typed client sets
lister-gen → typed lister interfaces
conversion-gen → version conversion functions
defaulter-gen → defaulting functions

Why

At Kubernetes scale (~50 resource types × multiple versions), hand-writing deep copy, client wrappers, and conversion code is:

Error-prone (forgetting to copy a new field breaks everything)
Unmaintainable (thousands of nearly-identical files)
Not verifiable by human review

How it works

Annotations drive generation:

// +k8s:deepcopy-gen=true
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
type RawExtension struct { ... }

Generated output uses zz_generated. prefix (convention for "don't edit"):

// staging/src/k8s.io/apimachinery/pkg/runtime/zz_generated.deepcopy.go:22
// Code generated by deepcopy-gen. DO NOT EDIT.
package runtime

func (in *RawExtension) DeepCopyInto(out *RawExtension) {
    *out = *in
    if in.Raw != nil {
        in, out := &in.Raw, &out.Raw
        *out = make([]byte, len(*in))
        copy(*out, *in)
    }
}

Generated informers (note the header comment):

// staging/src/k8s.io/client-go/informers/apps/v1/deployment.go:20
// Code generated by informer-gen. DO NOT EDIT.

When to Use

Triggers:

You have 10+ types that need identical boilerplate methods (DeepCopy, Validate, Marshal)
Hand-writing the code is error-prone (forgetting to copy a new field causes silent bugs)
The generated output is mechanical and reviewable, not creative

Example — before:

// Hand-written deep copy for every type — 50 types × 30 lines each = 1500 lines of bugs
func (in *Deployment) DeepCopy() *Deployment {
    out := new(Deployment)
    out.Name = in.Name
    out.Labels = make(map[string]string)
    for k, v := range in.Labels { out.Labels[k] = v }
    // Did you remember Annotations? Finalizers? Every nested struct?
}

Example — after:

// +k8s:deepcopy-gen=true
type Deployment struct {
    Name        string
    Labels      map[string]string
    Annotations map[string]string
}
// Generated: zz_generated.deepcopy.go handles ALL fields correctly, always.
// Adding a new field? Re-run generator. Zero chance of forgetting.

When NOT to Use

Don't use this when:

You have fewer than ~5 types (hand-writing is faster and more readable)
The boilerplate varies significantly between types (generators work best for uniform patterns)
The generated code would be harder to debug than hand-written code (e.g., complex business logic)

Over-application example:

// Generating a "ToString" method for 3 types — overkill
//go:generate stringer -type=Status,Priority,Phase

// You now have 3 generated files, a build dependency on stringer,
// and anyone reading the code has to understand the generation pipeline
// for what amounts to 9 lines of hand-written code.

Better alternative:

// Just write it — 3 types is not "at scale"
func (s Status) String() string {
    switch s {
    case Running: return "Running"
    case Stopped: return "Stopped"
    default: return fmt.Sprintf("Status(%d)", s)
    }
}

Why: Code generation adds build complexity (Makefiles, CI steps, go generate ordering), makes debugging harder (stack traces point to generated code), and obscures logic. It pays off only when the volume of boilerplate is large enough that correctness via automation outweighs these costs.

Key Insight

Stdlib has no code generation culture. stdlib keeps things small enough that hand-writing works. Kubernetes proves that once you cross ~20 types with shared behavior, code gen is the only sane path.

2. The Scheme / Type Registry Pattern

Source: staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go (lines 38–100), scheme_builder.go

What it does

The Scheme is a runtime type registry that maps:

GroupVersionKind → Go type (reflect.Type)
Go type → []GroupVersionKind
Provides serialization, defaulting, conversion, and validation dispatch

Why

Kubernetes has 50+ resource types across 15+ API groups, each with multiple versions. The Scheme provides:

Dynamic dispatch: serialize any Object without knowing its concrete type
Version conversion: convert between v1 and v1beta1 transparently
Pluggability: third-party resources register into the same system

Structure

// staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go:38-98
type Scheme struct {
    gvkToType       map[schema.GroupVersionKind]reflect.Type
    typeToGVK       map[reflect.Type][]schema.GroupVersionKind
    unversionedTypes map[reflect.Type]schema.GroupVersionKind
    defaulterFuncs  map[reflect.Type]func(interface{})
    validationFuncs map[reflect.Type]func(ctx, op, obj, oldObj) field.ErrorList
    converter       *conversion.Converter
    versionPriority map[string][]string
}

SchemeBuilder Pattern

// staging/src/k8s.io/apimachinery/pkg/runtime/scheme_builder.go:23-48
type SchemeBuilder []func(*Scheme) error

func (sb *SchemeBuilder) AddToScheme(s *Scheme) error {
    for _, f := range *sb {
        if err := f(s); err != nil {
            return err
        }
    }
    return nil
}

func (sb *SchemeBuilder) Register(funcs ...func(*Scheme) error) {
    *sb = append(*sb, f)
}

How Registration Works

// staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go:151-160
func (s *Scheme) AddKnownTypes(gv schema.GroupVersion, types ...Object) {
    for _, obj := range types {
        t := reflect.TypeOf(obj)
        if t.Kind() != reflect.Pointer {
            panic("All types must be pointers to structs.")
        }
        t = t.Elem()
        s.AddKnownTypeWithName(gv.WithKind(t.Name()), obj)
    }
}

When to Use

Triggers:

You have many types that must be serialized/deserialized polymorphically (you read JSON/YAML and don't know the concrete type ahead of time)
You need version conversion between different representations of the same concept
Third parties need to register new types without modifying your core code

Example — before:

// Hard-coded switch statement — breaks open/closed principle
func Deserialize(data []byte) (Object, error) {
    kind := extractKind(data)
    switch kind {
    case "Deployment":
        var d Deployment
        json.Unmarshal(data, &d)
        return &d, nil
    case "Service":
        var s Service
        json.Unmarshal(data, &s)
        return &s, nil
    // ... 50 more cases, each a potential bug
    default:
        return nil, fmt.Errorf("unknown kind: %s", kind)
    }
}

Example — after:

// Type registry — extensible, no switch statements
scheme := runtime.NewScheme()
scheme.AddKnownTypes(appsv1.SchemeGroupVersion, &Deployment{}, &ReplicaSet{})
scheme.AddKnownTypes(corev1.SchemeGroupVersion, &Service{}, &Pod{})

func Deserialize(scheme *runtime.Scheme, data []byte) (Object, error) {
    gvk := extractGVK(data)
    obj, err := scheme.New(gvk) // creates correct concrete type
    if err != nil { return nil, err }
    json.Unmarshal(data, obj)
    return obj, nil
}
// New types register themselves — no core code changes needed

When NOT to Use

Don't use this when:

You have a small, fixed set of types that won't grow (a switch statement is simpler and fully type-safe at compile time)
You don't need dynamic/polymorphic deserialization (you always know the concrete type)
You're not building a plugin system (nobody external needs to register types)

Over-application example:

// Type registry for an internal service with 3 event types — over-engineered
var eventScheme = NewScheme()
func init() {
    eventScheme.Register("OrderCreated", reflect.TypeOf(OrderCreated{}))
    eventScheme.Register("OrderShipped", reflect.TypeOf(OrderShipped{}))
    eventScheme.Register("OrderCanceled", reflect.TypeOf(OrderCanceled{}))
}
// Now debugging requires understanding the registry, reflection, runtime dispatch...

Better alternative:

// Simple interface + switch — readable, debuggable, compile-time safe
type Event interface { EventType() string }

func handleEvent(data []byte) error {
    switch extractType(data) {
    case "OrderCreated":
        var e OrderCreated
        json.Unmarshal(data, &e)
        return processCreated(e)
    case "OrderShipped":
        // ...
    }
}

Why: Type registries trade compile-time safety for runtime extensibility. In Kubernetes (50+ types, CRDs, multiple API groups), this tradeoff is essential. In a service with a handful of known types, you're paying the complexity cost (reflection, runtime errors, harder debugging) without the benefit.

Key Insight

This is Java's ServiceLoader / dependency injection adapted for Go's type system. Stdlib uses interfaces; Kubernetes needs a runtime type system on top of Go's static type system because API objects must be dynamically dispatched across version boundaries.

3. The runtime.Object Interface

Source: staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go (lines 333–342)

What it does

Every Kubernetes API object must implement this two-method interface:

// staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go:337-341
type Object interface {
    GetObjectKind() schema.ObjectKind
    DeepCopyObject() Object
}

Why

GetObjectKind() — allows the serialization layer to determine what type an object is without reflection
DeepCopyObject() — enables safe concurrent access (informer cache is shared; mutations must happen on copies)

When to Use

Triggers:

You're building a framework where heterogeneous types flow through a common pipeline (serialization, admission, storage)
You need to identify the "kind" of an object at runtime without type switches
Concurrent access requires copy semantics built into the type contract

Example — before:

// No common interface — every handler needs type assertions
func store(obj interface{}) error {
    switch v := obj.(type) {
    case *Deployment:
        return storeDeployment(v)
    case *Service:
        return storeService(v)
    // Every new type requires changes here
    }
}

Example — after:

// Common interface — generic pipeline handles any type
func store(obj runtime.Object) error {
    gvk := obj.GetObjectKind().GroupVersionKind()
    key := buildKey(gvk, obj)
    data, _ := serialize(obj)
    return etcd.Put(key, data)
    // Works for any registered type — no switch statement
}

When NOT to Use

Don't use this when:

Your types don't need polymorphic handling (you always know the concrete type at each call site)
Deep copy is irrelevant (no shared caches, no concurrent access)
You're not building framework-level infrastructure (application code rarely needs this)

Over-application example:

// Implementing runtime.Object on application-level domain types
type Invoice struct { ... }
func (i *Invoice) GetObjectKind() schema.ObjectKind { return &i.TypeMeta }
func (i *Invoice) DeepCopyObject() runtime.Object { return i.DeepCopy() }

// Now your business logic imports k8s.io/apimachinery — for what?

Better alternative: Use a domain-specific interface that captures what your code actually needs:

type Storable interface {
    ID() string
    Marshal() ([]byte, error)
}

Why: runtime.Object is designed for Kubernetes' specific needs (polymorphic serialization across API versions). Adopting it in non-Kubernetes code couples you to a heavy dependency for an abstraction that doesn't match your domain.

Key Insight

This is the foundation of Kubernetes' extensibility. Any Go struct that satisfies these two methods can participate in the entire API machinery — serialization, storage, admission, informers, etc. CRDs generate code that implements this interface.

4. Deep Copy Everywhere

Source: Generated code in zz_generated.deepcopy.go files throughout the tree

What it does

Every API type has generated DeepCopy() and DeepCopyInto() methods that create true deep copies including nested slices, maps, and pointer fields.

Why

The informer cache is shared across all controllers in a process. If controller A gets an object from the cache and mutates it, controller B would see corrupted data. Deep copy provides the isolation guarantee.

// Usage pattern in controllers:
deployment := deploymentFromCache.DeepCopy()
deployment.Spec.Replicas = ptr.To[int32](3)
_, err := client.AppsV1().Deployments(ns).Update(ctx, deployment, metav1.UpdateOptions{})

When to Use

Triggers:

You read from a shared data structure (cache, registry, config store) and need to modify the result
Multiple goroutines access the same data and at least one modifies it
You're passing data across ownership boundaries (your function returns data that the caller might mutate)

Example — before:

// Returning a pointer to internal state — caller can corrupt it
func (c *ConfigStore) GetConfig() *Config {
    return c.current // caller mutates this → corrupts store
}

Example — after:

// Return a copy — caller can mutate freely
func (c *ConfigStore) GetConfig() *Config {
    c.mu.RLock()
    defer c.mu.RUnlock()
    copy := c.current.DeepCopy()
    return copy
}

When NOT to Use

Don't use this when:

The data is owned by a single goroutine (no concurrent access, no shared cache)
You only need to read the data, never modify it (deep copy for read-only access is wasted allocation)
Performance is critical and you can prove safety via immutability or other means (e.g., freezing the object after construction)

Over-application example:

// Deep copying on every read, even when only logging
func (c *Controller) logStatus(ctx context.Context, key string) {
    obj, _ := c.lister.Get(key)
    copy := obj.DeepCopy() // Allocates! But we never mutate copy
    logger.Info("status", "phase", copy.Status.Phase)
}

Better alternative:

// Read directly from cache — no mutation, no copy needed
func (c *Controller) logStatus(ctx context.Context, key string) {
    obj, _ := c.lister.Get(key)
    logger.Info("status", "phase", obj.Status.Phase) // read-only: safe
}

Why: Deep copy allocates memory and does work proportional to object size. In hot paths that only read data, it's pure waste. Copy only when you intend to mutate.

Key Insight

Stdlib rarely needs deep copy because stdlib objects are typically owned by one goroutine. Kubernetes has a shared read cache (the informer store) that necessitates copy-on-write semantics at the application level.

5. Graceful Shutdown with Priority Classes

Source: pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go (lines 23–100)

What it does

When a node is shutting down, pods are terminated in priority order. Critical pods (system-node-critical) get more grace time than regular pods.

Why

A hard kill of all pods simultaneously would lose important work. Priority-based graceful shutdown preserves the most important workloads longest.

// pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go:66-90
type managerImpl struct {
    logger             klog.Logger
    recorder           record.EventRecorder
    getPods            eviction.ActivePodsFunc
    syncNodeStatus     func(context.Context)
    dbusCon            dbusInhibiter
    inhibitLock        systemd.InhibitLock
    nodeShuttingDownMutex sync.Mutex
    nodeShuttingDownNow   bool
    podManager         *podManager
}

When to Use

Triggers:

Your system runs workloads with different importance levels (critical infrastructure vs. batch jobs)
You receive shutdown signals and need to drain gracefully, not crash
Some work MUST complete (leader lease release, data flush) while other work can be interrupted

Example — before:

// Flat shutdown: everything gets the same 30s, critical work might not finish
func main() {
    sig := make(chan os.Signal, 1)
    signal.Notify(sig, syscall.SIGTERM)
    <-sig
    
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    server.Shutdown(ctx) // hope 30s is enough for everything
}

Example — after:

// Priority-based shutdown: critical work gets more time
func main() {
    sig := make(chan os.Signal, 1)
    signal.Notify(sig, syscall.SIGTERM)
    <-sig
    
    // Phase 1: stop accepting new work (immediate)
    server.StopAccepting()
    
    // Phase 2: drain low-priority work (5s budget)
    ctx1, cancel1 := context.WithTimeout(context.Background(), 5*time.Second)
    batchWorker.Shutdown(ctx1)
    cancel1()
    
    // Phase 3: drain critical work (25s budget)
    ctx2, cancel2 := context.WithTimeout(context.Background(), 25*time.Second)
    leaderElector.Release(ctx2)
    dataStore.Flush(ctx2)
    cancel2()
}

When NOT to Use

Don't use this when:

All your work is equally important (flat shutdown with a single timeout is simpler)
Your process is stateless and restarts are cheap (just die and restart)
Shutdown time is not constrained (you can wait as long as needed for everything to drain)

Over-application example:

// Priority shutdown for a stateless HTTP server with no background work
type ShutdownManager struct {
    priorities []PriorityClass
    workers    map[PriorityClass][]Worker
}
// All this machinery for... stopping an HTTP handler that has no in-flight state?

Better alternative:

// Simple graceful shutdown — stateless server just drains connections
server := &http.Server{}
go server.ListenAndServe()
<-sigCh
server.Shutdown(ctx) // drains in-flight requests, done

Why: Priority-based shutdown adds ordering logic, timeout budgets per phase, and coordination complexity. It's justified when different subsystems have genuinely different importance. For simple services, http.Server.Shutdown() is all you need.

6. Context as Logger Carrier

Source: pkg/controller/deployment/deployment_controller.go (lines 106, 179, 500)

What it does

Kubernetes passes structured loggers through context:

// pkg/controller/deployment/deployment_controller.go:179
logger := klog.FromContext(ctx)
logger.Info("Starting controller", "controller", "deployment")

Why

At scale, you need structured logging with:

Consistent key-value pairs (controller name, object reference)
Verbosity levels (logger.V(4).Info(...))
No global state (context carries the logger configured by the caller)

When to Use

Triggers:

You have deep call stacks where log context (request ID, user, controller name) should propagate without explicit parameters
Multiple subsystems share code but need different logger configurations (verbosity, output format)
You want to enrich logs with contextual metadata without threading a logger through every function signature

Example — before:

// Global logger: no context, hard to filter, pollutes all output
var log = logrus.New()

func processOrder(orderID string) error {
    log.Info("processing order") // which order? which service? which request?
    items := fetchItems(orderID)
    log.Infof("fetched %d items", len(items)) // no correlation possible
    return nil
}

Example — after:

// Context-carried logger: each request has its own enriched logger
func processOrder(ctx context.Context, orderID string) error {
    logger := klog.FromContext(ctx)
    logger = logger.WithValues("orderID", orderID)
    ctx = klog.NewContext(ctx, logger)
    
    logger.Info("processing order") // automatically includes orderID
    items := fetchItems(ctx, orderID)
    // fetchItems logs with the same logger — all lines correlate
    return nil
}

When NOT to Use

Don't use this when:

You have a simple CLI tool with sequential execution (a global logger is fine)
Performance is critical and context allocation is measurable overhead (rare, but possible in hot loops)
Your logging needs are simple and don't require request-scoped enrichment

Over-application example:

// Context logger in a tight computational loop — unnecessary overhead
func computeHash(ctx context.Context, data []byte) []byte {
    logger := klog.FromContext(ctx) // context lookup on every call
    logger.V(5).Info("computing hash", "size", len(data))
    // This is called 1M times/sec — the context lookup adds up
    return sha256.Sum(data)
}

Better alternative:

// Log once at the boundary, not in the hot loop
func processHashes(ctx context.Context, items [][]byte) [][]byte {
    logger := klog.FromContext(ctx)
    logger.Info("processing hashes", "count", len(items))
    results := make([][]byte, len(items))
    for i, data := range items {
        results[i] = sha256.Sum(data) // no logging in hot path
    }
    return results
}

Why: Context-based logging is for request/operation scoping, not for instrumenting every function call. In hot paths, the overhead of context lookup and structured log construction matters.

Key Insight

Stdlib's log package is global. Kubernetes uses context-based structured logging (klog.FromContext) to allow each call chain to carry its own logger configuration. This enables filtering by controller, verbosity tuning per-component, and correlation.

7. Functional Options for Configuration

Source: staging/src/k8s.io/client-go/informers/factory.go (lines 83–127)

What it does

The SharedInformerFactory uses functional options for configuration:

// staging/src/k8s.io/client-go/informers/factory.go:57
type SharedInformerOption func(*sharedInformerFactory) *sharedInformerFactory

func WithNamespace(namespace string) SharedInformerOption {
    return func(factory *sharedInformerFactory) *sharedInformerFactory {
        factory.namespace = namespace
        return factory
    }
}

func WithTransform(transform cache.TransformFunc) SharedInformerOption {
    return func(factory *sharedInformerFactory) *sharedInformerFactory {
        factory.transform = transform
        return factory
    }
}

func NewSharedInformerFactoryWithOptions(client kubernetes.Interface, defaultResync time.Duration, options ...SharedInformerOption) SharedInformerFactory {
    factory := &sharedInformerFactory{...}
    for _, opt := range options {
        factory = opt(factory)
    }
    return factory
}

Why

APIs evolve. Adding a new configuration option shouldn't break callers. Functional options provide:

Backward compatibility (new options don't change existing signatures)
Self-documenting (each option is a named function)
Composability (options can be collected and applied conditionally)

When to Use

Triggers:

Your constructor has more than 3-4 optional parameters that grow over time
You're building a library/SDK where backward compatibility matters across versions
Different callers need different subsets of configuration (not everyone uses every option)

Example — before:

// Constructor with growing parameter list — breaks on every addition
func NewClient(addr string, timeout time.Duration, retries int, tls bool, cert string) *Client
// v2: added auth
func NewClient(addr string, timeout time.Duration, retries int, tls bool, cert string, token string) *Client
// Every caller must update, even if they don't use the new param

Example — after:

func NewClient(addr string, opts ...ClientOption) *Client {
    c := &Client{addr: addr, timeout: 30 * time.Second, retries: 3}
    for _, opt := range opts { opt(c) }
    return c
}

func WithTimeout(d time.Duration) ClientOption { return func(c *Client) { c.timeout = d } }
func WithRetries(n int) ClientOption { return func(c *Client) { c.retries = n } }
func WithTLS(cert string) ClientOption { return func(c *Client) { c.tls = true; c.cert = cert } }
// Adding WithAuth doesn't break any existing callers

When NOT to Use

Don't use this when:

You have 1-2 required parameters and nothing optional (just use a simple constructor)
The configuration is static and well-known (a config struct is simpler and more discoverable)
You're building an internal-only API that you control all callers of (breaking changes are cheap)

Over-application example:

// Functional options for a struct with one optional field
func NewLogger(opts ...LoggerOption) *Logger {
    l := &Logger{level: InfoLevel}
    for _, opt := range opts { opt(l) }
    return l
}
func WithLevel(level Level) LoggerOption { return func(l *Logger) { l.level = level } }

// Callers write: NewLogger(WithLevel(DebugLevel))
// When they could write: NewLogger(DebugLevel) — simpler, clearer

Better alternative:

// Simple constructor with the one meaningful parameter
func NewLogger(level Level) *Logger {
    return &Logger{level: level}
}

Why: Functional options add indirection (each option is a closure) and reduce discoverability (you need to know which With* functions exist). For simple constructors with few parameters, a direct signature is clearer. Use functional options when the option space is large and growing.

8. Type-Safe Generics in Critical Paths

Source: staging/src/k8s.io/client-go/util/workqueue/queue.go (lines 33–200), staging/src/k8s.io/client-go/gentype/type.go (lines 33–120)

What it does

Both workqueue and gentype use Go generics (1.18+) to provide type-safe interfaces while maintaining backward compatibility via type aliases:

// Workqueue: type-safe queue
type TypedInterface[T comparable] interface {
    Add(item T)
    Get() (item T, shutdown bool)
    Done(item T)
}

// Type alias for backward compat
type Type = Typed[any]

// Gentype: type-safe client
type Client[T objectWithMeta] struct {
    resource       string
    client         rest.Interface
    namespace      string
    newObject      func() T
}

Why

Before generics, Kubernetes used interface{} everywhere, requiring type assertions at every boundary. Generics eliminate entire classes of runtime panics and make the code self-documenting.

When to Use

Triggers:

You have container types (queues, caches, pools) that currently use interface{}/any with type assertions
Type assertion panics have caused production issues or require defensive coding at every call site
You're building a library where callers benefit from compile-time type checking

Example — before:

// interface{} queue: type assertions at every boundary
type Queue struct { items []interface{} }
func (q *Queue) Add(item interface{}) { q.items = append(q.items, item) }
func (q *Queue) Get() interface{} { /* ... */ }

// Caller must assert — runtime panic if wrong type sneaks in
item := queue.Get()
pod := item.(*v1.Pod) // panics if someone added a string

Example — after:

// Generic queue: compile-time safety, no assertions needed
type Queue[T any] struct { items []T }
func (q *Queue[T]) Add(item T) { q.items = append(q.items, item) }
func (q *Queue[T]) Get() T { /* ... */ }

// Caller gets the right type directly — no assertion, no panic
queue := NewQueue[*v1.Pod]()
pod := queue.Get() // already *v1.Pod at compile time

When NOT to Use

Don't use this when:

You genuinely need heterogeneous collections (multiple unrelated types in one container)
The generic constraint would be so broad (any) that you gain no type safety anyway
You're adding generics to existing interfaces where all callers work fine with concrete types (generics for generics' sake)

Over-application example:

// Genericizing a function that only ever handles one type
func processItems[T any](items []T, fn func(T) error) error {
    for _, item := range items {
        if err := fn(item); err != nil { return err }
    }
    return nil
}
// Every call site: processItems[*Pod](pods, handlePod)
// When you could just write: for _, pod := range pods { handlePod(pod) }

Better alternative:

// If the type is always known and the pattern is trivial, just loop
for _, pod := range pods {
    if err := handlePod(pod); err != nil { return err }
}

Why: Generics shine when they eliminate runtime type assertions or enable reusable container/algorithm libraries. They add cognitive overhead (type parameter noise) when the code only ever operates on one concrete type.

Key Insight

This is a migration pattern: introduce the generic version alongside the deprecated interface{} version using type aliases. Callers migrate at their own pace.

9. HandleCrash — Structured Panic Recovery

Source: staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go (lines 30–120)

What it does

A standardized defer HandleCrash() pattern that:

Catches panics
Logs them with proper stack attribution
Invokes registered panic handlers
Optionally re-panics (controlled by ReallyCrash flag)

// staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:78-82
func HandleCrashWithContext(ctx context.Context, additionalHandlers ...func(context.Context, interface{})) {
    if r := recover(); r != nil {
        handleCrash(ctx, r, additionalHandlers...)
    }
}

Why

In a production system with hundreds of goroutines, an unrecovered panic in one kills the entire process. HandleCrash provides a standardized recovery point that:

Logs the panic with caller attribution
Allows cleanup handlers (shutdown gracefully)
In tests, can be configured to not actually crash

When to Use

Triggers:

You're running multiple independent subsystems in one process (multiple controllers, background workers)
A panic in one subsystem shouldn't kill the entire process
You need structured logging of panic stack traces before potential recovery

Example — before:

// One bad nil pointer in workerB kills workerA, workerC, and the whole server
func main() {
    go workerA(ctx)
    go workerB(ctx) // panics → entire process dies
    go workerC(ctx)
    select {}
}

Example — after:

func safeGo(ctx context.Context, name string, f func(ctx context.Context)) {
    go func() {
        defer func() {
            if r := recover(); r != nil {
                log.Printf("panic in %s: %v\n%s", name, r, debug.Stack())
                // Log, alert, increment metric — but don't kill siblings
            }
        }()
        f(ctx)
    }()
}

func main() {
    safeGo(ctx, "worker-a", workerA)
    safeGo(ctx, "worker-b", workerB) // panics → logged, other workers continue
    safeGo(ctx, "worker-c", workerC)
    select {}
}

When NOT to Use

Don't use this when:

The panic indicates truly unrecoverable corruption (out of memory, data corruption) — recovering would hide the bug
You're in a single-goroutine CLI tool (let it crash with a stack trace — that IS the right behavior)
The goroutine holds locks or half-modified state that can't be safely abandoned (recovering leaves the system in an inconsistent state)

Over-application example:

// Recovering from panics in code that modifies shared state
func (c *Controller) unsafeSyncWithRecovery(ctx context.Context, key string) {
    defer func() {
        if r := recover(); r != nil {
            log.Printf("recovered: %v", r)
            // Problem: c.internalState might be half-modified
            // The next sync will read corrupted state
        }
    }()
    c.internalState.Phase = "processing"
    riskyOperation() // panics here
    c.internalState.Phase = "done" // never reached — state is now wrong
}

Better alternative:

// If you can't guarantee clean recovery, don't recover — crash and restart
func (c *Controller) syncItem(ctx context.Context, key string) error {
    // Use error returns, not panics, for expected failures
    result, err := riskyOperation()
    if err != nil {
        return err // retry via workqueue — state is clean
    }
    c.internalState.Phase = "done"
    return nil
}
// HandleCrash at the top level (worker loop) is fine — each sync starts fresh

Why: Panic recovery is safe when each iteration is isolated (starts from clean state, reads from cache). It's dangerous when recovery leaves half-mutated state that subsequent operations will read.

Key Insight

Stdlib's approach is "let it crash." Kubernetes' approach is "catch it, log it, let the controller retry on the next sync." This is only safe because the controller pattern is idempotent.

10. ContextForChannel — Bridge Pattern

Source: staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go (lines 120–145)

What it does

Bridges the older <-chan struct{} stop pattern to the modern context.Context pattern:

// staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:120-142
func ContextForChannel(parentCh <-chan struct{}) context.Context {
    return channelContext{stopCh: parentCh}
}

type channelContext struct {
    stopCh <-chan struct{}
}

func (c channelContext) Done() <-chan struct{} { return c.stopCh }
func (c channelContext) Err() error {
    select {
    case <-c.stopCh:
        return context.Canceled
    default:
        return nil
    }
}

Why

Kubernetes predates context.Context (which arrived in Go 1.7). Millions of lines of code use stopCh <-chan struct{}. Rather than a big-bang rewrite, this adapter allows gradual migration.

When to Use

Triggers:

You have legacy code using <-chan struct{} for cancellation that needs to call modern context-aware APIs
You're doing a gradual migration from channels to context (not a big-bang rewrite)
You need to integrate with a library that only accepts context, but your cancellation signal comes from a channel

Example — before:

// Legacy API: only speaks channels
func RunLegacyWorker(stopCh <-chan struct{}) {
    for {
        select {
        case <-stopCh:
            return
        default:
            doWork()
        }
    }
}

// New dependency requires context — how to bridge?
func doWork() {
    ctx := context.TODO() // wrong: doesn't cancel when stopCh closes
    newLibrary.Call(ctx)
}

Example — after:

func RunLegacyWorker(stopCh <-chan struct{}) {
    ctx := ContextForChannel(stopCh) // bridge: closes when stopCh closes
    for {
        select {
        case <-ctx.Done():
            return
        default:
            newLibrary.Call(ctx) // properly cancels when stopCh closes
        }
    }
}

When NOT to Use

Don't use this when:

You're writing new code (just use context.Context from the start — no bridge needed)
The channel carries data, not just a close signal (this only works for <-chan struct{})
You need context features beyond cancellation (timeouts, values) — add a real context

Over-application example:

// Using ContextForChannel in new code that already has context
func NewController(ctx context.Context) *Controller {
    stopCh := make(chan struct{})
    go func() { <-ctx.Done(); close(stopCh) }() // convert context → channel
    bridgedCtx := ContextForChannel(stopCh)       // convert channel → context
    // You just round-tripped for no reason — use ctx directly
}

Better alternative:

func NewController(ctx context.Context) *Controller {
    // Just use ctx — it already does everything you need
    return &Controller{ctx: ctx}
}

Why: The bridge pattern exists for migration. In new code, using context directly is simpler, supports timeouts/values, and avoids the indirection of channel↔context conversion.

Key Insight

Large codebases can't do breaking API changes atomically. This bridge pattern is how you evolve from one idiom to another over years without breaking everything at once.

36 KiB Raw Permalink Blame History Unescape Escape

Production Go Patterns (from Kubernetes)

1. Code Generation Pattern

What it does

Why

How it works

When to Use

When NOT to Use

Key Insight

2. The Scheme / Type Registry Pattern

What it does

Why

Structure

SchemeBuilder Pattern

How Registration Works

When to Use

When NOT to Use

Key Insight

3. The runtime.Object Interface

What it does

Why

When to Use

When NOT to Use

Key Insight

4. Deep Copy Everywhere

What it does

Why

When to Use

When NOT to Use

Key Insight

5. Graceful Shutdown with Priority Classes

What it does

Why

When to Use

When NOT to Use

6. Context as Logger Carrier

What it does

Why

When to Use

When NOT to Use

Key Insight

7. Functional Options for Configuration

What it does

Why

When to Use

When NOT to Use

8. Type-Safe Generics in Critical Paths

What it does

Why

When to Use

When NOT to Use

Key Insight

9. HandleCrash — Structured Panic Recovery

What it does

Why

When to Use

When NOT to Use

Key Insight

10. ContextForChannel — Bridge Pattern

What it does

Why

When to Use

When NOT to Use

Key Insight

36 KiB

Raw Permalink Blame History