36 KiB
Production Go Patterns (from Kubernetes)
Patterns for building large-scale Go codebases that go beyond what stdlib teaches you.
1. Code Generation Pattern
Source: staging/src/k8s.io/apimachinery/pkg/runtime/zz_generated.deepcopy.go, staging/src/k8s.io/client-go/informers/apps/v1/deployment.go
What it does
Kubernetes generates massive amounts of boilerplate code from annotations on types:
deepcopy-gen→ DeepCopy/DeepCopyInto methodsinformer-gen→ typed informers (List/Watch/Lister per resource)client-gen→ typed client setslister-gen→ typed lister interfacesconversion-gen→ version conversion functionsdefaulter-gen→ defaulting functions
Why
At Kubernetes scale (~50 resource types × multiple versions), hand-writing deep copy, client wrappers, and conversion code is:
- Error-prone (forgetting to copy a new field breaks everything)
- Unmaintainable (thousands of nearly-identical files)
- Not verifiable by human review
How it works
Annotations drive generation:
// +k8s:deepcopy-gen=true
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
type RawExtension struct { ... }
Generated output uses zz_generated. prefix (convention for "don't edit"):
// staging/src/k8s.io/apimachinery/pkg/runtime/zz_generated.deepcopy.go:22
// Code generated by deepcopy-gen. DO NOT EDIT.
package runtime
func (in *RawExtension) DeepCopyInto(out *RawExtension) {
*out = *in
if in.Raw != nil {
in, out := &in.Raw, &out.Raw
*out = make([]byte, len(*in))
copy(*out, *in)
}
}
Generated informers (note the header comment):
// staging/src/k8s.io/client-go/informers/apps/v1/deployment.go:20
// Code generated by informer-gen. DO NOT EDIT.
When to Use
Triggers:
- You have 10+ types that need identical boilerplate methods (DeepCopy, Validate, Marshal)
- Hand-writing the code is error-prone (forgetting to copy a new field causes silent bugs)
- The generated output is mechanical and reviewable, not creative
Example — before:
// Hand-written deep copy for every type — 50 types × 30 lines each = 1500 lines of bugs
func (in *Deployment) DeepCopy() *Deployment {
out := new(Deployment)
out.Name = in.Name
out.Labels = make(map[string]string)
for k, v := range in.Labels { out.Labels[k] = v }
// Did you remember Annotations? Finalizers? Every nested struct?
}
Example — after:
// +k8s:deepcopy-gen=true
type Deployment struct {
Name string
Labels map[string]string
Annotations map[string]string
}
// Generated: zz_generated.deepcopy.go handles ALL fields correctly, always.
// Adding a new field? Re-run generator. Zero chance of forgetting.
When NOT to Use
Don't use this when:
- You have fewer than ~5 types (hand-writing is faster and more readable)
- The boilerplate varies significantly between types (generators work best for uniform patterns)
- The generated code would be harder to debug than hand-written code (e.g., complex business logic)
Over-application example:
// Generating a "ToString" method for 3 types — overkill
//go:generate stringer -type=Status,Priority,Phase
// You now have 3 generated files, a build dependency on stringer,
// and anyone reading the code has to understand the generation pipeline
// for what amounts to 9 lines of hand-written code.
Better alternative:
// Just write it — 3 types is not "at scale"
func (s Status) String() string {
switch s {
case Running: return "Running"
case Stopped: return "Stopped"
default: return fmt.Sprintf("Status(%d)", s)
}
}
Why: Code generation adds build complexity (Makefiles, CI steps, go generate ordering), makes debugging harder (stack traces point to generated code), and obscures logic. It pays off only when the volume of boilerplate is large enough that correctness via automation outweighs these costs.
Key Insight
Stdlib has no code generation culture. stdlib keeps things small enough that hand-writing works. Kubernetes proves that once you cross ~20 types with shared behavior, code gen is the only sane path.
2. The Scheme / Type Registry Pattern
Source: staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go (lines 38–100), scheme_builder.go
What it does
The Scheme is a runtime type registry that maps:
GroupVersionKind→ Go type (reflect.Type)- Go type →
[]GroupVersionKind - Provides serialization, defaulting, conversion, and validation dispatch
Why
Kubernetes has 50+ resource types across 15+ API groups, each with multiple versions. The Scheme provides:
- Dynamic dispatch: serialize any Object without knowing its concrete type
- Version conversion: convert between v1 and v1beta1 transparently
- Pluggability: third-party resources register into the same system
Structure
// staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go:38-98
type Scheme struct {
gvkToType map[schema.GroupVersionKind]reflect.Type
typeToGVK map[reflect.Type][]schema.GroupVersionKind
unversionedTypes map[reflect.Type]schema.GroupVersionKind
defaulterFuncs map[reflect.Type]func(interface{})
validationFuncs map[reflect.Type]func(ctx, op, obj, oldObj) field.ErrorList
converter *conversion.Converter
versionPriority map[string][]string
}
SchemeBuilder Pattern
// staging/src/k8s.io/apimachinery/pkg/runtime/scheme_builder.go:23-48
type SchemeBuilder []func(*Scheme) error
func (sb *SchemeBuilder) AddToScheme(s *Scheme) error {
for _, f := range *sb {
if err := f(s); err != nil {
return err
}
}
return nil
}
func (sb *SchemeBuilder) Register(funcs ...func(*Scheme) error) {
*sb = append(*sb, f)
}
How Registration Works
// staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go:151-160
func (s *Scheme) AddKnownTypes(gv schema.GroupVersion, types ...Object) {
for _, obj := range types {
t := reflect.TypeOf(obj)
if t.Kind() != reflect.Pointer {
panic("All types must be pointers to structs.")
}
t = t.Elem()
s.AddKnownTypeWithName(gv.WithKind(t.Name()), obj)
}
}
When to Use
Triggers:
- You have many types that must be serialized/deserialized polymorphically (you read JSON/YAML and don't know the concrete type ahead of time)
- You need version conversion between different representations of the same concept
- Third parties need to register new types without modifying your core code
Example — before:
// Hard-coded switch statement — breaks open/closed principle
func Deserialize(data []byte) (Object, error) {
kind := extractKind(data)
switch kind {
case "Deployment":
var d Deployment
json.Unmarshal(data, &d)
return &d, nil
case "Service":
var s Service
json.Unmarshal(data, &s)
return &s, nil
// ... 50 more cases, each a potential bug
default:
return nil, fmt.Errorf("unknown kind: %s", kind)
}
}
Example — after:
// Type registry — extensible, no switch statements
scheme := runtime.NewScheme()
scheme.AddKnownTypes(appsv1.SchemeGroupVersion, &Deployment{}, &ReplicaSet{})
scheme.AddKnownTypes(corev1.SchemeGroupVersion, &Service{}, &Pod{})
func Deserialize(scheme *runtime.Scheme, data []byte) (Object, error) {
gvk := extractGVK(data)
obj, err := scheme.New(gvk) // creates correct concrete type
if err != nil { return nil, err }
json.Unmarshal(data, obj)
return obj, nil
}
// New types register themselves — no core code changes needed
When NOT to Use
Don't use this when:
- You have a small, fixed set of types that won't grow (a switch statement is simpler and fully type-safe at compile time)
- You don't need dynamic/polymorphic deserialization (you always know the concrete type)
- You're not building a plugin system (nobody external needs to register types)
Over-application example:
// Type registry for an internal service with 3 event types — over-engineered
var eventScheme = NewScheme()
func init() {
eventScheme.Register("OrderCreated", reflect.TypeOf(OrderCreated{}))
eventScheme.Register("OrderShipped", reflect.TypeOf(OrderShipped{}))
eventScheme.Register("OrderCanceled", reflect.TypeOf(OrderCanceled{}))
}
// Now debugging requires understanding the registry, reflection, runtime dispatch...
Better alternative:
// Simple interface + switch — readable, debuggable, compile-time safe
type Event interface { EventType() string }
func handleEvent(data []byte) error {
switch extractType(data) {
case "OrderCreated":
var e OrderCreated
json.Unmarshal(data, &e)
return processCreated(e)
case "OrderShipped":
// ...
}
}
Why: Type registries trade compile-time safety for runtime extensibility. In Kubernetes (50+ types, CRDs, multiple API groups), this tradeoff is essential. In a service with a handful of known types, you're paying the complexity cost (reflection, runtime errors, harder debugging) without the benefit.
Key Insight
This is Java's ServiceLoader / dependency injection adapted for Go's type system. Stdlib uses interfaces; Kubernetes needs a runtime type system on top of Go's static type system because API objects must be dynamically dispatched across version boundaries.
3. The runtime.Object Interface
Source: staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go (lines 333–342)
What it does
Every Kubernetes API object must implement this two-method interface:
// staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go:337-341
type Object interface {
GetObjectKind() schema.ObjectKind
DeepCopyObject() Object
}
Why
GetObjectKind()— allows the serialization layer to determine what type an object is without reflectionDeepCopyObject()— enables safe concurrent access (informer cache is shared; mutations must happen on copies)
When to Use
Triggers:
- You're building a framework where heterogeneous types flow through a common pipeline (serialization, admission, storage)
- You need to identify the "kind" of an object at runtime without type switches
- Concurrent access requires copy semantics built into the type contract
Example — before:
// No common interface — every handler needs type assertions
func store(obj interface{}) error {
switch v := obj.(type) {
case *Deployment:
return storeDeployment(v)
case *Service:
return storeService(v)
// Every new type requires changes here
}
}
Example — after:
// Common interface — generic pipeline handles any type
func store(obj runtime.Object) error {
gvk := obj.GetObjectKind().GroupVersionKind()
key := buildKey(gvk, obj)
data, _ := serialize(obj)
return etcd.Put(key, data)
// Works for any registered type — no switch statement
}
When NOT to Use
Don't use this when:
- Your types don't need polymorphic handling (you always know the concrete type at each call site)
- Deep copy is irrelevant (no shared caches, no concurrent access)
- You're not building framework-level infrastructure (application code rarely needs this)
Over-application example:
// Implementing runtime.Object on application-level domain types
type Invoice struct { ... }
func (i *Invoice) GetObjectKind() schema.ObjectKind { return &i.TypeMeta }
func (i *Invoice) DeepCopyObject() runtime.Object { return i.DeepCopy() }
// Now your business logic imports k8s.io/apimachinery — for what?
Better alternative: Use a domain-specific interface that captures what your code actually needs:
type Storable interface {
ID() string
Marshal() ([]byte, error)
}
Why: runtime.Object is designed for Kubernetes' specific needs (polymorphic serialization across API versions). Adopting it in non-Kubernetes code couples you to a heavy dependency for an abstraction that doesn't match your domain.
Key Insight
This is the foundation of Kubernetes' extensibility. Any Go struct that satisfies these two methods can participate in the entire API machinery — serialization, storage, admission, informers, etc. CRDs generate code that implements this interface.
4. Deep Copy Everywhere
Source: Generated code in zz_generated.deepcopy.go files throughout the tree
What it does
Every API type has generated DeepCopy() and DeepCopyInto() methods that create true deep copies including nested slices, maps, and pointer fields.
Why
The informer cache is shared across all controllers in a process. If controller A gets an object from the cache and mutates it, controller B would see corrupted data. Deep copy provides the isolation guarantee.
// Usage pattern in controllers:
deployment := deploymentFromCache.DeepCopy()
deployment.Spec.Replicas = ptr.To[int32](3)
_, err := client.AppsV1().Deployments(ns).Update(ctx, deployment, metav1.UpdateOptions{})
When to Use
Triggers:
- You read from a shared data structure (cache, registry, config store) and need to modify the result
- Multiple goroutines access the same data and at least one modifies it
- You're passing data across ownership boundaries (your function returns data that the caller might mutate)
Example — before:
// Returning a pointer to internal state — caller can corrupt it
func (c *ConfigStore) GetConfig() *Config {
return c.current // caller mutates this → corrupts store
}
Example — after:
// Return a copy — caller can mutate freely
func (c *ConfigStore) GetConfig() *Config {
c.mu.RLock()
defer c.mu.RUnlock()
copy := c.current.DeepCopy()
return copy
}
When NOT to Use
Don't use this when:
- The data is owned by a single goroutine (no concurrent access, no shared cache)
- You only need to read the data, never modify it (deep copy for read-only access is wasted allocation)
- Performance is critical and you can prove safety via immutability or other means (e.g., freezing the object after construction)
Over-application example:
// Deep copying on every read, even when only logging
func (c *Controller) logStatus(ctx context.Context, key string) {
obj, _ := c.lister.Get(key)
copy := obj.DeepCopy() // Allocates! But we never mutate copy
logger.Info("status", "phase", copy.Status.Phase)
}
Better alternative:
// Read directly from cache — no mutation, no copy needed
func (c *Controller) logStatus(ctx context.Context, key string) {
obj, _ := c.lister.Get(key)
logger.Info("status", "phase", obj.Status.Phase) // read-only: safe
}
Why: Deep copy allocates memory and does work proportional to object size. In hot paths that only read data, it's pure waste. Copy only when you intend to mutate.
Key Insight
Stdlib rarely needs deep copy because stdlib objects are typically owned by one goroutine. Kubernetes has a shared read cache (the informer store) that necessitates copy-on-write semantics at the application level.
5. Graceful Shutdown with Priority Classes
Source: pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go (lines 23–100)
What it does
When a node is shutting down, pods are terminated in priority order. Critical pods (system-node-critical) get more grace time than regular pods.
Why
A hard kill of all pods simultaneously would lose important work. Priority-based graceful shutdown preserves the most important workloads longest.
// pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go:66-90
type managerImpl struct {
logger klog.Logger
recorder record.EventRecorder
getPods eviction.ActivePodsFunc
syncNodeStatus func(context.Context)
dbusCon dbusInhibiter
inhibitLock systemd.InhibitLock
nodeShuttingDownMutex sync.Mutex
nodeShuttingDownNow bool
podManager *podManager
}
When to Use
Triggers:
- Your system runs workloads with different importance levels (critical infrastructure vs. batch jobs)
- You receive shutdown signals and need to drain gracefully, not crash
- Some work MUST complete (leader lease release, data flush) while other work can be interrupted
Example — before:
// Flat shutdown: everything gets the same 30s, critical work might not finish
func main() {
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGTERM)
<-sig
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
server.Shutdown(ctx) // hope 30s is enough for everything
}
Example — after:
// Priority-based shutdown: critical work gets more time
func main() {
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGTERM)
<-sig
// Phase 1: stop accepting new work (immediate)
server.StopAccepting()
// Phase 2: drain low-priority work (5s budget)
ctx1, cancel1 := context.WithTimeout(context.Background(), 5*time.Second)
batchWorker.Shutdown(ctx1)
cancel1()
// Phase 3: drain critical work (25s budget)
ctx2, cancel2 := context.WithTimeout(context.Background(), 25*time.Second)
leaderElector.Release(ctx2)
dataStore.Flush(ctx2)
cancel2()
}
When NOT to Use
Don't use this when:
- All your work is equally important (flat shutdown with a single timeout is simpler)
- Your process is stateless and restarts are cheap (just die and restart)
- Shutdown time is not constrained (you can wait as long as needed for everything to drain)
Over-application example:
// Priority shutdown for a stateless HTTP server with no background work
type ShutdownManager struct {
priorities []PriorityClass
workers map[PriorityClass][]Worker
}
// All this machinery for... stopping an HTTP handler that has no in-flight state?
Better alternative:
// Simple graceful shutdown — stateless server just drains connections
server := &http.Server{}
go server.ListenAndServe()
<-sigCh
server.Shutdown(ctx) // drains in-flight requests, done
Why: Priority-based shutdown adds ordering logic, timeout budgets per phase, and coordination complexity. It's justified when different subsystems have genuinely different importance. For simple services, http.Server.Shutdown() is all you need.
6. Context as Logger Carrier
Source: pkg/controller/deployment/deployment_controller.go (lines 106, 179, 500)
What it does
Kubernetes passes structured loggers through context:
// pkg/controller/deployment/deployment_controller.go:179
logger := klog.FromContext(ctx)
logger.Info("Starting controller", "controller", "deployment")
Why
At scale, you need structured logging with:
- Consistent key-value pairs (controller name, object reference)
- Verbosity levels (
logger.V(4).Info(...)) - No global state (context carries the logger configured by the caller)
When to Use
Triggers:
- You have deep call stacks where log context (request ID, user, controller name) should propagate without explicit parameters
- Multiple subsystems share code but need different logger configurations (verbosity, output format)
- You want to enrich logs with contextual metadata without threading a logger through every function signature
Example — before:
// Global logger: no context, hard to filter, pollutes all output
var log = logrus.New()
func processOrder(orderID string) error {
log.Info("processing order") // which order? which service? which request?
items := fetchItems(orderID)
log.Infof("fetched %d items", len(items)) // no correlation possible
return nil
}
Example — after:
// Context-carried logger: each request has its own enriched logger
func processOrder(ctx context.Context, orderID string) error {
logger := klog.FromContext(ctx)
logger = logger.WithValues("orderID", orderID)
ctx = klog.NewContext(ctx, logger)
logger.Info("processing order") // automatically includes orderID
items := fetchItems(ctx, orderID)
// fetchItems logs with the same logger — all lines correlate
return nil
}
When NOT to Use
Don't use this when:
- You have a simple CLI tool with sequential execution (a global logger is fine)
- Performance is critical and context allocation is measurable overhead (rare, but possible in hot loops)
- Your logging needs are simple and don't require request-scoped enrichment
Over-application example:
// Context logger in a tight computational loop — unnecessary overhead
func computeHash(ctx context.Context, data []byte) []byte {
logger := klog.FromContext(ctx) // context lookup on every call
logger.V(5).Info("computing hash", "size", len(data))
// This is called 1M times/sec — the context lookup adds up
return sha256.Sum(data)
}
Better alternative:
// Log once at the boundary, not in the hot loop
func processHashes(ctx context.Context, items [][]byte) [][]byte {
logger := klog.FromContext(ctx)
logger.Info("processing hashes", "count", len(items))
results := make([][]byte, len(items))
for i, data := range items {
results[i] = sha256.Sum(data) // no logging in hot path
}
return results
}
Why: Context-based logging is for request/operation scoping, not for instrumenting every function call. In hot paths, the overhead of context lookup and structured log construction matters.
Key Insight
Stdlib's log package is global. Kubernetes uses context-based structured logging (klog.FromContext) to allow each call chain to carry its own logger configuration. This enables filtering by controller, verbosity tuning per-component, and correlation.
7. Functional Options for Configuration
Source: staging/src/k8s.io/client-go/informers/factory.go (lines 83–127)
What it does
The SharedInformerFactory uses functional options for configuration:
// staging/src/k8s.io/client-go/informers/factory.go:57
type SharedInformerOption func(*sharedInformerFactory) *sharedInformerFactory
func WithNamespace(namespace string) SharedInformerOption {
return func(factory *sharedInformerFactory) *sharedInformerFactory {
factory.namespace = namespace
return factory
}
}
func WithTransform(transform cache.TransformFunc) SharedInformerOption {
return func(factory *sharedInformerFactory) *sharedInformerFactory {
factory.transform = transform
return factory
}
}
func NewSharedInformerFactoryWithOptions(client kubernetes.Interface, defaultResync time.Duration, options ...SharedInformerOption) SharedInformerFactory {
factory := &sharedInformerFactory{...}
for _, opt := range options {
factory = opt(factory)
}
return factory
}
Why
APIs evolve. Adding a new configuration option shouldn't break callers. Functional options provide:
- Backward compatibility (new options don't change existing signatures)
- Self-documenting (each option is a named function)
- Composability (options can be collected and applied conditionally)
When to Use
Triggers:
- Your constructor has more than 3-4 optional parameters that grow over time
- You're building a library/SDK where backward compatibility matters across versions
- Different callers need different subsets of configuration (not everyone uses every option)
Example — before:
// Constructor with growing parameter list — breaks on every addition
func NewClient(addr string, timeout time.Duration, retries int, tls bool, cert string) *Client
// v2: added auth
func NewClient(addr string, timeout time.Duration, retries int, tls bool, cert string, token string) *Client
// Every caller must update, even if they don't use the new param
Example — after:
func NewClient(addr string, opts ...ClientOption) *Client {
c := &Client{addr: addr, timeout: 30 * time.Second, retries: 3}
for _, opt := range opts { opt(c) }
return c
}
func WithTimeout(d time.Duration) ClientOption { return func(c *Client) { c.timeout = d } }
func WithRetries(n int) ClientOption { return func(c *Client) { c.retries = n } }
func WithTLS(cert string) ClientOption { return func(c *Client) { c.tls = true; c.cert = cert } }
// Adding WithAuth doesn't break any existing callers
When NOT to Use
Don't use this when:
- You have 1-2 required parameters and nothing optional (just use a simple constructor)
- The configuration is static and well-known (a config struct is simpler and more discoverable)
- You're building an internal-only API that you control all callers of (breaking changes are cheap)
Over-application example:
// Functional options for a struct with one optional field
func NewLogger(opts ...LoggerOption) *Logger {
l := &Logger{level: InfoLevel}
for _, opt := range opts { opt(l) }
return l
}
func WithLevel(level Level) LoggerOption { return func(l *Logger) { l.level = level } }
// Callers write: NewLogger(WithLevel(DebugLevel))
// When they could write: NewLogger(DebugLevel) — simpler, clearer
Better alternative:
// Simple constructor with the one meaningful parameter
func NewLogger(level Level) *Logger {
return &Logger{level: level}
}
Why: Functional options add indirection (each option is a closure) and reduce discoverability (you need to know which With* functions exist). For simple constructors with few parameters, a direct signature is clearer. Use functional options when the option space is large and growing.
8. Type-Safe Generics in Critical Paths
Source: staging/src/k8s.io/client-go/util/workqueue/queue.go (lines 33–200), staging/src/k8s.io/client-go/gentype/type.go (lines 33–120)
What it does
Both workqueue and gentype use Go generics (1.18+) to provide type-safe interfaces while maintaining backward compatibility via type aliases:
// Workqueue: type-safe queue
type TypedInterface[T comparable] interface {
Add(item T)
Get() (item T, shutdown bool)
Done(item T)
}
// Type alias for backward compat
type Type = Typed[any]
// Gentype: type-safe client
type Client[T objectWithMeta] struct {
resource string
client rest.Interface
namespace string
newObject func() T
}
Why
Before generics, Kubernetes used interface{} everywhere, requiring type assertions at every boundary. Generics eliminate entire classes of runtime panics and make the code self-documenting.
When to Use
Triggers:
- You have container types (queues, caches, pools) that currently use
interface{}/anywith type assertions - Type assertion panics have caused production issues or require defensive coding at every call site
- You're building a library where callers benefit from compile-time type checking
Example — before:
// interface{} queue: type assertions at every boundary
type Queue struct { items []interface{} }
func (q *Queue) Add(item interface{}) { q.items = append(q.items, item) }
func (q *Queue) Get() interface{} { /* ... */ }
// Caller must assert — runtime panic if wrong type sneaks in
item := queue.Get()
pod := item.(*v1.Pod) // panics if someone added a string
Example — after:
// Generic queue: compile-time safety, no assertions needed
type Queue[T any] struct { items []T }
func (q *Queue[T]) Add(item T) { q.items = append(q.items, item) }
func (q *Queue[T]) Get() T { /* ... */ }
// Caller gets the right type directly — no assertion, no panic
queue := NewQueue[*v1.Pod]()
pod := queue.Get() // already *v1.Pod at compile time
When NOT to Use
Don't use this when:
- You genuinely need heterogeneous collections (multiple unrelated types in one container)
- The generic constraint would be so broad (
any) that you gain no type safety anyway - You're adding generics to existing interfaces where all callers work fine with concrete types (generics for generics' sake)
Over-application example:
// Genericizing a function that only ever handles one type
func processItems[T any](items []T, fn func(T) error) error {
for _, item := range items {
if err := fn(item); err != nil { return err }
}
return nil
}
// Every call site: processItems[*Pod](pods, handlePod)
// When you could just write: for _, pod := range pods { handlePod(pod) }
Better alternative:
// If the type is always known and the pattern is trivial, just loop
for _, pod := range pods {
if err := handlePod(pod); err != nil { return err }
}
Why: Generics shine when they eliminate runtime type assertions or enable reusable container/algorithm libraries. They add cognitive overhead (type parameter noise) when the code only ever operates on one concrete type.
Key Insight
This is a migration pattern: introduce the generic version alongside the deprecated interface{} version using type aliases. Callers migrate at their own pace.
9. HandleCrash — Structured Panic Recovery
Source: staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go (lines 30–120)
What it does
A standardized defer HandleCrash() pattern that:
- Catches panics
- Logs them with proper stack attribution
- Invokes registered panic handlers
- Optionally re-panics (controlled by
ReallyCrashflag)
// staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:78-82
func HandleCrashWithContext(ctx context.Context, additionalHandlers ...func(context.Context, interface{})) {
if r := recover(); r != nil {
handleCrash(ctx, r, additionalHandlers...)
}
}
Why
In a production system with hundreds of goroutines, an unrecovered panic in one kills the entire process. HandleCrash provides a standardized recovery point that:
- Logs the panic with caller attribution
- Allows cleanup handlers (shutdown gracefully)
- In tests, can be configured to not actually crash
When to Use
Triggers:
- You're running multiple independent subsystems in one process (multiple controllers, background workers)
- A panic in one subsystem shouldn't kill the entire process
- You need structured logging of panic stack traces before potential recovery
Example — before:
// One bad nil pointer in workerB kills workerA, workerC, and the whole server
func main() {
go workerA(ctx)
go workerB(ctx) // panics → entire process dies
go workerC(ctx)
select {}
}
Example — after:
func safeGo(ctx context.Context, name string, f func(ctx context.Context)) {
go func() {
defer func() {
if r := recover(); r != nil {
log.Printf("panic in %s: %v\n%s", name, r, debug.Stack())
// Log, alert, increment metric — but don't kill siblings
}
}()
f(ctx)
}()
}
func main() {
safeGo(ctx, "worker-a", workerA)
safeGo(ctx, "worker-b", workerB) // panics → logged, other workers continue
safeGo(ctx, "worker-c", workerC)
select {}
}
When NOT to Use
Don't use this when:
- The panic indicates truly unrecoverable corruption (out of memory, data corruption) — recovering would hide the bug
- You're in a single-goroutine CLI tool (let it crash with a stack trace — that IS the right behavior)
- The goroutine holds locks or half-modified state that can't be safely abandoned (recovering leaves the system in an inconsistent state)
Over-application example:
// Recovering from panics in code that modifies shared state
func (c *Controller) unsafeSyncWithRecovery(ctx context.Context, key string) {
defer func() {
if r := recover(); r != nil {
log.Printf("recovered: %v", r)
// Problem: c.internalState might be half-modified
// The next sync will read corrupted state
}
}()
c.internalState.Phase = "processing"
riskyOperation() // panics here
c.internalState.Phase = "done" // never reached — state is now wrong
}
Better alternative:
// If you can't guarantee clean recovery, don't recover — crash and restart
func (c *Controller) syncItem(ctx context.Context, key string) error {
// Use error returns, not panics, for expected failures
result, err := riskyOperation()
if err != nil {
return err // retry via workqueue — state is clean
}
c.internalState.Phase = "done"
return nil
}
// HandleCrash at the top level (worker loop) is fine — each sync starts fresh
Why: Panic recovery is safe when each iteration is isolated (starts from clean state, reads from cache). It's dangerous when recovery leaves half-mutated state that subsequent operations will read.
Key Insight
Stdlib's approach is "let it crash." Kubernetes' approach is "catch it, log it, let the controller retry on the next sync." This is only safe because the controller pattern is idempotent.
10. ContextForChannel — Bridge Pattern
Source: staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go (lines 120–145)
What it does
Bridges the older <-chan struct{} stop pattern to the modern context.Context pattern:
// staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:120-142
func ContextForChannel(parentCh <-chan struct{}) context.Context {
return channelContext{stopCh: parentCh}
}
type channelContext struct {
stopCh <-chan struct{}
}
func (c channelContext) Done() <-chan struct{} { return c.stopCh }
func (c channelContext) Err() error {
select {
case <-c.stopCh:
return context.Canceled
default:
return nil
}
}
Why
Kubernetes predates context.Context (which arrived in Go 1.7). Millions of lines of code use stopCh <-chan struct{}. Rather than a big-bang rewrite, this adapter allows gradual migration.
When to Use
Triggers:
- You have legacy code using
<-chan struct{}for cancellation that needs to call modern context-aware APIs - You're doing a gradual migration from channels to context (not a big-bang rewrite)
- You need to integrate with a library that only accepts context, but your cancellation signal comes from a channel
Example — before:
// Legacy API: only speaks channels
func RunLegacyWorker(stopCh <-chan struct{}) {
for {
select {
case <-stopCh:
return
default:
doWork()
}
}
}
// New dependency requires context — how to bridge?
func doWork() {
ctx := context.TODO() // wrong: doesn't cancel when stopCh closes
newLibrary.Call(ctx)
}
Example — after:
func RunLegacyWorker(stopCh <-chan struct{}) {
ctx := ContextForChannel(stopCh) // bridge: closes when stopCh closes
for {
select {
case <-ctx.Done():
return
default:
newLibrary.Call(ctx) // properly cancels when stopCh closes
}
}
}
When NOT to Use
Don't use this when:
- You're writing new code (just use
context.Contextfrom the start — no bridge needed) - The channel carries data, not just a close signal (this only works for
<-chan struct{}) - You need context features beyond cancellation (timeouts, values) — add a real context
Over-application example:
// Using ContextForChannel in new code that already has context
func NewController(ctx context.Context) *Controller {
stopCh := make(chan struct{})
go func() { <-ctx.Done(); close(stopCh) }() // convert context → channel
bridgedCtx := ContextForChannel(stopCh) // convert channel → context
// You just round-tripped for no reason — use ctx directly
}
Better alternative:
func NewController(ctx context.Context) *Controller {
// Just use ctx — it already does everything you need
return &Controller{ctx: ctx}
}
Why: The bridge pattern exists for migration. In new code, using context directly is simpler, supports timeouts/values, and avoids the indirection of channel↔context conversion.
Key Insight
Large codebases can't do breaking API changes atomically. This bridge pattern is how you evolve from one idiom to another over years without breaking everything at once.