1045 lines
36 KiB
Markdown
1045 lines
36 KiB
Markdown
# Production Go Patterns (from Kubernetes)
|
||
|
||
Patterns for building large-scale Go codebases that go beyond what stdlib teaches you.
|
||
|
||
## 1. Code Generation Pattern
|
||
|
||
**Source:** `staging/src/k8s.io/apimachinery/pkg/runtime/zz_generated.deepcopy.go`, `staging/src/k8s.io/client-go/informers/apps/v1/deployment.go`
|
||
|
||
### What it does
|
||
Kubernetes generates massive amounts of boilerplate code from annotations on types:
|
||
- `deepcopy-gen` → DeepCopy/DeepCopyInto methods
|
||
- `informer-gen` → typed informers (List/Watch/Lister per resource)
|
||
- `client-gen` → typed client sets
|
||
- `lister-gen` → typed lister interfaces
|
||
- `conversion-gen` → version conversion functions
|
||
- `defaulter-gen` → defaulting functions
|
||
|
||
### Why
|
||
At Kubernetes scale (~50 resource types × multiple versions), hand-writing deep copy, client wrappers, and conversion code is:
|
||
1. Error-prone (forgetting to copy a new field breaks everything)
|
||
2. Unmaintainable (thousands of nearly-identical files)
|
||
3. Not verifiable by human review
|
||
|
||
### How it works
|
||
|
||
Annotations drive generation:
|
||
```go
|
||
// +k8s:deepcopy-gen=true
|
||
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
|
||
type RawExtension struct { ... }
|
||
```
|
||
|
||
Generated output uses `zz_generated.` prefix (convention for "don't edit"):
|
||
```go
|
||
// staging/src/k8s.io/apimachinery/pkg/runtime/zz_generated.deepcopy.go:22
|
||
// Code generated by deepcopy-gen. DO NOT EDIT.
|
||
package runtime
|
||
|
||
func (in *RawExtension) DeepCopyInto(out *RawExtension) {
|
||
*out = *in
|
||
if in.Raw != nil {
|
||
in, out := &in.Raw, &out.Raw
|
||
*out = make([]byte, len(*in))
|
||
copy(*out, *in)
|
||
}
|
||
}
|
||
```
|
||
|
||
Generated informers (note the header comment):
|
||
```go
|
||
// staging/src/k8s.io/client-go/informers/apps/v1/deployment.go:20
|
||
// Code generated by informer-gen. DO NOT EDIT.
|
||
```
|
||
|
||
### When to Use
|
||
|
||
**Triggers:**
|
||
- You have 10+ types that need identical boilerplate methods (DeepCopy, Validate, Marshal)
|
||
- Hand-writing the code is error-prone (forgetting to copy a new field causes silent bugs)
|
||
- The generated output is mechanical and reviewable, not creative
|
||
|
||
**Example — before:**
|
||
```go
|
||
// Hand-written deep copy for every type — 50 types × 30 lines each = 1500 lines of bugs
|
||
func (in *Deployment) DeepCopy() *Deployment {
|
||
out := new(Deployment)
|
||
out.Name = in.Name
|
||
out.Labels = make(map[string]string)
|
||
for k, v := range in.Labels { out.Labels[k] = v }
|
||
// Did you remember Annotations? Finalizers? Every nested struct?
|
||
}
|
||
```
|
||
|
||
**Example — after:**
|
||
```go
|
||
// +k8s:deepcopy-gen=true
|
||
type Deployment struct {
|
||
Name string
|
||
Labels map[string]string
|
||
Annotations map[string]string
|
||
}
|
||
// Generated: zz_generated.deepcopy.go handles ALL fields correctly, always.
|
||
// Adding a new field? Re-run generator. Zero chance of forgetting.
|
||
```
|
||
|
||
### When NOT to Use
|
||
|
||
**Don't use this when:**
|
||
- You have fewer than ~5 types (hand-writing is faster and more readable)
|
||
- The boilerplate varies significantly between types (generators work best for uniform patterns)
|
||
- The generated code would be harder to debug than hand-written code (e.g., complex business logic)
|
||
|
||
**Over-application example:**
|
||
```go
|
||
// Generating a "ToString" method for 3 types — overkill
|
||
//go:generate stringer -type=Status,Priority,Phase
|
||
|
||
// You now have 3 generated files, a build dependency on stringer,
|
||
// and anyone reading the code has to understand the generation pipeline
|
||
// for what amounts to 9 lines of hand-written code.
|
||
```
|
||
|
||
**Better alternative:**
|
||
```go
|
||
// Just write it — 3 types is not "at scale"
|
||
func (s Status) String() string {
|
||
switch s {
|
||
case Running: return "Running"
|
||
case Stopped: return "Stopped"
|
||
default: return fmt.Sprintf("Status(%d)", s)
|
||
}
|
||
}
|
||
```
|
||
|
||
**Why:** Code generation adds build complexity (Makefiles, CI steps, `go generate` ordering), makes debugging harder (stack traces point to generated code), and obscures logic. It pays off only when the volume of boilerplate is large enough that correctness via automation outweighs these costs.
|
||
|
||
### Key Insight
|
||
**Stdlib has no code generation culture.** stdlib keeps things small enough that hand-writing works. Kubernetes proves that once you cross ~20 types with shared behavior, code gen is the only sane path.
|
||
|
||
---
|
||
|
||
## 2. The Scheme / Type Registry Pattern
|
||
|
||
**Source:** `staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go` (lines 38–100), `scheme_builder.go`
|
||
|
||
### What it does
|
||
The Scheme is a runtime type registry that maps:
|
||
- `GroupVersionKind` → Go type (`reflect.Type`)
|
||
- Go type → `[]GroupVersionKind`
|
||
- Provides serialization, defaulting, conversion, and validation dispatch
|
||
|
||
### Why
|
||
Kubernetes has 50+ resource types across 15+ API groups, each with multiple versions. The Scheme provides:
|
||
- **Dynamic dispatch**: serialize any Object without knowing its concrete type
|
||
- **Version conversion**: convert between v1 and v1beta1 transparently
|
||
- **Pluggability**: third-party resources register into the same system
|
||
|
||
### Structure
|
||
|
||
```go
|
||
// staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go:38-98
|
||
type Scheme struct {
|
||
gvkToType map[schema.GroupVersionKind]reflect.Type
|
||
typeToGVK map[reflect.Type][]schema.GroupVersionKind
|
||
unversionedTypes map[reflect.Type]schema.GroupVersionKind
|
||
defaulterFuncs map[reflect.Type]func(interface{})
|
||
validationFuncs map[reflect.Type]func(ctx, op, obj, oldObj) field.ErrorList
|
||
converter *conversion.Converter
|
||
versionPriority map[string][]string
|
||
}
|
||
```
|
||
|
||
### SchemeBuilder Pattern
|
||
|
||
```go
|
||
// staging/src/k8s.io/apimachinery/pkg/runtime/scheme_builder.go:23-48
|
||
type SchemeBuilder []func(*Scheme) error
|
||
|
||
func (sb *SchemeBuilder) AddToScheme(s *Scheme) error {
|
||
for _, f := range *sb {
|
||
if err := f(s); err != nil {
|
||
return err
|
||
}
|
||
}
|
||
return nil
|
||
}
|
||
|
||
func (sb *SchemeBuilder) Register(funcs ...func(*Scheme) error) {
|
||
*sb = append(*sb, f)
|
||
}
|
||
```
|
||
|
||
### How Registration Works
|
||
|
||
```go
|
||
// staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go:151-160
|
||
func (s *Scheme) AddKnownTypes(gv schema.GroupVersion, types ...Object) {
|
||
for _, obj := range types {
|
||
t := reflect.TypeOf(obj)
|
||
if t.Kind() != reflect.Pointer {
|
||
panic("All types must be pointers to structs.")
|
||
}
|
||
t = t.Elem()
|
||
s.AddKnownTypeWithName(gv.WithKind(t.Name()), obj)
|
||
}
|
||
}
|
||
```
|
||
|
||
### When to Use
|
||
|
||
**Triggers:**
|
||
- You have many types that must be serialized/deserialized polymorphically (you read JSON/YAML and don't know the concrete type ahead of time)
|
||
- You need version conversion between different representations of the same concept
|
||
- Third parties need to register new types without modifying your core code
|
||
|
||
**Example — before:**
|
||
```go
|
||
// Hard-coded switch statement — breaks open/closed principle
|
||
func Deserialize(data []byte) (Object, error) {
|
||
kind := extractKind(data)
|
||
switch kind {
|
||
case "Deployment":
|
||
var d Deployment
|
||
json.Unmarshal(data, &d)
|
||
return &d, nil
|
||
case "Service":
|
||
var s Service
|
||
json.Unmarshal(data, &s)
|
||
return &s, nil
|
||
// ... 50 more cases, each a potential bug
|
||
default:
|
||
return nil, fmt.Errorf("unknown kind: %s", kind)
|
||
}
|
||
}
|
||
```
|
||
|
||
**Example — after:**
|
||
```go
|
||
// Type registry — extensible, no switch statements
|
||
scheme := runtime.NewScheme()
|
||
scheme.AddKnownTypes(appsv1.SchemeGroupVersion, &Deployment{}, &ReplicaSet{})
|
||
scheme.AddKnownTypes(corev1.SchemeGroupVersion, &Service{}, &Pod{})
|
||
|
||
func Deserialize(scheme *runtime.Scheme, data []byte) (Object, error) {
|
||
gvk := extractGVK(data)
|
||
obj, err := scheme.New(gvk) // creates correct concrete type
|
||
if err != nil { return nil, err }
|
||
json.Unmarshal(data, obj)
|
||
return obj, nil
|
||
}
|
||
// New types register themselves — no core code changes needed
|
||
```
|
||
|
||
### When NOT to Use
|
||
|
||
**Don't use this when:**
|
||
- You have a small, fixed set of types that won't grow (a switch statement is simpler and fully type-safe at compile time)
|
||
- You don't need dynamic/polymorphic deserialization (you always know the concrete type)
|
||
- You're not building a plugin system (nobody external needs to register types)
|
||
|
||
**Over-application example:**
|
||
```go
|
||
// Type registry for an internal service with 3 event types — over-engineered
|
||
var eventScheme = NewScheme()
|
||
func init() {
|
||
eventScheme.Register("OrderCreated", reflect.TypeOf(OrderCreated{}))
|
||
eventScheme.Register("OrderShipped", reflect.TypeOf(OrderShipped{}))
|
||
eventScheme.Register("OrderCanceled", reflect.TypeOf(OrderCanceled{}))
|
||
}
|
||
// Now debugging requires understanding the registry, reflection, runtime dispatch...
|
||
```
|
||
|
||
**Better alternative:**
|
||
```go
|
||
// Simple interface + switch — readable, debuggable, compile-time safe
|
||
type Event interface { EventType() string }
|
||
|
||
func handleEvent(data []byte) error {
|
||
switch extractType(data) {
|
||
case "OrderCreated":
|
||
var e OrderCreated
|
||
json.Unmarshal(data, &e)
|
||
return processCreated(e)
|
||
case "OrderShipped":
|
||
// ...
|
||
}
|
||
}
|
||
```
|
||
|
||
**Why:** Type registries trade compile-time safety for runtime extensibility. In Kubernetes (50+ types, CRDs, multiple API groups), this tradeoff is essential. In a service with a handful of known types, you're paying the complexity cost (reflection, runtime errors, harder debugging) without the benefit.
|
||
|
||
### Key Insight
|
||
This is Java's ServiceLoader / dependency injection adapted for Go's type system. Stdlib uses interfaces; Kubernetes needs a **runtime type system on top of Go's static type system** because API objects must be dynamically dispatched across version boundaries.
|
||
|
||
---
|
||
|
||
## 3. The runtime.Object Interface
|
||
|
||
**Source:** `staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go` (lines 333–342)
|
||
|
||
### What it does
|
||
Every Kubernetes API object must implement this two-method interface:
|
||
|
||
```go
|
||
// staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go:337-341
|
||
type Object interface {
|
||
GetObjectKind() schema.ObjectKind
|
||
DeepCopyObject() Object
|
||
}
|
||
```
|
||
|
||
### Why
|
||
- `GetObjectKind()` — allows the serialization layer to determine what type an object is without reflection
|
||
- `DeepCopyObject()` — enables safe concurrent access (informer cache is shared; mutations must happen on copies)
|
||
|
||
### When to Use
|
||
|
||
**Triggers:**
|
||
- You're building a framework where heterogeneous types flow through a common pipeline (serialization, admission, storage)
|
||
- You need to identify the "kind" of an object at runtime without type switches
|
||
- Concurrent access requires copy semantics built into the type contract
|
||
|
||
**Example — before:**
|
||
```go
|
||
// No common interface — every handler needs type assertions
|
||
func store(obj interface{}) error {
|
||
switch v := obj.(type) {
|
||
case *Deployment:
|
||
return storeDeployment(v)
|
||
case *Service:
|
||
return storeService(v)
|
||
// Every new type requires changes here
|
||
}
|
||
}
|
||
```
|
||
|
||
**Example — after:**
|
||
```go
|
||
// Common interface — generic pipeline handles any type
|
||
func store(obj runtime.Object) error {
|
||
gvk := obj.GetObjectKind().GroupVersionKind()
|
||
key := buildKey(gvk, obj)
|
||
data, _ := serialize(obj)
|
||
return etcd.Put(key, data)
|
||
// Works for any registered type — no switch statement
|
||
}
|
||
```
|
||
|
||
### When NOT to Use
|
||
|
||
**Don't use this when:**
|
||
- Your types don't need polymorphic handling (you always know the concrete type at each call site)
|
||
- Deep copy is irrelevant (no shared caches, no concurrent access)
|
||
- You're not building framework-level infrastructure (application code rarely needs this)
|
||
|
||
**Over-application example:**
|
||
```go
|
||
// Implementing runtime.Object on application-level domain types
|
||
type Invoice struct { ... }
|
||
func (i *Invoice) GetObjectKind() schema.ObjectKind { return &i.TypeMeta }
|
||
func (i *Invoice) DeepCopyObject() runtime.Object { return i.DeepCopy() }
|
||
|
||
// Now your business logic imports k8s.io/apimachinery — for what?
|
||
```
|
||
|
||
**Better alternative:** Use a domain-specific interface that captures what your code actually needs:
|
||
```go
|
||
type Storable interface {
|
||
ID() string
|
||
Marshal() ([]byte, error)
|
||
}
|
||
```
|
||
|
||
**Why:** `runtime.Object` is designed for Kubernetes' specific needs (polymorphic serialization across API versions). Adopting it in non-Kubernetes code couples you to a heavy dependency for an abstraction that doesn't match your domain.
|
||
|
||
### Key Insight
|
||
**This is the foundation of Kubernetes' extensibility.** Any Go struct that satisfies these two methods can participate in the entire API machinery — serialization, storage, admission, informers, etc. CRDs generate code that implements this interface.
|
||
|
||
---
|
||
|
||
## 4. Deep Copy Everywhere
|
||
|
||
**Source:** Generated code in `zz_generated.deepcopy.go` files throughout the tree
|
||
|
||
### What it does
|
||
Every API type has generated `DeepCopy()` and `DeepCopyInto()` methods that create true deep copies including nested slices, maps, and pointer fields.
|
||
|
||
### Why
|
||
The informer cache is shared across all controllers in a process. If controller A gets an object from the cache and mutates it, controller B would see corrupted data. Deep copy provides the isolation guarantee.
|
||
|
||
```go
|
||
// Usage pattern in controllers:
|
||
deployment := deploymentFromCache.DeepCopy()
|
||
deployment.Spec.Replicas = ptr.To[int32](3)
|
||
_, err := client.AppsV1().Deployments(ns).Update(ctx, deployment, metav1.UpdateOptions{})
|
||
```
|
||
|
||
### When to Use
|
||
|
||
**Triggers:**
|
||
- You read from a shared data structure (cache, registry, config store) and need to modify the result
|
||
- Multiple goroutines access the same data and at least one modifies it
|
||
- You're passing data across ownership boundaries (your function returns data that the caller might mutate)
|
||
|
||
**Example — before:**
|
||
```go
|
||
// Returning a pointer to internal state — caller can corrupt it
|
||
func (c *ConfigStore) GetConfig() *Config {
|
||
return c.current // caller mutates this → corrupts store
|
||
}
|
||
```
|
||
|
||
**Example — after:**
|
||
```go
|
||
// Return a copy — caller can mutate freely
|
||
func (c *ConfigStore) GetConfig() *Config {
|
||
c.mu.RLock()
|
||
defer c.mu.RUnlock()
|
||
copy := c.current.DeepCopy()
|
||
return copy
|
||
}
|
||
```
|
||
|
||
### When NOT to Use
|
||
|
||
**Don't use this when:**
|
||
- The data is owned by a single goroutine (no concurrent access, no shared cache)
|
||
- You only need to read the data, never modify it (deep copy for read-only access is wasted allocation)
|
||
- Performance is critical and you can prove safety via immutability or other means (e.g., freezing the object after construction)
|
||
|
||
**Over-application example:**
|
||
```go
|
||
// Deep copying on every read, even when only logging
|
||
func (c *Controller) logStatus(ctx context.Context, key string) {
|
||
obj, _ := c.lister.Get(key)
|
||
copy := obj.DeepCopy() // Allocates! But we never mutate copy
|
||
logger.Info("status", "phase", copy.Status.Phase)
|
||
}
|
||
```
|
||
|
||
**Better alternative:**
|
||
```go
|
||
// Read directly from cache — no mutation, no copy needed
|
||
func (c *Controller) logStatus(ctx context.Context, key string) {
|
||
obj, _ := c.lister.Get(key)
|
||
logger.Info("status", "phase", obj.Status.Phase) // read-only: safe
|
||
}
|
||
```
|
||
|
||
**Why:** Deep copy allocates memory and does work proportional to object size. In hot paths that only read data, it's pure waste. Copy only when you intend to mutate.
|
||
|
||
### Key Insight
|
||
Stdlib rarely needs deep copy because stdlib objects are typically owned by one goroutine. Kubernetes has a **shared read cache** (the informer store) that necessitates copy-on-write semantics at the application level.
|
||
|
||
---
|
||
|
||
## 5. Graceful Shutdown with Priority Classes
|
||
|
||
**Source:** `pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go` (lines 23–100)
|
||
|
||
### What it does
|
||
When a node is shutting down, pods are terminated in priority order. Critical pods (system-node-critical) get more grace time than regular pods.
|
||
|
||
### Why
|
||
A hard kill of all pods simultaneously would lose important work. Priority-based graceful shutdown preserves the most important workloads longest.
|
||
|
||
```go
|
||
// pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go:66-90
|
||
type managerImpl struct {
|
||
logger klog.Logger
|
||
recorder record.EventRecorder
|
||
getPods eviction.ActivePodsFunc
|
||
syncNodeStatus func(context.Context)
|
||
dbusCon dbusInhibiter
|
||
inhibitLock systemd.InhibitLock
|
||
nodeShuttingDownMutex sync.Mutex
|
||
nodeShuttingDownNow bool
|
||
podManager *podManager
|
||
}
|
||
```
|
||
|
||
### When to Use
|
||
|
||
**Triggers:**
|
||
- Your system runs workloads with different importance levels (critical infrastructure vs. batch jobs)
|
||
- You receive shutdown signals and need to drain gracefully, not crash
|
||
- Some work MUST complete (leader lease release, data flush) while other work can be interrupted
|
||
|
||
**Example — before:**
|
||
```go
|
||
// Flat shutdown: everything gets the same 30s, critical work might not finish
|
||
func main() {
|
||
sig := make(chan os.Signal, 1)
|
||
signal.Notify(sig, syscall.SIGTERM)
|
||
<-sig
|
||
|
||
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||
defer cancel()
|
||
server.Shutdown(ctx) // hope 30s is enough for everything
|
||
}
|
||
```
|
||
|
||
**Example — after:**
|
||
```go
|
||
// Priority-based shutdown: critical work gets more time
|
||
func main() {
|
||
sig := make(chan os.Signal, 1)
|
||
signal.Notify(sig, syscall.SIGTERM)
|
||
<-sig
|
||
|
||
// Phase 1: stop accepting new work (immediate)
|
||
server.StopAccepting()
|
||
|
||
// Phase 2: drain low-priority work (5s budget)
|
||
ctx1, cancel1 := context.WithTimeout(context.Background(), 5*time.Second)
|
||
batchWorker.Shutdown(ctx1)
|
||
cancel1()
|
||
|
||
// Phase 3: drain critical work (25s budget)
|
||
ctx2, cancel2 := context.WithTimeout(context.Background(), 25*time.Second)
|
||
leaderElector.Release(ctx2)
|
||
dataStore.Flush(ctx2)
|
||
cancel2()
|
||
}
|
||
```
|
||
|
||
### When NOT to Use
|
||
|
||
**Don't use this when:**
|
||
- All your work is equally important (flat shutdown with a single timeout is simpler)
|
||
- Your process is stateless and restarts are cheap (just die and restart)
|
||
- Shutdown time is not constrained (you can wait as long as needed for everything to drain)
|
||
|
||
**Over-application example:**
|
||
```go
|
||
// Priority shutdown for a stateless HTTP server with no background work
|
||
type ShutdownManager struct {
|
||
priorities []PriorityClass
|
||
workers map[PriorityClass][]Worker
|
||
}
|
||
// All this machinery for... stopping an HTTP handler that has no in-flight state?
|
||
```
|
||
|
||
**Better alternative:**
|
||
```go
|
||
// Simple graceful shutdown — stateless server just drains connections
|
||
server := &http.Server{}
|
||
go server.ListenAndServe()
|
||
<-sigCh
|
||
server.Shutdown(ctx) // drains in-flight requests, done
|
||
```
|
||
|
||
**Why:** Priority-based shutdown adds ordering logic, timeout budgets per phase, and coordination complexity. It's justified when different subsystems have genuinely different importance. For simple services, `http.Server.Shutdown()` is all you need.
|
||
|
||
---
|
||
|
||
## 6. Context as Logger Carrier
|
||
|
||
**Source:** `pkg/controller/deployment/deployment_controller.go` (lines 106, 179, 500)
|
||
|
||
### What it does
|
||
Kubernetes passes structured loggers through context:
|
||
|
||
```go
|
||
// pkg/controller/deployment/deployment_controller.go:179
|
||
logger := klog.FromContext(ctx)
|
||
logger.Info("Starting controller", "controller", "deployment")
|
||
```
|
||
|
||
### Why
|
||
At scale, you need structured logging with:
|
||
- Consistent key-value pairs (controller name, object reference)
|
||
- Verbosity levels (`logger.V(4).Info(...)`)
|
||
- No global state (context carries the logger configured by the caller)
|
||
|
||
### When to Use
|
||
|
||
**Triggers:**
|
||
- You have deep call stacks where log context (request ID, user, controller name) should propagate without explicit parameters
|
||
- Multiple subsystems share code but need different logger configurations (verbosity, output format)
|
||
- You want to enrich logs with contextual metadata without threading a logger through every function signature
|
||
|
||
**Example — before:**
|
||
```go
|
||
// Global logger: no context, hard to filter, pollutes all output
|
||
var log = logrus.New()
|
||
|
||
func processOrder(orderID string) error {
|
||
log.Info("processing order") // which order? which service? which request?
|
||
items := fetchItems(orderID)
|
||
log.Infof("fetched %d items", len(items)) // no correlation possible
|
||
return nil
|
||
}
|
||
```
|
||
|
||
**Example — after:**
|
||
```go
|
||
// Context-carried logger: each request has its own enriched logger
|
||
func processOrder(ctx context.Context, orderID string) error {
|
||
logger := klog.FromContext(ctx)
|
||
logger = logger.WithValues("orderID", orderID)
|
||
ctx = klog.NewContext(ctx, logger)
|
||
|
||
logger.Info("processing order") // automatically includes orderID
|
||
items := fetchItems(ctx, orderID)
|
||
// fetchItems logs with the same logger — all lines correlate
|
||
return nil
|
||
}
|
||
```
|
||
|
||
### When NOT to Use
|
||
|
||
**Don't use this when:**
|
||
- You have a simple CLI tool with sequential execution (a global logger is fine)
|
||
- Performance is critical and context allocation is measurable overhead (rare, but possible in hot loops)
|
||
- Your logging needs are simple and don't require request-scoped enrichment
|
||
|
||
**Over-application example:**
|
||
```go
|
||
// Context logger in a tight computational loop — unnecessary overhead
|
||
func computeHash(ctx context.Context, data []byte) []byte {
|
||
logger := klog.FromContext(ctx) // context lookup on every call
|
||
logger.V(5).Info("computing hash", "size", len(data))
|
||
// This is called 1M times/sec — the context lookup adds up
|
||
return sha256.Sum(data)
|
||
}
|
||
```
|
||
|
||
**Better alternative:**
|
||
```go
|
||
// Log once at the boundary, not in the hot loop
|
||
func processHashes(ctx context.Context, items [][]byte) [][]byte {
|
||
logger := klog.FromContext(ctx)
|
||
logger.Info("processing hashes", "count", len(items))
|
||
results := make([][]byte, len(items))
|
||
for i, data := range items {
|
||
results[i] = sha256.Sum(data) // no logging in hot path
|
||
}
|
||
return results
|
||
}
|
||
```
|
||
|
||
**Why:** Context-based logging is for request/operation scoping, not for instrumenting every function call. In hot paths, the overhead of context lookup and structured log construction matters.
|
||
|
||
### Key Insight
|
||
Stdlib's `log` package is global. Kubernetes uses context-based structured logging (`klog.FromContext`) to allow each call chain to carry its own logger configuration. This enables filtering by controller, verbosity tuning per-component, and correlation.
|
||
|
||
---
|
||
|
||
## 7. Functional Options for Configuration
|
||
|
||
**Source:** `staging/src/k8s.io/client-go/informers/factory.go` (lines 83–127)
|
||
|
||
### What it does
|
||
The SharedInformerFactory uses functional options for configuration:
|
||
|
||
```go
|
||
// staging/src/k8s.io/client-go/informers/factory.go:57
|
||
type SharedInformerOption func(*sharedInformerFactory) *sharedInformerFactory
|
||
|
||
func WithNamespace(namespace string) SharedInformerOption {
|
||
return func(factory *sharedInformerFactory) *sharedInformerFactory {
|
||
factory.namespace = namespace
|
||
return factory
|
||
}
|
||
}
|
||
|
||
func WithTransform(transform cache.TransformFunc) SharedInformerOption {
|
||
return func(factory *sharedInformerFactory) *sharedInformerFactory {
|
||
factory.transform = transform
|
||
return factory
|
||
}
|
||
}
|
||
|
||
func NewSharedInformerFactoryWithOptions(client kubernetes.Interface, defaultResync time.Duration, options ...SharedInformerOption) SharedInformerFactory {
|
||
factory := &sharedInformerFactory{...}
|
||
for _, opt := range options {
|
||
factory = opt(factory)
|
||
}
|
||
return factory
|
||
}
|
||
```
|
||
|
||
### Why
|
||
APIs evolve. Adding a new configuration option shouldn't break callers. Functional options provide:
|
||
- Backward compatibility (new options don't change existing signatures)
|
||
- Self-documenting (each option is a named function)
|
||
- Composability (options can be collected and applied conditionally)
|
||
|
||
### When to Use
|
||
|
||
**Triggers:**
|
||
- Your constructor has more than 3-4 optional parameters that grow over time
|
||
- You're building a library/SDK where backward compatibility matters across versions
|
||
- Different callers need different subsets of configuration (not everyone uses every option)
|
||
|
||
**Example — before:**
|
||
```go
|
||
// Constructor with growing parameter list — breaks on every addition
|
||
func NewClient(addr string, timeout time.Duration, retries int, tls bool, cert string) *Client
|
||
// v2: added auth
|
||
func NewClient(addr string, timeout time.Duration, retries int, tls bool, cert string, token string) *Client
|
||
// Every caller must update, even if they don't use the new param
|
||
```
|
||
|
||
**Example — after:**
|
||
```go
|
||
func NewClient(addr string, opts ...ClientOption) *Client {
|
||
c := &Client{addr: addr, timeout: 30 * time.Second, retries: 3}
|
||
for _, opt := range opts { opt(c) }
|
||
return c
|
||
}
|
||
|
||
func WithTimeout(d time.Duration) ClientOption { return func(c *Client) { c.timeout = d } }
|
||
func WithRetries(n int) ClientOption { return func(c *Client) { c.retries = n } }
|
||
func WithTLS(cert string) ClientOption { return func(c *Client) { c.tls = true; c.cert = cert } }
|
||
// Adding WithAuth doesn't break any existing callers
|
||
```
|
||
|
||
### When NOT to Use
|
||
|
||
**Don't use this when:**
|
||
- You have 1-2 required parameters and nothing optional (just use a simple constructor)
|
||
- The configuration is static and well-known (a config struct is simpler and more discoverable)
|
||
- You're building an internal-only API that you control all callers of (breaking changes are cheap)
|
||
|
||
**Over-application example:**
|
||
```go
|
||
// Functional options for a struct with one optional field
|
||
func NewLogger(opts ...LoggerOption) *Logger {
|
||
l := &Logger{level: InfoLevel}
|
||
for _, opt := range opts { opt(l) }
|
||
return l
|
||
}
|
||
func WithLevel(level Level) LoggerOption { return func(l *Logger) { l.level = level } }
|
||
|
||
// Callers write: NewLogger(WithLevel(DebugLevel))
|
||
// When they could write: NewLogger(DebugLevel) — simpler, clearer
|
||
```
|
||
|
||
**Better alternative:**
|
||
```go
|
||
// Simple constructor with the one meaningful parameter
|
||
func NewLogger(level Level) *Logger {
|
||
return &Logger{level: level}
|
||
}
|
||
```
|
||
|
||
**Why:** Functional options add indirection (each option is a closure) and reduce discoverability (you need to know which `With*` functions exist). For simple constructors with few parameters, a direct signature is clearer. Use functional options when the option space is large and growing.
|
||
|
||
---
|
||
|
||
## 8. Type-Safe Generics in Critical Paths
|
||
|
||
**Source:** `staging/src/k8s.io/client-go/util/workqueue/queue.go` (lines 33–200), `staging/src/k8s.io/client-go/gentype/type.go` (lines 33–120)
|
||
|
||
### What it does
|
||
Both workqueue and gentype use Go generics (1.18+) to provide type-safe interfaces while maintaining backward compatibility via type aliases:
|
||
|
||
```go
|
||
// Workqueue: type-safe queue
|
||
type TypedInterface[T comparable] interface {
|
||
Add(item T)
|
||
Get() (item T, shutdown bool)
|
||
Done(item T)
|
||
}
|
||
|
||
// Type alias for backward compat
|
||
type Type = Typed[any]
|
||
|
||
// Gentype: type-safe client
|
||
type Client[T objectWithMeta] struct {
|
||
resource string
|
||
client rest.Interface
|
||
namespace string
|
||
newObject func() T
|
||
}
|
||
```
|
||
|
||
### Why
|
||
Before generics, Kubernetes used `interface{}` everywhere, requiring type assertions at every boundary. Generics eliminate entire classes of runtime panics and make the code self-documenting.
|
||
|
||
### When to Use
|
||
|
||
**Triggers:**
|
||
- You have container types (queues, caches, pools) that currently use `interface{}`/`any` with type assertions
|
||
- Type assertion panics have caused production issues or require defensive coding at every call site
|
||
- You're building a library where callers benefit from compile-time type checking
|
||
|
||
**Example — before:**
|
||
```go
|
||
// interface{} queue: type assertions at every boundary
|
||
type Queue struct { items []interface{} }
|
||
func (q *Queue) Add(item interface{}) { q.items = append(q.items, item) }
|
||
func (q *Queue) Get() interface{} { /* ... */ }
|
||
|
||
// Caller must assert — runtime panic if wrong type sneaks in
|
||
item := queue.Get()
|
||
pod := item.(*v1.Pod) // panics if someone added a string
|
||
```
|
||
|
||
**Example — after:**
|
||
```go
|
||
// Generic queue: compile-time safety, no assertions needed
|
||
type Queue[T any] struct { items []T }
|
||
func (q *Queue[T]) Add(item T) { q.items = append(q.items, item) }
|
||
func (q *Queue[T]) Get() T { /* ... */ }
|
||
|
||
// Caller gets the right type directly — no assertion, no panic
|
||
queue := NewQueue[*v1.Pod]()
|
||
pod := queue.Get() // already *v1.Pod at compile time
|
||
```
|
||
|
||
### When NOT to Use
|
||
|
||
**Don't use this when:**
|
||
- You genuinely need heterogeneous collections (multiple unrelated types in one container)
|
||
- The generic constraint would be so broad (`any`) that you gain no type safety anyway
|
||
- You're adding generics to existing interfaces where all callers work fine with concrete types (generics for generics' sake)
|
||
|
||
**Over-application example:**
|
||
```go
|
||
// Genericizing a function that only ever handles one type
|
||
func processItems[T any](items []T, fn func(T) error) error {
|
||
for _, item := range items {
|
||
if err := fn(item); err != nil { return err }
|
||
}
|
||
return nil
|
||
}
|
||
// Every call site: processItems[*Pod](pods, handlePod)
|
||
// When you could just write: for _, pod := range pods { handlePod(pod) }
|
||
```
|
||
|
||
**Better alternative:**
|
||
```go
|
||
// If the type is always known and the pattern is trivial, just loop
|
||
for _, pod := range pods {
|
||
if err := handlePod(pod); err != nil { return err }
|
||
}
|
||
```
|
||
|
||
**Why:** Generics shine when they eliminate runtime type assertions or enable reusable container/algorithm libraries. They add cognitive overhead (type parameter noise) when the code only ever operates on one concrete type.
|
||
|
||
### Key Insight
|
||
This is a migration pattern: introduce the generic version alongside the deprecated `interface{}` version using type aliases. Callers migrate at their own pace.
|
||
|
||
---
|
||
|
||
## 9. HandleCrash — Structured Panic Recovery
|
||
|
||
**Source:** `staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go` (lines 30–120)
|
||
|
||
### What it does
|
||
A standardized `defer HandleCrash()` pattern that:
|
||
1. Catches panics
|
||
2. Logs them with proper stack attribution
|
||
3. Invokes registered panic handlers
|
||
4. Optionally re-panics (controlled by `ReallyCrash` flag)
|
||
|
||
```go
|
||
// staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:78-82
|
||
func HandleCrashWithContext(ctx context.Context, additionalHandlers ...func(context.Context, interface{})) {
|
||
if r := recover(); r != nil {
|
||
handleCrash(ctx, r, additionalHandlers...)
|
||
}
|
||
}
|
||
```
|
||
|
||
### Why
|
||
In a production system with hundreds of goroutines, an unrecovered panic in one kills the entire process. HandleCrash provides a standardized recovery point that:
|
||
- Logs the panic with caller attribution
|
||
- Allows cleanup handlers (shutdown gracefully)
|
||
- In tests, can be configured to not actually crash
|
||
|
||
### When to Use
|
||
|
||
**Triggers:**
|
||
- You're running multiple independent subsystems in one process (multiple controllers, background workers)
|
||
- A panic in one subsystem shouldn't kill the entire process
|
||
- You need structured logging of panic stack traces before potential recovery
|
||
|
||
**Example — before:**
|
||
```go
|
||
// One bad nil pointer in workerB kills workerA, workerC, and the whole server
|
||
func main() {
|
||
go workerA(ctx)
|
||
go workerB(ctx) // panics → entire process dies
|
||
go workerC(ctx)
|
||
select {}
|
||
}
|
||
```
|
||
|
||
**Example — after:**
|
||
```go
|
||
func safeGo(ctx context.Context, name string, f func(ctx context.Context)) {
|
||
go func() {
|
||
defer func() {
|
||
if r := recover(); r != nil {
|
||
log.Printf("panic in %s: %v\n%s", name, r, debug.Stack())
|
||
// Log, alert, increment metric — but don't kill siblings
|
||
}
|
||
}()
|
||
f(ctx)
|
||
}()
|
||
}
|
||
|
||
func main() {
|
||
safeGo(ctx, "worker-a", workerA)
|
||
safeGo(ctx, "worker-b", workerB) // panics → logged, other workers continue
|
||
safeGo(ctx, "worker-c", workerC)
|
||
select {}
|
||
}
|
||
```
|
||
|
||
### When NOT to Use
|
||
|
||
**Don't use this when:**
|
||
- The panic indicates truly unrecoverable corruption (out of memory, data corruption) — recovering would hide the bug
|
||
- You're in a single-goroutine CLI tool (let it crash with a stack trace — that IS the right behavior)
|
||
- The goroutine holds locks or half-modified state that can't be safely abandoned (recovering leaves the system in an inconsistent state)
|
||
|
||
**Over-application example:**
|
||
```go
|
||
// Recovering from panics in code that modifies shared state
|
||
func (c *Controller) unsafeSyncWithRecovery(ctx context.Context, key string) {
|
||
defer func() {
|
||
if r := recover(); r != nil {
|
||
log.Printf("recovered: %v", r)
|
||
// Problem: c.internalState might be half-modified
|
||
// The next sync will read corrupted state
|
||
}
|
||
}()
|
||
c.internalState.Phase = "processing"
|
||
riskyOperation() // panics here
|
||
c.internalState.Phase = "done" // never reached — state is now wrong
|
||
}
|
||
```
|
||
|
||
**Better alternative:**
|
||
```go
|
||
// If you can't guarantee clean recovery, don't recover — crash and restart
|
||
func (c *Controller) syncItem(ctx context.Context, key string) error {
|
||
// Use error returns, not panics, for expected failures
|
||
result, err := riskyOperation()
|
||
if err != nil {
|
||
return err // retry via workqueue — state is clean
|
||
}
|
||
c.internalState.Phase = "done"
|
||
return nil
|
||
}
|
||
// HandleCrash at the top level (worker loop) is fine — each sync starts fresh
|
||
```
|
||
|
||
**Why:** Panic recovery is safe when each iteration is isolated (starts from clean state, reads from cache). It's dangerous when recovery leaves half-mutated state that subsequent operations will read.
|
||
|
||
### Key Insight
|
||
Stdlib's approach is "let it crash." Kubernetes' approach is "catch it, log it, let the controller retry on the next sync." This is only safe because the controller pattern is idempotent.
|
||
|
||
---
|
||
|
||
## 10. ContextForChannel — Bridge Pattern
|
||
|
||
**Source:** `staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go` (lines 120–145)
|
||
|
||
### What it does
|
||
Bridges the older `<-chan struct{}` stop pattern to the modern `context.Context` pattern:
|
||
|
||
```go
|
||
// staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:120-142
|
||
func ContextForChannel(parentCh <-chan struct{}) context.Context {
|
||
return channelContext{stopCh: parentCh}
|
||
}
|
||
|
||
type channelContext struct {
|
||
stopCh <-chan struct{}
|
||
}
|
||
|
||
func (c channelContext) Done() <-chan struct{} { return c.stopCh }
|
||
func (c channelContext) Err() error {
|
||
select {
|
||
case <-c.stopCh:
|
||
return context.Canceled
|
||
default:
|
||
return nil
|
||
}
|
||
}
|
||
```
|
||
|
||
### Why
|
||
Kubernetes predates `context.Context` (which arrived in Go 1.7). Millions of lines of code use `stopCh <-chan struct{}`. Rather than a big-bang rewrite, this adapter allows gradual migration.
|
||
|
||
### When to Use
|
||
|
||
**Triggers:**
|
||
- You have legacy code using `<-chan struct{}` for cancellation that needs to call modern context-aware APIs
|
||
- You're doing a gradual migration from channels to context (not a big-bang rewrite)
|
||
- You need to integrate with a library that only accepts context, but your cancellation signal comes from a channel
|
||
|
||
**Example — before:**
|
||
```go
|
||
// Legacy API: only speaks channels
|
||
func RunLegacyWorker(stopCh <-chan struct{}) {
|
||
for {
|
||
select {
|
||
case <-stopCh:
|
||
return
|
||
default:
|
||
doWork()
|
||
}
|
||
}
|
||
}
|
||
|
||
// New dependency requires context — how to bridge?
|
||
func doWork() {
|
||
ctx := context.TODO() // wrong: doesn't cancel when stopCh closes
|
||
newLibrary.Call(ctx)
|
||
}
|
||
```
|
||
|
||
**Example — after:**
|
||
```go
|
||
func RunLegacyWorker(stopCh <-chan struct{}) {
|
||
ctx := ContextForChannel(stopCh) // bridge: closes when stopCh closes
|
||
for {
|
||
select {
|
||
case <-ctx.Done():
|
||
return
|
||
default:
|
||
newLibrary.Call(ctx) // properly cancels when stopCh closes
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### When NOT to Use
|
||
|
||
**Don't use this when:**
|
||
- You're writing new code (just use `context.Context` from the start — no bridge needed)
|
||
- The channel carries data, not just a close signal (this only works for `<-chan struct{}`)
|
||
- You need context features beyond cancellation (timeouts, values) — add a real context
|
||
|
||
**Over-application example:**
|
||
```go
|
||
// Using ContextForChannel in new code that already has context
|
||
func NewController(ctx context.Context) *Controller {
|
||
stopCh := make(chan struct{})
|
||
go func() { <-ctx.Done(); close(stopCh) }() // convert context → channel
|
||
bridgedCtx := ContextForChannel(stopCh) // convert channel → context
|
||
// You just round-tripped for no reason — use ctx directly
|
||
}
|
||
```
|
||
|
||
**Better alternative:**
|
||
```go
|
||
func NewController(ctx context.Context) *Controller {
|
||
// Just use ctx — it already does everything you need
|
||
return &Controller{ctx: ctx}
|
||
}
|
||
```
|
||
|
||
**Why:** The bridge pattern exists for migration. In new code, using context directly is simpler, supports timeouts/values, and avoids the indirection of channel↔context conversion.
|
||
|
||
### Key Insight
|
||
**Large codebases can't do breaking API changes atomically.** This bridge pattern is how you evolve from one idiom to another over years without breaking everything at once.
|