docs: add patterns extracted from ecto and oban

Ecto: 6 patterns (protocol dispatch, changeset separation, Multi pipelines) Oban: 9 patterns (plugin behaviour, telemetry spans, engine abstraction)
2026-04-30 09:03:17 -07:00
parent 9a94765ea2
commit 44c61840df
2 changed files with 901 additions and 0 deletions
@@ -0,0 +1,359 @@
+# Patterns Extracted from oban-bg/oban
+
+## Pattern: Plugin as Behaviour + GenServer
+
+**Source:** `lib/oban/plugin.ex`
+**Category:** plugin
+
+**What:** Define a plugin interface as a behaviour with
+`start_link/1` and `validate/1` callbacks. Plugins must be
+OTP-compliant (GenServer/Agent). The host supervises them.
+
+**Why:** Extensibility without coupling. Oban can start any
+module that satisfies the behaviour — pruning, cron,
+lifeline — without knowing implementation details. The
+`validate/1` callback ensures misconfigured plugins fail at
+startup, not at runtime.
+
+**Example:**
+
+```elixir
+@callback start_link([option()]) :: GenServer.on_start()
+@callback validate([option()]) :: :ok | {:error, String.t()}
+@optional_callbacks [format_logger_output: 2]
+```
+
+**When to use:** When your application needs a plugin
+system where third parties add behavior. The behaviour
+ensures type safety; supervision ensures fault isolation.
+
+**When NOT to use:** Internal modules that you control.
+Behaviours add ceremony — if there is only one
+implementation, use a module directly.
+
+---
+
+## Pattern: Structured Telemetry Spans
+
+**Source:** `lib/oban/telemetry.ex`
+**Category:** telemetry
+
+**What:** Emit telemetry events as spans with
+start/stop/exception structure. Every operation (job
+execution, engine calls, plugin work) follows the same
+three-event pattern with consistent metadata shapes.
+
+**Why:** Uniform observability. Any monitoring tool
+(AppSignal, Datadog, custom logger) can hook into the same
+event structure. The span pattern (start → stop|exception)
+enables latency tracking, error rates, and resource usage
+measurement without custom instrumentation per feature.
+
+**Example:**
+
+```elixir
+# Event names follow: [:oban, :component, :action, :phase]
+[:oban, :job, :start]
+[:oban, :job, :stop]      # measurements: duration, memory
+[:oban, :job, :exception] # + kind, reason, stacktrace
+
+[:oban, :engine, :fetch_jobs, :start]
+[:oban, :engine, :fetch_jobs, :stop]
+[:oban, :engine, :fetch_jobs, :exception]
+```
+
+**When to use:** Any library or application that wants
+observability without coupling to a specific monitoring
+backend. The pattern works for database queries, HTTP
+requests, background jobs, cache operations.
+
+**When NOT to use:** Ultra-hot paths where telemetry
+overhead matters (millions of events/second). Use sampling
+or skip entirely.
+
+---
+
+## Pattern: Engine Abstraction for Backend Swap
+
+**Source:** `lib/oban/engine.ex`
+**Category:** engine
+
+**What:** Define a behaviour (`Engine`) with callbacks for
+all database operations (insert, fetch, complete, etc.).
+Ship multiple implementations (Basic/Inline/Lite) that swap
+at config time.
+
+**Why:** Different environments need different backends:
+Postgres for production, SQLite for development, inline
+(in-memory) for testing. The engine abstraction lets you
+swap without changing application code.
+
+**Example:**
+
+```elixir
+@callback init(conf, opts) :: {:ok, meta} | {:error, term}
+@callback insert_job(conf, changeset, opts) :: {:ok, Job.t()}
+@callback fetch_jobs(conf, meta, opts) :: {:ok, {meta, [Job.t()]}}
+@callback complete_job(conf, Job.t()) :: :ok
+```
+
+**When to use:** When your system needs to support multiple
+storage backends, or when testing requires a fundamentally
+different execution model (synchronous vs async).
+
+**When NOT to use:** Single-backend applications. The
+abstraction layer adds complexity that is only justified
+when you actually swap implementations.
+
+---
+
+## Pattern: Keyword Validation with Reduce-While
+
+**Source:** `lib/oban/validation.ex`
+**Category:** config
+
+**What:** Validate keyword options by iterating with
+`Enum.reduce_while/3` and a validator function. Stop at
+first error. Return `:ok` or `{:error, reason}`.
+
+**Why:** Keyword lists are the standard Elixir config
+format. Validating them procedurally (nested if/case) gets
+messy. The reduce-while + validator pattern is composable:
+each option validates independently, errors short-circuit,
+and the validator function can be swapped or extended.
+
+**Example:**
+
+```elixir
+def validate(opts, validator) when is_list(opts) do
+  Enum.reduce_while(opts, :ok, fn opt, acc ->
+    case validator.(opt) do
+      :ok -> {:cont, acc}
+      {:error, _} = error -> {:halt, error}
+    end
+  end)
+end
+```
+
+**When to use:** Any public API that accepts keyword
+options from users. Libraries, GenServer init, plugin
+configs.
+
+**When NOT to use:** Internal functions where the caller
+is trusted. Also avoid for deeply nested configs — use
+schema-based validation (NimbleOptions, Ecto embedded
+schemas) instead.
+
+---
+
+## Pattern: Testing Mode Toggle
+
+**Source:** `lib/oban/testing.ex`, `lib/oban/config.ex`
+**Category:** testing
+
+**What:** Support a `testing:` config option that switches
+execution mode: `:disabled` (production), `:inline`
+(execute immediately in caller process), `:manual` (enqueue
+but don't execute — assert on DB state).
+
+**Why:** Background job systems are inherently async, which
+makes testing hard. The mode toggle gives you: (1) inline
+for unit tests that need synchronous execution, (2) manual
+for integration tests that verify enqueueing without
+side effects.
+
+**Example:**
+
+```elixir
+# In test config:
+config :my_app, Oban, testing: :manual
+
+# In tests:
+use Oban.Testing, repo: MyApp.Repo
+
+perform_job(MyWorker, %{id: 1})
+assert_enqueued worker: MyWorker, args: %{id: 1}
+```
+
+**When to use:** Any async system that needs deterministic
+testing — job queues, event buses, notification systems.
+The testing mode replaces "sleep and hope" with explicit
+control.
+
+**When NOT to use:** Synchronous systems that are already
+deterministic. Also avoid if the mode toggle leaks into
+production code paths (keep it config-only, not conditional
+logic scattered through business code).
+
+---
+
+## Pattern: Stopper for Goroutine Lifecycle (CockroachDB)
+
+**Source:** `pkg/util/stop/stopper.go` (cockroachdb)
+**Category:** concurrency
+
+**What:** A dedicated struct that manages the lifecycle of
+all goroutines in a component: tracks active tasks, refuses
+new work during shutdown (quiesce), waits for completion,
+then runs closers.
+
+**Why:** In distributed systems, clean shutdown is critical.
+You need to: (1) stop accepting new work, (2) finish
+in-flight work, (3) release resources in order. The Stopper
+centralizes this instead of scattering shutdown logic across
+every goroutine.
+
+**Example:**
+
+```go
+type Stopper struct {
+    quiescer chan struct{} // closed when quiescing
+    stopped  chan struct{} // closed when fully stopped
+    mu struct {
+        syncutil.RWMutex
+        _numTasks int32
+        quiescing, stopping bool
+        closers []Closer
+    }
+}
+
+// RunAsyncTask refuses new work during quiesce
+func (s *Stopper) RunAsyncTask(ctx context.Context,
+    taskName string, f func(context.Context)) error {
+    if !s.addTask() {
+        return ErrUnavailable
+    }
+    go func() {
+        defer s.decTask()
+        f(ctx)
+    }()
+    return nil
+}
+```
+
+**When to use:** Any server or subsystem that spawns
+goroutines and needs graceful shutdown. Especially in
+long-running services where leaked goroutines cause
+resource exhaustion.
+
+**When NOT to use:** Simple programs with a single main
+goroutine. Or when `errgroup` with context cancellation
+suffices for the shutdown coordination.
+
+---
+
+## Pattern: Atomic File Operations with Suffix Convention
+
+**Source:** `tsdb/db.go` (prometheus)
+**Category:** storage
+
+**What:** Use directory suffixes (`.tmp-for-creation`,
+`.tmp-for-deletion`) to make multi-step file operations
+crash-safe. On startup, clean up any dirs with these
+suffixes (they represent incomplete operations).
+
+**Why:** Database storage needs atomicity. If the process
+crashes between creating a block and finalizing it, you
+need to know the block is incomplete. The suffix convention
+makes incomplete state visible at the filesystem level
+without requiring a separate journal.
+
+**Example:**
+
+```go
+const (
+    tmpForDeletionBlockDirSuffix = ".tmp-for-deletion"
+    tmpForCreationBlockDirSuffix = ".tmp-for-creation"
+)
+
+// On startup: remove any .tmp-* dirs (incomplete ops)
+// On create: write to dir.tmp-for-creation, then rename
+// On delete: rename to dir.tmp-for-deletion, then remove
+```
+
+**When to use:** Any system that manages files/directories
+and needs crash consistency without a full WAL. Simpler
+than a write-ahead log for coarse-grained operations.
+
+**When NOT to use:** When you already have a WAL or
+transaction log. Or for fine-grained operations where
+rename semantics are insufficient.
+
+---
+
+## Pattern: Options as DefaultOptions() + Override
+
+**Source:** `tsdb/db.go` (prometheus)
+**Category:** configuration
+
+**What:** Provide a `DefaultOptions()` function returning a
+fully-populated config struct. Users copy and override only
+what they need. No nil-means-default ambiguity.
+
+**Why:** Large config structs (20+ fields) are unwieldy.
+By providing sane defaults as a function (not a package-
+level var), you avoid mutation bugs and make it clear what
+"normal" looks like. Users only specify deviations.
+
+**Example:**
+
+```go
+func DefaultOptions() *Options {
+    return &Options{
+        WALSegmentSize:   wlog.DefaultSegmentSize,
+        RetentionDuration: int64(15 * 24 * time.Hour / ...),
+        MinBlockDuration:  DefaultBlockDuration,
+        MaxBlockDuration:  DefaultBlockDuration,
+        SamplesPerChunk:   DefaultSamplesPerChunk,
+        // ... 20 more fields with sane defaults
+    }
+}
+
+// Usage:
+opts := tsdb.DefaultOptions()
+opts.RetentionDuration = 30 * 24 * time.Hour
+db, err := tsdb.Open(dir, nil, nil, opts, nil)
+```
+
+**When to use:** Config structs with many fields where most
+users want defaults. Especially when zero-value semantics
+would be confusing (e.g., 0 retention = infinite? or off?).
+
+**When NOT to use:** Small configs (3-4 fields) where
+struct literal with zero-means-default is clear enough.
+
+---
+
+## Pattern: Scrape Loop with Aligned Timestamps
+
+**Source:** `scrape/scrape.go` (prometheus)
+**Category:** concurrency
+
+**What:** Periodic scrape loops that align timestamps to
+intervals with a small tolerance, enabling better storage
+compression downstream.
+
+**Why:** Time-series databases compress better when
+timestamps are regular. A 2ms tolerance on alignment
+means scraped data aligns to the expected grid while
+accommodating real-world jitter.
+
+**Example:**
+
+```go
+var ScrapeTimestampTolerance = 2 * time.Millisecond
+var AlignScrapeTimestamps = true
+
+// In scrape loop: if scrape finishes within tolerance
+// of the expected timestamp, snap to the grid
+```
+
+**When to use:** Any periodic data collection where
+downstream storage benefits from timestamp regularity.
+Metrics, heartbeats, polling loops.
+
+**When NOT to use:** Event-driven data where timestamps
+must reflect actual occurrence time. Audit logs, user
+actions, financial transactions.
+
+<!-- PATTERN_COMPLETE -->