# Patterns Extracted from oban-bg/oban ## Pattern: Plugin as Behaviour + GenServer **Source:** `lib/oban/plugin.ex` **Category:** plugin **What:** Define a plugin interface as a behaviour with `start_link/1` and `validate/1` callbacks. Plugins must be OTP-compliant (GenServer/Agent). The host supervises them. **Why:** Extensibility without coupling. Oban can start any module that satisfies the behaviour — pruning, cron, lifeline — without knowing implementation details. The `validate/1` callback ensures misconfigured plugins fail at startup, not at runtime. **Example:** ```elixir @callback start_link([option()]) :: GenServer.on_start() @callback validate([option()]) :: :ok | {:error, String.t()} @optional_callbacks [format_logger_output: 2] ``` **When to use:** When your application needs a plugin system where third parties add behavior. The behaviour ensures type safety; supervision ensures fault isolation. **When NOT to use:** Internal modules that you control. Behaviours add ceremony — if there is only one implementation, use a module directly. --- ## Pattern: Structured Telemetry Spans **Source:** `lib/oban/telemetry.ex` **Category:** telemetry **What:** Emit telemetry events as spans with start/stop/exception structure. Every operation (job execution, engine calls, plugin work) follows the same three-event pattern with consistent metadata shapes. **Why:** Uniform observability. Any monitoring tool (AppSignal, Datadog, custom logger) can hook into the same event structure. The span pattern (start → stop|exception) enables latency tracking, error rates, and resource usage measurement without custom instrumentation per feature. **Example:** ```elixir # Event names follow: [:oban, :component, :action, :phase] [:oban, :job, :start] [:oban, :job, :stop] # measurements: duration, memory [:oban, :job, :exception] # + kind, reason, stacktrace [:oban, :engine, :fetch_jobs, :start] [:oban, :engine, :fetch_jobs, :stop] [:oban, :engine, :fetch_jobs, :exception] ``` **When to use:** Any library or application that wants observability without coupling to a specific monitoring backend. The pattern works for database queries, HTTP requests, background jobs, cache operations. **When NOT to use:** Ultra-hot paths where telemetry overhead matters (millions of events/second). Use sampling or skip entirely. --- ## Pattern: Engine Abstraction for Backend Swap **Source:** `lib/oban/engine.ex` **Category:** engine **What:** Define a behaviour (`Engine`) with callbacks for all database operations (insert, fetch, complete, etc.). Ship multiple implementations (Basic/Inline/Lite) that swap at config time. **Why:** Different environments need different backends: Postgres for production, SQLite for development, inline (in-memory) for testing. The engine abstraction lets you swap without changing application code. **Example:** ```elixir @callback init(conf, opts) :: {:ok, meta} | {:error, term} @callback insert_job(conf, changeset, opts) :: {:ok, Job.t()} @callback fetch_jobs(conf, meta, opts) :: {:ok, {meta, [Job.t()]}} @callback complete_job(conf, Job.t()) :: :ok ``` **When to use:** When your system needs to support multiple storage backends, or when testing requires a fundamentally different execution model (synchronous vs async). **When NOT to use:** Single-backend applications. The abstraction layer adds complexity that is only justified when you actually swap implementations. --- ## Pattern: Keyword Validation with Reduce-While **Source:** `lib/oban/validation.ex` **Category:** config **What:** Validate keyword options by iterating with `Enum.reduce_while/3` and a validator function. Stop at first error. Return `:ok` or `{:error, reason}`. **Why:** Keyword lists are the standard Elixir config format. Validating them procedurally (nested if/case) gets messy. The reduce-while + validator pattern is composable: each option validates independently, errors short-circuit, and the validator function can be swapped or extended. **Example:** ```elixir def validate(opts, validator) when is_list(opts) do Enum.reduce_while(opts, :ok, fn opt, acc -> case validator.(opt) do :ok -> {:cont, acc} {:error, _} = error -> {:halt, error} end end) end ``` **When to use:** Any public API that accepts keyword options from users. Libraries, GenServer init, plugin configs. **When NOT to use:** Internal functions where the caller is trusted. Also avoid for deeply nested configs — use schema-based validation (NimbleOptions, Ecto embedded schemas) instead. --- ## Pattern: Testing Mode Toggle **Source:** `lib/oban/testing.ex`, `lib/oban/config.ex` **Category:** testing **What:** Support a `testing:` config option that switches execution mode: `:disabled` (production), `:inline` (execute immediately in caller process), `:manual` (enqueue but don't execute — assert on DB state). **Why:** Background job systems are inherently async, which makes testing hard. The mode toggle gives you: (1) inline for unit tests that need synchronous execution, (2) manual for integration tests that verify enqueueing without side effects. **Example:** ```elixir # In test config: config :my_app, Oban, testing: :manual # In tests: use Oban.Testing, repo: MyApp.Repo perform_job(MyWorker, %{id: 1}) assert_enqueued worker: MyWorker, args: %{id: 1} ``` **When to use:** Any async system that needs deterministic testing — job queues, event buses, notification systems. The testing mode replaces "sleep and hope" with explicit control. **When NOT to use:** Synchronous systems that are already deterministic. Also avoid if the mode toggle leaks into production code paths (keep it config-only, not conditional logic scattered through business code). --- ## Pattern: Stopper for Goroutine Lifecycle (CockroachDB) **Source:** `pkg/util/stop/stopper.go` (cockroachdb) **Category:** concurrency **What:** A dedicated struct that manages the lifecycle of all goroutines in a component: tracks active tasks, refuses new work during shutdown (quiesce), waits for completion, then runs closers. **Why:** In distributed systems, clean shutdown is critical. You need to: (1) stop accepting new work, (2) finish in-flight work, (3) release resources in order. The Stopper centralizes this instead of scattering shutdown logic across every goroutine. **Example:** ```go type Stopper struct { quiescer chan struct{} // closed when quiescing stopped chan struct{} // closed when fully stopped mu struct { syncutil.RWMutex _numTasks int32 quiescing, stopping bool closers []Closer } } // RunAsyncTask refuses new work during quiesce func (s *Stopper) RunAsyncTask(ctx context.Context, taskName string, f func(context.Context)) error { if !s.addTask() { return ErrUnavailable } go func() { defer s.decTask() f(ctx) }() return nil } ``` **When to use:** Any server or subsystem that spawns goroutines and needs graceful shutdown. Especially in long-running services where leaked goroutines cause resource exhaustion. **When NOT to use:** Simple programs with a single main goroutine. Or when `errgroup` with context cancellation suffices for the shutdown coordination. --- ## Pattern: Atomic File Operations with Suffix Convention **Source:** `tsdb/db.go` (prometheus) **Category:** storage **What:** Use directory suffixes (`.tmp-for-creation`, `.tmp-for-deletion`) to make multi-step file operations crash-safe. On startup, clean up any dirs with these suffixes (they represent incomplete operations). **Why:** Database storage needs atomicity. If the process crashes between creating a block and finalizing it, you need to know the block is incomplete. The suffix convention makes incomplete state visible at the filesystem level without requiring a separate journal. **Example:** ```go const ( tmpForDeletionBlockDirSuffix = ".tmp-for-deletion" tmpForCreationBlockDirSuffix = ".tmp-for-creation" ) // On startup: remove any .tmp-* dirs (incomplete ops) // On create: write to dir.tmp-for-creation, then rename // On delete: rename to dir.tmp-for-deletion, then remove ``` **When to use:** Any system that manages files/directories and needs crash consistency without a full WAL. Simpler than a write-ahead log for coarse-grained operations. **When NOT to use:** When you already have a WAL or transaction log. Or for fine-grained operations where rename semantics are insufficient. --- ## Pattern: Options as DefaultOptions() + Override **Source:** `tsdb/db.go` (prometheus) **Category:** configuration **What:** Provide a `DefaultOptions()` function returning a fully-populated config struct. Users copy and override only what they need. No nil-means-default ambiguity. **Why:** Large config structs (20+ fields) are unwieldy. By providing sane defaults as a function (not a package- level var), you avoid mutation bugs and make it clear what "normal" looks like. Users only specify deviations. **Example:** ```go func DefaultOptions() *Options { return &Options{ WALSegmentSize: wlog.DefaultSegmentSize, RetentionDuration: int64(15 * 24 * time.Hour / ...), MinBlockDuration: DefaultBlockDuration, MaxBlockDuration: DefaultBlockDuration, SamplesPerChunk: DefaultSamplesPerChunk, // ... 20 more fields with sane defaults } } // Usage: opts := tsdb.DefaultOptions() opts.RetentionDuration = 30 * 24 * time.Hour db, err := tsdb.Open(dir, nil, nil, opts, nil) ``` **When to use:** Config structs with many fields where most users want defaults. Especially when zero-value semantics would be confusing (e.g., 0 retention = infinite? or off?). **When NOT to use:** Small configs (3-4 fields) where struct literal with zero-means-default is clear enough. --- ## Pattern: Scrape Loop with Aligned Timestamps **Source:** `scrape/scrape.go` (prometheus) **Category:** concurrency **What:** Periodic scrape loops that align timestamps to intervals with a small tolerance, enabling better storage compression downstream. **Why:** Time-series databases compress better when timestamps are regular. A 2ms tolerance on alignment means scraped data aligns to the expected grid while accommodating real-world jitter. **Example:** ```go var ScrapeTimestampTolerance = 2 * time.Millisecond var AlignScrapeTimestamps = true // In scrape loop: if scrape finishes within tolerance // of the expected timestamp, snap to the grid ``` **When to use:** Any periodic data collection where downstream storage benefits from timestamp regularity. Metrics, heartbeats, polling loops. **When NOT to use:** Event-driven data where timestamps must reflect actual occurrence time. Audit logs, user actions, financial transactions.