Files
elixir-patterns/sources/oban.md
T
Rodin 74101b513c chore: merge elixir-conventions and oban-conventions into sources/
Absorbed content from rodin/elixir-conventions and rodin/oban-conventions
into a sources/ directory. These are reference material — descriptive,
not prescriptive. Patterns that prove broadly applicable get promoted
into patterns/.

Part of taxonomy cleanup (issue #4):
- Pattern = prescriptive, follow these
- Convention/Source = reference, study for ideas

The original repos can now be archived.
2026-05-07 18:01:42 -07:00

11 KiB

Patterns Extracted from oban-bg/oban

Pattern: Plugin as Behaviour + GenServer

Source: lib/oban/plugin.ex Category: plugin

What: Define a plugin interface as a behaviour with start_link/1 and validate/1 callbacks. Plugins must be OTP-compliant (GenServer/Agent). The host supervises them.

Why: Extensibility without coupling. Oban can start any module that satisfies the behaviour — pruning, cron, lifeline — without knowing implementation details. The validate/1 callback ensures misconfigured plugins fail at startup, not at runtime.

Example:

@callback start_link([option()]) :: GenServer.on_start()
@callback validate([option()]) :: :ok | {:error, String.t()}
@optional_callbacks [format_logger_output: 2]

When to use: When your application needs a plugin system where third parties add behavior. The behaviour ensures type safety; supervision ensures fault isolation.

When NOT to use: Internal modules that you control. Behaviours add ceremony — if there is only one implementation, use a module directly.


Pattern: Structured Telemetry Spans

Source: lib/oban/telemetry.ex Category: telemetry

What: Emit telemetry events as spans with start/stop/exception structure. Every operation (job execution, engine calls, plugin work) follows the same three-event pattern with consistent metadata shapes.

Why: Uniform observability. Any monitoring tool (AppSignal, Datadog, custom logger) can hook into the same event structure. The span pattern (start → stop|exception) enables latency tracking, error rates, and resource usage measurement without custom instrumentation per feature.

Example:

# Event names follow: [:oban, :component, :action, :phase]
[:oban, :job, :start]
[:oban, :job, :stop]      # measurements: duration, memory
[:oban, :job, :exception] # + kind, reason, stacktrace

[:oban, :engine, :fetch_jobs, :start]
[:oban, :engine, :fetch_jobs, :stop]
[:oban, :engine, :fetch_jobs, :exception]

When to use: Any library or application that wants observability without coupling to a specific monitoring backend. The pattern works for database queries, HTTP requests, background jobs, cache operations.

When NOT to use: Ultra-hot paths where telemetry overhead matters (millions of events/second). Use sampling or skip entirely.


Pattern: Engine Abstraction for Backend Swap

Source: lib/oban/engine.ex Category: engine

What: Define a behaviour (Engine) with callbacks for all database operations (insert, fetch, complete, etc.). Ship multiple implementations (Basic/Inline/Lite) that swap at config time.

Why: Different environments need different backends: Postgres for production, SQLite for development, inline (in-memory) for testing. The engine abstraction lets you swap without changing application code.

Example:

@callback init(conf, opts) :: {:ok, meta} | {:error, term}
@callback insert_job(conf, changeset, opts) :: {:ok, Job.t()}
@callback fetch_jobs(conf, meta, opts) :: {:ok, {meta, [Job.t()]}}
@callback complete_job(conf, Job.t()) :: :ok

When to use: When your system needs to support multiple storage backends, or when testing requires a fundamentally different execution model (synchronous vs async).

When NOT to use: Single-backend applications. The abstraction layer adds complexity that is only justified when you actually swap implementations.


Pattern: Keyword Validation with Reduce-While

Source: lib/oban/validation.ex Category: config

What: Validate keyword options by iterating with Enum.reduce_while/3 and a validator function. Stop at first error. Return :ok or {:error, reason}.

Why: Keyword lists are the standard Elixir config format. Validating them procedurally (nested if/case) gets messy. The reduce-while + validator pattern is composable: each option validates independently, errors short-circuit, and the validator function can be swapped or extended.

Example:

def validate(opts, validator) when is_list(opts) do
  Enum.reduce_while(opts, :ok, fn opt, acc ->
    case validator.(opt) do
      :ok -> {:cont, acc}
      {:error, _} = error -> {:halt, error}
    end
  end)
end

When to use: Any public API that accepts keyword options from users. Libraries, GenServer init, plugin configs.

When NOT to use: Internal functions where the caller is trusted. Also avoid for deeply nested configs — use schema-based validation (NimbleOptions, Ecto embedded schemas) instead.


Pattern: Testing Mode Toggle

Source: lib/oban/testing.ex, lib/oban/config.ex Category: testing

What: Support a testing: config option that switches execution mode: :disabled (production), :inline (execute immediately in caller process), :manual (enqueue but don't execute — assert on DB state).

Why: Background job systems are inherently async, which makes testing hard. The mode toggle gives you: (1) inline for unit tests that need synchronous execution, (2) manual for integration tests that verify enqueueing without side effects.

Example:

# In test config:
config :my_app, Oban, testing: :manual

# In tests:
use Oban.Testing, repo: MyApp.Repo

perform_job(MyWorker, %{id: 1})
assert_enqueued worker: MyWorker, args: %{id: 1}

When to use: Any async system that needs deterministic testing — job queues, event buses, notification systems. The testing mode replaces "sleep and hope" with explicit control.

When NOT to use: Synchronous systems that are already deterministic. Also avoid if the mode toggle leaks into production code paths (keep it config-only, not conditional logic scattered through business code).


Pattern: Stopper for Goroutine Lifecycle (CockroachDB)

Source: pkg/util/stop/stopper.go (cockroachdb) Category: concurrency

What: A dedicated struct that manages the lifecycle of all goroutines in a component: tracks active tasks, refuses new work during shutdown (quiesce), waits for completion, then runs closers.

Why: In distributed systems, clean shutdown is critical. You need to: (1) stop accepting new work, (2) finish in-flight work, (3) release resources in order. The Stopper centralizes this instead of scattering shutdown logic across every goroutine.

Example:

type Stopper struct {
    quiescer chan struct{} // closed when quiescing
    stopped  chan struct{} // closed when fully stopped
    mu struct {
        syncutil.RWMutex
        _numTasks int32
        quiescing, stopping bool
        closers []Closer
    }
}

// RunAsyncTask refuses new work during quiesce
func (s *Stopper) RunAsyncTask(ctx context.Context,
    taskName string, f func(context.Context)) error {
    if !s.addTask() {
        return ErrUnavailable
    }
    go func() {
        defer s.decTask()
        f(ctx)
    }()
    return nil
}

When to use: Any server or subsystem that spawns goroutines and needs graceful shutdown. Especially in long-running services where leaked goroutines cause resource exhaustion.

When NOT to use: Simple programs with a single main goroutine. Or when errgroup with context cancellation suffices for the shutdown coordination.


Pattern: Atomic File Operations with Suffix Convention

Source: tsdb/db.go (prometheus) Category: storage

What: Use directory suffixes (.tmp-for-creation, .tmp-for-deletion) to make multi-step file operations crash-safe. On startup, clean up any dirs with these suffixes (they represent incomplete operations).

Why: Database storage needs atomicity. If the process crashes between creating a block and finalizing it, you need to know the block is incomplete. The suffix convention makes incomplete state visible at the filesystem level without requiring a separate journal.

Example:

const (
    tmpForDeletionBlockDirSuffix = ".tmp-for-deletion"
    tmpForCreationBlockDirSuffix = ".tmp-for-creation"
)

// On startup: remove any .tmp-* dirs (incomplete ops)
// On create: write to dir.tmp-for-creation, then rename
// On delete: rename to dir.tmp-for-deletion, then remove

When to use: Any system that manages files/directories and needs crash consistency without a full WAL. Simpler than a write-ahead log for coarse-grained operations.

When NOT to use: When you already have a WAL or transaction log. Or for fine-grained operations where rename semantics are insufficient.


Pattern: Options as DefaultOptions() + Override

Source: tsdb/db.go (prometheus) Category: configuration

What: Provide a DefaultOptions() function returning a fully-populated config struct. Users copy and override only what they need. No nil-means-default ambiguity.

Why: Large config structs (20+ fields) are unwieldy. By providing sane defaults as a function (not a package- level var), you avoid mutation bugs and make it clear what "normal" looks like. Users only specify deviations.

Example:

func DefaultOptions() *Options {
    return &Options{
        WALSegmentSize:   wlog.DefaultSegmentSize,
        RetentionDuration: int64(15 * 24 * time.Hour / ...),
        MinBlockDuration:  DefaultBlockDuration,
        MaxBlockDuration:  DefaultBlockDuration,
        SamplesPerChunk:   DefaultSamplesPerChunk,
        // ... 20 more fields with sane defaults
    }
}

// Usage:
opts := tsdb.DefaultOptions()
opts.RetentionDuration = 30 * 24 * time.Hour
db, err := tsdb.Open(dir, nil, nil, opts, nil)

When to use: Config structs with many fields where most users want defaults. Especially when zero-value semantics would be confusing (e.g., 0 retention = infinite? or off?).

When NOT to use: Small configs (3-4 fields) where struct literal with zero-means-default is clear enough.


Pattern: Scrape Loop with Aligned Timestamps

Source: scrape/scrape.go (prometheus) Category: concurrency

What: Periodic scrape loops that align timestamps to intervals with a small tolerance, enabling better storage compression downstream.

Why: Time-series databases compress better when timestamps are regular. A 2ms tolerance on alignment means scraped data aligns to the expected grid while accommodating real-world jitter.

Example:

var ScrapeTimestampTolerance = 2 * time.Millisecond
var AlignScrapeTimestamps = true

// In scrape loop: if scrape finishes within tolerance
// of the expected timestamp, snap to the grid

When to use: Any periodic data collection where downstream storage benefits from timestamp regularity. Metrics, heartbeats, polling loops.

When NOT to use: Event-driven data where timestamps must reflect actual occurrence time. Audit logs, user actions, financial transactions.