Files
elixir-patterns/patterns/process-design.md
T
Aaron Weiker 4ea9a884aa docs: idiomatic Elixir and Phoenix patterns with source citations
Extracted patterns, conventions, and code smells directly from the
Elixir and Phoenix source code with file path and line number citations.

Covers: GenServer, error handling, data transforms, process design,
testing, documentation, typespecs, macros, behaviours, module organization,
Phoenix-specific patterns, framework deviations, and anti-patterns.
2026-04-29 22:50:12 -07:00

24 KiB

Process Design Patterns — From the Elixir Source

Analysis of lib/elixir/lib/supervisor.ex, lib/elixir/lib/dynamic_supervisor.ex, lib/elixir/lib/task.ex, lib/elixir/lib/task/supervisor.ex, lib/elixir/lib/process.ex, and lib/elixir/lib/registry.ex.


Pattern 1: Static vs Dynamic Supervision — Choose the Right Tool

Source: lib/elixir/lib/supervisor.ex:1-20 vs lib/elixir/lib/dynamic_supervisor.ex:1-20

What it does: Elixir provides two distinct supervisor types:

  • Supervisor — for static children known at compile time, started in a defined order
  • DynamicSupervisor — for children started on demand at runtime, with no ordering guarantees

Why: Static supervisors guarantee startup order (critical for dependencies like "DB pool must start before web server"). Dynamic supervisors optimize for scale — they can hold millions of children using efficient data structures and shut down concurrently.

Anti-pattern: Using a Supervisor when children are created dynamically (e.g., one process per WebSocket connection). You'll hit performance issues and ordering semantics you don't need. Conversely, using DynamicSupervisor for fixed infrastructure (DB pool, PubSub) loses startup order guarantees.

Code example from source (dynamic_supervisor.ex:1-15):

# DynamicSupervisor docs explain the distinction:
# "The Supervisor module was designed to handle mostly static children
#  that are started in the given order when the supervisor starts. A
#  DynamicSupervisor starts with no children. Instead, children are
#  started on demand via start_child/2 and there is no ordering between
#  children."

Pattern 2: PartitionSupervisor for Scalability

Source: lib/elixir/lib/dynamic_supervisor.ex:60-95 and lib/elixir/lib/task/supervisor.ex:35-65

What it does: Both DynamicSupervisor and Task.Supervisor document the same scalability pattern: when a single supervisor becomes a bottleneck, wrap it in a PartitionSupervisor which starts N instances (one per core by default) and routes via a key.

Why: A supervisor is a single process. Under heavy start_child load, it serializes all spawn operations. PartitionSupervisor distributes the load across multiple supervisor processes, using self() as the routing key to ensure each caller consistently hits the same partition.

Anti-pattern: Creating your own load-balancing logic for supervisors, or just accepting the bottleneck. The standard library provides this pattern explicitly.

Code example from source (dynamic_supervisor.ex):

# Instead of a single DynamicSupervisor:
children = [
  {PartitionSupervisor,
   child_spec: DynamicSupervisor,
   name: MyApp.DynamicSupervisors}
]

# Start children through the partition supervisor:
DynamicSupervisor.start_child(
  {:via, PartitionSupervisor, {MyApp.DynamicSupervisors, self()}},
  {Counter, 0}
)

Pattern 3: Supervision Strategies — Choosing the Right Restart Behavior

Source: lib/elixir/lib/supervisor.ex:315-345 (Strategies section)

What it does: Three strategies model three dependency patterns:

  • :one_for_one — independent children (crash of A doesn't affect B)
  • :one_for_all — tightly coupled children (if one fails, all state is inconsistent)
  • :rest_for_one — sequential dependencies (children started after the crashed one depend on it)

Why: These map directly to runtime dependency graphs. A connection pool and its consumers are :rest_for_one — consumers can't work without the pool. Multiple independent request handlers are :one_for_one. Workers sharing a cache are :one_for_all — stale cache state after a crash could cause inconsistency.

Anti-pattern: Defaulting to :one_for_one everywhere without thinking about dependencies. If process B depends on process A's state and A crashes, B will be working with stale assumptions.

Code example from source (supervisor.ex docs):

# Independent workers — one crash doesn't affect others
Supervisor.start_link(children, strategy: :one_for_one)

# Tightly coupled — all must restart together for consistency
Supervisor.start_link(children, strategy: :one_for_all)

# Sequential dependency — later children depend on earlier ones
Supervisor.start_link(children, strategy: :rest_for_one)

Pattern 4: Restart Intensity (max_restarts / max_seconds)

Source: lib/elixir/lib/supervisor.ex:309-313, lib/elixir/lib/dynamic_supervisor.ex:730-758 (implementation)

What it does: Supervisors track restart frequency. If a child exceeds max_restarts within max_seconds, the supervisor itself shuts down (escalating the failure to its parent). Defaults: 3 restarts in 5 seconds.

Why: Prevents infinite restart loops that waste CPU and mask bugs. If a child keeps crashing within seconds, it's a systemic problem that the current supervisor level can't fix. Escalating to the parent allows a higher-level strategy to respond (perhaps restarting the entire subsystem with fresh state).

Anti-pattern: Setting max_restarts extremely high to "prevent crashes." This hides bugs and wastes resources. Let supervisors escalate — that's the point of the hierarchy.

Code example from source (dynamic_supervisor.ex internal logic):

defp add_restart(state) do
  %{max_seconds: max_seconds, max_restarts: max_restarts, restarts: restarts} = state

  now = :erlang.monotonic_time(1)
  restarts = add_restart([now | restarts], now, max_seconds)
  state = %{state | restarts: restarts}

  if length(restarts) <= max_restarts do
    {:ok, state}
  else
    {:shutdown, state}
  end
end

defp add_restart(restarts, now, period) do
  for then <- restarts, now <= then + period, do: then
end

Pattern 5: Restart Values — :permanent vs :transient vs :temporary

Source: lib/elixir/lib/supervisor.ex:130-152 (Restart values section)

What it does: Three restart policies control what happens when a child terminates:

  • :permanent — always restart (default for GenServer/Agent/Supervisor)
  • :transient — restart only on abnormal exit (not :normal, :shutdown, {:shutdown, term})
  • :temporary — never restart (default for Task)

Why: Different processes have different lifecycle expectations. A database pool should always be running (:permanent). A task that computes a value and exits is done when it's done (:temporary). A connection process should restart on crash but not on graceful disconnect (:transient).

Anti-pattern: Making everything :permanent. If a one-shot task keeps restarting, it'll trigger restart intensity limits and take down the supervisor.

Code example from source:

# Task defaults to :temporary — intentional one-shot work
# (from task.ex:282)
def child_spec(arg) do
  %{
    id: Task,
    start: {Task, :start_link, [arg]},
    restart: :temporary
  }
end

# Customize via use:
use GenServer, restart: :transient

Pattern 6: Automatic Shutdown for Pipeline Supervisors

Source: lib/elixir/lib/supervisor.ex:349-375 (Automatic shutdown section)

What it does: Supervisors support :auto_shutdown which terminates the supervisor when significant children exit. Options: :any_significant (first significant child exits → shutdown) or :all_significant (all significant children must exit → shutdown).

Why: Models pipeline/workflow patterns where a supervisor's purpose is tied to its children's work. If all significant workers finish, the supervisor should clean up. This is useful for batch processing supervisors or connection-scoped process groups.

Anti-pattern: Manually monitoring children and calling Supervisor.stop/1. The automatic shutdown mechanism handles this cleanly within OTP semantics.

Code example (from docs):

# Only :transient and :temporary children can be marked significant
children = [
  Supervisor.child_spec({BatchWorker, args}, significant: true, restart: :transient)
]

Supervisor.start_link(children,
  strategy: :one_for_one,
  auto_shutdown: :all_significant
)

Pattern 7: Task.async/await for Concurrent Value Computation

Source: lib/elixir/lib/task.ex:1-20 and lib/elixir/lib/task.ex:300-340

What it does: Task.async spawns a linked, monitored process and returns a %Task{} struct. Task.await blocks until the result arrives or times out. This is the canonical pattern for "compute a value concurrently."

Why: Tasks provide structured concurrency — the caller is linked to the task, so failures propagate naturally. The monitor reference enables safe await with timeout. This is explicit and composable unlike raw spawn_link + receive.

Anti-pattern: Using spawn_link + send + receive for one-shot concurrent computation. You lose: error propagation, structured monitoring, timeout handling, and the caller-tracking metadata that tasks provide.

Code example from source:

task = Task.async(fn -> do_some_work() end)
res = do_some_other_work()
res + Task.await(task)

Key constraint from docs: "If you start an async, you must await. This is either done by calling Task.await/2 or Task.yield/2 followed by Task.shutdown/2."


Source: lib/elixir/lib/task/supervisor.ex:240-320 (async_nolink docs with GenServer example)

What it does: Unlike Task.async, async_nolink spawns a task that is NOT linked to the caller. The caller monitors it and handles success/failure via handle_info. This prevents a task crash from killing the caller.

Why: In a GenServer, you often want to spawn work that might fail without taking down the server. The pattern: spawn with async_nolink, receive the result as {ref, answer}, and handle failure as {:DOWN, ref, :process, _pid, reason}.

Anti-pattern: Using Task.async inside a GenServer when the task might fail. The link means the GenServer crashes too. Use async_nolink + handle_info for resilient concurrent work.

Code example from source (task/supervisor.ex):

defmodule MyApp.Server do
  use GenServer

  def handle_call(:start_task, _from, %{ref: nil} = state) do
    task =
      Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn ->
        # potentially failing work
      end)

    {:reply, :ok, %{state | ref: task.ref}}
  end

  # Task completed successfully
  def handle_info({ref, answer}, %{ref: ref} = state) do
    Process.demonitor(ref, [:flush])
    {:noreply, %{state | ref: nil}}
  end

  # Task failed
  def handle_info({:DOWN, ref, :process, _pid, _reason}, %{ref: ref} = state) do
    {:noreply, %{state | ref: nil}}
  end
end

Pattern 9: Task Supervisor as DynamicSupervisor Specialization

Source: lib/elixir/lib/task/supervisor.ex:151-165 (start_link implementation)

What it does: Task.Supervisor is implemented directly on top of DynamicSupervisor. It stores default restart/shutdown settings in the process dictionary and delegates init to DynamicSupervisor.init.

Why: Specialization without duplication. Task.Supervisor adds task-specific behavior (caller tracking, async/nolink patterns, stream support) on top of the generic dynamic supervision infrastructure. It's a compositional pattern — build specialized supervisors by wrapping the generic one.

Anti-pattern: Re-implementing task supervision from scratch with a plain DynamicSupervisor + custom start logic. Use Task.Supervisor — it handles caller tracking, owner propagation, and proper shutdown.

Code example from source:

# Task.Supervisor.start_link delegates to DynamicSupervisor
def start_link(options \\ []) do
  {restart, options} = Keyword.pop(options, :restart)
  {shutdown, options} = Keyword.pop(options, :shutdown)
  keys = [:max_children, :max_seconds, :max_restarts]
  {sup_opts, start_opts} = Keyword.split(options, keys)
  restart_and_shutdown = {restart || :temporary, shutdown || 5000}
  DynamicSupervisor.start_link(__MODULE__, {restart_and_shutdown, sup_opts}, start_opts)
end

def init({{_restart, _shutdown} = arg, options}) do
  Process.put(__MODULE__, arg)
  DynamicSupervisor.init([strategy: :one_for_one] ++ options)
end

Pattern 10: Registry for Dynamic Process Naming and PubSub

Source: lib/elixir/lib/registry.ex:1-70 (module docs), lib/elixir/lib/registry.ex:250-270 (whereis_name via callbacks)

What it does: Registry provides two modes:

  • :unique keys — each key maps to exactly one process (name registry, process lookup)
  • :duplicate keys — each key maps to many processes (PubSub topics, event dispatch)

Processes are automatically unregistered on death. Registry integrates with GenServer naming via {:via, Registry, {registry, key}}.

Why: Solves the dynamic naming problem without atom leaks. Also provides local PubSub without external dependencies. The registry is partitioned for scalability and uses ETS for O(1) lookups.

Anti-pattern: Building custom ETS-based process registries with manual cleanup on process death. Registry handles monitor-based cleanup automatically.

Code example from source (registry.ex :via callbacks):

# :via integration — GenServer uses these callbacks
def whereis_name({registry, key}), do: whereis_name(registry, key)
def whereis_name({registry, key, _value}), do: whereis_name(registry, key)

defp whereis_name(registry, key) do
  case key_info!(registry) do
    {:unique, partitions, key_ets} ->
      key_ets = key_ets || key_ets!(registry, key, partitions)
      case lookup_second(:unique, key_ets, key) do
        {pid, _} ->
          if Process.alive?(pid), do: pid, else: :undefined
        _ ->
          :undefined
      end
  end
end

Pattern 11: Shutdown Semantics — Graceful Termination

Source: lib/elixir/lib/supervisor.ex:156-192 (Shutdown values section)

What it does: Three shutdown modes:

  • :brutal_kill — immediate Process.exit(child, :kill), no cleanup
  • integer (ms) — send :shutdown signal, wait N ms, then :kill
  • :infinity — wait forever (default for supervisor children)

Workers default to 5000ms. Supervisors default to :infinity (to give their children time).

Why: Graceful shutdown enables cleanup (closing connections, flushing buffers, deregistering from services). The timeout prevents hung processes from blocking system shutdown indefinitely. The hierarchy matters: supervisors need infinite time because they're waiting for their own children to shut down.

Anti-pattern: Setting :brutal_kill on processes that hold external resources (DB connections, file handles). They'll leak. Also: setting :infinity on worker processes — a bug in terminate/2 will hang your entire shutdown.

Code example from source:

# From supervisor.ex docs:
# :brutal_kill - unconditional and immediate termination
# integer >= 0 - wait that many ms after :shutdown signal
# :infinity - wait forever (recommended for supervisors)

# Worker default: 5000ms
%{shutdown: 5_000, type: :worker}

# Supervisor default: :infinity
%{shutdown: :infinity, type: :supervisor}

Pattern 12: DynamicSupervisor Internal State — Struct with Restart Tracking

Source: lib/elixir/lib/dynamic_supervisor.ex:165-178 (defstruct)

What it does: The DynamicSupervisor uses a struct for its GenServer state with explicit fields: children (map of pid → child spec), restarts (list of timestamps for rate limiting), and configuration fields.

Why: This shows the Elixir team's state design philosophy: use a struct with named fields, not a bare map or tuple. The children field uses a %{} map keyed by PID for O(1) lookup/deletion on child exit. The restarts list uses a simple sliding-window approach for restart intensity.

Anti-pattern: Using a list for children lookup (O(n) on every EXIT message), or using a tuple-based state that requires positional knowledge.

Code example from source:

defstruct [
  :args,
  :extra_arguments,
  :mod,
  :name,
  :strategy,
  :max_children,
  :max_restarts,
  :max_seconds,
  children: %{},
  restarts: []
]

Pattern 13: Restart Logic with Exponential Backoff via :try_again

Source: lib/elixir/lib/dynamic_supervisor.ex:710-758 (restart_child and related functions)

What it does: When a child fails to restart (start function returns error), DynamicSupervisor doesn't give up. It stores the child as {:restarting, child}, sends itself a :"$gen_restart" message, and retries later. This prevents the supervisor from blocking on a transiently failing child.

Why: During network partitions or resource exhaustion, a child might fail to start immediately but succeed seconds later. Instead of counting this as a restart (which would hit intensity limits), the supervisor retries asynchronously. The :try_again path is separate from the restart counter.

Anti-pattern: Treating every start failure as a "restart" — this would exhaust max_restarts quickly during transient failures like port conflicts.

Code example from source:

defp restart_child(:one_for_one, current_pid, child, state) do
  {{m, f, args} = mfa, restart, shutdown, type, modules} = child
  %{extra_arguments: extra} = state

  case start_child(m, f, extra ++ args) do
    {:ok, pid, _} ->
      state = delete_child(current_pid, state)
      {:ok, save_child(pid, mfa, restart, shutdown, type, modules, state)}

    {:ok, pid} ->
      state = delete_child(current_pid, state)
      {:ok, save_child(pid, mfa, restart, shutdown, type, modules, state)}

    :ignore ->
      {:ok, delete_child(current_pid, state)}

    {:error, reason} ->
      report_error(:start_error, reason, {:restarting, current_pid}, child, state)
      state = put_in(state.children[current_pid], {:restarting, child})
      {:try_again, state}
  end
end

Pattern 14: $ancestors and $callers — Process Lineage Tracking

Source: lib/elixir/lib/task.ex:227-268 (Ancestor and Caller Tracking section)

What it does: Elixir uses two process dictionary keys for lineage:

  • $ancestors — the supervision hierarchy (who spawned/supervises this process)
  • $callers — the logical call chain (who requested this work)

These are different! A task's ancestor is its supervisor, but its caller is the process that initiated the async operation.

Why: Debugging and tracing. When a task crashes, the log includes both its supervisor (for restart context) and its caller (for business logic context). This dual tracking is essential for understanding failures in systems where the spawner and supervisor are different processes.

Anti-pattern: Ignoring caller tracking when building custom process spawning. If you build something like Task.Supervisor, propagate $callers so crash logs are meaningful.

Code example from source (task/supervisor.ex):

defp get_callers(owner) do
  case :erlang.get(:"$callers") do
    [_ | _] = list -> [owner | list]
    _ -> [owner]
  end
end

# Task.start_link propagates both owner and callers
def start_link(module, function, args)
    when is_atom(module) and is_atom(function) and is_list(args) do
  mfa = {module, function, args}
  Task.Supervised.start_link(get_owner(self()), get_callers(self()), mfa)
end

Pattern 15: GenServer.reply/2 for Deferred Responses

Source: lib/elixir/lib/gen_server.ex:620-640 (callback docs), lib/elixir/lib/gen_server.ex:1328-1346 (reply/2 function)

What it does: A handle_call can return {:noreply, state} without replying, then later call GenServer.reply(from, response) from any process. This decouples request receipt from response delivery.

Why: Three use cases (from the source):

  1. Reply before returning (response known, but need to do cleanup after)
  2. Reply after returning (response not yet available, computed asynchronously)
  3. Reply from another process (delegate work to a task)

This enables non-blocking request handling in GenServers that would otherwise be bottlenecked.

Anti-pattern: Spawning a task to do work and then having the GenServer block on Task.await inside handle_call. This defeats the purpose — use reply/2 from the task instead.

Code example from source:

def handle_call(:reply_in_one_second, from, state) do
  Process.send_after(self(), {:reply, from}, 1_000)
  {:noreply, state}
end

def handle_info({:reply, from}, state) do
  GenServer.reply(from, :one_second_has_passed)
  {:noreply, state}
end

Pattern 16: Process.alias for Safe Request/Response

Source: lib/elixir/lib/process.ex:32-95 (Aliases section)

What it does: Process aliases (Erlang/OTP 24+) provide a deactivatable reference for receiving replies. After sending a request with an alias as the reply address, you can deactivate the alias if you no longer want the response — any messages sent to a deactivated alias are silently dropped.

Why: Solves the "late reply" problem. In request/response patterns, if the requester times out and moves on, a late reply to its PID could confuse future receive blocks. With aliases, you deactivate after timeout and the late reply harmlessly vanishes.

Anti-pattern: Using bare PIDs for reply addresses in protocols where timeouts are possible. Late messages pollute the mailbox.

Code example from source:

server = spawn(&server/0)

source_alias = Process.alias()
send(server, {:ping, source_alias})

receive do
  :pong -> :pong
end

# Deactivate — late replies to this alias are silently dropped
Process.unalias(source_alias)

Pattern 17: Registry Partitioning Strategies

Source: lib/elixir/lib/registry.ex:310-350 (start_link partitioning docs)

What it does: Duplicate registries support two partitioning strategies:

  • {:duplicate, :pid} (default) — groups entries by the registering process's PID. Good for few keys with many entries (e.g., one PubSub topic with many subscribers).
  • {:duplicate, :key} — groups entries by key. Good for many keys with few entries each (e.g., many topics with few subscribers).

Why: The partitioning strategy determines which partition(s) need to be scanned during lookup. With :key partitioning, a key lookup hits exactly one partition (O(1) partitions). With :pid partitioning, key lookups must scan all partitions but process-based operations (unregister on death) are localized.

Anti-pattern: Using default :pid partitioning with millions of unique keys and frequent lookups. Each lookup scans all partitions. Switch to {:duplicate, :key}.

Code example from source:

# Many topics, few subscribers each — use key partitioning
Registry.start_link(
  keys: {:duplicate, :key},
  name: MyApp.TopicRegistry,
  partitions: System.schedulers_online()
)

# Few topics, many subscribers — use pid partitioning (default)
Registry.start_link(
  keys: :duplicate,
  name: MyApp.BroadcastRegistry,
  partitions: System.schedulers_online()
)

Pattern 18: init/1 Return Values — The Full Spectrum

Source: lib/elixir/lib/gen_server.ex:498-545 (init callback spec)

What it does: init/1 supports five return values:

  • {:ok, state} — normal start
  • {:ok, state, timeout} — start with idle timeout
  • {:ok, state, :hibernate} — start and immediately hibernate (GC + compact heap)
  • {:ok, state, {:continue, arg}} — start then immediately invoke handle_continue
  • :ignore — don't start, supervisor treats as successful (child can be restarted later)
  • {:stop, reason} — initialization failed

Why: Each covers a real scenario:

  • :ignore — process is disabled by configuration but might be enabled later via Supervisor.restart_child/2
  • {:stop, reason} — unrecoverable initialization failure
  • :hibernate — process will be idle for a long time, minimize memory
  • {:continue, _} — split fast init from slow setup

Anti-pattern: Using {:stop, reason} when :ignore is appropriate. If a feature is disabled by config, :ignore keeps the child spec in the supervisor for later activation. {:stop, reason} signals a real failure.