Verified all 17 file:line citations against elixir-lang/elixir HEAD. Fixed 10 citations where line numbers had shifted due to upstream changes: - patterns/genserver.md: agent.ex:246 → 279 (start_link spec) - patterns/process-design.md: task.ex:282 → 327 (child_spec) - smells/anti-patterns.md: registry_test.exs:28 → 29, gen_server_test.exs:166 → 164, test_helper.exs:98 → 99 - smells/common-mistakes.md: registry_test.exs:28 → 29, callbacks.ex:423 → 433, task_test.exs:297,305,315,330 → 300,308,316,327, supervisor_test.exs:278 → 289, callbacks.ex:277 → 520
71 KiB
Process Design Patterns — From the Elixir Source
Analysis of lib/elixir/lib/supervisor.ex, lib/elixir/lib/dynamic_supervisor.ex, lib/elixir/lib/task.ex, lib/elixir/lib/task/supervisor.ex, lib/elixir/lib/process.ex, and lib/elixir/lib/registry.ex.
Contents
- Pattern 1: Static vs Dynamic Supervision — Choose the Right Tool
- Pattern 2: PartitionSupervisor for Scalability
- Pattern 3: Supervision Strategies — Choosing the Right Restart Behavior
- Pattern 4: Restart Intensity (
max_restarts/max_seconds) - Pattern 5: Restart Values —
:permanentvs:transientvs:temporary - Pattern 6: Automatic Shutdown for Pipeline Supervisors
- Pattern 7: Task.async/await for Concurrent Value Computation
- Pattern 8: Task.Supervisor.async_nolink for Fault-Tolerant Task Execution
- Pattern 9: Task Supervisor as DynamicSupervisor Specialization
- Pattern 10: Registry for Dynamic Process Naming and PubSub
- Pattern 11: Shutdown Semantics — Graceful Termination
- Pattern 12: DynamicSupervisor Internal State — Struct with Restart Tracking
- Pattern 13: Restart Logic with Exponential Backoff via
:try_again - Pattern 14:
$ancestorsand$callers— Process Lineage Tracking - Pattern 15: GenServer.reply/2 for Deferred Responses
- Pattern 16: Process.alias for Safe Request/Response
- Pattern 17: Registry Partitioning Strategies
- Pattern 18:
init/1Return Values — The Full Spectrum
Pattern 1: Static vs Dynamic Supervision — Choose the Right Tool
Source: lib/elixir/lib/supervisor.ex#L1 vs lib/elixir/lib/dynamic_supervisor.ex#L1
What it does: Elixir provides two distinct supervisor types:
Supervisor— for static children known at compile time, started in a defined orderDynamicSupervisor— for children started on demand at runtime, with no ordering guarantees
Why: Static supervisors guarantee startup order (critical for dependencies like "DB pool must start before web server"). Dynamic supervisors optimize for scale — they can hold millions of children using efficient data structures and shut down concurrently.
Anti-pattern: Using a Supervisor when children are created dynamically (e.g., one process per WebSocket connection). You'll hit performance issues and ordering semantics you don't need. Conversely, using DynamicSupervisor for fixed infrastructure (DB pool, PubSub) loses startup order guarantees.
Code example from source (dynamic_supervisor.ex:1-15):
# DynamicSupervisor docs explain the distinction:
# "The Supervisor module was designed to handle mostly static children
# that are started in the given order when the supervisor starts. A
# DynamicSupervisor starts with no children. Instead, children are
# started on demand via start_child/2 and there is no ordering between
# children."
When to Use
Triggers:
- You know at startup exactly which children need to run (DB pool, PubSub, caches)
- Children have ordering dependencies (pool must start before consumers)
- You're building application-level infrastructure in your supervision tree
Example — before:
# Using DynamicSupervisor for fixed infrastructure — wrong tool
defmodule MyApp.Application do
def start(_type, _args) do
children = [{DynamicSupervisor, name: MyApp.InfraSupervisor}]
Supervisor.start_link(children, strategy: :one_for_one)
end
end
# Manually starting fixed children after supervisor boots
DynamicSupervisor.start_child(MyApp.InfraSupervisor, MyApp.Repo)
DynamicSupervisor.start_child(MyApp.InfraSupervisor, MyApp.PubSub)
DynamicSupervisor.start_child(MyApp.InfraSupervisor, MyApp.Endpoint)
# No startup ordering guarantee!
Example — after:
defmodule MyApp.Application do
def start(_type, _args) do
children = [
MyApp.Repo, # DB pool starts first
MyApp.PubSub, # PubSub starts after DB is ready
MyApp.Endpoint # Web server starts last
]
Supervisor.start_link(children, strategy: :rest_for_one)
end
end
When NOT to Use
Don't use this when:
- Children are created on-demand (per-connection, per-request, per-user)
- The number of children is unbounded or varies significantly at runtime
- You don't need ordering guarantees between children
Over-application example:
# Static supervisor for per-WebSocket connections — will be clunky
defmodule MyApp.ConnectionSupervisor do
use Supervisor
def init(_) do
# Can't define children at compile time — they arrive at runtime!
Supervisor.init([], strategy: :one_for_one)
end
# Awkward: using Supervisor for dynamic children
def add_connection(socket) do
spec = {ConnectionHandler, socket}
Supervisor.start_child(__MODULE__, spec)
end
end
Better alternative:
defmodule MyApp.ConnectionSupervisor do
use DynamicSupervisor
def start_link(_), do: DynamicSupervisor.start_link(__MODULE__, [], name: __MODULE__)
def init(_), do: DynamicSupervisor.init(strategy: :one_for_one)
def add_connection(socket) do
DynamicSupervisor.start_child(__MODULE__, {ConnectionHandler, socket})
end
end
Why: Static supervisors excel at ordered, fixed infrastructure. DynamicSupervisor excels at scale with runtime-determined children. Pick based on whether children are known at compile time.
Pattern 2: PartitionSupervisor for Scalability
Source: lib/elixir/lib/dynamic_supervisor.ex#L60 and lib/elixir/lib/task/supervisor.ex#L35
What it does: Both DynamicSupervisor and Task.Supervisor document the same scalability pattern: when a single supervisor becomes a bottleneck, wrap it in a PartitionSupervisor which starts N instances (one per core by default) and routes via a key.
Why: A supervisor is a single process. Under heavy start_child load, it serializes all spawn operations. PartitionSupervisor distributes the load across multiple supervisor processes, using self() as the routing key to ensure each caller consistently hits the same partition.
Anti-pattern: Creating your own load-balancing logic for supervisors, or just accepting the bottleneck. The standard library provides this pattern explicitly.
Code example from source (dynamic_supervisor.ex):
# Instead of a single DynamicSupervisor:
children = [
{PartitionSupervisor,
child_spec: DynamicSupervisor,
name: MyApp.DynamicSupervisors}
]
# Start children through the partition supervisor:
DynamicSupervisor.start_child(
{:via, PartitionSupervisor, {MyApp.DynamicSupervisors, self()}},
{Counter, 0}
)
When to Use
Triggers:
- A single DynamicSupervisor or Task.Supervisor is a bottleneck under high
start_childload - You're seeing latency spikes when spawning tasks/children under concurrency
- Profiling shows the supervisor process has a large message queue
Example — before:
# Single supervisor — serializes all spawn operations
defmodule MyApp.TaskRunner do
def run_async(fun) do
Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fun)
end
end
# Under 10k concurrent requests, this single process becomes a bottleneck
Example — after:
# Partitioned — distributes load across N supervisor processes
defmodule MyApp.Application do
def start(_type, _args) do
children = [
{PartitionSupervisor,
child_spec: Task.Supervisor,
name: MyApp.TaskSupervisors}
]
Supervisor.start_link(children, strategy: :one_for_one)
end
end
defmodule MyApp.TaskRunner do
def run_async(fun) do
Task.Supervisor.async_nolink(
{:via, PartitionSupervisor, {MyApp.TaskSupervisors, self()}},
fun
)
end
end
When NOT to Use
Don't use this when:
- You have low spawn rates (< 1000/sec) — a single supervisor is fine
- You need ordering guarantees between children (partitioning breaks ordering)
- The supervisor has few children total (partitioning adds overhead for no gain)
Over-application example:
# Partitioning a supervisor that starts 5 children at boot — pointless
{PartitionSupervisor,
child_spec: DynamicSupervisor,
name: MyApp.ConfigSupervisors,
partitions: System.schedulers_online()}
# 16 partitions for 5 children = massive overhead, zero benefit
Better alternative:
# Just use a plain DynamicSupervisor
{DynamicSupervisor, name: MyApp.ConfigSupervisor}
Why: PartitionSupervisor exists for high-throughput spawn scenarios. If you're not hitting supervisor mailbox limits, the extra processes and routing logic add complexity without benefit.
Pattern 3: Supervision Strategies — Choosing the Right Restart Behavior
Source: lib/elixir/lib/supervisor.ex#L315 (Strategies section)
What it does: Three strategies model three dependency patterns:
:one_for_one— independent children (crash of A doesn't affect B):one_for_all— tightly coupled children (if one fails, all state is inconsistent):rest_for_one— sequential dependencies (children started after the crashed one depend on it)
Why: These map directly to runtime dependency graphs. A connection pool and its consumers are :rest_for_one — consumers can't work without the pool. Multiple independent request handlers are :one_for_one. Workers sharing a cache are :one_for_all — stale cache state after a crash could cause inconsistency.
Anti-pattern: Defaulting to :one_for_one everywhere without thinking about dependencies. If process B depends on process A's state and A crashes, B will be working with stale assumptions.
Code example from source (supervisor.ex docs):
# Independent workers — one crash doesn't affect others
Supervisor.start_link(children, strategy: :one_for_one)
# Tightly coupled — all must restart together for consistency
Supervisor.start_link(children, strategy: :one_for_all)
# Sequential dependency — later children depend on earlier ones
Supervisor.start_link(children, strategy: :rest_for_one)
When to Use
Triggers:
- You're deciding how a supervisor should react when one child fails
- Children share state or resources that become inconsistent if one crashes
- You have a pipeline: A feeds B feeds C
Example — before:
# Defaulting to :one_for_one without thinking about dependencies
defmodule MyApp.DataPipeline do
use Supervisor
def init(_) do
children = [
MyApp.DataSource, # Produces data
MyApp.Transformer, # Transforms data (holds reference to DataSource)
MyApp.Sink # Writes transformed data
]
# If DataSource crashes, Transformer has a stale reference!
Supervisor.init(children, strategy: :one_for_one)
end
end
Example — after:
defmodule MyApp.DataPipeline do
use Supervisor
def init(_) do
children = [
MyApp.DataSource, # If this crashes...
MyApp.Transformer, # ...these must restart too (stale refs)
MyApp.Sink
]
# rest_for_one: crash of DataSource restarts Transformer and Sink
Supervisor.init(children, strategy: :rest_for_one)
end
end
When NOT to Use
Don't use this when:
- Children are truly independent (HTTP request handlers, job workers)
- You're using
:one_for_allbecause you're unsure — analyze dependencies first - The restart strategy masks a design problem (maybe use separate supervision subtrees)
Over-application example:
# one_for_all when children are actually independent
defmodule MyApp.Workers do
use Supervisor
def init(_) do
children = [
{MyApp.EmailWorker, []},
{MyApp.SMSWorker, []},
{MyApp.PushWorker, []}
]
# If email crashes, why restart SMS and Push? They're independent!
Supervisor.init(children, strategy: :one_for_all)
end
end
Better alternative:
defmodule MyApp.Workers do
use Supervisor
def init(_) do
children = [
{MyApp.EmailWorker, []},
{MyApp.SMSWorker, []},
{MyApp.PushWorker, []}
]
# Independent workers — one crash doesn't affect others
Supervisor.init(children, strategy: :one_for_one)
end
end
Why: Restart strategies model dependency graphs. Using :one_for_all for independent workers causes unnecessary restarts, losing in-progress work for no benefit.
Pattern 4: Restart Intensity (max_restarts / max_seconds)
Source: lib/elixir/lib/supervisor.ex#L309, lib/elixir/lib/dynamic_supervisor.ex#L730 (implementation)
What it does: Supervisors track restart frequency. If a child exceeds max_restarts within max_seconds, the supervisor itself shuts down (escalating the failure to its parent). Defaults: 3 restarts in 5 seconds.
Why: Prevents infinite restart loops that waste CPU and mask bugs. If a child keeps crashing within seconds, it's a systemic problem that the current supervisor level can't fix. Escalating to the parent allows a higher-level strategy to respond (perhaps restarting the entire subsystem with fresh state).
Anti-pattern: Setting max_restarts extremely high to "prevent crashes." This hides bugs and wastes resources. Let supervisors escalate — that's the point of the hierarchy.
Code example from source (dynamic_supervisor.ex internal logic):
defp add_restart(state) do
%{max_seconds: max_seconds, max_restarts: max_restarts, restarts: restarts} = state
now = :erlang.monotonic_time(1)
restarts = add_restart([now | restarts], now, max_seconds)
state = %{state | restarts: restarts}
if length(restarts) <= max_restarts do
{:ok, state}
else
{:shutdown, state}
end
end
defp add_restart(restarts, now, period) do
for then <- restarts, now <= then + period, do: then
end
When to Use
Triggers:
- You want to prevent infinite restart loops from burning CPU
- You're tuning a supervisor for a child that occasionally crashes under load
- You need the supervisor to escalate when a systemic problem prevents recovery
Example — before:
# Default: 3 restarts in 5 seconds — might be too aggressive for flaky networks
defmodule MyApp.ExternalAPISupervisor do
use Supervisor
def init(_) do
children = [{MyApp.APIClient, []}]
# Default max_restarts: 3, max_seconds: 5
# Network blip causes 3 crashes in 2 seconds → supervisor dies → app crashes
Supervisor.init(children, strategy: :one_for_one)
end
end
Example — after:
defmodule MyApp.ExternalAPISupervisor do
use Supervisor
def init(_) do
children = [{MyApp.APIClient, []}]
# Allow more restarts for transient network issues
Supervisor.init(children,
strategy: :one_for_one,
max_restarts: 10,
max_seconds: 60
)
end
end
When NOT to Use
Don't use this when:
- You're setting max_restarts very high to "prevent crashes" — you're hiding bugs
- The child crash is deterministic (same input = same crash) — fix the bug instead
- You're relying on restart intensity as a backoff mechanism (use explicit backoff)
Over-application example:
# Setting absurdly high restart limits to "never crash"
Supervisor.init(children,
strategy: :one_for_one,
max_restarts: 1000,
max_seconds: 1
)
# This allows 1000 crashes per second — you'll burn CPU and hide bugs
Better alternative:
# Reasonable limits + fix the underlying crash
Supervisor.init(children,
strategy: :one_for_one,
max_restarts: 5,
max_seconds: 30
)
# If 5 crashes in 30 seconds isn't enough, the problem is the child, not the limit
Why: Restart intensity is a circuit breaker, not a throttle. It should escalate systemic failures, not suppress them. If you need aggressive restarts, your child has a bug.
Pattern 5: Restart Values — :permanent vs :transient vs :temporary
Source: lib/elixir/lib/supervisor.ex#L130 (Restart values section)
What it does: Three restart policies control what happens when a child terminates:
:permanent— always restart (default for GenServer/Agent/Supervisor):transient— restart only on abnormal exit (not:normal,:shutdown,{:shutdown, term}):temporary— never restart (default for Task)
Why: Different processes have different lifecycle expectations. A database pool should always be running (:permanent). A task that computes a value and exits is done when it's done (:temporary). A connection process should restart on crash but not on graceful disconnect (:transient).
Anti-pattern: Making everything :permanent. If a one-shot task keeps restarting, it'll trigger restart intensity limits and take down the supervisor.
Code example from source:
# Task defaults to :temporary — intentional one-shot work
# (from task.ex:327)
def child_spec(arg) do
%{
id: Task,
start: {Task, :start_link, [arg]},
restart: :temporary
}
end
# Customize via use:
use GenServer, restart: :transient
When to Use
Triggers:
- You have different process types with different lifecycle expectations
- One-shot tasks keep restarting and wasting resources
- A connection should gracefully disconnect without triggering restart
Example — before:
# Everything is :permanent (default) — tasks restart forever
defmodule MyApp.BatchProcessor do
use GenServer
def handle_cast({:process, batch}, state) do
Task.start_link(fn ->
process_batch(batch)
# Task exits :normal... and gets restarted by supervisor!
end)
{:noreply, state}
end
end
Example — after:
defmodule MyApp.BatchTask do
use Task, restart: :temporary # Don't restart completed tasks
def start_link(batch) do
Task.start_link(__MODULE__, :run, [batch])
end
def run(batch), do: process_batch(batch)
end
defmodule MyApp.ConnectionWorker do
use GenServer, restart: :transient # Restart on crash, not graceful disconnect
def disconnect(pid) do
GenServer.stop(pid, :normal) # Won't trigger restart
end
end
When NOT to Use
Don't use this when:
- You're using
:temporaryto avoid fixing a crash (the child just stays dead) - You set everything to
:transientwithout thinking —:permanentis usually right for services
Over-application example:
# Making a critical service :temporary so it "doesn't bother the supervisor"
defmodule MyApp.PaymentProcessor do
use GenServer, restart: :temporary
# If this crashes, it stays dead! Payments stop working silently!
end
Better alternative:
defmodule MyApp.PaymentProcessor do
use GenServer, restart: :permanent
# Critical services should always restart — that's the whole point
end
Why: :permanent is the safe default for anything that should "always be running." Only use :transient for processes that have a valid "done" state, and :temporary for truly one-shot work.
Pattern 6: Automatic Shutdown for Pipeline Supervisors
Source: lib/elixir/lib/supervisor.ex#L349 (Automatic shutdown section)
What it does: Supervisors support :auto_shutdown which terminates the supervisor when significant children exit. Options: :any_significant (first significant child exits → shutdown) or :all_significant (all significant children must exit → shutdown).
Why: Models pipeline/workflow patterns where a supervisor's purpose is tied to its children's work. If all significant workers finish, the supervisor should clean up. This is useful for batch processing supervisors or connection-scoped process groups.
Anti-pattern: Manually monitoring children and calling Supervisor.stop/1. The automatic shutdown mechanism handles this cleanly within OTP semantics.
Code example (from docs):
# Only :transient and :temporary children can be marked significant
children = [
Supervisor.child_spec({BatchWorker, args}, significant: true, restart: :transient)
]
Supervisor.start_link(children,
strategy: :one_for_one,
auto_shutdown: :all_significant
)
When to Use
Triggers:
- You have a supervisor managing a batch/pipeline where completion means "job done"
- A supervisor's existence only makes sense while its children are doing work
- You're building a workflow that should self-terminate
Example — before:
# Manual cleanup when batch workers finish
defmodule MyApp.BatchSupervisor do
use DynamicSupervisor
def all_done?(supervisor) do
# Polling... ugly
DynamicSupervisor.count_children(supervisor).active == 0
end
end
# Somewhere else, a monitor process watches and cleans up
defmodule MyApp.BatchMonitor do
use GenServer
def handle_info(:check, state) do
if MyApp.BatchSupervisor.all_done?(state.sup) do
Supervisor.stop(state.sup)
end
{:noreply, state}
end
end
Example — after:
defmodule MyApp.BatchSupervisor do
use Supervisor
def start_link(tasks) do
Supervisor.start_link(__MODULE__, tasks)
end
def init(tasks) do
children =
Enum.map(tasks, fn task ->
Supervisor.child_spec({MyApp.BatchWorker, task},
id: task.id, restart: :transient, significant: true)
end)
Supervisor.init(children,
strategy: :one_for_one,
auto_shutdown: :all_significant
)
end
end
# When all workers complete normally, supervisor shuts down automatically
When NOT to Use
Don't use this when:
- Children are long-lived services that should never "complete"
- You want the supervisor to keep running even after children exit (for later restarts)
- Children have
:permanentrestart — they can't be:significant
Over-application example:
# auto_shutdown on infrastructure supervisor — it'll die when any child exits!
children = [
Supervisor.child_spec(MyApp.Cache, significant: true, restart: :transient),
MyApp.WebServer
]
Supervisor.init(children,
strategy: :one_for_one,
auto_shutdown: :any_significant
)
# If Cache restarts as :transient and exits :normal once, the WHOLE supervisor dies
Better alternative:
# Infrastructure supervisors should NOT auto-shutdown
Supervisor.init(children, strategy: :one_for_one)
# Only use auto_shutdown for workflow/batch supervisors with finite lifetimes
Why: auto_shutdown models "this supervisor's job is done when its children finish." It's for finite work, not long-lived services.
Pattern 7: Task.async/await for Concurrent Value Computation
Source: lib/elixir/lib/task.ex#L1 and lib/elixir/lib/task.ex#L300
What it does: Task.async spawns a linked, monitored process and returns a %Task{} struct. Task.await blocks until the result arrives or times out. This is the canonical pattern for "compute a value concurrently."
Why: Tasks provide structured concurrency — the caller is linked to the task, so failures propagate naturally. The monitor reference enables safe await with timeout. This is explicit and composable unlike raw spawn_link + receive.
Anti-pattern: Using spawn_link + send + receive for one-shot concurrent computation. You lose: error propagation, structured monitoring, timeout handling, and the caller-tracking metadata that tasks provide.
Code example from source:
task = Task.async(fn -> do_some_work() end)
res = do_some_other_work()
res + Task.await(task)
Key constraint from docs: "If you start an async, you must await. This is either done by calling Task.await/2 or Task.yield/2 followed by Task.shutdown/2."
When to Use
Triggers:
- You need to compute a value concurrently and use it in the current flow
- Multiple independent computations can run in parallel to reduce latency
- The current process should crash if the computation fails (linked failure)
Example — before:
# Sequential — total time = sum of all operations
def build_dashboard(user_id) do
profile = fetch_profile(user_id) # 200ms
orders = fetch_recent_orders(user_id) # 300ms
recommendations = compute_recs(user_id) # 500ms
# Total: 1000ms
%{profile: profile, orders: orders, recommendations: recommendations}
end
Example — after:
# Concurrent — total time = max of all operations
def build_dashboard(user_id) do
profile_task = Task.async(fn -> fetch_profile(user_id) end)
orders_task = Task.async(fn -> fetch_recent_orders(user_id) end)
recs_task = Task.async(fn -> compute_recs(user_id) end)
%{
profile: Task.await(profile_task),
orders: Task.await(orders_task),
recommendations: Task.await(recs_task)
}
# Total: ~500ms (limited by slowest task)
end
When NOT to Use
Don't use this when:
- The task might fail and you don't want the caller to crash (use
async_nolink) - You're inside a GenServer and can't block on
await(useasync_nolink+handle_info) - The computation is trivial (< 1ms) — spawning a process adds overhead
Over-application example:
# Spawning a task for trivial work — overhead exceeds benefit
def format_name(user) do
task = Task.async(fn -> String.upcase(user.name) end)
Task.await(task)
# Process spawn + message passing for a microsecond operation!
end
Better alternative:
def format_name(user) do
String.upcase(user.name)
end
Why: Tasks add process spawn overhead (~2-5μs) plus message passing. Only use them when the work is expensive enough to justify parallelism — typically >1ms or when running multiple operations concurrently.
Pattern 8: Task.Supervisor.async_nolink for Fault-Tolerant Task Execution
Source: lib/elixir/lib/task/supervisor.ex#L240 (async_nolink docs with GenServer example)
What it does: Unlike Task.async, async_nolink spawns a task that is NOT linked to the caller. The caller monitors it and handles success/failure via handle_info. This prevents a task crash from killing the caller.
Why: In a GenServer, you often want to spawn work that might fail without taking down the server. The pattern: spawn with async_nolink, receive the result as {ref, answer}, and handle failure as {:DOWN, ref, :process, _pid, reason}.
Anti-pattern: Using Task.async inside a GenServer when the task might fail. The link means the GenServer crashes too. Use async_nolink + handle_info for resilient concurrent work.
Code example from source (task/supervisor.ex):
defmodule MyApp.Server do
use GenServer
def handle_call(:start_task, _from, %{ref: nil} = state) do
task =
Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn ->
# potentially failing work
end)
{:reply, :ok, %{state | ref: task.ref}}
end
# Task completed successfully
def handle_info({ref, answer}, %{ref: ref} = state) do
Process.demonitor(ref, [:flush])
{:noreply, %{state | ref: nil}}
end
# Task failed
def handle_info({:DOWN, ref, :process, _pid, _reason}, %{ref: ref} = state) do
{:noreply, %{state | ref: nil}}
end
end
When to Use
Triggers:
- A GenServer needs to spawn work that might fail without crashing the server
- You're building a "request/response with timeout" pattern inside a GenServer
- External calls (HTTP, DB) from a GenServer should be non-blocking and resilient
Example — before:
defmodule MyApp.Enricher do
use GenServer
@impl true
def handle_call({:enrich, data}, _from, state) do
# If this HTTP call crashes, the entire GenServer dies!
result = Task.async(fn -> HTTPClient.post!("/api/enrich", data) end)
enriched = Task.await(result, 5_000)
{:reply, {:ok, enriched}, state}
end
end
Example — after:
defmodule MyApp.Enricher do
use GenServer
@impl true
def handle_call({:enrich, data}, from, state) do
task = Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn ->
HTTPClient.post!("/api/enrich", data)
end)
{:noreply, Map.put(state, task.ref, from)}
end
@impl true
def handle_info({ref, result}, state) do
Process.demonitor(ref, [:flush])
{from, state} = Map.pop(state, ref)
GenServer.reply(from, {:ok, result})
{:noreply, state}
end
@impl true
def handle_info({:DOWN, ref, :process, _pid, reason}, state) do
{from, state} = Map.pop(state, ref)
GenServer.reply(from, {:error, reason})
{:noreply, state}
end
end
When NOT to Use
Don't use this when:
- Task failure should crash the caller (you WANT linked failure propagation)
- You're not inside a GenServer and can handle the crash in a try/rescue
- The work is fast and synchronous is acceptable
Over-application example:
# Using async_nolink for work that SHOULD crash the caller on failure
defmodule MyApp.CriticalPayment do
use GenServer
def handle_call({:charge, card}, _from, state) do
task = Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn ->
PaymentGateway.charge!(card)
end)
# Now you have to manually handle the failure...
# But if payment fails, maybe this GenServer SHOULD crash
# to trigger a supervisor restart with clean state
end
end
Better alternative:
# If failure should crash the GenServer, use Task.async (linked)
def handle_call({:charge, card}, _from, state) do
result = Task.async(fn -> PaymentGateway.charge!(card) end)
{:reply, Task.await(result), state}
end
Why: async_nolink is for resilient, non-critical work. If the task's failure means your GenServer's state is invalid, you want the link — let it crash and restart clean.
Pattern 9: Task Supervisor as DynamicSupervisor Specialization
Source: lib/elixir/lib/task/supervisor.ex#L151 (start_link implementation)
What it does: Task.Supervisor is implemented directly on top of DynamicSupervisor. It stores default restart/shutdown settings in the process dictionary and delegates init to DynamicSupervisor.init.
Why: Specialization without duplication. Task.Supervisor adds task-specific behavior (caller tracking, async/nolink patterns, stream support) on top of the generic dynamic supervision infrastructure. It's a compositional pattern — build specialized supervisors by wrapping the generic one.
Anti-pattern: Re-implementing task supervision from scratch with a plain DynamicSupervisor + custom start logic. Use Task.Supervisor — it handles caller tracking, owner propagation, and proper shutdown.
Code example from source:
# Task.Supervisor.start_link delegates to DynamicSupervisor
def start_link(options \\ []) do
{restart, options} = Keyword.pop(options, :restart)
{shutdown, options} = Keyword.pop(options, :shutdown)
keys = [:max_children, :max_seconds, :max_restarts]
{sup_opts, start_opts} = Keyword.split(options, keys)
restart_and_shutdown = {restart || :temporary, shutdown || 5000}
DynamicSupervisor.start_link(__MODULE__, {restart_and_shutdown, sup_opts}, start_opts)
end
def init({{_restart, _shutdown} = arg, options}) do
Process.put(__MODULE__, arg)
DynamicSupervisor.init([strategy: :one_for_one] ++ options)
end
When to Use
Triggers:
- You're building task infrastructure that needs proper shutdown and caller tracking
- You want
async_nolink+ streaming + concurrency limiting for tasks - You need tasks to be supervised (restarted, tracked, shut down gracefully)
Example — before:
# Rolling your own task management on DynamicSupervisor
defmodule MyApp.Workers do
def start_task(fun) do
spec = %{id: make_ref(), start: {Task, :start_link, [fun]}, restart: :temporary}
DynamicSupervisor.start_child(MyApp.WorkerSup, spec)
end
# No caller tracking, no async_nolink, no stream support
# Must manually build all of that
end
Example — after:
# Task.Supervisor gives you all of this for free
defmodule MyApp.Application do
def start(_type, _args) do
children = [
{Task.Supervisor, name: MyApp.TaskSupervisor}
]
Supervisor.start_link(children, strategy: :one_for_one)
end
end
# Now you get: async, async_nolink, async_stream, start_child, etc.
Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn -> work() end)
When NOT to Use
Don't use this when:
- Tasks are fire-and-forget and you don't need supervision (just
Task.start/1) - You need custom child specs with complex init logic (use DynamicSupervisor directly)
- You're spawning non-Task children (GenServers, Agents)
Over-application example:
# Using Task.Supervisor to start GenServers — wrong tool
Task.Supervisor.start_child(MyApp.TaskSupervisor, fn ->
# This spawns a Task that starts a GenServer... awkward
{:ok, _} = MyApp.Worker.start_link(args)
Process.sleep(:infinity) # Keep the task alive??
end)
Better alternative:
# Use DynamicSupervisor for non-Task children
DynamicSupervisor.start_child(MyApp.WorkerSupervisor, {MyApp.Worker, args})
Why: Task.Supervisor is purpose-built for Task processes. It adds caller tracking, $callers propagation, and task-specific APIs. For anything that isn't a Task, use DynamicSupervisor.
Pattern 10: Registry for Dynamic Process Naming and PubSub
Source: lib/elixir/lib/registry.ex#L1 (module docs), lib/elixir/lib/registry.ex#L250 (whereis_name via callbacks)
What it does: Registry provides two modes:
:uniquekeys — each key maps to exactly one process (name registry, process lookup):duplicatekeys — each key maps to many processes (PubSub topics, event dispatch)
Processes are automatically unregistered on death. Registry integrates with GenServer naming via {:via, Registry, {registry, key}}.
Why: Solves the dynamic naming problem without atom leaks. Also provides local PubSub without external dependencies. The registry is partitioned for scalability and uses ETS for O(1) lookups.
Anti-pattern: Building custom ETS-based process registries with manual cleanup on process death. Registry handles monitor-based cleanup automatically.
Code example from source (registry.ex :via callbacks):
# :via integration — GenServer uses these callbacks
def whereis_name({registry, key}), do: whereis_name(registry, key)
def whereis_name({registry, key, _value}), do: whereis_name(registry, key)
defp whereis_name(registry, key) do
case key_info!(registry) do
{:unique, partitions, key_ets} ->
key_ets = key_ets || key_ets!(registry, key, partitions)
case lookup_second(:unique, key_ets, key) do
{pid, _} ->
if Process.alive?(pid), do: pid, else: :undefined
_ ->
:undefined
end
end
end
When to Use
Triggers:
- You need to look up processes by a dynamic key without atom leaks
- You want local PubSub (subscribe/dispatch to topics) without external deps
- You're building per-entity process pools (per-user, per-room, per-device)
Example — before:
# Custom ETS-based registry with manual cleanup
defmodule MyApp.ProcessRegistry do
def register(key, pid) do
ref = Process.monitor(pid)
:ets.insert(:registry, {key, pid, ref})
end
def lookup(key) do
case :ets.lookup(:registry, key) do
[{^key, pid, _ref}] -> {:ok, pid}
[] -> :error
end
end
# Must handle :DOWN manually to clean up dead entries
def handle_info({:DOWN, ref, :process, pid, _reason}, state) do
:ets.match_delete(:registry, {:_, pid, ref})
{:noreply, state}
end
end
Example — after:
# Registry handles all of this automatically
# In supervision tree:
{Registry, keys: :unique, name: MyApp.GameRegistry}
# Registration happens via :via tuple — automatic cleanup on death
defmodule MyApp.GameSession do
use GenServer
def start_link(game_id) do
GenServer.start_link(__MODULE__, game_id,
name: {:via, Registry, {MyApp.GameRegistry, game_id}})
end
end
When NOT to Use
Don't use this when:
- You need distributed/cluster-wide process registration (use Horde, :global, or pg)
- Process lookup is the hot path and you need sub-microsecond latency (direct PID passing)
- You have a fixed set of processes known at compile time (atom names are simpler)
Over-application example:
# Using Registry when you could just pass the PID directly
defmodule MyApp.Pipeline do
def process(data) do
# Register a process just to look it up one line later...
{:ok, pid} = MyApp.Worker.start_link(data)
# Why not just use `pid` directly?
worker = Registry.lookup(MyApp.Registry, data.id) |> List.first()
GenServer.call(elem(worker, 0), :process)
end
end
Better alternative:
defmodule MyApp.Pipeline do
def process(data) do
{:ok, pid} = MyApp.Worker.start_link(data)
GenServer.call(pid, :process)
end
end
Why: Registry shines when the looker-upper doesn't know the PID (arrived in a different request, different process tree). If you already have the PID, just use it directly.
Pattern 11: Shutdown Semantics — Graceful Termination
Source: lib/elixir/lib/supervisor.ex#L156 (Shutdown values section)
What it does: Three shutdown modes:
:brutal_kill— immediateProcess.exit(child, :kill), no cleanup- integer (ms) — send
:shutdownsignal, wait N ms, then:kill :infinity— wait forever (default for supervisor children)
Workers default to 5000ms. Supervisors default to :infinity (to give their children time).
Why: Graceful shutdown enables cleanup (closing connections, flushing buffers, deregistering from services). The timeout prevents hung processes from blocking system shutdown indefinitely. The hierarchy matters: supervisors need infinite time because they're waiting for their own children to shut down.
Anti-pattern: Setting :brutal_kill on processes that hold external resources (DB connections, file handles). They'll leak. Also: setting :infinity on worker processes — a bug in terminate/2 will hang your entire shutdown.
Code example from source:
# From supervisor.ex docs:
# :brutal_kill - unconditional and immediate termination
# integer >= 0 - wait that many ms after :shutdown signal
# :infinity - wait forever (recommended for supervisors)
# Worker default: 5000ms
%{shutdown: 5_000, type: :worker}
# Supervisor default: :infinity
%{shutdown: :infinity, type: :supervisor}
When to Use
Triggers:
- You're deploying and need processes to flush buffers, close connections, or deregister
- Child processes hold external resources that leak if killed immediately
- Your system has a clean shutdown requirement (compliance, data integrity)
Example — before:
# :brutal_kill on a process that writes to disk — data loss
defmodule MyApp.WriteAheadLog do
use GenServer, shutdown: :brutal_kill # BAD: loses buffered writes
@impl true
def terminate(_reason, state) do
# This never runs with :brutal_kill!
flush_buffer_to_disk(state.buffer)
end
end
Example — after:
defmodule MyApp.WriteAheadLog do
use GenServer, shutdown: 10_000 # 10 seconds to flush
@impl true
def terminate(_reason, state) do
flush_buffer_to_disk(state.buffer)
close_file_handle(state.fd)
end
end
When NOT to Use
Don't use this when:
- Setting
:infinityon worker processes — a bug interminate/2hangs your entire shutdown - The process holds no external resources (default 5000ms is fine)
- You're using
:brutal_killon supervisors (they need time to stop their children)
Over-application example:
# :infinity shutdown on a worker — if terminate hangs, deployment hangs
defmodule MyApp.Worker do
use GenServer, shutdown: :infinity
@impl true
def terminate(_reason, state) do
# If this HTTP call hangs forever, your entire app can't shut down
HTTPClient.post!("/api/deregister", %{id: state.id})
end
end
Better alternative:
defmodule MyApp.Worker do
use GenServer, shutdown: 15_000 # Generous but bounded
@impl true
def terminate(_reason, state) do
# Use a timeout on the cleanup call too
Task.async(fn -> HTTPClient.post("/api/deregister", %{id: state.id}) end)
|> Task.yield(10_000)
end
end
Why: :infinity is safe for supervisors (they're waiting for children) but dangerous for workers. A hung terminate/2 with infinite shutdown blocks your entire deployment pipeline.
Pattern 12: DynamicSupervisor Internal State — Struct with Restart Tracking
Source: lib/elixir/lib/dynamic_supervisor.ex#L165 (defstruct)
What it does: The DynamicSupervisor uses a struct for its GenServer state with explicit fields: children (map of pid → child spec), restarts (list of timestamps for rate limiting), and configuration fields.
Why: This shows the Elixir team's state design philosophy: use a struct with named fields, not a bare map or tuple. The children field uses a %{} map keyed by PID for O(1) lookup/deletion on child exit. The restarts list uses a simple sliding-window approach for restart intensity.
Anti-pattern: Using a list for children lookup (O(n) on every EXIT message), or using a tuple-based state that requires positional knowledge.
Code example from source:
defstruct [
:args,
:extra_arguments,
:mod,
:name,
:strategy,
:max_children,
:max_restarts,
:max_seconds,
children: %{},
restarts: []
]
When to Use
Triggers:
- You need to understand the internal implementation of a supervisor for debugging
- You're building a custom supervisor-like process
- You want to understand why DynamicSupervisor uses a map keyed by PID
Example — before:
# Using a list to track children — O(n) on every EXIT message
defmodule MyApp.CustomSupervisor do
use GenServer
@impl true
def init(_) do
{:ok, %{children: []}} # List! Every EXIT scans the whole thing
end
def handle_info({:EXIT, pid, _reason}, state) do
# O(n) scan to find and remove the dead child
children = Enum.reject(state.children, fn {p, _spec} -> p == pid end)
{:noreply, %{state | children: children}}
end
end
Example — after:
defmodule MyApp.CustomSupervisor do
use GenServer
defstruct children: %{}, restarts: [], max_restarts: 3, max_seconds: 5
@impl true
def init(_) do
Process.flag(:trap_exit, true)
{:ok, %__MODULE__{}}
end
def handle_info({:EXIT, pid, _reason}, state) do
# O(1) lookup and delete
{_spec, children} = Map.pop(state.children, pid)
{:noreply, %{state | children: children}}
end
end
When NOT to Use
Don't use this when:
- You're building a production supervisor (use Supervisor/DynamicSupervisor)
- You don't need custom supervision logic (the standard supervisors cover 99% of cases)
- You're optimizing prematurely — most apps never have enough children for data structure choice to matter
Over-application example:
# Building a custom supervisor because "I want more control"
defmodule MyApp.FancySupervisor do
use GenServer
# 200 lines of restart logic, child tracking, shutdown handling...
# Congratulations, you've reimplemented DynamicSupervisor with more bugs
end
Better alternative:
# Just use the standard one
{DynamicSupervisor, name: MyApp.FancySupervisor, strategy: :one_for_one}
Why: The standard supervisors are battle-tested over decades. Build custom only when you need semantics they don't provide (e.g., priority-based restart, custom backoff).
Pattern 13: Restart Logic with Exponential Backoff via :try_again
Source: lib/elixir/lib/dynamic_supervisor.ex#L710 (restart_child and related functions)
What it does: When a child fails to restart (start function returns error), DynamicSupervisor doesn't give up. It stores the child as {:restarting, child}, sends itself a :"$gen_restart" message, and retries later. This prevents the supervisor from blocking on a transiently failing child.
Why: During network partitions or resource exhaustion, a child might fail to start immediately but succeed seconds later. Instead of counting this as a restart (which would hit intensity limits), the supervisor retries asynchronously. The :try_again path is separate from the restart counter.
Anti-pattern: Treating every start failure as a "restart" — this would exhaust max_restarts quickly during transient failures like port conflicts.
Code example from source:
defp restart_child(:one_for_one, current_pid, child, state) do
{{m, f, args} = mfa, restart, shutdown, type, modules} = child
%{extra_arguments: extra} = state
case start_child(m, f, extra ++ args) do
{:ok, pid, _} ->
state = delete_child(current_pid, state)
{:ok, save_child(pid, mfa, restart, shutdown, type, modules, state)}
{:ok, pid} ->
state = delete_child(current_pid, state)
{:ok, save_child(pid, mfa, restart, shutdown, type, modules, state)}
:ignore ->
{:ok, delete_child(current_pid, state)}
{:error, reason} ->
report_error(:start_error, reason, {:restarting, current_pid}, child, state)
state = put_in(state.children[current_pid], {:restarting, child})
{:try_again, state}
end
end
When to Use
Triggers:
- A child fails to start due to transient conditions (port conflict, network partition)
- You're seeing restart intensity limits hit because start failures count as restarts
- You need a supervisor that tolerates temporary resource unavailability
Example — before:
# Every start failure counts against restart intensity
# 3 failures in 5 seconds → supervisor crashes → cascading failure
defmodule MyApp.ConnectionPool do
use GenServer
@impl true
def init(config) do
# If DB is temporarily unreachable, this crashes...
# ...which counts as a restart...
# ...which can exhaust restart intensity
{:ok, conn} = DBConnection.start_link(config)
{:ok, %{conn: conn}}
end
end
Example — after:
defmodule MyApp.ConnectionPool do
use GenServer
@impl true
def init(config) do
# Use handle_continue for the connection attempt
{:ok, %{config: config, conn: nil}, {:continue, :connect}}
end
@impl true
def handle_continue(:connect, state) do
case DBConnection.start_link(state.config) do
{:ok, conn} ->
{:noreply, %{state | conn: conn}}
{:error, _reason} ->
# Retry after delay without counting against restart intensity
Process.send_after(self(), :retry_connect, 5_000)
{:noreply, state}
end
end
@impl true
def handle_info(:retry_connect, state) do
{:noreply, state, {:continue, :connect}}
end
end
When NOT to Use
Don't use this when:
- The start failure is deterministic (config error, missing module) — fix the bug
- You're relying on automatic retry to avoid proper health checking
- The child should NOT start at all if dependencies are unavailable (use
:ignore)
Over-application example:
# Retrying forever when the error is permanent
defmodule MyApp.MisconfiguredWorker do
@impl true
def init(%{api_key: nil}) do
# This will NEVER succeed — the key is nil!
# Infinite retry just wastes resources
Process.send_after(self(), :retry, 5_000)
{:ok, %{}}
end
end
Better alternative:
defmodule MyApp.MisconfiguredWorker do
@impl true
def init(%{api_key: nil}) do
# Fail fast on permanent configuration errors
{:stop, {:error, :missing_api_key}}
end
end
Why: Retry logic is for transient failures (network, resource contention). For permanent errors (bad config, missing deps), fail fast so the operator can fix the actual problem.
Pattern 14: $ancestors and $callers — Process Lineage Tracking
Source: lib/elixir/lib/task.ex#L227 (Ancestor and Caller Tracking section)
What it does: Elixir uses two process dictionary keys for lineage:
$ancestors— the supervision hierarchy (who spawned/supervises this process)$callers— the logical call chain (who requested this work)
These are different! A task's ancestor is its supervisor, but its caller is the process that initiated the async operation.
Why: Debugging and tracing. When a task crashes, the log includes both its supervisor (for restart context) and its caller (for business logic context). This dual tracking is essential for understanding failures in systems where the spawner and supervisor are different processes.
Anti-pattern: Ignoring caller tracking when building custom process spawning. If you build something like Task.Supervisor, propagate $callers so crash logs are meaningful.
Code example from source (task/supervisor.ex):
defp get_callers(owner) do
case :erlang.get(:"$callers") do
[_ | _] = list -> [owner | list]
_ -> [owner]
end
end
# Task.start_link propagates both owner and callers
def start_link(module, function, args)
when is_atom(module) and is_atom(function) and is_list(args) do
mfa = {module, function, args}
Task.Supervised.start_link(get_owner(self()), get_callers(self()), mfa)
end
When to Use
Triggers:
- You're debugging crashes and need to understand where a task was spawned from
- You're building custom process spawning and want crash logs to show the call chain
- You need to trace a request through multiple spawned processes
Example — before:
# Custom spawner that loses caller context
defmodule MyApp.BackgroundJob do
def run_async(fun) do
spawn_link(fn ->
# When this crashes, the log shows no context about WHO spawned it
fun.()
end)
end
end
# Crash log:
# [error] Process #PID<0.234.0> raised an exception
# ** (RuntimeError) something went wrong
# No idea who called run_async or why!
Example — after:
defmodule MyApp.BackgroundJob do
def run_async(fun) do
owner = self()
callers = case Process.get(:"$callers") do
[_ | _] = list -> [owner | list]
_ -> [owner]
end
spawn_link(fn ->
Process.put(:"$callers", callers)
fun.()
end)
end
end
# Crash log now shows the full caller chain:
# [error] Process #PID<0.234.0> raised an exception
# Callers: [#PID<0.200.0>, #PID<0.150.0>] ← who initiated this work
When NOT to Use
Don't use this when:
- You're using Task/Task.Supervisor (they propagate callers automatically)
- The process is long-lived and the original caller is irrelevant after startup
- You're spawning processes that outlive their callers (callers list becomes stale)
Over-application example:
# Tracking callers for a permanent GenServer — pointless after init
defmodule MyApp.Cache do
use GenServer
def start_link(_) do
# The "caller" of start_link is the supervisor — not useful for debugging
# After boot, the cache serves many callers — the original spawner is irrelevant
GenServer.start_link(__MODULE__, [], name: __MODULE__)
end
end
Better alternative:
# For long-lived processes, use Logger.metadata or OpenTelemetry spans
# to track per-request context, not process lineage
def handle_call({:get, key}, _from, state) do
Logger.metadata(request_id: Logger.metadata()[:request_id])
{:reply, Map.get(state, key), state}
end
Why: $callers is useful for short-lived spawned work (tasks, one-shot processes). For long-lived services, per-request tracing (metadata, spans) is more appropriate than process lineage.
Pattern 15: GenServer.reply/2 for Deferred Responses
Source: lib/elixir/lib/gen_server.ex#L620 (callback docs), lib/elixir/lib/gen_server.ex#L1328 (reply/2 function)
What it does: A handle_call can return {:noreply, state} without replying, then later call GenServer.reply(from, response) from any process. This decouples request receipt from response delivery.
Why: Three use cases (from the source):
- Reply before returning (response known, but need to do cleanup after)
- Reply after returning (response not yet available, computed asynchronously)
- Reply from another process (delegate work to a task)
This enables non-blocking request handling in GenServers that would otherwise be bottlenecked.
Anti-pattern: Spawning a task to do work and then having the GenServer block on Task.await inside handle_call. This defeats the purpose — use reply/2 from the task instead.
Code example from source:
def handle_call(:reply_in_one_second, from, state) do
Process.send_after(self(), {:reply, from}, 1_000)
{:noreply, state}
end
def handle_info({:reply, from}, state) do
GenServer.reply(from, :one_second_has_passed)
{:noreply, state}
end
When to Use
Triggers:
- A GenServer needs to do async work before replying (DB query, HTTP call, aggregation)
- You want to reply from a different process than the one that received the request
- You need to send intermediate progress and then a final response
Example — before:
defmodule MyApp.Aggregator do
use GenServer
@impl true
def handle_call(:aggregate, _from, state) do
# Blocks the GenServer for potentially seconds
# No other calls can be processed during this time
result = Enum.reduce(state.sources, %{}, fn source, acc ->
data = HTTPClient.get!(source.url).body
Map.merge(acc, Jason.decode!(data))
end)
{:reply, result, state}
end
end
Example — after:
defmodule MyApp.Aggregator do
use GenServer
@impl true
def handle_call(:aggregate, from, state) do
# Don't block — spawn the work and reply later
Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn ->
result = Enum.reduce(state.sources, %{}, fn source, acc ->
data = HTTPClient.get!(source.url).body
Map.merge(acc, Jason.decode!(data))
end)
GenServer.reply(from, result)
end)
{:noreply, state}
end
end
When NOT to Use
Don't use this when:
- The work is fast (< 1ms) — just reply inline
- You need the reply to be ordered with respect to other calls (deferred replies break ordering)
- The
fromreference escapes to a long-lived process (it holds a monitor that should be cleaned up)
Over-application example:
# Deferring reply for trivial work — unnecessary complexity
defmodule MyApp.Counter do
use GenServer
@impl true
def handle_call(:get, from, state) do
# This is instant! Why defer?
Task.start(fn -> GenServer.reply(from, state.count) end)
{:noreply, state}
end
end
Better alternative:
defmodule MyApp.Counter do
use GenServer
@impl true
def handle_call(:get, _from, state) do
{:reply, state.count, state}
end
end
Why: reply/2 enables non-blocking GenServers for expensive operations. For cheap operations, it adds process spawn overhead, potential ordering issues, and code complexity for no benefit.
Pattern 16: Process.alias for Safe Request/Response
Source: lib/elixir/lib/process.ex#L32 (Aliases section)
What it does: Process aliases (Erlang/OTP 24+) provide a deactivatable reference for receiving replies. After sending a request with an alias as the reply address, you can deactivate the alias if you no longer want the response — any messages sent to a deactivated alias are silently dropped.
Why: Solves the "late reply" problem. In request/response patterns, if the requester times out and moves on, a late reply to its PID could confuse future receive blocks. With aliases, you deactivate after timeout and the late reply harmlessly vanishes.
Anti-pattern: Using bare PIDs for reply addresses in protocols where timeouts are possible. Late messages pollute the mailbox.
Code example from source:
server = spawn(&server/0)
source_alias = Process.alias()
send(server, {:ping, source_alias})
receive do
:pong -> :pong
end
# Deactivate — late replies to this alias are silently dropped
Process.unalias(source_alias)
When to Use
Triggers:
- You're building request/response patterns with timeouts where late replies pollute the mailbox
- A GenServer sends a request and moves on after timeout, but the response arrives later
- You need safe cancellation of pending responses
Example — before:
defmodule MyApp.RequestRouter do
use GenServer
@impl true
def handle_call({:request, payload}, _from, state) do
send(state.backend, {:request, self(), payload})
receive do
{:response, result} -> {:reply, result, state}
after
5_000 ->
# Timeout... but the response might still arrive later!
# It'll sit in our mailbox and confuse future receives
{:reply, {:error, :timeout}, state}
end
end
end
Example — after:
defmodule MyApp.RequestRouter do
use GenServer
@impl true
def handle_call({:request, payload}, _from, state) do
alias_ref = Process.alias([:reply])
send(state.backend, {:request, alias_ref, payload})
receive do
{^alias_ref, result} -> {:reply, result, state}
after
5_000 ->
# Deactivate the alias — late replies are silently dropped
Process.unalias(alias_ref)
{:reply, {:error, :timeout}, state}
end
end
end
When NOT to Use
Don't use this when:
- You're using GenServer.call (it already handles this with its own ref-based protocol)
- The response will always arrive (no timeout scenario)
- You're on OTP < 24 (aliases aren't available)
Over-application example:
# Using aliases for GenServer.call — it already handles late replies
defmodule MyApp.Client do
def get_data(server) do
alias_ref = Process.alias([:reply])
# Pointless — GenServer.call already uses monitor-based protocol
# that handles late replies correctly
GenServer.call(server, {:get, alias_ref})
end
end
Better alternative:
defmodule MyApp.Client do
def get_data(server) do
# GenServer.call already handles timeouts and late replies correctly
GenServer.call(server, :get, 5_000)
end
end
Why: Aliases solve the problem for custom protocols where you build your own request/response. GenServer.call already has equivalent protections built in. Use aliases when you're implementing raw message-based protocols.
Pattern 17: Registry Partitioning Strategies
Source: lib/elixir/lib/registry.ex#L310 (start_link partitioning docs)
What it does: Duplicate registries support two partitioning strategies:
{:duplicate, :pid}(default) — groups entries by the registering process's PID. Good for few keys with many entries (e.g., one PubSub topic with many subscribers).{:duplicate, :key}— groups entries by key. Good for many keys with few entries each (e.g., many topics with few subscribers).
Why: The partitioning strategy determines which partition(s) need to be scanned during lookup. With :key partitioning, a key lookup hits exactly one partition (O(1) partitions). With :pid partitioning, key lookups must scan all partitions but process-based operations (unregister on death) are localized.
Anti-pattern: Using default :pid partitioning with millions of unique keys and frequent lookups. Each lookup scans all partitions. Switch to {:duplicate, :key}.
Code example from source:
# Many topics, few subscribers each — use key partitioning
Registry.start_link(
keys: {:duplicate, :key},
name: MyApp.TopicRegistry,
partitions: System.schedulers_online()
)
# Few topics, many subscribers — use pid partitioning (default)
Registry.start_link(
keys: :duplicate,
name: MyApp.BroadcastRegistry,
partitions: System.schedulers_online()
)
When to Use
Triggers:
- You have a PubSub with many topics and few subscribers per topic — key lookups are slow
- Profiling shows Registry.dispatch scanning many partitions for key-based lookups
- You're choosing between "optimize for subscribe/unsubscribe" vs "optimize for dispatch"
Example — before:
# Default :pid partitioning with many unique keys
# Each dispatch must scan ALL partitions to find subscribers for a key
Registry.start_link(keys: :duplicate, name: MyApp.Events)
# With 16 partitions and 100k unique event types,
# every dispatch scans 16 ETS tables
Registry.dispatch(MyApp.Events, "order.created", fn entries ->
for {pid, _} <- entries, do: send(pid, :notify)
end)
Example — after:
# Key partitioning — dispatch hits exactly ONE partition per key
Registry.start_link(
keys: {:duplicate, :key},
name: MyApp.Events,
partitions: System.schedulers_online()
)
# Now dispatch only scans one ETS table — O(1) partitions
Registry.dispatch(MyApp.Events, "order.created", fn entries ->
for {pid, _} <- entries, do: send(pid, :notify)
end)
When NOT to Use
Don't use this when:
- You have few keys with many subscribers (
:pidpartitioning is better for cleanup) - Process death cleanup is the hot path (
:keypartitioning must scan all partitions on death) - You're not hitting performance issues with the default (premature optimization)
Over-application example:
# Key partitioning for a "presence" system where processes die frequently
# Each death must scan ALL partitions to unregister
Registry.start_link(
keys: {:duplicate, :key},
name: MyApp.Presence,
partitions: 16
)
# With 50k users connecting/disconnecting per second,
# each disconnect scans 16 partitions — worse than default!
Better alternative:
# Pid partitioning — death cleanup is localized to one partition
Registry.start_link(
keys: :duplicate,
name: MyApp.Presence,
partitions: System.schedulers_online()
)
Why: Partitioning is a tradeoff. :key optimizes dispatch (one partition per lookup) at the cost of death cleanup (scan all). :pid optimizes death cleanup (one partition) at the cost of dispatch (scan all). Pick based on which operation is hotter.
Pattern 18: init/1 Return Values — The Full Spectrum
Source: lib/elixir/lib/gen_server.ex#L498 (init callback spec)
What it does: init/1 supports five return values:
{:ok, state}— normal start{:ok, state, timeout}— start with idle timeout{:ok, state, :hibernate}— start and immediately hibernate (GC + compact heap){:ok, state, {:continue, arg}}— start then immediately invokehandle_continue:ignore— don't start, supervisor treats as successful (child can be restarted later){:stop, reason}— initialization failed
Why: Each covers a real scenario:
:ignore— process is disabled by configuration but might be enabled later viaSupervisor.restart_child/2{:stop, reason}— unrecoverable initialization failure:hibernate— process will be idle for a long time, minimize memory{:continue, _}— split fast init from slow setup
Anti-pattern: Using {:stop, reason} when :ignore is appropriate. If a feature is disabled by config, :ignore keeps the child spec in the supervisor for later activation. {:stop, reason} signals a real failure.
When to Use
Triggers:
- You need to communicate "don't start this child" without the supervisor treating it as failure
- A feature is disabled by config but the child spec should remain for hot-enabling
- A process discovers during init that it's a duplicate and should yield to the existing one
Example — before:
defmodule MyApp.OptionalFeature do
use GenServer
@impl true
def init(_) do
if Application.get_env(:my_app, :feature_enabled) do
{:ok, %{}}
else
# {:stop, :disabled} causes supervisor to count it as a failure!
{:stop, :disabled}
end
end
end
Example — after:
defmodule MyApp.OptionalFeature do
use GenServer
@impl true
def init(_) do
if Application.get_env(:my_app, :feature_enabled) do
{:ok, %{}}
else
# :ignore — supervisor is happy, child spec stays for later activation
:ignore
end
end
end
# Later, to enable:
# Update config, then:
# Supervisor.restart_child(MyApp.Supervisor, MyApp.OptionalFeature)
When NOT to Use
Don't use this when:
- The failure is real and should count toward restart intensity (use
{:stop, reason}) - You want the supervisor to NOT have a child spec for this module (just don't add it)
- The process should retry starting later automatically (use
{:stop, _}+ transient restart)
Over-application example:
# Using :ignore for a real failure — hides the problem
defmodule MyApp.DBConnection do
@impl true
def init(config) do
case connect(config) do
{:ok, conn} -> {:ok, conn}
{:error, _} -> :ignore # BAD: DB is down but we pretend everything is fine
end
end
end
Better alternative:
defmodule MyApp.DBConnection do
@impl true
def init(config) do
case connect(config) do
{:ok, conn} -> {:ok, conn}
{:error, reason} -> {:stop, reason} # Let supervisor handle the failure
end
end
end
Why: :ignore means "this child intentionally should not run right now." {:stop, reason} means "this child tried to start and failed." Conflating the two hides real failures from your supervision tree.
Decision Tree
- If you have children known at compile time with ordering dependencies → Pattern 1: Static vs Dynamic Supervision
- If a single DynamicSupervisor or Task.Supervisor is a bottleneck under high spawn load → Pattern 2: PartitionSupervisor for Scalability
- If you need to decide how a supervisor reacts when children share state or have dependencies → Pattern 3: Supervision Strategies
- If you want to tune how many restarts are tolerated before escalation → Pattern 4: Restart Intensity
- If different processes have different lifecycle expectations (one-shot vs permanent) → Pattern 5: Restart Values
- If a supervisor should self-terminate when its children finish their work → Pattern 6: Automatic Shutdown
- If you need to compute values concurrently and the caller should crash on failure → Pattern 7: Task.async/await
- If a GenServer needs to spawn work that might fail without taking down the server → Pattern 8: Task.Supervisor.async_nolink
- If you need supervised tasks with caller tracking, async_nolink, and streaming → Pattern 9: Task Supervisor
- If you need to look up processes by a dynamic key without atom leaks → Pattern 10: Registry
- If processes hold external resources that need cleanup on shutdown → Pattern 11: Shutdown Semantics
- If you are building a custom supervisor-like process and need efficient child tracking → Pattern 12: DynamicSupervisor Internal State
- If a child fails to start due to transient conditions and you want non-blocking retry → Pattern 13: Restart Logic with Backoff
- If you need to trace which process initiated spawned work for debugging → Pattern 14: Process Lineage Tracking
- If a GenServer needs to do async work before replying to a caller → Pattern 15: GenServer.reply/2
- If you build a custom request/response protocol with timeouts and need to prevent late replies → Pattern 16: Process.alias
- If your Registry dispatch is slow because of wrong partitioning strategy → Pattern 17: Registry Partitioning
- If you need to communicate "don't start this child" or split init into fast/slow phases → Pattern 18: init/1 Return Values