# Process Design Patterns — From the Elixir Source Analysis of `lib/elixir/lib/supervisor.ex`, `lib/elixir/lib/dynamic_supervisor.ex`, `lib/elixir/lib/task.ex`, `lib/elixir/lib/task/supervisor.ex`, `lib/elixir/lib/process.ex`, and `lib/elixir/lib/registry.ex`. --- ## Pattern 1: Static vs Dynamic Supervision — Choose the Right Tool **Source:** [lib/elixir/lib/supervisor.ex#L1](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L1) vs [lib/elixir/lib/dynamic_supervisor.ex#L1](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/dynamic_supervisor.ex#L1) **What it does:** Elixir provides two distinct supervisor types: - `Supervisor` — for **static** children known at compile time, started in a defined order - `DynamicSupervisor` — for children started **on demand** at runtime, with no ordering guarantees **Why:** Static supervisors guarantee startup order (critical for dependencies like "DB pool must start before web server"). Dynamic supervisors optimize for scale — they can hold millions of children using efficient data structures and shut down concurrently. **Anti-pattern:** Using a `Supervisor` when children are created dynamically (e.g., one process per WebSocket connection). You'll hit performance issues and ordering semantics you don't need. Conversely, using `DynamicSupervisor` for fixed infrastructure (DB pool, PubSub) loses startup order guarantees. **Code example from source (dynamic_supervisor.ex:1-15):** ```elixir # DynamicSupervisor docs explain the distinction: # "The Supervisor module was designed to handle mostly static children # that are started in the given order when the supervisor starts. A # DynamicSupervisor starts with no children. Instead, children are # started on demand via start_child/2 and there is no ordering between # children." ``` ### When to Use **Triggers:** - You know at startup exactly which children need to run (DB pool, PubSub, caches) - Children have ordering dependencies (pool must start before consumers) - You're building application-level infrastructure in your supervision tree **Example — before:** ```elixir # Using DynamicSupervisor for fixed infrastructure — wrong tool defmodule MyApp.Application do def start(_type, _args) do children = [{DynamicSupervisor, name: MyApp.InfraSupervisor}] Supervisor.start_link(children, strategy: :one_for_one) end end # Manually starting fixed children after supervisor boots DynamicSupervisor.start_child(MyApp.InfraSupervisor, MyApp.Repo) DynamicSupervisor.start_child(MyApp.InfraSupervisor, MyApp.PubSub) DynamicSupervisor.start_child(MyApp.InfraSupervisor, MyApp.Endpoint) # No startup ordering guarantee! ``` **Example — after:** ```elixir defmodule MyApp.Application do def start(_type, _args) do children = [ MyApp.Repo, # DB pool starts first MyApp.PubSub, # PubSub starts after DB is ready MyApp.Endpoint # Web server starts last ] Supervisor.start_link(children, strategy: :rest_for_one) end end ``` ### When NOT to Use **Don't use this when:** - Children are created on-demand (per-connection, per-request, per-user) - The number of children is unbounded or varies significantly at runtime - You don't need ordering guarantees between children **Over-application example:** ```elixir # Static supervisor for per-WebSocket connections — will be clunky defmodule MyApp.ConnectionSupervisor do use Supervisor def init(_) do # Can't define children at compile time — they arrive at runtime! Supervisor.init([], strategy: :one_for_one) end # Awkward: using Supervisor for dynamic children def add_connection(socket) do spec = {ConnectionHandler, socket} Supervisor.start_child(__MODULE__, spec) end end ``` **Better alternative:** ```elixir defmodule MyApp.ConnectionSupervisor do use DynamicSupervisor def start_link(_), do: DynamicSupervisor.start_link(__MODULE__, [], name: __MODULE__) def init(_), do: DynamicSupervisor.init(strategy: :one_for_one) def add_connection(socket) do DynamicSupervisor.start_child(__MODULE__, {ConnectionHandler, socket}) end end ``` **Why:** Static supervisors excel at ordered, fixed infrastructure. DynamicSupervisor excels at scale with runtime-determined children. Pick based on whether children are known at compile time. --- ## Pattern 2: PartitionSupervisor for Scalability **Source:** [lib/elixir/lib/dynamic_supervisor.ex#L60](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/dynamic_supervisor.ex#L60) and [lib/elixir/lib/task/supervisor.ex#L35](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task/supervisor.ex#L35) **What it does:** Both `DynamicSupervisor` and `Task.Supervisor` document the same scalability pattern: when a single supervisor becomes a bottleneck, wrap it in a `PartitionSupervisor` which starts N instances (one per core by default) and routes via a key. **Why:** A supervisor is a single process. Under heavy `start_child` load, it serializes all spawn operations. PartitionSupervisor distributes the load across multiple supervisor processes, using `self()` as the routing key to ensure each caller consistently hits the same partition. **Anti-pattern:** Creating your own load-balancing logic for supervisors, or just accepting the bottleneck. The standard library provides this pattern explicitly. **Code example from source (dynamic_supervisor.ex):** ```elixir # Instead of a single DynamicSupervisor: children = [ {PartitionSupervisor, child_spec: DynamicSupervisor, name: MyApp.DynamicSupervisors} ] # Start children through the partition supervisor: DynamicSupervisor.start_child( {:via, PartitionSupervisor, {MyApp.DynamicSupervisors, self()}}, {Counter, 0} ) ``` ### When to Use **Triggers:** - A single DynamicSupervisor or Task.Supervisor is a bottleneck under high `start_child` load - You're seeing latency spikes when spawning tasks/children under concurrency - Profiling shows the supervisor process has a large message queue **Example — before:** ```elixir # Single supervisor — serializes all spawn operations defmodule MyApp.TaskRunner do def run_async(fun) do Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fun) end end # Under 10k concurrent requests, this single process becomes a bottleneck ``` **Example — after:** ```elixir # Partitioned — distributes load across N supervisor processes defmodule MyApp.Application do def start(_type, _args) do children = [ {PartitionSupervisor, child_spec: Task.Supervisor, name: MyApp.TaskSupervisors} ] Supervisor.start_link(children, strategy: :one_for_one) end end defmodule MyApp.TaskRunner do def run_async(fun) do Task.Supervisor.async_nolink( {:via, PartitionSupervisor, {MyApp.TaskSupervisors, self()}}, fun ) end end ``` ### When NOT to Use **Don't use this when:** - You have low spawn rates (< 1000/sec) — a single supervisor is fine - You need ordering guarantees between children (partitioning breaks ordering) - The supervisor has few children total (partitioning adds overhead for no gain) **Over-application example:** ```elixir # Partitioning a supervisor that starts 5 children at boot — pointless {PartitionSupervisor, child_spec: DynamicSupervisor, name: MyApp.ConfigSupervisors, partitions: System.schedulers_online()} # 16 partitions for 5 children = massive overhead, zero benefit ``` **Better alternative:** ```elixir # Just use a plain DynamicSupervisor {DynamicSupervisor, name: MyApp.ConfigSupervisor} ``` **Why:** PartitionSupervisor exists for high-throughput spawn scenarios. If you're not hitting supervisor mailbox limits, the extra processes and routing logic add complexity without benefit. --- ## Pattern 3: Supervision Strategies — Choosing the Right Restart Behavior **Source:** [lib/elixir/lib/supervisor.ex#L315](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L315) (Strategies section) **What it does:** Three strategies model three dependency patterns: - `:one_for_one` — independent children (crash of A doesn't affect B) - `:one_for_all` — tightly coupled children (if one fails, all state is inconsistent) - `:rest_for_one` — sequential dependencies (children started after the crashed one depend on it) **Why:** These map directly to runtime dependency graphs. A connection pool and its consumers are `:rest_for_one` — consumers can't work without the pool. Multiple independent request handlers are `:one_for_one`. Workers sharing a cache are `:one_for_all` — stale cache state after a crash could cause inconsistency. **Anti-pattern:** Defaulting to `:one_for_one` everywhere without thinking about dependencies. If process B depends on process A's state and A crashes, B will be working with stale assumptions. **Code example from source (supervisor.ex docs):** ```elixir # Independent workers — one crash doesn't affect others Supervisor.start_link(children, strategy: :one_for_one) # Tightly coupled — all must restart together for consistency Supervisor.start_link(children, strategy: :one_for_all) # Sequential dependency — later children depend on earlier ones Supervisor.start_link(children, strategy: :rest_for_one) ``` ### When to Use **Triggers:** - You're deciding how a supervisor should react when one child fails - Children share state or resources that become inconsistent if one crashes - You have a pipeline: A feeds B feeds C **Example — before:** ```elixir # Defaulting to :one_for_one without thinking about dependencies defmodule MyApp.DataPipeline do use Supervisor def init(_) do children = [ MyApp.DataSource, # Produces data MyApp.Transformer, # Transforms data (holds reference to DataSource) MyApp.Sink # Writes transformed data ] # If DataSource crashes, Transformer has a stale reference! Supervisor.init(children, strategy: :one_for_one) end end ``` **Example — after:** ```elixir defmodule MyApp.DataPipeline do use Supervisor def init(_) do children = [ MyApp.DataSource, # If this crashes... MyApp.Transformer, # ...these must restart too (stale refs) MyApp.Sink ] # rest_for_one: crash of DataSource restarts Transformer and Sink Supervisor.init(children, strategy: :rest_for_one) end end ``` ### When NOT to Use **Don't use this when:** - Children are truly independent (HTTP request handlers, job workers) - You're using `:one_for_all` because you're unsure — analyze dependencies first - The restart strategy masks a design problem (maybe use separate supervision subtrees) **Over-application example:** ```elixir # one_for_all when children are actually independent defmodule MyApp.Workers do use Supervisor def init(_) do children = [ {MyApp.EmailWorker, []}, {MyApp.SMSWorker, []}, {MyApp.PushWorker, []} ] # If email crashes, why restart SMS and Push? They're independent! Supervisor.init(children, strategy: :one_for_all) end end ``` **Better alternative:** ```elixir defmodule MyApp.Workers do use Supervisor def init(_) do children = [ {MyApp.EmailWorker, []}, {MyApp.SMSWorker, []}, {MyApp.PushWorker, []} ] # Independent workers — one crash doesn't affect others Supervisor.init(children, strategy: :one_for_one) end end ``` **Why:** Restart strategies model dependency graphs. Using `:one_for_all` for independent workers causes unnecessary restarts, losing in-progress work for no benefit. --- ## Pattern 4: Restart Intensity (`max_restarts` / `max_seconds`) **Source:** [lib/elixir/lib/supervisor.ex#L309](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L309), [lib/elixir/lib/dynamic_supervisor.ex#L730](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/dynamic_supervisor.ex#L730) (implementation) **What it does:** Supervisors track restart frequency. If a child exceeds `max_restarts` within `max_seconds`, the supervisor itself shuts down (escalating the failure to its parent). Defaults: 3 restarts in 5 seconds. **Why:** Prevents infinite restart loops that waste CPU and mask bugs. If a child keeps crashing within seconds, it's a systemic problem that the current supervisor level can't fix. Escalating to the parent allows a higher-level strategy to respond (perhaps restarting the entire subsystem with fresh state). **Anti-pattern:** Setting `max_restarts` extremely high to "prevent crashes." This hides bugs and wastes resources. Let supervisors escalate — that's the point of the hierarchy. **Code example from source (dynamic_supervisor.ex internal logic):** ```elixir defp add_restart(state) do %{max_seconds: max_seconds, max_restarts: max_restarts, restarts: restarts} = state now = :erlang.monotonic_time(1) restarts = add_restart([now | restarts], now, max_seconds) state = %{state | restarts: restarts} if length(restarts) <= max_restarts do {:ok, state} else {:shutdown, state} end end defp add_restart(restarts, now, period) do for then <- restarts, now <= then + period, do: then end ``` ### When to Use **Triggers:** - You want to prevent infinite restart loops from burning CPU - You're tuning a supervisor for a child that occasionally crashes under load - You need the supervisor to escalate when a systemic problem prevents recovery **Example — before:** ```elixir # Default: 3 restarts in 5 seconds — might be too aggressive for flaky networks defmodule MyApp.ExternalAPISupervisor do use Supervisor def init(_) do children = [{MyApp.APIClient, []}] # Default max_restarts: 3, max_seconds: 5 # Network blip causes 3 crashes in 2 seconds → supervisor dies → app crashes Supervisor.init(children, strategy: :one_for_one) end end ``` **Example — after:** ```elixir defmodule MyApp.ExternalAPISupervisor do use Supervisor def init(_) do children = [{MyApp.APIClient, []}] # Allow more restarts for transient network issues Supervisor.init(children, strategy: :one_for_one, max_restarts: 10, max_seconds: 60 ) end end ``` ### When NOT to Use **Don't use this when:** - You're setting max_restarts very high to "prevent crashes" — you're hiding bugs - The child crash is deterministic (same input = same crash) — fix the bug instead - You're relying on restart intensity as a backoff mechanism (use explicit backoff) **Over-application example:** ```elixir # Setting absurdly high restart limits to "never crash" Supervisor.init(children, strategy: :one_for_one, max_restarts: 1000, max_seconds: 1 ) # This allows 1000 crashes per second — you'll burn CPU and hide bugs ``` **Better alternative:** ```elixir # Reasonable limits + fix the underlying crash Supervisor.init(children, strategy: :one_for_one, max_restarts: 5, max_seconds: 30 ) # If 5 crashes in 30 seconds isn't enough, the problem is the child, not the limit ``` **Why:** Restart intensity is a circuit breaker, not a throttle. It should escalate systemic failures, not suppress them. If you need aggressive restarts, your child has a bug. --- ## Pattern 5: Restart Values — `:permanent` vs `:transient` vs `:temporary` **Source:** [lib/elixir/lib/supervisor.ex#L130](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L130) (Restart values section) **What it does:** Three restart policies control what happens when a child terminates: - `:permanent` — always restart (default for GenServer/Agent/Supervisor) - `:transient` — restart only on abnormal exit (not `:normal`, `:shutdown`, `{:shutdown, term}`) - `:temporary` — never restart (default for Task) **Why:** Different processes have different lifecycle expectations. A database pool should always be running (`:permanent`). A task that computes a value and exits is done when it's done (`:temporary`). A connection process should restart on crash but not on graceful disconnect (`:transient`). **Anti-pattern:** Making everything `:permanent`. If a one-shot task keeps restarting, it'll trigger restart intensity limits and take down the supervisor. **Code example from source:** ```elixir # Task defaults to :temporary — intentional one-shot work # (from task.ex:282) def child_spec(arg) do %{ id: Task, start: {Task, :start_link, [arg]}, restart: :temporary } end # Customize via use: use GenServer, restart: :transient ``` ### When to Use **Triggers:** - You have different process types with different lifecycle expectations - One-shot tasks keep restarting and wasting resources - A connection should gracefully disconnect without triggering restart **Example — before:** ```elixir # Everything is :permanent (default) — tasks restart forever defmodule MyApp.BatchProcessor do use GenServer def handle_cast({:process, batch}, state) do Task.start_link(fn -> process_batch(batch) # Task exits :normal... and gets restarted by supervisor! end) {:noreply, state} end end ``` **Example — after:** ```elixir defmodule MyApp.BatchTask do use Task, restart: :temporary # Don't restart completed tasks def start_link(batch) do Task.start_link(__MODULE__, :run, [batch]) end def run(batch), do: process_batch(batch) end defmodule MyApp.ConnectionWorker do use GenServer, restart: :transient # Restart on crash, not graceful disconnect def disconnect(pid) do GenServer.stop(pid, :normal) # Won't trigger restart end end ``` ### When NOT to Use **Don't use this when:** - You're using `:temporary` to avoid fixing a crash (the child just stays dead) - You set everything to `:transient` without thinking — `:permanent` is usually right for services **Over-application example:** ```elixir # Making a critical service :temporary so it "doesn't bother the supervisor" defmodule MyApp.PaymentProcessor do use GenServer, restart: :temporary # If this crashes, it stays dead! Payments stop working silently! end ``` **Better alternative:** ```elixir defmodule MyApp.PaymentProcessor do use GenServer, restart: :permanent # Critical services should always restart — that's the whole point end ``` **Why:** `:permanent` is the safe default for anything that should "always be running." Only use `:transient` for processes that have a valid "done" state, and `:temporary` for truly one-shot work. --- ## Pattern 6: Automatic Shutdown for Pipeline Supervisors **Source:** [lib/elixir/lib/supervisor.ex#L349](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L349) (Automatic shutdown section) **What it does:** Supervisors support `:auto_shutdown` which terminates the supervisor when significant children exit. Options: `:any_significant` (first significant child exits → shutdown) or `:all_significant` (all significant children must exit → shutdown). **Why:** Models pipeline/workflow patterns where a supervisor's purpose is tied to its children's work. If all significant workers finish, the supervisor should clean up. This is useful for batch processing supervisors or connection-scoped process groups. **Anti-pattern:** Manually monitoring children and calling `Supervisor.stop/1`. The automatic shutdown mechanism handles this cleanly within OTP semantics. **Code example (from docs):** ```elixir # Only :transient and :temporary children can be marked significant children = [ Supervisor.child_spec({BatchWorker, args}, significant: true, restart: :transient) ] Supervisor.start_link(children, strategy: :one_for_one, auto_shutdown: :all_significant ) ``` ### When to Use **Triggers:** - You have a supervisor managing a batch/pipeline where completion means "job done" - A supervisor's existence only makes sense while its children are doing work - You're building a workflow that should self-terminate **Example — before:** ```elixir # Manual cleanup when batch workers finish defmodule MyApp.BatchSupervisor do use DynamicSupervisor def all_done?(supervisor) do # Polling... ugly DynamicSupervisor.count_children(supervisor).active == 0 end end # Somewhere else, a monitor process watches and cleans up defmodule MyApp.BatchMonitor do use GenServer def handle_info(:check, state) do if MyApp.BatchSupervisor.all_done?(state.sup) do Supervisor.stop(state.sup) end {:noreply, state} end end ``` **Example — after:** ```elixir defmodule MyApp.BatchSupervisor do use Supervisor def start_link(tasks) do Supervisor.start_link(__MODULE__, tasks) end def init(tasks) do children = Enum.map(tasks, fn task -> Supervisor.child_spec({MyApp.BatchWorker, task}, id: task.id, restart: :transient, significant: true) end) Supervisor.init(children, strategy: :one_for_one, auto_shutdown: :all_significant ) end end # When all workers complete normally, supervisor shuts down automatically ``` ### When NOT to Use **Don't use this when:** - Children are long-lived services that should never "complete" - You want the supervisor to keep running even after children exit (for later restarts) - Children have `:permanent` restart — they can't be `:significant` **Over-application example:** ```elixir # auto_shutdown on infrastructure supervisor — it'll die when any child exits! children = [ Supervisor.child_spec(MyApp.Cache, significant: true, restart: :transient), MyApp.WebServer ] Supervisor.init(children, strategy: :one_for_one, auto_shutdown: :any_significant ) # If Cache restarts as :transient and exits :normal once, the WHOLE supervisor dies ``` **Better alternative:** ```elixir # Infrastructure supervisors should NOT auto-shutdown Supervisor.init(children, strategy: :one_for_one) # Only use auto_shutdown for workflow/batch supervisors with finite lifetimes ``` **Why:** `auto_shutdown` models "this supervisor's job is done when its children finish." It's for finite work, not long-lived services. --- ## Pattern 7: Task.async/await for Concurrent Value Computation **Source:** [lib/elixir/lib/task.ex#L1](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task.ex#L1) and [lib/elixir/lib/task.ex#L300](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task.ex#L300) **What it does:** `Task.async` spawns a linked, monitored process and returns a `%Task{}` struct. `Task.await` blocks until the result arrives or times out. This is the canonical pattern for "compute a value concurrently." **Why:** Tasks provide structured concurrency — the caller is linked to the task, so failures propagate naturally. The monitor reference enables safe await with timeout. This is explicit and composable unlike raw `spawn_link` + receive. **Anti-pattern:** Using `spawn_link` + `send` + `receive` for one-shot concurrent computation. You lose: error propagation, structured monitoring, timeout handling, and the caller-tracking metadata that tasks provide. **Code example from source:** ```elixir task = Task.async(fn -> do_some_work() end) res = do_some_other_work() res + Task.await(task) ``` Key constraint from docs: "If you start an async, you **must await**. This is either done by calling `Task.await/2` or `Task.yield/2` followed by `Task.shutdown/2`." ### When to Use **Triggers:** - You need to compute a value concurrently and use it in the current flow - Multiple independent computations can run in parallel to reduce latency - The current process should crash if the computation fails (linked failure) **Example — before:** ```elixir # Sequential — total time = sum of all operations def build_dashboard(user_id) do profile = fetch_profile(user_id) # 200ms orders = fetch_recent_orders(user_id) # 300ms recommendations = compute_recs(user_id) # 500ms # Total: 1000ms %{profile: profile, orders: orders, recommendations: recommendations} end ``` **Example — after:** ```elixir # Concurrent — total time = max of all operations def build_dashboard(user_id) do profile_task = Task.async(fn -> fetch_profile(user_id) end) orders_task = Task.async(fn -> fetch_recent_orders(user_id) end) recs_task = Task.async(fn -> compute_recs(user_id) end) %{ profile: Task.await(profile_task), orders: Task.await(orders_task), recommendations: Task.await(recs_task) } # Total: ~500ms (limited by slowest task) end ``` ### When NOT to Use **Don't use this when:** - The task might fail and you don't want the caller to crash (use `async_nolink`) - You're inside a GenServer and can't block on `await` (use `async_nolink` + `handle_info`) - The computation is trivial (< 1ms) — spawning a process adds overhead **Over-application example:** ```elixir # Spawning a task for trivial work — overhead exceeds benefit def format_name(user) do task = Task.async(fn -> String.upcase(user.name) end) Task.await(task) # Process spawn + message passing for a microsecond operation! end ``` **Better alternative:** ```elixir def format_name(user) do String.upcase(user.name) end ``` **Why:** Tasks add process spawn overhead (~2-5μs) plus message passing. Only use them when the work is expensive enough to justify parallelism — typically >1ms or when running multiple operations concurrently. --- ## Pattern 8: Task.Supervisor.async_nolink for Fault-Tolerant Task Execution **Source:** [lib/elixir/lib/task/supervisor.ex#L240](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task/supervisor.ex#L240) (async_nolink docs with GenServer example) **What it does:** Unlike `Task.async`, `async_nolink` spawns a task that is NOT linked to the caller. The caller monitors it and handles success/failure via `handle_info`. This prevents a task crash from killing the caller. **Why:** In a GenServer, you often want to spawn work that might fail without taking down the server. The pattern: spawn with `async_nolink`, receive the result as `{ref, answer}`, and handle failure as `{:DOWN, ref, :process, _pid, reason}`. **Anti-pattern:** Using `Task.async` inside a GenServer when the task might fail. The link means the GenServer crashes too. Use `async_nolink` + `handle_info` for resilient concurrent work. **Code example from source (task/supervisor.ex):** ```elixir defmodule MyApp.Server do use GenServer def handle_call(:start_task, _from, %{ref: nil} = state) do task = Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn -> # potentially failing work end) {:reply, :ok, %{state | ref: task.ref}} end # Task completed successfully def handle_info({ref, answer}, %{ref: ref} = state) do Process.demonitor(ref, [:flush]) {:noreply, %{state | ref: nil}} end # Task failed def handle_info({:DOWN, ref, :process, _pid, _reason}, %{ref: ref} = state) do {:noreply, %{state | ref: nil}} end end ``` ### When to Use **Triggers:** - A GenServer needs to spawn work that might fail without crashing the server - You're building a "request/response with timeout" pattern inside a GenServer - External calls (HTTP, DB) from a GenServer should be non-blocking and resilient **Example — before:** ```elixir defmodule MyApp.Enricher do use GenServer @impl true def handle_call({:enrich, data}, _from, state) do # If this HTTP call crashes, the entire GenServer dies! result = Task.async(fn -> HTTPClient.post!("/api/enrich", data) end) enriched = Task.await(result, 5_000) {:reply, {:ok, enriched}, state} end end ``` **Example — after:** ```elixir defmodule MyApp.Enricher do use GenServer @impl true def handle_call({:enrich, data}, from, state) do task = Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn -> HTTPClient.post!("/api/enrich", data) end) {:noreply, Map.put(state, task.ref, from)} end @impl true def handle_info({ref, result}, state) do Process.demonitor(ref, [:flush]) {from, state} = Map.pop(state, ref) GenServer.reply(from, {:ok, result}) {:noreply, state} end @impl true def handle_info({:DOWN, ref, :process, _pid, reason}, state) do {from, state} = Map.pop(state, ref) GenServer.reply(from, {:error, reason}) {:noreply, state} end end ``` ### When NOT to Use **Don't use this when:** - Task failure should crash the caller (you WANT linked failure propagation) - You're not inside a GenServer and can handle the crash in a try/rescue - The work is fast and synchronous is acceptable **Over-application example:** ```elixir # Using async_nolink for work that SHOULD crash the caller on failure defmodule MyApp.CriticalPayment do use GenServer def handle_call({:charge, card}, _from, state) do task = Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn -> PaymentGateway.charge!(card) end) # Now you have to manually handle the failure... # But if payment fails, maybe this GenServer SHOULD crash # to trigger a supervisor restart with clean state end end ``` **Better alternative:** ```elixir # If failure should crash the GenServer, use Task.async (linked) def handle_call({:charge, card}, _from, state) do result = Task.async(fn -> PaymentGateway.charge!(card) end) {:reply, Task.await(result), state} end ``` **Why:** `async_nolink` is for resilient, non-critical work. If the task's failure means your GenServer's state is invalid, you want the link — let it crash and restart clean. --- ## Pattern 9: Task Supervisor as DynamicSupervisor Specialization **Source:** [lib/elixir/lib/task/supervisor.ex#L151](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task/supervisor.ex#L151) (start_link implementation) **What it does:** `Task.Supervisor` is implemented directly on top of `DynamicSupervisor`. It stores default restart/shutdown settings in the process dictionary and delegates `init` to `DynamicSupervisor.init`. **Why:** Specialization without duplication. Task.Supervisor adds task-specific behavior (caller tracking, async/nolink patterns, stream support) on top of the generic dynamic supervision infrastructure. It's a compositional pattern — build specialized supervisors by wrapping the generic one. **Anti-pattern:** Re-implementing task supervision from scratch with a plain DynamicSupervisor + custom start logic. Use Task.Supervisor — it handles caller tracking, owner propagation, and proper shutdown. **Code example from source:** ```elixir # Task.Supervisor.start_link delegates to DynamicSupervisor def start_link(options \\ []) do {restart, options} = Keyword.pop(options, :restart) {shutdown, options} = Keyword.pop(options, :shutdown) keys = [:max_children, :max_seconds, :max_restarts] {sup_opts, start_opts} = Keyword.split(options, keys) restart_and_shutdown = {restart || :temporary, shutdown || 5000} DynamicSupervisor.start_link(__MODULE__, {restart_and_shutdown, sup_opts}, start_opts) end def init({{_restart, _shutdown} = arg, options}) do Process.put(__MODULE__, arg) DynamicSupervisor.init([strategy: :one_for_one] ++ options) end ``` ### When to Use **Triggers:** - You're building task infrastructure that needs proper shutdown and caller tracking - You want `async_nolink` + streaming + concurrency limiting for tasks - You need tasks to be supervised (restarted, tracked, shut down gracefully) **Example — before:** ```elixir # Rolling your own task management on DynamicSupervisor defmodule MyApp.Workers do def start_task(fun) do spec = %{id: make_ref(), start: {Task, :start_link, [fun]}, restart: :temporary} DynamicSupervisor.start_child(MyApp.WorkerSup, spec) end # No caller tracking, no async_nolink, no stream support # Must manually build all of that end ``` **Example — after:** ```elixir # Task.Supervisor gives you all of this for free defmodule MyApp.Application do def start(_type, _args) do children = [ {Task.Supervisor, name: MyApp.TaskSupervisor} ] Supervisor.start_link(children, strategy: :one_for_one) end end # Now you get: async, async_nolink, async_stream, start_child, etc. Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn -> work() end) ``` ### When NOT to Use **Don't use this when:** - Tasks are fire-and-forget and you don't need supervision (just `Task.start/1`) - You need custom child specs with complex init logic (use DynamicSupervisor directly) - You're spawning non-Task children (GenServers, Agents) **Over-application example:** ```elixir # Using Task.Supervisor to start GenServers — wrong tool Task.Supervisor.start_child(MyApp.TaskSupervisor, fn -> # This spawns a Task that starts a GenServer... awkward {:ok, _} = MyApp.Worker.start_link(args) Process.sleep(:infinity) # Keep the task alive?? end) ``` **Better alternative:** ```elixir # Use DynamicSupervisor for non-Task children DynamicSupervisor.start_child(MyApp.WorkerSupervisor, {MyApp.Worker, args}) ``` **Why:** Task.Supervisor is purpose-built for Task processes. It adds caller tracking, `$callers` propagation, and task-specific APIs. For anything that isn't a Task, use DynamicSupervisor. --- ## Pattern 10: Registry for Dynamic Process Naming and PubSub **Source:** [lib/elixir/lib/registry.ex#L1](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/registry.ex#L1) (module docs), [lib/elixir/lib/registry.ex#L250](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/registry.ex#L250) (whereis_name via callbacks) **What it does:** Registry provides two modes: - `:unique` keys — each key maps to exactly one process (name registry, process lookup) - `:duplicate` keys — each key maps to many processes (PubSub topics, event dispatch) Processes are automatically unregistered on death. Registry integrates with GenServer naming via `{:via, Registry, {registry, key}}`. **Why:** Solves the dynamic naming problem without atom leaks. Also provides local PubSub without external dependencies. The registry is partitioned for scalability and uses ETS for O(1) lookups. **Anti-pattern:** Building custom ETS-based process registries with manual cleanup on process death. Registry handles monitor-based cleanup automatically. **Code example from source (registry.ex :via callbacks):** ```elixir # :via integration — GenServer uses these callbacks def whereis_name({registry, key}), do: whereis_name(registry, key) def whereis_name({registry, key, _value}), do: whereis_name(registry, key) defp whereis_name(registry, key) do case key_info!(registry) do {:unique, partitions, key_ets} -> key_ets = key_ets || key_ets!(registry, key, partitions) case lookup_second(:unique, key_ets, key) do {pid, _} -> if Process.alive?(pid), do: pid, else: :undefined _ -> :undefined end end end ``` ### When to Use **Triggers:** - You need to look up processes by a dynamic key without atom leaks - You want local PubSub (subscribe/dispatch to topics) without external deps - You're building per-entity process pools (per-user, per-room, per-device) **Example — before:** ```elixir # Custom ETS-based registry with manual cleanup defmodule MyApp.ProcessRegistry do def register(key, pid) do ref = Process.monitor(pid) :ets.insert(:registry, {key, pid, ref}) end def lookup(key) do case :ets.lookup(:registry, key) do [{^key, pid, _ref}] -> {:ok, pid} [] -> :error end end # Must handle :DOWN manually to clean up dead entries def handle_info({:DOWN, ref, :process, pid, _reason}, state) do :ets.match_delete(:registry, {:_, pid, ref}) {:noreply, state} end end ``` **Example — after:** ```elixir # Registry handles all of this automatically # In supervision tree: {Registry, keys: :unique, name: MyApp.GameRegistry} # Registration happens via :via tuple — automatic cleanup on death defmodule MyApp.GameSession do use GenServer def start_link(game_id) do GenServer.start_link(__MODULE__, game_id, name: {:via, Registry, {MyApp.GameRegistry, game_id}}) end end ``` ### When NOT to Use **Don't use this when:** - You need distributed/cluster-wide process registration (use Horde, :global, or pg) - Process lookup is the hot path and you need sub-microsecond latency (direct PID passing) - You have a fixed set of processes known at compile time (atom names are simpler) **Over-application example:** ```elixir # Using Registry when you could just pass the PID directly defmodule MyApp.Pipeline do def process(data) do # Register a process just to look it up one line later... {:ok, pid} = MyApp.Worker.start_link(data) # Why not just use `pid` directly? worker = Registry.lookup(MyApp.Registry, data.id) |> List.first() GenServer.call(elem(worker, 0), :process) end end ``` **Better alternative:** ```elixir defmodule MyApp.Pipeline do def process(data) do {:ok, pid} = MyApp.Worker.start_link(data) GenServer.call(pid, :process) end end ``` **Why:** Registry shines when the looker-upper doesn't know the PID (arrived in a different request, different process tree). If you already have the PID, just use it directly. --- ## Pattern 11: Shutdown Semantics — Graceful Termination **Source:** [lib/elixir/lib/supervisor.ex#L156](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L156) (Shutdown values section) **What it does:** Three shutdown modes: - `:brutal_kill` — immediate `Process.exit(child, :kill)`, no cleanup - integer (ms) — send `:shutdown` signal, wait N ms, then `:kill` - `:infinity` — wait forever (default for supervisor children) Workers default to 5000ms. Supervisors default to `:infinity` (to give their children time). **Why:** Graceful shutdown enables cleanup (closing connections, flushing buffers, deregistering from services). The timeout prevents hung processes from blocking system shutdown indefinitely. The hierarchy matters: supervisors need infinite time because they're waiting for their own children to shut down. **Anti-pattern:** Setting `:brutal_kill` on processes that hold external resources (DB connections, file handles). They'll leak. Also: setting `:infinity` on worker processes — a bug in `terminate/2` will hang your entire shutdown. **Code example from source:** ```elixir # From supervisor.ex docs: # :brutal_kill - unconditional and immediate termination # integer >= 0 - wait that many ms after :shutdown signal # :infinity - wait forever (recommended for supervisors) # Worker default: 5000ms %{shutdown: 5_000, type: :worker} # Supervisor default: :infinity %{shutdown: :infinity, type: :supervisor} ``` ### When to Use **Triggers:** - You're deploying and need processes to flush buffers, close connections, or deregister - Child processes hold external resources that leak if killed immediately - Your system has a clean shutdown requirement (compliance, data integrity) **Example — before:** ```elixir # :brutal_kill on a process that writes to disk — data loss defmodule MyApp.WriteAheadLog do use GenServer, shutdown: :brutal_kill # BAD: loses buffered writes @impl true def terminate(_reason, state) do # This never runs with :brutal_kill! flush_buffer_to_disk(state.buffer) end end ``` **Example — after:** ```elixir defmodule MyApp.WriteAheadLog do use GenServer, shutdown: 10_000 # 10 seconds to flush @impl true def terminate(_reason, state) do flush_buffer_to_disk(state.buffer) close_file_handle(state.fd) end end ``` ### When NOT to Use **Don't use this when:** - Setting `:infinity` on worker processes — a bug in `terminate/2` hangs your entire shutdown - The process holds no external resources (default 5000ms is fine) - You're using `:brutal_kill` on supervisors (they need time to stop their children) **Over-application example:** ```elixir # :infinity shutdown on a worker — if terminate hangs, deployment hangs defmodule MyApp.Worker do use GenServer, shutdown: :infinity @impl true def terminate(_reason, state) do # If this HTTP call hangs forever, your entire app can't shut down HTTPClient.post!("/api/deregister", %{id: state.id}) end end ``` **Better alternative:** ```elixir defmodule MyApp.Worker do use GenServer, shutdown: 15_000 # Generous but bounded @impl true def terminate(_reason, state) do # Use a timeout on the cleanup call too Task.async(fn -> HTTPClient.post("/api/deregister", %{id: state.id}) end) |> Task.yield(10_000) end end ``` **Why:** `:infinity` is safe for supervisors (they're waiting for children) but dangerous for workers. A hung `terminate/2` with infinite shutdown blocks your entire deployment pipeline. --- ## Pattern 12: DynamicSupervisor Internal State — Struct with Restart Tracking **Source:** [lib/elixir/lib/dynamic_supervisor.ex#L165](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/dynamic_supervisor.ex#L165) (defstruct) **What it does:** The DynamicSupervisor uses a struct for its GenServer state with explicit fields: `children` (map of pid → child spec), `restarts` (list of timestamps for rate limiting), and configuration fields. **Why:** This shows the Elixir team's state design philosophy: use a struct with named fields, not a bare map or tuple. The `children` field uses a `%{}` map keyed by PID for O(1) lookup/deletion on child exit. The `restarts` list uses a simple sliding-window approach for restart intensity. **Anti-pattern:** Using a list for children lookup (O(n) on every EXIT message), or using a tuple-based state that requires positional knowledge. **Code example from source:** ```elixir defstruct [ :args, :extra_arguments, :mod, :name, :strategy, :max_children, :max_restarts, :max_seconds, children: %{}, restarts: [] ] ``` ### When to Use **Triggers:** - You need to understand the internal implementation of a supervisor for debugging - You're building a custom supervisor-like process - You want to understand why DynamicSupervisor uses a map keyed by PID **Example — before:** ```elixir # Using a list to track children — O(n) on every EXIT message defmodule MyApp.CustomSupervisor do use GenServer @impl true def init(_) do {:ok, %{children: []}} # List! Every EXIT scans the whole thing end def handle_info({:EXIT, pid, _reason}, state) do # O(n) scan to find and remove the dead child children = Enum.reject(state.children, fn {p, _spec} -> p == pid end) {:noreply, %{state | children: children}} end end ``` **Example — after:** ```elixir defmodule MyApp.CustomSupervisor do use GenServer defstruct children: %{}, restarts: [], max_restarts: 3, max_seconds: 5 @impl true def init(_) do Process.flag(:trap_exit, true) {:ok, %__MODULE__{}} end def handle_info({:EXIT, pid, _reason}, state) do # O(1) lookup and delete {_spec, children} = Map.pop(state.children, pid) {:noreply, %{state | children: children}} end end ``` ### When NOT to Use **Don't use this when:** - You're building a production supervisor (use Supervisor/DynamicSupervisor) - You don't need custom supervision logic (the standard supervisors cover 99% of cases) - You're optimizing prematurely — most apps never have enough children for data structure choice to matter **Over-application example:** ```elixir # Building a custom supervisor because "I want more control" defmodule MyApp.FancySupervisor do use GenServer # 200 lines of restart logic, child tracking, shutdown handling... # Congratulations, you've reimplemented DynamicSupervisor with more bugs end ``` **Better alternative:** ```elixir # Just use the standard one {DynamicSupervisor, name: MyApp.FancySupervisor, strategy: :one_for_one} ``` **Why:** The standard supervisors are battle-tested over decades. Build custom only when you need semantics they don't provide (e.g., priority-based restart, custom backoff). --- ## Pattern 13: Restart Logic with Exponential Backoff via `:try_again` **Source:** [lib/elixir/lib/dynamic_supervisor.ex#L710](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/dynamic_supervisor.ex#L710) (restart_child and related functions) **What it does:** When a child fails to restart (start function returns error), DynamicSupervisor doesn't give up. It stores the child as `{:restarting, child}`, sends itself a `:"$gen_restart"` message, and retries later. This prevents the supervisor from blocking on a transiently failing child. **Why:** During network partitions or resource exhaustion, a child might fail to start immediately but succeed seconds later. Instead of counting this as a restart (which would hit intensity limits), the supervisor retries asynchronously. The `:try_again` path is separate from the restart counter. **Anti-pattern:** Treating every start failure as a "restart" — this would exhaust `max_restarts` quickly during transient failures like port conflicts. **Code example from source:** ```elixir defp restart_child(:one_for_one, current_pid, child, state) do {{m, f, args} = mfa, restart, shutdown, type, modules} = child %{extra_arguments: extra} = state case start_child(m, f, extra ++ args) do {:ok, pid, _} -> state = delete_child(current_pid, state) {:ok, save_child(pid, mfa, restart, shutdown, type, modules, state)} {:ok, pid} -> state = delete_child(current_pid, state) {:ok, save_child(pid, mfa, restart, shutdown, type, modules, state)} :ignore -> {:ok, delete_child(current_pid, state)} {:error, reason} -> report_error(:start_error, reason, {:restarting, current_pid}, child, state) state = put_in(state.children[current_pid], {:restarting, child}) {:try_again, state} end end ``` ### When to Use **Triggers:** - A child fails to start due to transient conditions (port conflict, network partition) - You're seeing restart intensity limits hit because start failures count as restarts - You need a supervisor that tolerates temporary resource unavailability **Example — before:** ```elixir # Every start failure counts against restart intensity # 3 failures in 5 seconds → supervisor crashes → cascading failure defmodule MyApp.ConnectionPool do use GenServer @impl true def init(config) do # If DB is temporarily unreachable, this crashes... # ...which counts as a restart... # ...which can exhaust restart intensity {:ok, conn} = DBConnection.start_link(config) {:ok, %{conn: conn}} end end ``` **Example — after:** ```elixir defmodule MyApp.ConnectionPool do use GenServer @impl true def init(config) do # Use handle_continue for the connection attempt {:ok, %{config: config, conn: nil}, {:continue, :connect}} end @impl true def handle_continue(:connect, state) do case DBConnection.start_link(state.config) do {:ok, conn} -> {:noreply, %{state | conn: conn}} {:error, _reason} -> # Retry after delay without counting against restart intensity Process.send_after(self(), :retry_connect, 5_000) {:noreply, state} end end @impl true def handle_info(:retry_connect, state) do {:noreply, state, {:continue, :connect}} end end ``` ### When NOT to Use **Don't use this when:** - The start failure is deterministic (config error, missing module) — fix the bug - You're relying on automatic retry to avoid proper health checking - The child should NOT start at all if dependencies are unavailable (use `:ignore`) **Over-application example:** ```elixir # Retrying forever when the error is permanent defmodule MyApp.MisconfiguredWorker do @impl true def init(%{api_key: nil}) do # This will NEVER succeed — the key is nil! # Infinite retry just wastes resources Process.send_after(self(), :retry, 5_000) {:ok, %{}} end end ``` **Better alternative:** ```elixir defmodule MyApp.MisconfiguredWorker do @impl true def init(%{api_key: nil}) do # Fail fast on permanent configuration errors {:stop, {:error, :missing_api_key}} end end ``` **Why:** Retry logic is for transient failures (network, resource contention). For permanent errors (bad config, missing deps), fail fast so the operator can fix the actual problem. --- ## Pattern 14: `$ancestors` and `$callers` — Process Lineage Tracking **Source:** [lib/elixir/lib/task.ex#L227](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task.ex#L227) (Ancestor and Caller Tracking section) **What it does:** Elixir uses two process dictionary keys for lineage: - `$ancestors` — the supervision hierarchy (who spawned/supervises this process) - `$callers` — the logical call chain (who requested this work) These are different! A task's ancestor is its supervisor, but its caller is the process that initiated the async operation. **Why:** Debugging and tracing. When a task crashes, the log includes both its supervisor (for restart context) and its caller (for business logic context). This dual tracking is essential for understanding failures in systems where the spawner and supervisor are different processes. **Anti-pattern:** Ignoring caller tracking when building custom process spawning. If you build something like `Task.Supervisor`, propagate `$callers` so crash logs are meaningful. **Code example from source (task/supervisor.ex):** ```elixir defp get_callers(owner) do case :erlang.get(:"$callers") do [_ | _] = list -> [owner | list] _ -> [owner] end end # Task.start_link propagates both owner and callers def start_link(module, function, args) when is_atom(module) and is_atom(function) and is_list(args) do mfa = {module, function, args} Task.Supervised.start_link(get_owner(self()), get_callers(self()), mfa) end ``` ### When to Use **Triggers:** - You're debugging crashes and need to understand where a task was spawned from - You're building custom process spawning and want crash logs to show the call chain - You need to trace a request through multiple spawned processes **Example — before:** ```elixir # Custom spawner that loses caller context defmodule MyApp.BackgroundJob do def run_async(fun) do spawn_link(fn -> # When this crashes, the log shows no context about WHO spawned it fun.() end) end end # Crash log: # [error] Process #PID<0.234.0> raised an exception # ** (RuntimeError) something went wrong # No idea who called run_async or why! ``` **Example — after:** ```elixir defmodule MyApp.BackgroundJob do def run_async(fun) do owner = self() callers = case Process.get(:"$callers") do [_ | _] = list -> [owner | list] _ -> [owner] end spawn_link(fn -> Process.put(:"$callers", callers) fun.() end) end end # Crash log now shows the full caller chain: # [error] Process #PID<0.234.0> raised an exception # Callers: [#PID<0.200.0>, #PID<0.150.0>] ← who initiated this work ``` ### When NOT to Use **Don't use this when:** - You're using Task/Task.Supervisor (they propagate callers automatically) - The process is long-lived and the original caller is irrelevant after startup - You're spawning processes that outlive their callers (callers list becomes stale) **Over-application example:** ```elixir # Tracking callers for a permanent GenServer — pointless after init defmodule MyApp.Cache do use GenServer def start_link(_) do # The "caller" of start_link is the supervisor — not useful for debugging # After boot, the cache serves many callers — the original spawner is irrelevant GenServer.start_link(__MODULE__, [], name: __MODULE__) end end ``` **Better alternative:** ```elixir # For long-lived processes, use Logger.metadata or OpenTelemetry spans # to track per-request context, not process lineage def handle_call({:get, key}, _from, state) do Logger.metadata(request_id: Logger.metadata()[:request_id]) {:reply, Map.get(state, key), state} end ``` **Why:** `$callers` is useful for short-lived spawned work (tasks, one-shot processes). For long-lived services, per-request tracing (metadata, spans) is more appropriate than process lineage. --- ## Pattern 15: GenServer.reply/2 for Deferred Responses **Source:** [lib/elixir/lib/gen_server.ex#L620](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/gen_server.ex#L620) (callback docs), [lib/elixir/lib/gen_server.ex#L1328](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/gen_server.ex#L1328) (reply/2 function) **What it does:** A `handle_call` can return `{:noreply, state}` without replying, then later call `GenServer.reply(from, response)` from any process. This decouples request receipt from response delivery. **Why:** Three use cases (from the source): 1. Reply before returning (response known, but need to do cleanup after) 2. Reply after returning (response not yet available, computed asynchronously) 3. Reply from another process (delegate work to a task) This enables non-blocking request handling in GenServers that would otherwise be bottlenecked. **Anti-pattern:** Spawning a task to do work and then having the GenServer block on `Task.await` inside `handle_call`. This defeats the purpose — use `reply/2` from the task instead. **Code example from source:** ```elixir def handle_call(:reply_in_one_second, from, state) do Process.send_after(self(), {:reply, from}, 1_000) {:noreply, state} end def handle_info({:reply, from}, state) do GenServer.reply(from, :one_second_has_passed) {:noreply, state} end ``` ### When to Use **Triggers:** - A GenServer needs to do async work before replying (DB query, HTTP call, aggregation) - You want to reply from a different process than the one that received the request - You need to send intermediate progress and then a final response **Example — before:** ```elixir defmodule MyApp.Aggregator do use GenServer @impl true def handle_call(:aggregate, _from, state) do # Blocks the GenServer for potentially seconds # No other calls can be processed during this time result = Enum.reduce(state.sources, %{}, fn source, acc -> data = HTTPClient.get!(source.url).body Map.merge(acc, Jason.decode!(data)) end) {:reply, result, state} end end ``` **Example — after:** ```elixir defmodule MyApp.Aggregator do use GenServer @impl true def handle_call(:aggregate, from, state) do # Don't block — spawn the work and reply later Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn -> result = Enum.reduce(state.sources, %{}, fn source, acc -> data = HTTPClient.get!(source.url).body Map.merge(acc, Jason.decode!(data)) end) GenServer.reply(from, result) end) {:noreply, state} end end ``` ### When NOT to Use **Don't use this when:** - The work is fast (< 1ms) — just reply inline - You need the reply to be ordered with respect to other calls (deferred replies break ordering) - The `from` reference escapes to a long-lived process (it holds a monitor that should be cleaned up) **Over-application example:** ```elixir # Deferring reply for trivial work — unnecessary complexity defmodule MyApp.Counter do use GenServer @impl true def handle_call(:get, from, state) do # This is instant! Why defer? Task.start(fn -> GenServer.reply(from, state.count) end) {:noreply, state} end end ``` **Better alternative:** ```elixir defmodule MyApp.Counter do use GenServer @impl true def handle_call(:get, _from, state) do {:reply, state.count, state} end end ``` **Why:** `reply/2` enables non-blocking GenServers for expensive operations. For cheap operations, it adds process spawn overhead, potential ordering issues, and code complexity for no benefit. --- ## Pattern 16: Process.alias for Safe Request/Response **Source:** [lib/elixir/lib/process.ex#L32](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/process.ex#L32) (Aliases section) **What it does:** Process aliases (Erlang/OTP 24+) provide a deactivatable reference for receiving replies. After sending a request with an alias as the reply address, you can deactivate the alias if you no longer want the response — any messages sent to a deactivated alias are silently dropped. **Why:** Solves the "late reply" problem. In request/response patterns, if the requester times out and moves on, a late reply to its PID could confuse future `receive` blocks. With aliases, you deactivate after timeout and the late reply harmlessly vanishes. **Anti-pattern:** Using bare PIDs for reply addresses in protocols where timeouts are possible. Late messages pollute the mailbox. **Code example from source:** ```elixir server = spawn(&server/0) source_alias = Process.alias() send(server, {:ping, source_alias}) receive do :pong -> :pong end # Deactivate — late replies to this alias are silently dropped Process.unalias(source_alias) ``` ### When to Use **Triggers:** - You're building request/response patterns with timeouts where late replies pollute the mailbox - A GenServer sends a request and moves on after timeout, but the response arrives later - You need safe cancellation of pending responses **Example — before:** ```elixir defmodule MyApp.RequestRouter do use GenServer @impl true def handle_call({:request, payload}, _from, state) do send(state.backend, {:request, self(), payload}) receive do {:response, result} -> {:reply, result, state} after 5_000 -> # Timeout... but the response might still arrive later! # It'll sit in our mailbox and confuse future receives {:reply, {:error, :timeout}, state} end end end ``` **Example — after:** ```elixir defmodule MyApp.RequestRouter do use GenServer @impl true def handle_call({:request, payload}, _from, state) do alias_ref = Process.alias([:reply]) send(state.backend, {:request, alias_ref, payload}) receive do {^alias_ref, result} -> {:reply, result, state} after 5_000 -> # Deactivate the alias — late replies are silently dropped Process.unalias(alias_ref) {:reply, {:error, :timeout}, state} end end end ``` ### When NOT to Use **Don't use this when:** - You're using GenServer.call (it already handles this with its own ref-based protocol) - The response will always arrive (no timeout scenario) - You're on OTP < 24 (aliases aren't available) **Over-application example:** ```elixir # Using aliases for GenServer.call — it already handles late replies defmodule MyApp.Client do def get_data(server) do alias_ref = Process.alias([:reply]) # Pointless — GenServer.call already uses monitor-based protocol # that handles late replies correctly GenServer.call(server, {:get, alias_ref}) end end ``` **Better alternative:** ```elixir defmodule MyApp.Client do def get_data(server) do # GenServer.call already handles timeouts and late replies correctly GenServer.call(server, :get, 5_000) end end ``` **Why:** Aliases solve the problem for custom protocols where you build your own request/response. GenServer.call already has equivalent protections built in. Use aliases when you're implementing raw message-based protocols. --- ## Pattern 17: Registry Partitioning Strategies **Source:** [lib/elixir/lib/registry.ex#L310](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/registry.ex#L310) (start_link partitioning docs) **What it does:** Duplicate registries support two partitioning strategies: - `{:duplicate, :pid}` (default) — groups entries by the registering process's PID. Good for few keys with many entries (e.g., one PubSub topic with many subscribers). - `{:duplicate, :key}` — groups entries by key. Good for many keys with few entries each (e.g., many topics with few subscribers). **Why:** The partitioning strategy determines which partition(s) need to be scanned during lookup. With `:key` partitioning, a key lookup hits exactly one partition (O(1) partitions). With `:pid` partitioning, key lookups must scan all partitions but process-based operations (unregister on death) are localized. **Anti-pattern:** Using default `:pid` partitioning with millions of unique keys and frequent lookups. Each lookup scans all partitions. Switch to `{:duplicate, :key}`. **Code example from source:** ```elixir # Many topics, few subscribers each — use key partitioning Registry.start_link( keys: {:duplicate, :key}, name: MyApp.TopicRegistry, partitions: System.schedulers_online() ) # Few topics, many subscribers — use pid partitioning (default) Registry.start_link( keys: :duplicate, name: MyApp.BroadcastRegistry, partitions: System.schedulers_online() ) ``` ### When to Use **Triggers:** - You have a PubSub with many topics and few subscribers per topic — key lookups are slow - Profiling shows Registry.dispatch scanning many partitions for key-based lookups - You're choosing between "optimize for subscribe/unsubscribe" vs "optimize for dispatch" **Example — before:** ```elixir # Default :pid partitioning with many unique keys # Each dispatch must scan ALL partitions to find subscribers for a key Registry.start_link(keys: :duplicate, name: MyApp.Events) # With 16 partitions and 100k unique event types, # every dispatch scans 16 ETS tables Registry.dispatch(MyApp.Events, "order.created", fn entries -> for {pid, _} <- entries, do: send(pid, :notify) end) ``` **Example — after:** ```elixir # Key partitioning — dispatch hits exactly ONE partition per key Registry.start_link( keys: {:duplicate, :key}, name: MyApp.Events, partitions: System.schedulers_online() ) # Now dispatch only scans one ETS table — O(1) partitions Registry.dispatch(MyApp.Events, "order.created", fn entries -> for {pid, _} <- entries, do: send(pid, :notify) end) ``` ### When NOT to Use **Don't use this when:** - You have few keys with many subscribers (`:pid` partitioning is better for cleanup) - Process death cleanup is the hot path (`:key` partitioning must scan all partitions on death) - You're not hitting performance issues with the default (premature optimization) **Over-application example:** ```elixir # Key partitioning for a "presence" system where processes die frequently # Each death must scan ALL partitions to unregister Registry.start_link( keys: {:duplicate, :key}, name: MyApp.Presence, partitions: 16 ) # With 50k users connecting/disconnecting per second, # each disconnect scans 16 partitions — worse than default! ``` **Better alternative:** ```elixir # Pid partitioning — death cleanup is localized to one partition Registry.start_link( keys: :duplicate, name: MyApp.Presence, partitions: System.schedulers_online() ) ``` **Why:** Partitioning is a tradeoff. `:key` optimizes dispatch (one partition per lookup) at the cost of death cleanup (scan all). `:pid` optimizes death cleanup (one partition) at the cost of dispatch (scan all). Pick based on which operation is hotter. --- ## Pattern 18: `init/1` Return Values — The Full Spectrum **Source:** [lib/elixir/lib/gen_server.ex#L498](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/gen_server.ex#L498) (init callback spec) **What it does:** `init/1` supports five return values: - `{:ok, state}` — normal start - `{:ok, state, timeout}` — start with idle timeout - `{:ok, state, :hibernate}` — start and immediately hibernate (GC + compact heap) - `{:ok, state, {:continue, arg}}` — start then immediately invoke `handle_continue` - `:ignore` — don't start, supervisor treats as successful (child can be restarted later) - `{:stop, reason}` — initialization failed **Why:** Each covers a real scenario: - `:ignore` — process is disabled by configuration but might be enabled later via `Supervisor.restart_child/2` - `{:stop, reason}` — unrecoverable initialization failure - `:hibernate` — process will be idle for a long time, minimize memory - `{:continue, _}` — split fast init from slow setup **Anti-pattern:** Using `{:stop, reason}` when `:ignore` is appropriate. If a feature is disabled by config, `:ignore` keeps the child spec in the supervisor for later activation. `{:stop, reason}` signals a real failure. ### When to Use **Triggers:** - You need to communicate "don't start this child" without the supervisor treating it as failure - A feature is disabled by config but the child spec should remain for hot-enabling - A process discovers during init that it's a duplicate and should yield to the existing one **Example — before:** ```elixir defmodule MyApp.OptionalFeature do use GenServer @impl true def init(_) do if Application.get_env(:my_app, :feature_enabled) do {:ok, %{}} else # {:stop, :disabled} causes supervisor to count it as a failure! {:stop, :disabled} end end end ``` **Example — after:** ```elixir defmodule MyApp.OptionalFeature do use GenServer @impl true def init(_) do if Application.get_env(:my_app, :feature_enabled) do {:ok, %{}} else # :ignore — supervisor is happy, child spec stays for later activation :ignore end end end # Later, to enable: # Update config, then: # Supervisor.restart_child(MyApp.Supervisor, MyApp.OptionalFeature) ``` ### When NOT to Use **Don't use this when:** - The failure is real and should count toward restart intensity (use `{:stop, reason}`) - You want the supervisor to NOT have a child spec for this module (just don't add it) - The process should retry starting later automatically (use `{:stop, _}` + transient restart) **Over-application example:** ```elixir # Using :ignore for a real failure — hides the problem defmodule MyApp.DBConnection do @impl true def init(config) do case connect(config) do {:ok, conn} -> {:ok, conn} {:error, _} -> :ignore # BAD: DB is down but we pretend everything is fine end end end ``` **Better alternative:** ```elixir defmodule MyApp.DBConnection do @impl true def init(config) do case connect(config) do {:ok, conn} -> {:ok, conn} {:error, reason} -> {:stop, reason} # Let supervisor handle the failure end end end ``` **Why:** `:ignore` means "this child intentionally should not run right now." `{:stop, reason}` means "this child tried to start and failed." Conflating the two hides real failures from your supervision tree.