# Process Design Patterns — From the Elixir Source

Analysis of `lib/elixir/lib/supervisor.ex`, `lib/elixir/lib/dynamic_supervisor.ex`, `lib/elixir/lib/task.ex`, `lib/elixir/lib/task/supervisor.ex`, `lib/elixir/lib/process.ex`, and `lib/elixir/lib/registry.ex`.

---

## Pattern 1: Static vs Dynamic Supervision — Choose the Right Tool

**Source:** [lib/elixir/lib/supervisor.ex#L1](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L1) vs [lib/elixir/lib/dynamic_supervisor.ex#L1](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/dynamic_supervisor.ex#L1)

**What it does:** Elixir provides two distinct supervisor types:
- `Supervisor` — for **static** children known at compile time, started in a defined order
- `DynamicSupervisor` — for children started **on demand** at runtime, with no ordering guarantees

**Why:** Static supervisors guarantee startup order (critical for dependencies like "DB pool must start before web server"). Dynamic supervisors optimize for scale — they can hold millions of children using efficient data structures and shut down concurrently.

**Anti-pattern:** Using a `Supervisor` when children are created dynamically (e.g., one process per WebSocket connection). You'll hit performance issues and ordering semantics you don't need. Conversely, using `DynamicSupervisor` for fixed infrastructure (DB pool, PubSub) loses startup order guarantees.

**Code example from source (dynamic_supervisor.ex:1-15):**
```elixir
# DynamicSupervisor docs explain the distinction:
# "The Supervisor module was designed to handle mostly static children
#  that are started in the given order when the supervisor starts. A
#  DynamicSupervisor starts with no children. Instead, children are
#  started on demand via start_child/2 and there is no ordering between
#  children."
```

### When to Use

**Triggers:**
- You know at startup exactly which children need to run (DB pool, PubSub, caches)
- Children have ordering dependencies (pool must start before consumers)
- You're building application-level infrastructure in your supervision tree

**Example — before:**
```elixir
# Using DynamicSupervisor for fixed infrastructure — wrong tool
defmodule MyApp.Application do
  def start(_type, _args) do
    children = [{DynamicSupervisor, name: MyApp.InfraSupervisor}]
    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

# Manually starting fixed children after supervisor boots
DynamicSupervisor.start_child(MyApp.InfraSupervisor, MyApp.Repo)
DynamicSupervisor.start_child(MyApp.InfraSupervisor, MyApp.PubSub)
DynamicSupervisor.start_child(MyApp.InfraSupervisor, MyApp.Endpoint)
# No startup ordering guarantee!
```

**Example — after:**
```elixir
defmodule MyApp.Application do
  def start(_type, _args) do
    children = [
      MyApp.Repo,        # DB pool starts first
      MyApp.PubSub,      # PubSub starts after DB is ready
      MyApp.Endpoint     # Web server starts last
    ]
    Supervisor.start_link(children, strategy: :rest_for_one)
  end
end
```

### When NOT to Use

**Don't use this when:**
- Children are created on-demand (per-connection, per-request, per-user)
- The number of children is unbounded or varies significantly at runtime
- You don't need ordering guarantees between children

**Over-application example:**
```elixir
# Static supervisor for per-WebSocket connections — will be clunky
defmodule MyApp.ConnectionSupervisor do
  use Supervisor

  def init(_) do
    # Can't define children at compile time — they arrive at runtime!
    Supervisor.init([], strategy: :one_for_one)
  end

  # Awkward: using Supervisor for dynamic children
  def add_connection(socket) do
    spec = {ConnectionHandler, socket}
    Supervisor.start_child(__MODULE__, spec)
  end
end
```

**Better alternative:**
```elixir
defmodule MyApp.ConnectionSupervisor do
  use DynamicSupervisor

  def start_link(_), do: DynamicSupervisor.start_link(__MODULE__, [], name: __MODULE__)
  def init(_), do: DynamicSupervisor.init(strategy: :one_for_one)

  def add_connection(socket) do
    DynamicSupervisor.start_child(__MODULE__, {ConnectionHandler, socket})
  end
end
```

**Why:** Static supervisors excel at ordered, fixed infrastructure. DynamicSupervisor excels at scale with runtime-determined children. Pick based on whether children are known at compile time.

---

## Pattern 2: PartitionSupervisor for Scalability

**Source:** [lib/elixir/lib/dynamic_supervisor.ex#L60](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/dynamic_supervisor.ex#L60) and [lib/elixir/lib/task/supervisor.ex#L35](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task/supervisor.ex#L35)

**What it does:** Both `DynamicSupervisor` and `Task.Supervisor` document the same scalability pattern: when a single supervisor becomes a bottleneck, wrap it in a `PartitionSupervisor` which starts N instances (one per core by default) and routes via a key.

**Why:** A supervisor is a single process. Under heavy `start_child` load, it serializes all spawn operations. PartitionSupervisor distributes the load across multiple supervisor processes, using `self()` as the routing key to ensure each caller consistently hits the same partition.

**Anti-pattern:** Creating your own load-balancing logic for supervisors, or just accepting the bottleneck. The standard library provides this pattern explicitly.

**Code example from source (dynamic_supervisor.ex):**
```elixir
# Instead of a single DynamicSupervisor:
children = [
  {PartitionSupervisor,
   child_spec: DynamicSupervisor,
   name: MyApp.DynamicSupervisors}
]

# Start children through the partition supervisor:
DynamicSupervisor.start_child(
  {:via, PartitionSupervisor, {MyApp.DynamicSupervisors, self()}},
  {Counter, 0}
)
```

### When to Use

**Triggers:**
- A single DynamicSupervisor or Task.Supervisor is a bottleneck under high `start_child` load
- You're seeing latency spikes when spawning tasks/children under concurrency
- Profiling shows the supervisor process has a large message queue

**Example — before:**
```elixir
# Single supervisor — serializes all spawn operations
defmodule MyApp.TaskRunner do
  def run_async(fun) do
    Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fun)
  end
end

# Under 10k concurrent requests, this single process becomes a bottleneck
```

**Example — after:**
```elixir
# Partitioned — distributes load across N supervisor processes
defmodule MyApp.Application do
  def start(_type, _args) do
    children = [
      {PartitionSupervisor,
       child_spec: Task.Supervisor,
       name: MyApp.TaskSupervisors}
    ]
    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

defmodule MyApp.TaskRunner do
  def run_async(fun) do
    Task.Supervisor.async_nolink(
      {:via, PartitionSupervisor, {MyApp.TaskSupervisors, self()}},
      fun
    )
  end
end
```

### When NOT to Use

**Don't use this when:**
- You have low spawn rates (< 1000/sec) — a single supervisor is fine
- You need ordering guarantees between children (partitioning breaks ordering)
- The supervisor has few children total (partitioning adds overhead for no gain)

**Over-application example:**
```elixir
# Partitioning a supervisor that starts 5 children at boot — pointless
{PartitionSupervisor,
 child_spec: DynamicSupervisor,
 name: MyApp.ConfigSupervisors,
 partitions: System.schedulers_online()}
# 16 partitions for 5 children = massive overhead, zero benefit
```

**Better alternative:**
```elixir
# Just use a plain DynamicSupervisor
{DynamicSupervisor, name: MyApp.ConfigSupervisor}
```

**Why:** PartitionSupervisor exists for high-throughput spawn scenarios. If you're not hitting supervisor mailbox limits, the extra processes and routing logic add complexity without benefit.

---

## Pattern 3: Supervision Strategies — Choosing the Right Restart Behavior

**Source:** [lib/elixir/lib/supervisor.ex#L315](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L315) (Strategies section)

**What it does:** Three strategies model three dependency patterns:
- `:one_for_one` — independent children (crash of A doesn't affect B)
- `:one_for_all` — tightly coupled children (if one fails, all state is inconsistent)
- `:rest_for_one` — sequential dependencies (children started after the crashed one depend on it)

**Why:** These map directly to runtime dependency graphs. A connection pool and its consumers are `:rest_for_one` — consumers can't work without the pool. Multiple independent request handlers are `:one_for_one`. Workers sharing a cache are `:one_for_all` — stale cache state after a crash could cause inconsistency.

**Anti-pattern:** Defaulting to `:one_for_one` everywhere without thinking about dependencies. If process B depends on process A's state and A crashes, B will be working with stale assumptions.

**Code example from source (supervisor.ex docs):**
```elixir
# Independent workers — one crash doesn't affect others
Supervisor.start_link(children, strategy: :one_for_one)

# Tightly coupled — all must restart together for consistency
Supervisor.start_link(children, strategy: :one_for_all)

# Sequential dependency — later children depend on earlier ones
Supervisor.start_link(children, strategy: :rest_for_one)
```

### When to Use

**Triggers:**
- You're deciding how a supervisor should react when one child fails
- Children share state or resources that become inconsistent if one crashes
- You have a pipeline: A feeds B feeds C

**Example — before:**
```elixir
# Defaulting to :one_for_one without thinking about dependencies
defmodule MyApp.DataPipeline do
  use Supervisor

  def init(_) do
    children = [
      MyApp.DataSource,    # Produces data
      MyApp.Transformer,   # Transforms data (holds reference to DataSource)
      MyApp.Sink           # Writes transformed data
    ]
    # If DataSource crashes, Transformer has a stale reference!
    Supervisor.init(children, strategy: :one_for_one)
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.DataPipeline do
  use Supervisor

  def init(_) do
    children = [
      MyApp.DataSource,    # If this crashes...
      MyApp.Transformer,   # ...these must restart too (stale refs)
      MyApp.Sink
    ]
    # rest_for_one: crash of DataSource restarts Transformer and Sink
    Supervisor.init(children, strategy: :rest_for_one)
  end
end
```

### When NOT to Use

**Don't use this when:**
- Children are truly independent (HTTP request handlers, job workers)
- You're using `:one_for_all` because you're unsure — analyze dependencies first
- The restart strategy masks a design problem (maybe use separate supervision subtrees)

**Over-application example:**
```elixir
# one_for_all when children are actually independent
defmodule MyApp.Workers do
  use Supervisor

  def init(_) do
    children = [
      {MyApp.EmailWorker, []},
      {MyApp.SMSWorker, []},
      {MyApp.PushWorker, []}
    ]
    # If email crashes, why restart SMS and Push? They're independent!
    Supervisor.init(children, strategy: :one_for_all)
  end
end
```

**Better alternative:**
```elixir
defmodule MyApp.Workers do
  use Supervisor

  def init(_) do
    children = [
      {MyApp.EmailWorker, []},
      {MyApp.SMSWorker, []},
      {MyApp.PushWorker, []}
    ]
    # Independent workers — one crash doesn't affect others
    Supervisor.init(children, strategy: :one_for_one)
  end
end
```

**Why:** Restart strategies model dependency graphs. Using `:one_for_all` for independent workers causes unnecessary restarts, losing in-progress work for no benefit.

---

## Pattern 4: Restart Intensity (`max_restarts` / `max_seconds`)

**Source:** [lib/elixir/lib/supervisor.ex#L309](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L309), [lib/elixir/lib/dynamic_supervisor.ex#L730](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/dynamic_supervisor.ex#L730) (implementation)

**What it does:** Supervisors track restart frequency. If a child exceeds `max_restarts` within `max_seconds`, the supervisor itself shuts down (escalating the failure to its parent). Defaults: 3 restarts in 5 seconds.

**Why:** Prevents infinite restart loops that waste CPU and mask bugs. If a child keeps crashing within seconds, it's a systemic problem that the current supervisor level can't fix. Escalating to the parent allows a higher-level strategy to respond (perhaps restarting the entire subsystem with fresh state).

**Anti-pattern:** Setting `max_restarts` extremely high to "prevent crashes." This hides bugs and wastes resources. Let supervisors escalate — that's the point of the hierarchy.

**Code example from source (dynamic_supervisor.ex internal logic):**
```elixir
defp add_restart(state) do
  %{max_seconds: max_seconds, max_restarts: max_restarts, restarts: restarts} = state

  now = :erlang.monotonic_time(1)
  restarts = add_restart([now | restarts], now, max_seconds)
  state = %{state | restarts: restarts}

  if length(restarts) <= max_restarts do
    {:ok, state}
  else
    {:shutdown, state}
  end
end

defp add_restart(restarts, now, period) do
  for then <- restarts, now <= then + period, do: then
end
```

### When to Use

**Triggers:**
- You want to prevent infinite restart loops from burning CPU
- You're tuning a supervisor for a child that occasionally crashes under load
- You need the supervisor to escalate when a systemic problem prevents recovery

**Example — before:**
```elixir
# Default: 3 restarts in 5 seconds — might be too aggressive for flaky networks
defmodule MyApp.ExternalAPISupervisor do
  use Supervisor

  def init(_) do
    children = [{MyApp.APIClient, []}]
    # Default max_restarts: 3, max_seconds: 5
    # Network blip causes 3 crashes in 2 seconds → supervisor dies → app crashes
    Supervisor.init(children, strategy: :one_for_one)
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.ExternalAPISupervisor do
  use Supervisor

  def init(_) do
    children = [{MyApp.APIClient, []}]
    # Allow more restarts for transient network issues
    Supervisor.init(children,
      strategy: :one_for_one,
      max_restarts: 10,
      max_seconds: 60
    )
  end
end
```

### When NOT to Use

**Don't use this when:**
- You're setting max_restarts very high to "prevent crashes" — you're hiding bugs
- The child crash is deterministic (same input = same crash) — fix the bug instead
- You're relying on restart intensity as a backoff mechanism (use explicit backoff)

**Over-application example:**
```elixir
# Setting absurdly high restart limits to "never crash"
Supervisor.init(children,
  strategy: :one_for_one,
  max_restarts: 1000,
  max_seconds: 1
)
# This allows 1000 crashes per second — you'll burn CPU and hide bugs
```

**Better alternative:**
```elixir
# Reasonable limits + fix the underlying crash
Supervisor.init(children,
  strategy: :one_for_one,
  max_restarts: 5,
  max_seconds: 30
)
# If 5 crashes in 30 seconds isn't enough, the problem is the child, not the limit
```

**Why:** Restart intensity is a circuit breaker, not a throttle. It should escalate systemic failures, not suppress them. If you need aggressive restarts, your child has a bug.

---

## Pattern 5: Restart Values — `:permanent` vs `:transient` vs `:temporary`

**Source:** [lib/elixir/lib/supervisor.ex#L130](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L130) (Restart values section)

**What it does:** Three restart policies control what happens when a child terminates:
- `:permanent` — always restart (default for GenServer/Agent/Supervisor)
- `:transient` — restart only on abnormal exit (not `:normal`, `:shutdown`, `{:shutdown, term}`)
- `:temporary` — never restart (default for Task)

**Why:** Different processes have different lifecycle expectations. A database pool should always be running (`:permanent`). A task that computes a value and exits is done when it's done (`:temporary`). A connection process should restart on crash but not on graceful disconnect (`:transient`).

**Anti-pattern:** Making everything `:permanent`. If a one-shot task keeps restarting, it'll trigger restart intensity limits and take down the supervisor.

**Code example from source:**
```elixir
# Task defaults to :temporary — intentional one-shot work
# (from task.ex:282)
def child_spec(arg) do
  %{
    id: Task,
    start: {Task, :start_link, [arg]},
    restart: :temporary
  }
end

# Customize via use:
use GenServer, restart: :transient
```

### When to Use

**Triggers:**
- You have different process types with different lifecycle expectations
- One-shot tasks keep restarting and wasting resources
- A connection should gracefully disconnect without triggering restart

**Example — before:**
```elixir
# Everything is :permanent (default) — tasks restart forever
defmodule MyApp.BatchProcessor do
  use GenServer

  def handle_cast({:process, batch}, state) do
    Task.start_link(fn ->
      process_batch(batch)
      # Task exits :normal... and gets restarted by supervisor!
    end)
    {:noreply, state}
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.BatchTask do
  use Task, restart: :temporary  # Don't restart completed tasks

  def start_link(batch) do
    Task.start_link(__MODULE__, :run, [batch])
  end

  def run(batch), do: process_batch(batch)
end

defmodule MyApp.ConnectionWorker do
  use GenServer, restart: :transient  # Restart on crash, not graceful disconnect

  def disconnect(pid) do
    GenServer.stop(pid, :normal)  # Won't trigger restart
  end
end
```

### When NOT to Use

**Don't use this when:**
- You're using `:temporary` to avoid fixing a crash (the child just stays dead)
- You set everything to `:transient` without thinking — `:permanent` is usually right for services

**Over-application example:**
```elixir
# Making a critical service :temporary so it "doesn't bother the supervisor"
defmodule MyApp.PaymentProcessor do
  use GenServer, restart: :temporary
  # If this crashes, it stays dead! Payments stop working silently!
end
```

**Better alternative:**
```elixir
defmodule MyApp.PaymentProcessor do
  use GenServer, restart: :permanent
  # Critical services should always restart — that's the whole point
end
```

**Why:** `:permanent` is the safe default for anything that should "always be running." Only use `:transient` for processes that have a valid "done" state, and `:temporary` for truly one-shot work.

---

## Pattern 6: Automatic Shutdown for Pipeline Supervisors

**Source:** [lib/elixir/lib/supervisor.ex#L349](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L349) (Automatic shutdown section)

**What it does:** Supervisors support `:auto_shutdown` which terminates the supervisor when significant children exit. Options: `:any_significant` (first significant child exits → shutdown) or `:all_significant` (all significant children must exit → shutdown).

**Why:** Models pipeline/workflow patterns where a supervisor's purpose is tied to its children's work. If all significant workers finish, the supervisor should clean up. This is useful for batch processing supervisors or connection-scoped process groups.

**Anti-pattern:** Manually monitoring children and calling `Supervisor.stop/1`. The automatic shutdown mechanism handles this cleanly within OTP semantics.

**Code example (from docs):**
```elixir
# Only :transient and :temporary children can be marked significant
children = [
  Supervisor.child_spec({BatchWorker, args}, significant: true, restart: :transient)
]

Supervisor.start_link(children,
  strategy: :one_for_one,
  auto_shutdown: :all_significant
)
```

### When to Use

**Triggers:**
- You have a supervisor managing a batch/pipeline where completion means "job done"
- A supervisor's existence only makes sense while its children are doing work
- You're building a workflow that should self-terminate

**Example — before:**
```elixir
# Manual cleanup when batch workers finish
defmodule MyApp.BatchSupervisor do
  use DynamicSupervisor

  def all_done?(supervisor) do
    # Polling... ugly
    DynamicSupervisor.count_children(supervisor).active == 0
  end
end

# Somewhere else, a monitor process watches and cleans up
defmodule MyApp.BatchMonitor do
  use GenServer

  def handle_info(:check, state) do
    if MyApp.BatchSupervisor.all_done?(state.sup) do
      Supervisor.stop(state.sup)
    end
    {:noreply, state}
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.BatchSupervisor do
  use Supervisor

  def start_link(tasks) do
    Supervisor.start_link(__MODULE__, tasks)
  end

  def init(tasks) do
    children =
      Enum.map(tasks, fn task ->
        Supervisor.child_spec({MyApp.BatchWorker, task},
          id: task.id, restart: :transient, significant: true)
      end)

    Supervisor.init(children,
      strategy: :one_for_one,
      auto_shutdown: :all_significant
    )
  end
end
# When all workers complete normally, supervisor shuts down automatically
```

### When NOT to Use

**Don't use this when:**
- Children are long-lived services that should never "complete"
- You want the supervisor to keep running even after children exit (for later restarts)
- Children have `:permanent` restart — they can't be `:significant`

**Over-application example:**
```elixir
# auto_shutdown on infrastructure supervisor — it'll die when any child exits!
children = [
  Supervisor.child_spec(MyApp.Cache, significant: true, restart: :transient),
  MyApp.WebServer
]
Supervisor.init(children,
  strategy: :one_for_one,
  auto_shutdown: :any_significant
)
# If Cache restarts as :transient and exits :normal once, the WHOLE supervisor dies
```

**Better alternative:**
```elixir
# Infrastructure supervisors should NOT auto-shutdown
Supervisor.init(children, strategy: :one_for_one)
# Only use auto_shutdown for workflow/batch supervisors with finite lifetimes
```

**Why:** `auto_shutdown` models "this supervisor's job is done when its children finish." It's for finite work, not long-lived services.

---

## Pattern 7: Task.async/await for Concurrent Value Computation

**Source:** [lib/elixir/lib/task.ex#L1](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task.ex#L1) and [lib/elixir/lib/task.ex#L300](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task.ex#L300)

**What it does:** `Task.async` spawns a linked, monitored process and returns a `%Task{}` struct. `Task.await` blocks until the result arrives or times out. This is the canonical pattern for "compute a value concurrently."

**Why:** Tasks provide structured concurrency — the caller is linked to the task, so failures propagate naturally. The monitor reference enables safe await with timeout. This is explicit and composable unlike raw `spawn_link` + receive.

**Anti-pattern:** Using `spawn_link` + `send` + `receive` for one-shot concurrent computation. You lose: error propagation, structured monitoring, timeout handling, and the caller-tracking metadata that tasks provide.

**Code example from source:**
```elixir
task = Task.async(fn -> do_some_work() end)
res = do_some_other_work()
res + Task.await(task)
```

Key constraint from docs: "If you start an async, you **must await**. This is either done by calling `Task.await/2` or `Task.yield/2` followed by `Task.shutdown/2`."

### When to Use

**Triggers:**
- You need to compute a value concurrently and use it in the current flow
- Multiple independent computations can run in parallel to reduce latency
- The current process should crash if the computation fails (linked failure)

**Example — before:**
```elixir
# Sequential — total time = sum of all operations
def build_dashboard(user_id) do
  profile = fetch_profile(user_id)        # 200ms
  orders = fetch_recent_orders(user_id)   # 300ms
  recommendations = compute_recs(user_id) # 500ms
  # Total: 1000ms

  %{profile: profile, orders: orders, recommendations: recommendations}
end
```

**Example — after:**
```elixir
# Concurrent — total time = max of all operations
def build_dashboard(user_id) do
  profile_task = Task.async(fn -> fetch_profile(user_id) end)
  orders_task = Task.async(fn -> fetch_recent_orders(user_id) end)
  recs_task = Task.async(fn -> compute_recs(user_id) end)

  %{
    profile: Task.await(profile_task),
    orders: Task.await(orders_task),
    recommendations: Task.await(recs_task)
  }
  # Total: ~500ms (limited by slowest task)
end
```

### When NOT to Use

**Don't use this when:**
- The task might fail and you don't want the caller to crash (use `async_nolink`)
- You're inside a GenServer and can't block on `await` (use `async_nolink` + `handle_info`)
- The computation is trivial (< 1ms) — spawning a process adds overhead

**Over-application example:**
```elixir
# Spawning a task for trivial work — overhead exceeds benefit
def format_name(user) do
  task = Task.async(fn -> String.upcase(user.name) end)
  Task.await(task)
  # Process spawn + message passing for a microsecond operation!
end
```

**Better alternative:**
```elixir
def format_name(user) do
  String.upcase(user.name)
end
```

**Why:** Tasks add process spawn overhead (~2-5μs) plus message passing. Only use them when the work is expensive enough to justify parallelism — typically >1ms or when running multiple operations concurrently.

---

## Pattern 8: Task.Supervisor.async_nolink for Fault-Tolerant Task Execution

**Source:** [lib/elixir/lib/task/supervisor.ex#L240](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task/supervisor.ex#L240) (async_nolink docs with GenServer example)

**What it does:** Unlike `Task.async`, `async_nolink` spawns a task that is NOT linked to the caller. The caller monitors it and handles success/failure via `handle_info`. This prevents a task crash from killing the caller.

**Why:** In a GenServer, you often want to spawn work that might fail without taking down the server. The pattern: spawn with `async_nolink`, receive the result as `{ref, answer}`, and handle failure as `{:DOWN, ref, :process, _pid, reason}`.

**Anti-pattern:** Using `Task.async` inside a GenServer when the task might fail. The link means the GenServer crashes too. Use `async_nolink` + `handle_info` for resilient concurrent work.

**Code example from source (task/supervisor.ex):**
```elixir
defmodule MyApp.Server do
  use GenServer

  def handle_call(:start_task, _from, %{ref: nil} = state) do
    task =
      Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn ->
        # potentially failing work
      end)

    {:reply, :ok, %{state | ref: task.ref}}
  end

  # Task completed successfully
  def handle_info({ref, answer}, %{ref: ref} = state) do
    Process.demonitor(ref, [:flush])
    {:noreply, %{state | ref: nil}}
  end

  # Task failed
  def handle_info({:DOWN, ref, :process, _pid, _reason}, %{ref: ref} = state) do
    {:noreply, %{state | ref: nil}}
  end
end
```

### When to Use

**Triggers:**
- A GenServer needs to spawn work that might fail without crashing the server
- You're building a "request/response with timeout" pattern inside a GenServer
- External calls (HTTP, DB) from a GenServer should be non-blocking and resilient

**Example — before:**
```elixir
defmodule MyApp.Enricher do
  use GenServer

  @impl true
  def handle_call({:enrich, data}, _from, state) do
    # If this HTTP call crashes, the entire GenServer dies!
    result = Task.async(fn -> HTTPClient.post!("/api/enrich", data) end)
    enriched = Task.await(result, 5_000)
    {:reply, {:ok, enriched}, state}
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.Enricher do
  use GenServer

  @impl true
  def handle_call({:enrich, data}, from, state) do
    task = Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn ->
      HTTPClient.post!("/api/enrich", data)
    end)
    {:noreply, Map.put(state, task.ref, from)}
  end

  @impl true
  def handle_info({ref, result}, state) do
    Process.demonitor(ref, [:flush])
    {from, state} = Map.pop(state, ref)
    GenServer.reply(from, {:ok, result})
    {:noreply, state}
  end

  @impl true
  def handle_info({:DOWN, ref, :process, _pid, reason}, state) do
    {from, state} = Map.pop(state, ref)
    GenServer.reply(from, {:error, reason})
    {:noreply, state}
  end
end
```

### When NOT to Use

**Don't use this when:**
- Task failure should crash the caller (you WANT linked failure propagation)
- You're not inside a GenServer and can handle the crash in a try/rescue
- The work is fast and synchronous is acceptable

**Over-application example:**
```elixir
# Using async_nolink for work that SHOULD crash the caller on failure
defmodule MyApp.CriticalPayment do
  use GenServer

  def handle_call({:charge, card}, _from, state) do
    task = Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn ->
      PaymentGateway.charge!(card)
    end)
    # Now you have to manually handle the failure...
    # But if payment fails, maybe this GenServer SHOULD crash
    # to trigger a supervisor restart with clean state
  end
end
```

**Better alternative:**
```elixir
# If failure should crash the GenServer, use Task.async (linked)
def handle_call({:charge, card}, _from, state) do
  result = Task.async(fn -> PaymentGateway.charge!(card) end)
  {:reply, Task.await(result), state}
end
```

**Why:** `async_nolink` is for resilient, non-critical work. If the task's failure means your GenServer's state is invalid, you want the link — let it crash and restart clean.

---

## Pattern 9: Task Supervisor as DynamicSupervisor Specialization

**Source:** [lib/elixir/lib/task/supervisor.ex#L151](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task/supervisor.ex#L151) (start_link implementation)

**What it does:** `Task.Supervisor` is implemented directly on top of `DynamicSupervisor`. It stores default restart/shutdown settings in the process dictionary and delegates `init` to `DynamicSupervisor.init`.

**Why:** Specialization without duplication. Task.Supervisor adds task-specific behavior (caller tracking, async/nolink patterns, stream support) on top of the generic dynamic supervision infrastructure. It's a compositional pattern — build specialized supervisors by wrapping the generic one.

**Anti-pattern:** Re-implementing task supervision from scratch with a plain DynamicSupervisor + custom start logic. Use Task.Supervisor — it handles caller tracking, owner propagation, and proper shutdown.

**Code example from source:**
```elixir
# Task.Supervisor.start_link delegates to DynamicSupervisor
def start_link(options \\ []) do
  {restart, options} = Keyword.pop(options, :restart)
  {shutdown, options} = Keyword.pop(options, :shutdown)
  keys = [:max_children, :max_seconds, :max_restarts]
  {sup_opts, start_opts} = Keyword.split(options, keys)
  restart_and_shutdown = {restart || :temporary, shutdown || 5000}
  DynamicSupervisor.start_link(__MODULE__, {restart_and_shutdown, sup_opts}, start_opts)
end

def init({{_restart, _shutdown} = arg, options}) do
  Process.put(__MODULE__, arg)
  DynamicSupervisor.init([strategy: :one_for_one] ++ options)
end
```

### When to Use

**Triggers:**
- You're building task infrastructure that needs proper shutdown and caller tracking
- You want `async_nolink` + streaming + concurrency limiting for tasks
- You need tasks to be supervised (restarted, tracked, shut down gracefully)

**Example — before:**
```elixir
# Rolling your own task management on DynamicSupervisor
defmodule MyApp.Workers do
  def start_task(fun) do
    spec = %{id: make_ref(), start: {Task, :start_link, [fun]}, restart: :temporary}
    DynamicSupervisor.start_child(MyApp.WorkerSup, spec)
  end

  # No caller tracking, no async_nolink, no stream support
  # Must manually build all of that
end
```

**Example — after:**
```elixir
# Task.Supervisor gives you all of this for free
defmodule MyApp.Application do
  def start(_type, _args) do
    children = [
      {Task.Supervisor, name: MyApp.TaskSupervisor}
    ]
    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

# Now you get: async, async_nolink, async_stream, start_child, etc.
Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn -> work() end)
```

### When NOT to Use

**Don't use this when:**
- Tasks are fire-and-forget and you don't need supervision (just `Task.start/1`)
- You need custom child specs with complex init logic (use DynamicSupervisor directly)
- You're spawning non-Task children (GenServers, Agents)

**Over-application example:**
```elixir
# Using Task.Supervisor to start GenServers — wrong tool
Task.Supervisor.start_child(MyApp.TaskSupervisor, fn ->
  # This spawns a Task that starts a GenServer... awkward
  {:ok, _} = MyApp.Worker.start_link(args)
  Process.sleep(:infinity)  # Keep the task alive??
end)
```

**Better alternative:**
```elixir
# Use DynamicSupervisor for non-Task children
DynamicSupervisor.start_child(MyApp.WorkerSupervisor, {MyApp.Worker, args})
```

**Why:** Task.Supervisor is purpose-built for Task processes. It adds caller tracking, `$callers` propagation, and task-specific APIs. For anything that isn't a Task, use DynamicSupervisor.

---

## Pattern 10: Registry for Dynamic Process Naming and PubSub

**Source:** [lib/elixir/lib/registry.ex#L1](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/registry.ex#L1) (module docs), [lib/elixir/lib/registry.ex#L250](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/registry.ex#L250) (whereis_name via callbacks)

**What it does:** Registry provides two modes:
- `:unique` keys — each key maps to exactly one process (name registry, process lookup)
- `:duplicate` keys — each key maps to many processes (PubSub topics, event dispatch)

Processes are automatically unregistered on death. Registry integrates with GenServer naming via `{:via, Registry, {registry, key}}`.

**Why:** Solves the dynamic naming problem without atom leaks. Also provides local PubSub without external dependencies. The registry is partitioned for scalability and uses ETS for O(1) lookups.

**Anti-pattern:** Building custom ETS-based process registries with manual cleanup on process death. Registry handles monitor-based cleanup automatically.

**Code example from source (registry.ex :via callbacks):**
```elixir
# :via integration — GenServer uses these callbacks
def whereis_name({registry, key}), do: whereis_name(registry, key)
def whereis_name({registry, key, _value}), do: whereis_name(registry, key)

defp whereis_name(registry, key) do
  case key_info!(registry) do
    {:unique, partitions, key_ets} ->
      key_ets = key_ets || key_ets!(registry, key, partitions)
      case lookup_second(:unique, key_ets, key) do
        {pid, _} ->
          if Process.alive?(pid), do: pid, else: :undefined
        _ ->
          :undefined
      end
  end
end
```

### When to Use

**Triggers:**
- You need to look up processes by a dynamic key without atom leaks
- You want local PubSub (subscribe/dispatch to topics) without external deps
- You're building per-entity process pools (per-user, per-room, per-device)

**Example — before:**
```elixir
# Custom ETS-based registry with manual cleanup
defmodule MyApp.ProcessRegistry do
  def register(key, pid) do
    ref = Process.monitor(pid)
    :ets.insert(:registry, {key, pid, ref})
  end

  def lookup(key) do
    case :ets.lookup(:registry, key) do
      [{^key, pid, _ref}] -> {:ok, pid}
      [] -> :error
    end
  end

  # Must handle :DOWN manually to clean up dead entries
  def handle_info({:DOWN, ref, :process, pid, _reason}, state) do
    :ets.match_delete(:registry, {:_, pid, ref})
    {:noreply, state}
  end
end
```

**Example — after:**
```elixir
# Registry handles all of this automatically
# In supervision tree:
{Registry, keys: :unique, name: MyApp.GameRegistry}

# Registration happens via :via tuple — automatic cleanup on death
defmodule MyApp.GameSession do
  use GenServer

  def start_link(game_id) do
    GenServer.start_link(__MODULE__, game_id,
      name: {:via, Registry, {MyApp.GameRegistry, game_id}})
  end
end
```

### When NOT to Use

**Don't use this when:**
- You need distributed/cluster-wide process registration (use Horde, :global, or pg)
- Process lookup is the hot path and you need sub-microsecond latency (direct PID passing)
- You have a fixed set of processes known at compile time (atom names are simpler)

**Over-application example:**
```elixir
# Using Registry when you could just pass the PID directly
defmodule MyApp.Pipeline do
  def process(data) do
    # Register a process just to look it up one line later...
    {:ok, pid} = MyApp.Worker.start_link(data)
    # Why not just use `pid` directly?
    worker = Registry.lookup(MyApp.Registry, data.id) |> List.first()
    GenServer.call(elem(worker, 0), :process)
  end
end
```

**Better alternative:**
```elixir
defmodule MyApp.Pipeline do
  def process(data) do
    {:ok, pid} = MyApp.Worker.start_link(data)
    GenServer.call(pid, :process)
  end
end
```

**Why:** Registry shines when the looker-upper doesn't know the PID (arrived in a different request, different process tree). If you already have the PID, just use it directly.

---

## Pattern 11: Shutdown Semantics — Graceful Termination

**Source:** [lib/elixir/lib/supervisor.ex#L156](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/supervisor.ex#L156) (Shutdown values section)

**What it does:** Three shutdown modes:
- `:brutal_kill` — immediate `Process.exit(child, :kill)`, no cleanup
- integer (ms) — send `:shutdown` signal, wait N ms, then `:kill`
- `:infinity` — wait forever (default for supervisor children)

Workers default to 5000ms. Supervisors default to `:infinity` (to give their children time).

**Why:** Graceful shutdown enables cleanup (closing connections, flushing buffers, deregistering from services). The timeout prevents hung processes from blocking system shutdown indefinitely. The hierarchy matters: supervisors need infinite time because they're waiting for their own children to shut down.

**Anti-pattern:** Setting `:brutal_kill` on processes that hold external resources (DB connections, file handles). They'll leak. Also: setting `:infinity` on worker processes — a bug in `terminate/2` will hang your entire shutdown.

**Code example from source:**
```elixir
# From supervisor.ex docs:
# :brutal_kill - unconditional and immediate termination
# integer >= 0 - wait that many ms after :shutdown signal
# :infinity - wait forever (recommended for supervisors)

# Worker default: 5000ms
%{shutdown: 5_000, type: :worker}

# Supervisor default: :infinity
%{shutdown: :infinity, type: :supervisor}
```

### When to Use

**Triggers:**
- You're deploying and need processes to flush buffers, close connections, or deregister
- Child processes hold external resources that leak if killed immediately
- Your system has a clean shutdown requirement (compliance, data integrity)

**Example — before:**
```elixir
# :brutal_kill on a process that writes to disk — data loss
defmodule MyApp.WriteAheadLog do
  use GenServer, shutdown: :brutal_kill  # BAD: loses buffered writes

  @impl true
  def terminate(_reason, state) do
    # This never runs with :brutal_kill!
    flush_buffer_to_disk(state.buffer)
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.WriteAheadLog do
  use GenServer, shutdown: 10_000  # 10 seconds to flush

  @impl true
  def terminate(_reason, state) do
    flush_buffer_to_disk(state.buffer)
    close_file_handle(state.fd)
  end
end
```

### When NOT to Use

**Don't use this when:**
- Setting `:infinity` on worker processes — a bug in `terminate/2` hangs your entire shutdown
- The process holds no external resources (default 5000ms is fine)
- You're using `:brutal_kill` on supervisors (they need time to stop their children)

**Over-application example:**
```elixir
# :infinity shutdown on a worker — if terminate hangs, deployment hangs
defmodule MyApp.Worker do
  use GenServer, shutdown: :infinity

  @impl true
  def terminate(_reason, state) do
    # If this HTTP call hangs forever, your entire app can't shut down
    HTTPClient.post!("/api/deregister", %{id: state.id})
  end
end
```

**Better alternative:**
```elixir
defmodule MyApp.Worker do
  use GenServer, shutdown: 15_000  # Generous but bounded

  @impl true
  def terminate(_reason, state) do
    # Use a timeout on the cleanup call too
    Task.async(fn -> HTTPClient.post("/api/deregister", %{id: state.id}) end)
    |> Task.yield(10_000)
  end
end
```

**Why:** `:infinity` is safe for supervisors (they're waiting for children) but dangerous for workers. A hung `terminate/2` with infinite shutdown blocks your entire deployment pipeline.

---

## Pattern 12: DynamicSupervisor Internal State — Struct with Restart Tracking

**Source:** [lib/elixir/lib/dynamic_supervisor.ex#L165](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/dynamic_supervisor.ex#L165) (defstruct)

**What it does:** The DynamicSupervisor uses a struct for its GenServer state with explicit fields: `children` (map of pid → child spec), `restarts` (list of timestamps for rate limiting), and configuration fields.

**Why:** This shows the Elixir team's state design philosophy: use a struct with named fields, not a bare map or tuple. The `children` field uses a `%{}` map keyed by PID for O(1) lookup/deletion on child exit. The `restarts` list uses a simple sliding-window approach for restart intensity.

**Anti-pattern:** Using a list for children lookup (O(n) on every EXIT message), or using a tuple-based state that requires positional knowledge.

**Code example from source:**
```elixir
defstruct [
  :args,
  :extra_arguments,
  :mod,
  :name,
  :strategy,
  :max_children,
  :max_restarts,
  :max_seconds,
  children: %{},
  restarts: []
]
```

### When to Use

**Triggers:**
- You need to understand the internal implementation of a supervisor for debugging
- You're building a custom supervisor-like process
- You want to understand why DynamicSupervisor uses a map keyed by PID

**Example — before:**
```elixir
# Using a list to track children — O(n) on every EXIT message
defmodule MyApp.CustomSupervisor do
  use GenServer

  @impl true
  def init(_) do
    {:ok, %{children: []}}  # List! Every EXIT scans the whole thing
  end

  def handle_info({:EXIT, pid, _reason}, state) do
    # O(n) scan to find and remove the dead child
    children = Enum.reject(state.children, fn {p, _spec} -> p == pid end)
    {:noreply, %{state | children: children}}
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.CustomSupervisor do
  use GenServer

  defstruct children: %{}, restarts: [], max_restarts: 3, max_seconds: 5

  @impl true
  def init(_) do
    Process.flag(:trap_exit, true)
    {:ok, %__MODULE__{}}
  end

  def handle_info({:EXIT, pid, _reason}, state) do
    # O(1) lookup and delete
    {_spec, children} = Map.pop(state.children, pid)
    {:noreply, %{state | children: children}}
  end
end
```

### When NOT to Use

**Don't use this when:**
- You're building a production supervisor (use Supervisor/DynamicSupervisor)
- You don't need custom supervision logic (the standard supervisors cover 99% of cases)
- You're optimizing prematurely — most apps never have enough children for data structure choice to matter

**Over-application example:**
```elixir
# Building a custom supervisor because "I want more control"
defmodule MyApp.FancySupervisor do
  use GenServer
  # 200 lines of restart logic, child tracking, shutdown handling...
  # Congratulations, you've reimplemented DynamicSupervisor with more bugs
end
```

**Better alternative:**
```elixir
# Just use the standard one
{DynamicSupervisor, name: MyApp.FancySupervisor, strategy: :one_for_one}
```

**Why:** The standard supervisors are battle-tested over decades. Build custom only when you need semantics they don't provide (e.g., priority-based restart, custom backoff).

---

## Pattern 13: Restart Logic with Exponential Backoff via `:try_again`

**Source:** [lib/elixir/lib/dynamic_supervisor.ex#L710](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/dynamic_supervisor.ex#L710) (restart_child and related functions)

**What it does:** When a child fails to restart (start function returns error), DynamicSupervisor doesn't give up. It stores the child as `{:restarting, child}`, sends itself a `:"$gen_restart"` message, and retries later. This prevents the supervisor from blocking on a transiently failing child.

**Why:** During network partitions or resource exhaustion, a child might fail to start immediately but succeed seconds later. Instead of counting this as a restart (which would hit intensity limits), the supervisor retries asynchronously. The `:try_again` path is separate from the restart counter.

**Anti-pattern:** Treating every start failure as a "restart" — this would exhaust `max_restarts` quickly during transient failures like port conflicts.

**Code example from source:**
```elixir
defp restart_child(:one_for_one, current_pid, child, state) do
  {{m, f, args} = mfa, restart, shutdown, type, modules} = child
  %{extra_arguments: extra} = state

  case start_child(m, f, extra ++ args) do
    {:ok, pid, _} ->
      state = delete_child(current_pid, state)
      {:ok, save_child(pid, mfa, restart, shutdown, type, modules, state)}

    {:ok, pid} ->
      state = delete_child(current_pid, state)
      {:ok, save_child(pid, mfa, restart, shutdown, type, modules, state)}

    :ignore ->
      {:ok, delete_child(current_pid, state)}

    {:error, reason} ->
      report_error(:start_error, reason, {:restarting, current_pid}, child, state)
      state = put_in(state.children[current_pid], {:restarting, child})
      {:try_again, state}
  end
end
```

### When to Use

**Triggers:**
- A child fails to start due to transient conditions (port conflict, network partition)
- You're seeing restart intensity limits hit because start failures count as restarts
- You need a supervisor that tolerates temporary resource unavailability

**Example — before:**
```elixir
# Every start failure counts against restart intensity
# 3 failures in 5 seconds → supervisor crashes → cascading failure
defmodule MyApp.ConnectionPool do
  use GenServer

  @impl true
  def init(config) do
    # If DB is temporarily unreachable, this crashes...
    # ...which counts as a restart...
    # ...which can exhaust restart intensity
    {:ok, conn} = DBConnection.start_link(config)
    {:ok, %{conn: conn}}
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.ConnectionPool do
  use GenServer

  @impl true
  def init(config) do
    # Use handle_continue for the connection attempt
    {:ok, %{config: config, conn: nil}, {:continue, :connect}}
  end

  @impl true
  def handle_continue(:connect, state) do
    case DBConnection.start_link(state.config) do
      {:ok, conn} ->
        {:noreply, %{state | conn: conn}}
      {:error, _reason} ->
        # Retry after delay without counting against restart intensity
        Process.send_after(self(), :retry_connect, 5_000)
        {:noreply, state}
    end
  end

  @impl true
  def handle_info(:retry_connect, state) do
    {:noreply, state, {:continue, :connect}}
  end
end
```

### When NOT to Use

**Don't use this when:**
- The start failure is deterministic (config error, missing module) — fix the bug
- You're relying on automatic retry to avoid proper health checking
- The child should NOT start at all if dependencies are unavailable (use `:ignore`)

**Over-application example:**
```elixir
# Retrying forever when the error is permanent
defmodule MyApp.MisconfiguredWorker do
  @impl true
  def init(%{api_key: nil}) do
    # This will NEVER succeed — the key is nil!
    # Infinite retry just wastes resources
    Process.send_after(self(), :retry, 5_000)
    {:ok, %{}}
  end
end
```

**Better alternative:**
```elixir
defmodule MyApp.MisconfiguredWorker do
  @impl true
  def init(%{api_key: nil}) do
    # Fail fast on permanent configuration errors
    {:stop, {:error, :missing_api_key}}
  end
end
```

**Why:** Retry logic is for transient failures (network, resource contention). For permanent errors (bad config, missing deps), fail fast so the operator can fix the actual problem.

---

## Pattern 14: `$ancestors` and `$callers` — Process Lineage Tracking

**Source:** [lib/elixir/lib/task.ex#L227](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/task.ex#L227) (Ancestor and Caller Tracking section)

**What it does:** Elixir uses two process dictionary keys for lineage:
- `$ancestors` — the supervision hierarchy (who spawned/supervises this process)
- `$callers` — the logical call chain (who requested this work)

These are different! A task's ancestor is its supervisor, but its caller is the process that initiated the async operation.

**Why:** Debugging and tracing. When a task crashes, the log includes both its supervisor (for restart context) and its caller (for business logic context). This dual tracking is essential for understanding failures in systems where the spawner and supervisor are different processes.

**Anti-pattern:** Ignoring caller tracking when building custom process spawning. If you build something like `Task.Supervisor`, propagate `$callers` so crash logs are meaningful.

**Code example from source (task/supervisor.ex):**
```elixir
defp get_callers(owner) do
  case :erlang.get(:"$callers") do
    [_ | _] = list -> [owner | list]
    _ -> [owner]
  end
end

# Task.start_link propagates both owner and callers
def start_link(module, function, args)
    when is_atom(module) and is_atom(function) and is_list(args) do
  mfa = {module, function, args}
  Task.Supervised.start_link(get_owner(self()), get_callers(self()), mfa)
end
```

### When to Use

**Triggers:**
- You're debugging crashes and need to understand where a task was spawned from
- You're building custom process spawning and want crash logs to show the call chain
- You need to trace a request through multiple spawned processes

**Example — before:**
```elixir
# Custom spawner that loses caller context
defmodule MyApp.BackgroundJob do
  def run_async(fun) do
    spawn_link(fn ->
      # When this crashes, the log shows no context about WHO spawned it
      fun.()
    end)
  end
end

# Crash log:
# [error] Process #PID<0.234.0> raised an exception
# ** (RuntimeError) something went wrong
# No idea who called run_async or why!
```

**Example — after:**
```elixir
defmodule MyApp.BackgroundJob do
  def run_async(fun) do
    owner = self()
    callers = case Process.get(:"$callers") do
      [_ | _] = list -> [owner | list]
      _ -> [owner]
    end

    spawn_link(fn ->
      Process.put(:"$callers", callers)
      fun.()
    end)
  end
end

# Crash log now shows the full caller chain:
# [error] Process #PID<0.234.0> raised an exception
# Callers: [#PID<0.200.0>, #PID<0.150.0>]  ← who initiated this work
```

### When NOT to Use

**Don't use this when:**
- You're using Task/Task.Supervisor (they propagate callers automatically)
- The process is long-lived and the original caller is irrelevant after startup
- You're spawning processes that outlive their callers (callers list becomes stale)

**Over-application example:**
```elixir
# Tracking callers for a permanent GenServer — pointless after init
defmodule MyApp.Cache do
  use GenServer

  def start_link(_) do
    # The "caller" of start_link is the supervisor — not useful for debugging
    # After boot, the cache serves many callers — the original spawner is irrelevant
    GenServer.start_link(__MODULE__, [], name: __MODULE__)
  end
end
```

**Better alternative:**
```elixir
# For long-lived processes, use Logger.metadata or OpenTelemetry spans
# to track per-request context, not process lineage
def handle_call({:get, key}, _from, state) do
  Logger.metadata(request_id: Logger.metadata()[:request_id])
  {:reply, Map.get(state, key), state}
end
```

**Why:** `$callers` is useful for short-lived spawned work (tasks, one-shot processes). For long-lived services, per-request tracing (metadata, spans) is more appropriate than process lineage.

---

## Pattern 15: GenServer.reply/2 for Deferred Responses

**Source:** [lib/elixir/lib/gen_server.ex#L620](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/gen_server.ex#L620) (callback docs), [lib/elixir/lib/gen_server.ex#L1328](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/gen_server.ex#L1328) (reply/2 function)

**What it does:** A `handle_call` can return `{:noreply, state}` without replying, then later call `GenServer.reply(from, response)` from any process. This decouples request receipt from response delivery.

**Why:** Three use cases (from the source):
1. Reply before returning (response known, but need to do cleanup after)
2. Reply after returning (response not yet available, computed asynchronously)
3. Reply from another process (delegate work to a task)

This enables non-blocking request handling in GenServers that would otherwise be bottlenecked.

**Anti-pattern:** Spawning a task to do work and then having the GenServer block on `Task.await` inside `handle_call`. This defeats the purpose — use `reply/2` from the task instead.

**Code example from source:**
```elixir
def handle_call(:reply_in_one_second, from, state) do
  Process.send_after(self(), {:reply, from}, 1_000)
  {:noreply, state}
end

def handle_info({:reply, from}, state) do
  GenServer.reply(from, :one_second_has_passed)
  {:noreply, state}
end
```

### When to Use

**Triggers:**
- A GenServer needs to do async work before replying (DB query, HTTP call, aggregation)
- You want to reply from a different process than the one that received the request
- You need to send intermediate progress and then a final response

**Example — before:**
```elixir
defmodule MyApp.Aggregator do
  use GenServer

  @impl true
  def handle_call(:aggregate, _from, state) do
    # Blocks the GenServer for potentially seconds
    # No other calls can be processed during this time
    result = Enum.reduce(state.sources, %{}, fn source, acc ->
      data = HTTPClient.get!(source.url).body
      Map.merge(acc, Jason.decode!(data))
    end)
    {:reply, result, state}
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.Aggregator do
  use GenServer

  @impl true
  def handle_call(:aggregate, from, state) do
    # Don't block — spawn the work and reply later
    Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn ->
      result = Enum.reduce(state.sources, %{}, fn source, acc ->
        data = HTTPClient.get!(source.url).body
        Map.merge(acc, Jason.decode!(data))
      end)
      GenServer.reply(from, result)
    end)
    {:noreply, state}
  end
end
```

### When NOT to Use

**Don't use this when:**
- The work is fast (< 1ms) — just reply inline
- You need the reply to be ordered with respect to other calls (deferred replies break ordering)
- The `from` reference escapes to a long-lived process (it holds a monitor that should be cleaned up)

**Over-application example:**
```elixir
# Deferring reply for trivial work — unnecessary complexity
defmodule MyApp.Counter do
  use GenServer

  @impl true
  def handle_call(:get, from, state) do
    # This is instant! Why defer?
    Task.start(fn -> GenServer.reply(from, state.count) end)
    {:noreply, state}
  end
end
```

**Better alternative:**
```elixir
defmodule MyApp.Counter do
  use GenServer

  @impl true
  def handle_call(:get, _from, state) do
    {:reply, state.count, state}
  end
end
```

**Why:** `reply/2` enables non-blocking GenServers for expensive operations. For cheap operations, it adds process spawn overhead, potential ordering issues, and code complexity for no benefit.

---

## Pattern 16: Process.alias for Safe Request/Response

**Source:** [lib/elixir/lib/process.ex#L32](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/process.ex#L32) (Aliases section)

**What it does:** Process aliases (Erlang/OTP 24+) provide a deactivatable reference for receiving replies. After sending a request with an alias as the reply address, you can deactivate the alias if you no longer want the response — any messages sent to a deactivated alias are silently dropped.

**Why:** Solves the "late reply" problem. In request/response patterns, if the requester times out and moves on, a late reply to its PID could confuse future `receive` blocks. With aliases, you deactivate after timeout and the late reply harmlessly vanishes.

**Anti-pattern:** Using bare PIDs for reply addresses in protocols where timeouts are possible. Late messages pollute the mailbox.

**Code example from source:**
```elixir
server = spawn(&server/0)

source_alias = Process.alias()
send(server, {:ping, source_alias})

receive do
  :pong -> :pong
end

# Deactivate — late replies to this alias are silently dropped
Process.unalias(source_alias)
```

### When to Use

**Triggers:**
- You're building request/response patterns with timeouts where late replies pollute the mailbox
- A GenServer sends a request and moves on after timeout, but the response arrives later
- You need safe cancellation of pending responses

**Example — before:**
```elixir
defmodule MyApp.RequestRouter do
  use GenServer

  @impl true
  def handle_call({:request, payload}, _from, state) do
    send(state.backend, {:request, self(), payload})
    receive do
      {:response, result} -> {:reply, result, state}
    after
      5_000 ->
        # Timeout... but the response might still arrive later!
        # It'll sit in our mailbox and confuse future receives
        {:reply, {:error, :timeout}, state}
    end
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.RequestRouter do
  use GenServer

  @impl true
  def handle_call({:request, payload}, _from, state) do
    alias_ref = Process.alias([:reply])
    send(state.backend, {:request, alias_ref, payload})

    receive do
      {^alias_ref, result} -> {:reply, result, state}
    after
      5_000 ->
        # Deactivate the alias — late replies are silently dropped
        Process.unalias(alias_ref)
        {:reply, {:error, :timeout}, state}
    end
  end
end
```

### When NOT to Use

**Don't use this when:**
- You're using GenServer.call (it already handles this with its own ref-based protocol)
- The response will always arrive (no timeout scenario)
- You're on OTP < 24 (aliases aren't available)

**Over-application example:**
```elixir
# Using aliases for GenServer.call — it already handles late replies
defmodule MyApp.Client do
  def get_data(server) do
    alias_ref = Process.alias([:reply])
    # Pointless — GenServer.call already uses monitor-based protocol
    # that handles late replies correctly
    GenServer.call(server, {:get, alias_ref})
  end
end
```

**Better alternative:**
```elixir
defmodule MyApp.Client do
  def get_data(server) do
    # GenServer.call already handles timeouts and late replies correctly
    GenServer.call(server, :get, 5_000)
  end
end
```

**Why:** Aliases solve the problem for custom protocols where you build your own request/response. GenServer.call already has equivalent protections built in. Use aliases when you're implementing raw message-based protocols.

---

## Pattern 17: Registry Partitioning Strategies

**Source:** [lib/elixir/lib/registry.ex#L310](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/registry.ex#L310) (start_link partitioning docs)

**What it does:** Duplicate registries support two partitioning strategies:
- `{:duplicate, :pid}` (default) — groups entries by the registering process's PID. Good for few keys with many entries (e.g., one PubSub topic with many subscribers).
- `{:duplicate, :key}` — groups entries by key. Good for many keys with few entries each (e.g., many topics with few subscribers).

**Why:** The partitioning strategy determines which partition(s) need to be scanned during lookup. With `:key` partitioning, a key lookup hits exactly one partition (O(1) partitions). With `:pid` partitioning, key lookups must scan all partitions but process-based operations (unregister on death) are localized.

**Anti-pattern:** Using default `:pid` partitioning with millions of unique keys and frequent lookups. Each lookup scans all partitions. Switch to `{:duplicate, :key}`.

**Code example from source:**
```elixir
# Many topics, few subscribers each — use key partitioning
Registry.start_link(
  keys: {:duplicate, :key},
  name: MyApp.TopicRegistry,
  partitions: System.schedulers_online()
)

# Few topics, many subscribers — use pid partitioning (default)
Registry.start_link(
  keys: :duplicate,
  name: MyApp.BroadcastRegistry,
  partitions: System.schedulers_online()
)
```

### When to Use

**Triggers:**
- You have a PubSub with many topics and few subscribers per topic — key lookups are slow
- Profiling shows Registry.dispatch scanning many partitions for key-based lookups
- You're choosing between "optimize for subscribe/unsubscribe" vs "optimize for dispatch"

**Example — before:**
```elixir
# Default :pid partitioning with many unique keys
# Each dispatch must scan ALL partitions to find subscribers for a key
Registry.start_link(keys: :duplicate, name: MyApp.Events)

# With 16 partitions and 100k unique event types,
# every dispatch scans 16 ETS tables
Registry.dispatch(MyApp.Events, "order.created", fn entries ->
  for {pid, _} <- entries, do: send(pid, :notify)
end)
```

**Example — after:**
```elixir
# Key partitioning — dispatch hits exactly ONE partition per key
Registry.start_link(
  keys: {:duplicate, :key},
  name: MyApp.Events,
  partitions: System.schedulers_online()
)

# Now dispatch only scans one ETS table — O(1) partitions
Registry.dispatch(MyApp.Events, "order.created", fn entries ->
  for {pid, _} <- entries, do: send(pid, :notify)
end)
```

### When NOT to Use

**Don't use this when:**
- You have few keys with many subscribers (`:pid` partitioning is better for cleanup)
- Process death cleanup is the hot path (`:key` partitioning must scan all partitions on death)
- You're not hitting performance issues with the default (premature optimization)

**Over-application example:**
```elixir
# Key partitioning for a "presence" system where processes die frequently
# Each death must scan ALL partitions to unregister
Registry.start_link(
  keys: {:duplicate, :key},
  name: MyApp.Presence,
  partitions: 16
)
# With 50k users connecting/disconnecting per second,
# each disconnect scans 16 partitions — worse than default!
```

**Better alternative:**
```elixir
# Pid partitioning — death cleanup is localized to one partition
Registry.start_link(
  keys: :duplicate,
  name: MyApp.Presence,
  partitions: System.schedulers_online()
)
```

**Why:** Partitioning is a tradeoff. `:key` optimizes dispatch (one partition per lookup) at the cost of death cleanup (scan all). `:pid` optimizes death cleanup (one partition) at the cost of dispatch (scan all). Pick based on which operation is hotter.

---

## Pattern 18: `init/1` Return Values — The Full Spectrum

**Source:** [lib/elixir/lib/gen_server.ex#L498](https://github.com/elixir-lang/elixir/blob/f4e1b34617ef92052b65781f18eae5b88a490098/lib/elixir/lib/gen_server.ex#L498) (init callback spec)

**What it does:** `init/1` supports five return values:
- `{:ok, state}` — normal start
- `{:ok, state, timeout}` — start with idle timeout
- `{:ok, state, :hibernate}` — start and immediately hibernate (GC + compact heap)
- `{:ok, state, {:continue, arg}}` — start then immediately invoke `handle_continue`
- `:ignore` — don't start, supervisor treats as successful (child can be restarted later)
- `{:stop, reason}` — initialization failed

**Why:** Each covers a real scenario:
- `:ignore` — process is disabled by configuration but might be enabled later via `Supervisor.restart_child/2`
- `{:stop, reason}` — unrecoverable initialization failure
- `:hibernate` — process will be idle for a long time, minimize memory
- `{:continue, _}` — split fast init from slow setup

**Anti-pattern:** Using `{:stop, reason}` when `:ignore` is appropriate. If a feature is disabled by config, `:ignore` keeps the child spec in the supervisor for later activation. `{:stop, reason}` signals a real failure.

### When to Use

**Triggers:**
- You need to communicate "don't start this child" without the supervisor treating it as failure
- A feature is disabled by config but the child spec should remain for hot-enabling
- A process discovers during init that it's a duplicate and should yield to the existing one

**Example — before:**
```elixir
defmodule MyApp.OptionalFeature do
  use GenServer

  @impl true
  def init(_) do
    if Application.get_env(:my_app, :feature_enabled) do
      {:ok, %{}}
    else
      # {:stop, :disabled} causes supervisor to count it as a failure!
      {:stop, :disabled}
    end
  end
end
```

**Example — after:**
```elixir
defmodule MyApp.OptionalFeature do
  use GenServer

  @impl true
  def init(_) do
    if Application.get_env(:my_app, :feature_enabled) do
      {:ok, %{}}
    else
      # :ignore — supervisor is happy, child spec stays for later activation
      :ignore
    end
  end
end

# Later, to enable:
# Update config, then:
# Supervisor.restart_child(MyApp.Supervisor, MyApp.OptionalFeature)
```

### When NOT to Use

**Don't use this when:**
- The failure is real and should count toward restart intensity (use `{:stop, reason}`)
- You want the supervisor to NOT have a child spec for this module (just don't add it)
- The process should retry starting later automatically (use `{:stop, _}` + transient restart)

**Over-application example:**
```elixir
# Using :ignore for a real failure — hides the problem
defmodule MyApp.DBConnection do
  @impl true
  def init(config) do
    case connect(config) do
      {:ok, conn} -> {:ok, conn}
      {:error, _} -> :ignore  # BAD: DB is down but we pretend everything is fine
    end
  end
end
```

**Better alternative:**
```elixir
defmodule MyApp.DBConnection do
  @impl true
  def init(config) do
    case connect(config) do
      {:ok, conn} -> {:ok, conn}
      {:error, reason} -> {:stop, reason}  # Let supervisor handle the failure
    end
  end
end
```

**Why:** `:ignore` means "this child intentionally should not run right now." `{:stop, reason}` means "this child tried to start and failed." Conflating the two hides real failures from your supervision tree.

<!-- PATTERN_COMPLETE -->