Files

T

aweiker 10218813d3 docs: backfill TOC + decision trees, fix review findings

- Add ## Contents and ## Decision Tree to all 10 existing pattern files
- Fix embed_as/1 semantics inversion in types.md (:self → :dump)
- Fix fabricated __meta__.changes reference in changesets.md
- Fix default primary key type (:integer → :id) in schemas.md
- Combine @impl subsections into single "Minimal Callback Annotation"

2026-05-01 22:13:35 -07:00

36 KiB

Raw Permalink Blame History

Data Transform & Pipeline Patterns in Elixir Core

Patterns extracted from Elixir's standard library source code.

List-Specialized Clause Before Protocol Dispatch
Build-Then-Reverse (Cons-Cell Accumulation)
Pipeline for Linear Transformations, Bare Calls for Control Flow
Pipeline Ending with |> elem(1) (Protocol Reduce Unwrap)
Private Helper Decomposition: Recursive Workers with Guards
Enum vs Stream Decision Pattern
Map.update vs Map.put Decision Pattern
Pattern Matching on Map Structure for Dispatch
Delegating to Erlang BIFs with defdelegate
Reduce as the Universal Primitive
Keyword Multi-Clause Guard Dispatch (String.split pattern)
Lazy Private Helpers with defp parts_to_index

1. List-Specialized Clause Before Protocol Dispatch

Source: lib/elixir/lib/enum.ex#L1723

What it does: Every public Enum function defines a when is_list(enumerable) clause first, then a generic fallback that uses the Enumerable protocol.

def map(enumerable, fun) when is_list(enumerable) do
  :lists.map(fun, enumerable)
end

def map(first..last//step, fun) do
  map_range(first, last, step, fun)
end

def map(enumerable, fun) do
  reduce(enumerable, [], R.map(fun)) |> :lists.reverse()
end

Why: Lists are by far the most common enumerable. Matching them first avoids protocol dispatch overhead entirely (direct Erlang BIF call). The range clause is a further optimization for a common case. The generic clause handles all other enumerables through the protocol.

Anti-pattern: A single clause that always goes through protocol dispatch:

# BAD — forces protocol overhead even for lists
def map(enumerable, fun) do
  Enumerable.reduce(enumerable, {:cont, []}, fn x, acc ->
    {:cont, [fun.(x) | acc]}
  end) |> elem(1) |> :lists.reverse()
end

When to Use

Triggers: You have a public function that accepts "any enumerable" but lists account for the majority of callers. Profiling shows protocol dispatch is a measurable cost. You can call an Erlang BIF or a direct recursive implementation for the list case.

Example — before:

def sum(enumerable) do
  Enumerable.reduce(enumerable, {:cont, 0}, fn x, acc -> {:cont, acc + x} end)
  |> elem(1)
end

Example — after:

def sum(enumerable) when is_list(enumerable) do
  :lists.foldl(fn x, acc -> acc + x end, 0, enumerable)
end

def sum(enumerable) do
  Enumerable.reduce(enumerable, {:cont, 0}, fn x, acc -> {:cont, acc + x} end)
  |> elem(1)
end

When NOT to Use

Don't use this when: The function is rarely called with lists, or the function body is complex enough that maintaining two implementations creates a bug risk. Also avoid when the protocol path is already fast enough (micro-optimization for non-hot paths).

Over-application example:

# Pointless — this function is only ever called with streams
def expensive_transform(enumerable) when is_list(enumerable) do
  # duplicate complex logic just in case a list shows up
  enumerable |> do_phase_1() |> do_phase_2() |> do_phase_3()
end

def expensive_transform(enumerable) do
  enumerable |> do_phase_1() |> do_phase_2() |> do_phase_3()
end

Better alternative: Keep one clause. Add the specialization only when profiling proves the protocol dispatch is a bottleneck for real workloads.

Why: Premature optimization. Two copies of the same logic means two places to fix bugs. The BEAM's protocol dispatch is already highly optimized — you need evidence before duplicating.

2. Build-Then-Reverse (Cons-Cell Accumulation)

Source: lib/elixir/lib/enum.ex#L1124, 1733, 2697

What it does: Accumulates results by prepending to a list ([x | acc]), then reverses at the end.

# filter (line 1124)
def filter(enumerable, fun) do
  reduce(enumerable, [], R.filter(fun)) |> :lists.reverse()
end

# map (line 1733)
def map(enumerable, fun) do
  reduce(enumerable, [], R.map(fun)) |> :lists.reverse()
end

# reject (line 2697)
def reject(enumerable, fun) do
  reduce(enumerable, [], R.reject(fun)) |> :lists.reverse()
end

Why: Prepending to a linked list is O(1); appending is O(n). Building reversed then flipping once is O(n) total. Appending each element would be O(n²).

Anti-pattern: Appending to the end of a list in a loop:

# BAD — O(n²)
def map(enumerable, fun) do
  reduce(enumerable, [], fn x, acc -> acc ++ [fun.(x)] end)
end

When to Use

Triggers: You're building a result list element-by-element through recursion or reduce, and the output order must match the input order. The collection can be any size.

Example — before:

def keep_positives(list) do
  Enum.reduce(list, [], fn x, acc ->
    if x > 0, do: acc ++ [x], else: acc
  end)
end

Example — after:

def keep_positives(list) do
  Enum.reduce(list, [], fn x, acc ->
    if x > 0, do: [x | acc], else: acc
  end)
  |> :lists.reverse()
end

When NOT to Use

Don't use this when: Order doesn't matter (e.g., building a set of unique items, collecting into a MapSet), or when you're only extracting a single value (sum, count, max). Also unnecessary if you're using Enum.map/2 or Enum.filter/2 directly — they already do this internally.

Over-application example:

# Unnecessary — order doesn't matter for uniqueness
def unique_tags(items) do
  Enum.reduce(items, [], fn item, acc ->
    if item.tag in acc, do: acc, else: [item.tag | acc]
  end)
  |> :lists.reverse()  # why reverse if you're just checking membership?
end

Better alternative: Use a MapSet or just don't reverse:

def unique_tags(items) do
  Enum.reduce(items, MapSet.new(), fn item, acc ->
    MapSet.put(acc, item.tag)
  end)
end

Why: The reverse adds O(n) work and a full list traversal. If you don't care about order, skip it. If you're collecting into a non-list structure, this pattern doesn't apply.

3. Pipeline for Linear Transformations, Bare Calls for Control Flow

Source: lib/elixir/lib/enum.ex#L1684, 1551, vs 496–502

What it does: Elixir core uses |> when data flows linearly through 2+ transformations. It does NOT use |> for single-step operations or when the first argument is computed by a case/if/with.

# Pipeline: data flows through multiple transforms (line 1684-1685)
def map_join(enumerable, joiner \\ "", mapper) do
  enumerable
  |> map(&entry_to_string(mapper.(&1)))
  |> intersperse(joiner)
  |> IO.iodata_to_binary()
end

# NO pipeline: single step or control flow (line 496-502)
def at(enumerable, index, default \\ nil) when is_integer(index) do
  case slice_forward(enumerable, index, 1, 1) do
    [value] -> value
    [] -> default
  end
end

Why: Pipelines communicate "data flows through transformations." Using them for a single function call or wrapping around control flow obscures intent rather than clarifying it.

Anti-pattern: Pipelines for single operations or wrapping control flow:

# BAD — single step, no pipeline needed
list |> Enum.reverse()

# BAD — control flow awkwardly forced into a pipe
result
|> case do
  {:ok, x} -> x
  :error -> nil
end

When to Use

Triggers: Data flows through 2 or more transformations in sequence, each taking the previous result as its first argument. The reader should see a "conveyor belt" of operations.

Example — before:

def format_names(users) do
  String.upcase(Enum.join(Enum.map(users, & &1.name), ", "))
end

Example — after:

def format_names(users) do
  users
  |> Enum.map(& &1.name)
  |> Enum.join(", ")
  |> String.upcase()
end

When NOT to Use

Don't use this when: There's only one transformation, the result needs to go into a pattern match (case/with), or the pipe would require anonymous function wrapping (|> then(fn x -> ... end)) to fit.

Over-application example:

# Forced — then/1 wrapper defeats readability
params
|> Map.get(:user_id)
|> then(fn id ->
  case Repo.get(User, id) do
    nil -> {:error, :not_found}
    user -> {:ok, user}
  end
end)

Better alternative:

case Repo.get(User, Map.get(params, :user_id)) do
  nil -> {:error, :not_found}
  user -> {:ok, user}
end

Why: Pipes are for linear data flow. When you need branching (case/cond/with), break out of the pipeline. Forcing control flow through then/1 adds indirection without clarity.

4. Pipeline Ending with `|> elem(1)` (Protocol Reduce Unwrap)

Source: lib/elixir/lib/enum.ex#L363, 403, 433, 468, 725, 1022, 2676

What it does: When calling Enumerable.reduce/3 directly, the result is always {:done | :halted | :suspended, acc}. Core extracts the accumulator with |> elem(1).

# all?/1 (line 363)
def all?(enumerable) do
  Enumerable.reduce(enumerable, {:cont, true}, fn entry, _ ->
    if entry, do: {:cont, true}, else: {:halt, false}
  end)
  |> elem(1)
end

# reduce/3 (line 2676)
def reduce(enumerable, acc, fun) do
  Enumerable.reduce(enumerable, {:cont, acc}, fun) |> elem(1)
end

Why: The protocol returns tagged tuples for the state machine (cont/halt/suspend). End users don't need the tag — only the accumulated value. |> elem(1) is the idiomatic unwrap.

Anti-pattern: Using case when you don't care about the tag:

# BAD — unnecessary pattern match when you always want the value
case Enumerable.reduce(enumerable, {:cont, acc}, fun) do
  {:done, result} -> result
  {:halted, result} -> result
end

When to Use

Triggers: You're calling Enumerable.reduce/3 directly (implementing a custom Enum-like function) and you always want the accumulated value regardless of whether iteration completed or halted.

Example — before:

def sum_until(enumerable, limit) do
  result = Enumerable.reduce(enumerable, {:cont, 0}, fn x, acc ->
    new = acc + x
    if new >= limit, do: {:halt, new}, else: {:cont, new}
  end)
  case result do
    {:done, val} -> val
    {:halted, val} -> val
  end
end

Example — after:

def sum_until(enumerable, limit) do
  Enumerable.reduce(enumerable, {:cont, 0}, fn x, acc ->
    new = acc + x
    if new >= limit, do: {:halt, new}, else: {:cont, new}
  end)
  |> elem(1)
end

When NOT to Use

Don't use this when: You need to distinguish between :done and :halted to decide subsequent behavior (e.g., you want to know if iteration was interrupted). Also don't use in application code where you should be using Enum.reduce/3 (which handles unwrapping for you).

Over-application example:

# Pointless — Enum.reduce already unwraps
Enum.reduce([1, 2, 3], 0, &(&1 + &2)) |> elem(1)
# This crashes! Enum.reduce returns the value directly, not a tuple.

Better alternative: Use Enum.reduce/3 in application code. Only use the |> elem(1) pattern when directly calling Enumerable.reduce/3 in library code.

Why: This pattern is for protocol implementers, not application developers. Using it on already-unwrapped results causes crashes. It's an internal idiom that shouldn't leak into regular code.

5. Private Helper Decomposition: Recursive Workers with Guards

Source: lib/elixir/lib/enum.ex#L4975, 5025–5039

What it does: Complex operations are split into a public entry point (with validation guards) and a private recursive worker function. The worker uses pattern matching on structure (empty list, head|tail) and guards on counters.

# Public entry: validates, delegates (line 890-904)
def drop(enumerable, amount)
    when is_list(enumerable) and is_integer(amount) and amount >= 0 do
  drop_list(enumerable, amount)
end

# Private worker: pattern matches on structure (line 4975-4983)
defp split_list([head | tail], counter, acc) when counter > 0 do
  split_list(tail, counter - 1, [head | acc])
end

defp split_list(list, 0, acc) do
  {:lists.reverse(acc), list}
end

defp split_list([], _, acc) do
  {:lists.reverse(acc), []}
end

Why: Separating validation from recursion keeps each clause focused. Guards in function heads enable the BEAM to optimize dispatch with jump tables. No runtime if/cond needed.

Anti-pattern: Mixing validation, edge cases, and recursion in a single function with internal conditionals:

# BAD — one big function with nested ifs
defp split_list(list, counter, acc) do
  if counter > 0 and list != [] do
    [head | tail] = list
    split_list(tail, counter - 1, [head | acc])
  else
    {:lists.reverse(acc), list}
  end
end

When to Use

Triggers: You're writing a recursive function that processes a list element-by-element with termination conditions (counter hits zero, list becomes empty, accumulator reaches a threshold). Multiple base cases exist.

Example — before:

defp take_while_impl(list, fun, acc) do
  case list do
    [] -> :lists.reverse(acc)
    [head | tail] ->
      if fun.(head) do
        take_while_impl(tail, fun, [head | acc])
      else
        :lists.reverse(acc)
      end
  end
end

Example — after:

defp take_while_impl([], _fun, acc) do
  :lists.reverse(acc)
end

defp take_while_impl([head | tail], fun, acc) do
  if fun.(head) do
    take_while_impl(tail, fun, [head | acc])
  else
    :lists.reverse(acc)
  end
end

When NOT to Use

Don't use this when: The logic doesn't recurse (a simple one-shot transformation), or when Enum functions already express the operation clearly. Don't decompose for decomposition's sake.

Over-application example:

# Over-engineered — this is just Enum.take/2
defp my_take_list([], _n, acc), do: :lists.reverse(acc)
defp my_take_list(_list, 0, acc), do: :lists.reverse(acc)
defp my_take_list([h | t], n, acc), do: my_take_list(t, n - 1, [h | acc])

def my_take(list, n), do: my_take_list(list, n, [])

Better alternative:

def my_take(list, n), do: Enum.take(list, n)

Why: Elixir's standard library already provides optimized implementations of common list operations. Writing your own recursive versions adds maintenance burden and likely performs worse (Enum's list clauses call Erlang BIFs). Reserve this pattern for genuinely novel recursion.

6. Enum vs Stream Decision Pattern

Source: lib/elixir/lib/stream.ex#L1 (module docs), lib/elixir/lib/enum.ex

What it does: Enum functions are eager (materialize intermediate lists). Stream functions are lazy (build computation recipes). Core uses Stream for:

Infinite sequences (cycle, iterate, repeatedly)
Resource management (resource/3)
Composing transformations to execute in a single pass

# Stream: builds a recipe, zero computation until consumed (stream.ex ~line 490)
def map(enum, fun) when is_function(fun, 1) do
  lazy(enum, fn f1 -> R.map(fun, f1) end)
end

# Enum: immediately materializes the result (enum.ex line 1723)
def map(enumerable, fun) when is_list(enumerable) do
  :lists.map(fun, enumerable)
end

Why: From Stream docs (lines 37–41): "When chaining many operations with Enum, intermediate lists are created, while Stream creates a recipe of computations that are executed at a later moment."

Use Enum when:

You need the full result now
The collection is small/bounded
You only chain 1–2 operations

Use Stream when:

The collection is large or infinite
You chain many transformations
You need resource cleanup (file handles, network)
You want single-pass processing

Anti-pattern: Using Stream for small bounded collections (overhead of the lazy machinery exceeds any benefit):

# BAD — Stream overhead for trivial transform
[1, 2, 3] |> Stream.map(&(&1 * 2)) |> Enum.to_list()

# GOOD — just use Enum
[1, 2, 3] |> Enum.map(&(&1 * 2))

When to Use

Triggers: You're chaining 3+ transformations on a large (or unbounded) collection. You're reading from a file/network where you want backpressure. You need Stream.resource/3 for cleanup guarantees.

Example — before:

# Materializes 3 intermediate lists for a 1M-line file
File.read!("large.csv")
|> String.split("\n")
|> Enum.map(&String.trim/1)
|> Enum.filter(&(&1 != ""))
|> Enum.map(&parse_row/1)
|> Enum.take(100)

Example — after:

# Single pass, constant memory, stops after 100
File.stream!("large.csv")
|> Stream.map(&String.trim/1)
|> Stream.reject(&(&1 == ""))
|> Stream.map(&parse_row/1)
|> Enum.take(100)

When NOT to Use

Don't use this when: The collection is small and bounded (under ~1000 elements), you only apply 1–2 transformations, or you need random access to the full result. Stream's lazy machinery has overhead that exceeds the savings for small data.

Over-application example:

# Stream overhead exceeds any benefit for 5 items
config = [:a, :b, :c, :d, :e]

config
|> Stream.map(&Atom.to_string/1)
|> Stream.map(&String.upcase/1)
|> Enum.to_list()

Better alternative:

config
|> Enum.map(&(&1 |> Atom.to_string() |> String.upcase()))

Why: Stream wraps each step in a closure and creates a lazy struct. For small collections, the allocation and indirection cost more than just building the intermediate list. The breakeven point is roughly when collections exceed hundreds of elements AND you chain 3+ operations.

7. Map.update vs Map.put Decision Pattern

Source: lib/elixir/lib/map.ex#L670

What it does: Map.update/4 transforms an existing value based on its current state. Map.put/3 unconditionally sets a value regardless of current state.

# Map.update/4 (line 682-693): transform based on current value
def update(map, key, default, fun) when is_function(fun, 1) do
  case map do
    %{^key => value} ->
      %{map | key => fun.(value)}
    %{} ->
      put(map, key, default)
    other ->
      :erlang.error({:badmap, other}, [map, key, default, fun])
  end
end

# Map.put/3 (line 636): unconditional set
def put(map, key, value) do
  :maps.put(key, value, map)
end

Why: update/4 is for when the new value depends on the old value (counters, appending to nested lists). put/3 is for when you know the exact new value regardless of what was there.

Anti-pattern: Using get + put when update expresses intent:

# BAD — two lookups, unclear intent
count = Map.get(map, :count, 0)
Map.put(map, :count, count + 1)

# GOOD — single lookup, clear intent
Map.update(map, :count, 1, &(&1 + 1))

When to Use

Triggers: The new value is computed FROM the old value — incrementing counters, appending to lists, toggling booleans, merging nested maps. You also need a sensible default for the "key doesn't exist yet" case.

Example — before:

def add_tag(state, tag) do
  existing = Map.get(state, :tags, [])
  Map.put(state, :tags, [tag | existing])
end

Example — after:

def add_tag(state, tag) do
  Map.update(state, :tags, [tag], fn tags -> [tag | tags] end)
end

When NOT to Use

Don't use this when: The new value is independent of the old value (you're replacing, not transforming). Also avoid when you need to handle the "missing key" case differently from "present key" (use Map.get_and_update/3 or explicit case instead).

Over-application example:

# Awkward — the "update" function ignores the old value entirely
Map.update(user, :name, new_name, fn _old -> new_name end)

Better alternative:

Map.put(user, :name, new_name)

Why: Map.update/4 communicates "the new value depends on the old one." When you ignore the old value in the update function, you're lying to the reader. Use put/3 for unconditional replacement — it's simpler and signals intent correctly.

8. Pattern Matching on Map Structure for Dispatch

Source: lib/elixir/lib/map.ex#L398, 509, 586

What it does: Map functions use case map do %{^key => value} -> ... to dispatch on whether a key exists, rather than calling has_key? + conditional.

# Map.get/3 (line 586-594)
def get(map, key, default \\ nil) do
  case map do
    %{^key => value} ->
      value
    %{} ->
      default
    other ->
      :erlang.error({:badmap, other}, [map, key, default])
  end
end

# Map.put_new/3 (line 398-407)
def put_new(map, key, value) do
  case map do
    %{^key => _value} ->
      map
    %{} ->
      put(map, key, value)
    other ->
      :erlang.error({:badmap, other})
  end
end

Why: Pattern matching with %{^key => value} does the lookup AND extraction in one step. The %{} clause (empty map pattern) matches any map where the key is NOT present. The other clause provides a clear error for non-maps. This is both more efficient and more readable than if Map.has_key?(map, key).

Anti-pattern:

# BAD — double lookup, less clear
def get(map, key, default) do
  if Map.has_key?(map, key) do
    :maps.get(key, map)
  else
    default
  end
end

When to Use

Triggers: You need to branch based on whether a key exists in a map, especially when you also want the value if it does exist. You want a single lookup that both checks existence and extracts the value.

Example — before:

def fetch_config(config, key) do
  if Map.has_key?(config, key) do
    {:ok, Map.get(config, key)}
  else
    {:error, :missing}
  end
end

Example — after:

def fetch_config(config, key) do
  case config do
    %{^key => value} -> {:ok, value}
    %{} -> {:error, :missing}
  end
end

When NOT to Use

Don't use this when: You're checking for multiple keys simultaneously (use a multi-key pattern match instead), or when Map.get/3 with a default already expresses what you need. Don't use case dispatch for simple "get with fallback" scenarios.

Over-application example:

# Over-engineered — Map.get/3 already does this
def get_name(user) do
  case user do
    %{:name => name} -> name
    %{} -> "Anonymous"
  end
end

Better alternative:

def get_name(user) do
  Map.get(user, :name, "Anonymous")
end

Why: Map.get/3 already implements this exact pattern internally. Rewriting it as an explicit case adds visual noise without any semantic or performance benefit. Use the case pattern when you're doing something Map.get can't — like returning different tagged tuples or triggering side effects.

9. Delegating to Erlang BIFs with `defdelegate`

Source: lib/elixir/lib/map.ex#L127, 143, 159, 173

What it does: When an Erlang function already does exactly what's needed, Elixir delegates directly rather than wrapping.

@spec keys(map) :: [key]
defdelegate keys(map), to: :maps

@spec values(map) :: [value]
defdelegate values(map), to: :maps

@spec merge(map, map) :: map
defdelegate merge(map1, map2), to: :maps

Why: Zero overhead — the compiler inlines these. No point wrapping an Erlang BIF just to have an Elixir wrapper when the semantics are identical. The @compile {:inline, ...} annotation on line 115 makes this explicit.

Anti-pattern: Wrapping without adding value:

# BAD — pointless wrapper
def keys(map) do
  :maps.keys(map)
end

When to Use

Triggers: An Erlang module already exports a function with the exact semantics you need. The argument order matches. You want to expose it under an Elixir-idiomatic name or in your module's namespace for discoverability.

Example — before:

defmodule MyQueue do
  def new, do: :queue.new()
  def push(q, item), do: :queue.in(item, q)
  def pop(q), do: :queue.out(q)
end

Example — after:

defmodule MyQueue do
  defdelegate new(), to: :queue
  # Can't delegate push — argument order differs, needs wrapper
  def push(q, item), do: :queue.in(item, q)
  defdelegate pop(q), to: :queue, as: :out
end

When NOT to Use

Don't use this when: You need to validate inputs, transform arguments, change argument order, add defaults, or adapt the return value. Also avoid when the Erlang function has unclear semantics that benefit from a documenting wrapper.

Over-application example:

# Broken — Erlang arg order is (key, map), Elixir convention is (map, key)
defdelegate get(map, key), to: :maps
# This compiles but has wrong argument order expectations!

Better alternative:

def get(map, key) do
  :maps.get(key, map)
end

Why: defdelegate is a transparent pass-through. If argument order, defaults, validation, or error handling differ between your desired API and the Erlang function, you need a real wrapper. Delegating with a semantic mismatch creates subtle bugs.

10. Reduce as the Universal Primitive

Source: lib/elixir/lib/enum.ex#L19, 2660–2676

What it does: Nearly every Enum operation is built on top of reduce. The Enumerable protocol's core function is reduce/3. Everything else (count, member?, slice) is an optimization hint.

# From the protocol docs (line 19-21):
def map(enumerable, fun) do
  reducer = fn x, acc -> {:cont, [fun.(x) | acc]} end
  Enumerable.reduce(enumerable, {:cont, []}, reducer) |> elem(1) |> :lists.reverse()
end

# The actual reduce/3 (line 2676):
def reduce(enumerable, acc, fun) do
  Enumerable.reduce(enumerable, {:cont, acc}, fun) |> elem(1)
end

Why: Reduce is the most general iteration primitive. By building all operations on reduce, any data structure that implements Enumerable.reduce/3 automatically gets the full Enum API. This is the protocol + reduce = universal composability pattern.

Anti-pattern: Implementing each Enum function independently for each data structure:

# BAD — reimplementing map for each type
def map(%MyStruct{items: items}, fun), do: ...
def filter(%MyStruct{items: items}, fun), do: ...
# Instead: implement Enumerable.reduce/3 once and get everything

When to Use

Triggers: You're implementing a custom data structure that should be iterable. You want the full Enum API without implementing each function. You're designing a protocol where one function provides maximum leverage.

Example — before:

defmodule RingBuffer do
  def map(%RingBuffer{} = rb, fun), do: ...
  def filter(%RingBuffer{} = rb, fun), do: ...
  def reduce(%RingBuffer{} = rb, acc, fun), do: ...
  def count(%RingBuffer{} = rb), do: ...
  # 70+ functions to implement...
end

Example — after:

defimpl Enumerable, for: RingBuffer do
  def reduce(%RingBuffer{data: data, head: h, size: s}, acc, fun) do
    # One function — yields elements in order
    do_reduce(data, h, s, acc, fun)
  end

  def count(%RingBuffer{size: s}), do: {:ok, s}
  def member?(_, _), do: {:error, __MODULE__}
  def slice(_), do: {:error, __MODULE__}
end
# Now Enum.map/filter/take/etc. all work automatically

When NOT to Use

Don't use this when: Your data structure has specialized algorithms that are significantly faster than the generic reduce-based approach (e.g., binary search on a sorted structure for member?). In that case, implement the specific protocol callbacks.

Over-application example:

# Wasteful — reduce traverses all elements to count a structure with O(1) size
defimpl Enumerable, for: SizedCollection do
  def count(_), do: {:error, __MODULE__}
  # This forces Enum.count to use reduce: O(n)
  # when the size is stored in a field: O(1)
end

Better alternative:

defimpl Enumerable, for: SizedCollection do
  def count(%{size: s}), do: {:ok, s}
  # Now Enum.count is O(1)
end

Why: The optimization callbacks (count, member?, slice) exist precisely because reduce is O(n) for operations that some structures can do faster. Use reduce as the universal fallback, but implement the fast paths when your structure supports them.

11. Keyword Multi-Clause Guard Dispatch (String.split pattern)

Source: lib/elixir/lib/string.ex#L516

What it does: Functions with many input shapes use multiple def clauses with guards to dispatch, handling each case distinctly rather than using internal cond/case.

def split(string, %Regex{} = pattern, options) when is_binary(string) and is_list(options) do
  Regex.split(pattern, string, options)
end

def split(string, "", options) when is_binary(string) and is_list(options) do
  # special case: split by empty string (grapheme-by-grapheme)
  ...
end

def split(string, [], options) when is_binary(string) and is_list(options) do
  # empty pattern list: no splitting
  ...
end

def split(string, pattern, options) when is_binary(string) and is_list(options) do
  # general binary pattern case
  ...
end

Why: Each clause has a single responsibility. The BEAM compiler generates efficient dispatch for these patterns. Adding a new case is additive (new clause) rather than modifying existing logic.

Anti-pattern: One function with nested conditionals:

# BAD — all cases mashed into one body
def split(string, pattern, options) do
  cond do
    is_struct(pattern, Regex) -> ...
    pattern == "" -> ...
    pattern == [] -> ...
    true -> ...
  end
end

When to Use

Triggers: A function accepts multiple distinct input shapes (different types, specific sentinel values, structural patterns). Each shape requires substantially different handling. The shapes are distinguishable via guards or pattern matching.

Example — before:

def parse(input, format) do
  cond do
    format == :json -> Jason.decode!(input)
    format == :yaml -> YamlElixir.read_from_string!(input)
    is_binary(format) -> custom_parse(input, format)
    true -> raise "unknown format"
  end
end

Example — after:

def parse(input, :json) when is_binary(input), do: Jason.decode!(input)
def parse(input, :yaml) when is_binary(input), do: YamlElixir.read_from_string!(input)
def parse(input, format) when is_binary(input) and is_binary(format), do: custom_parse(input, format)

When NOT to Use

Don't use this when: The differences between cases are minor (a single flag toggles a small behavior), or when you'd end up with 10+ nearly-identical clauses that differ by one line. Also avoid when the distinguishing condition can't be expressed in a guard (e.g., requires a database lookup).

Over-application example:

# Absurd — 5 clauses that differ only in a multiplier
def convert(value, :mm), do: value * 1.0
def convert(value, :cm), do: value * 10.0
def convert(value, :m), do: value * 1000.0
def convert(value, :km), do: value * 1_000_000.0
def convert(value, :in), do: value * 25.4

Better alternative:

@multipliers %{mm: 1.0, cm: 10.0, m: 1000.0, km: 1_000_000.0, in: 25.4}

def convert(value, unit) when is_map_key(@multipliers, unit) do
  value * @multipliers[unit]
end

Why: When clauses share identical structure and differ only by data, a lookup table is cleaner and more maintainable. Multi-clause dispatch shines when each case has genuinely different logic, not just different constants.

12. Lazy Private Helpers with `defp parts_to_index`

Source: lib/elixir/lib/string.ex#L562

What it does: Tiny private helpers that convert between API-level concepts and implementation-level values use single-line defp with guards.

defp parts_to_index(:infinity), do: 0
defp parts_to_index(n) when is_integer(n) and n > 0, do: n

Why: Clear, self-documenting dispatch. Each case is one line. No branching logic in the caller. The function name explains the conversion.

Anti-pattern: Inline conditional in the caller:

# BAD — logic scattered in caller
index = if parts == :infinity, do: 0, else: parts

When to Use

Triggers: You have a small, well-defined mapping between API-level values and internal representations. The conversion appears in multiple places, or the mapping is non-obvious enough to deserve a name.

Example — before:

def fetch(resource, timeout) do
  ms = if timeout == :infinity, do: 0, else: timeout * 1000
  do_fetch(resource, ms)
end

Example — after:

def fetch(resource, timeout) do
  do_fetch(resource, timeout_to_ms(timeout))
end

defp timeout_to_ms(:infinity), do: :infinity
defp timeout_to_ms(seconds) when is_number(seconds) and seconds >= 0, do: round(seconds * 1000)

When NOT to Use

Don't use this when: The conversion is trivial and only used once (a single if is clearer than a named function for x + 1), or when the mapping has many entries that would be better served by a lookup map.

Over-application example:

# Over-engineered — named function for trivial identity-like conversion
defp ensure_string(s) when is_binary(s), do: s
defp ensure_string(a) when is_atom(a), do: Atom.to_string(a)

# Used exactly once:
def log(msg), do: IO.puts(ensure_string(msg))

Better alternative:

def log(msg) when is_binary(msg), do: IO.puts(msg)
def log(msg) when is_atom(msg), do: IO.puts(Atom.to_string(msg))

Why: When a conversion is used exactly once and the calling function already dispatches on clauses, folding the conversion into the caller's clauses reduces indirection. Named helpers shine when reused or when they name a non-obvious transformation.

Decision Tree

If you accept "any enumerable" but lists are the common case → add a when is_list clause before protocol dispatch (Pattern 1)
If you are building a result list element-by-element and order matters → prepend with [x | acc] then reverse at the end (Pattern 2)
If data flows through 2+ sequential transformations → use the pipe operator (Pattern 3)
If you call Enumerable.reduce/3 directly and always want the accumulated value → unwrap with |> elem(1) (Pattern 4)
If you need a recursive function with multiple termination conditions → decompose into public entry + private multi-clause worker (Pattern 5)
If the collection is large/infinite or you chain 3+ transforms → use Stream; otherwise use Enum (Pattern 6)
If the new value depends on the old value (increment, append) → use Map.update/4; if replacing unconditionally → use Map.put/3 (Pattern 7)
If you need to branch on whether a key exists and extract the value → pattern-match with %{^key => value} in a case (Pattern 8)
If an Erlang function has identical semantics and argument order → use defdelegate (Pattern 9)
If you are implementing a custom iterable data structure → implement Enumerable.reduce/3 to get the full Enum API (Pattern 10)

36 KiB Raw Permalink Blame History Unescape Escape

Data Transform & Pipeline Patterns in Elixir Core

Contents

1. List-Specialized Clause Before Protocol Dispatch

When to Use

When NOT to Use

2. Build-Then-Reverse (Cons-Cell Accumulation)

When to Use

When NOT to Use

3. Pipeline for Linear Transformations, Bare Calls for Control Flow

When to Use

When NOT to Use

4. Pipeline Ending with |> elem(1) (Protocol Reduce Unwrap)

When to Use

When NOT to Use

5. Private Helper Decomposition: Recursive Workers with Guards

When to Use

When NOT to Use

6. Enum vs Stream Decision Pattern

When to Use

When NOT to Use

7. Map.update vs Map.put Decision Pattern

When to Use

When NOT to Use

8. Pattern Matching on Map Structure for Dispatch

When to Use

When NOT to Use

9. Delegating to Erlang BIFs with defdelegate

When to Use

When NOT to Use

10. Reduce as the Universal Primitive

When to Use

When NOT to Use

11. Keyword Multi-Clause Guard Dispatch (String.split pattern)

When to Use

When NOT to Use

12. Lazy Private Helpers with defp parts_to_index

When to Use

When NOT to Use

Decision Tree

36 KiB

Raw Permalink Blame History

4. Pipeline Ending with `|> elem(1)` (Protocol Reduce Unwrap)

9. Delegating to Erlang BIFs with `defdelegate`

12. Lazy Private Helpers with `defp parts_to_index`