Files
elixir-patterns/patterns/data-transforms.md
T
Aaron Weiker 4ea9a884aa docs: idiomatic Elixir and Phoenix patterns with source citations
Extracted patterns, conventions, and code smells directly from the
Elixir and Phoenix source code with file path and line number citations.

Covers: GenServer, error handling, data transforms, process design,
testing, documentation, typespecs, macros, behaviours, module organization,
Phoenix-specific patterns, framework deviations, and anti-patterns.
2026-04-29 22:50:12 -07:00

14 KiB
Raw Blame History

Data Transform & Pipeline Patterns in Elixir Core

Patterns extracted from Elixir's standard library source code.


1. List-Specialized Clause Before Protocol Dispatch

Source: lib/elixir/lib/enum.ex lines 17231733

What it does: Every public Enum function defines a when is_list(enumerable) clause first, then a generic fallback that uses the Enumerable protocol.

def map(enumerable, fun) when is_list(enumerable) do
  :lists.map(fun, enumerable)
end

def map(first..last//step, fun) do
  map_range(first, last, step, fun)
end

def map(enumerable, fun) do
  reduce(enumerable, [], R.map(fun)) |> :lists.reverse()
end

Why: Lists are by far the most common enumerable. Matching them first avoids protocol dispatch overhead entirely (direct Erlang BIF call). The range clause is a further optimization for a common case. The generic clause handles all other enumerables through the protocol.

Anti-pattern: A single clause that always goes through protocol dispatch:

# BAD — forces protocol overhead even for lists
def map(enumerable, fun) do
  Enumerable.reduce(enumerable, {:cont, []}, fn x, acc ->
    {:cont, [fun.(x) | acc]}
  end) |> elem(1) |> :lists.reverse()
end

2. Build-Then-Reverse (Cons-Cell Accumulation)

Source: lib/elixir/lib/enum.ex lines 1124, 1733, 2697

What it does: Accumulates results by prepending to a list ([x | acc]), then reverses at the end.

# filter (line 1124)
def filter(enumerable, fun) do
  reduce(enumerable, [], R.filter(fun)) |> :lists.reverse()
end

# map (line 1733)
def map(enumerable, fun) do
  reduce(enumerable, [], R.map(fun)) |> :lists.reverse()
end

# reject (line 2697)
def reject(enumerable, fun) do
  reduce(enumerable, [], R.reject(fun)) |> :lists.reverse()
end

Why: Prepending to a linked list is O(1); appending is O(n). Building reversed then flipping once is O(n) total. Appending each element would be O(n²).

Anti-pattern: Appending to the end of a list in a loop:

# BAD — O(n²)
def map(enumerable, fun) do
  reduce(enumerable, [], fn x, acc -> acc ++ [fun.(x)] end)
end

3. Pipeline for Linear Transformations, Bare Calls for Control Flow

Source: lib/elixir/lib/enum.ex lines 16841685, 1551, vs 496502

What it does: Elixir core uses |> when data flows linearly through 2+ transformations. It does NOT use |> for single-step operations or when the first argument is computed by a case/if/with.

# Pipeline: data flows through multiple transforms (line 1684-1685)
def map_join(enumerable, joiner \\ "", mapper) do
  enumerable
  |> map(&entry_to_string(mapper.(&1)))
  |> intersperse(joiner)
  |> IO.iodata_to_binary()
end

# NO pipeline: single step or control flow (line 496-502)
def at(enumerable, index, default \\ nil) when is_integer(index) do
  case slice_forward(enumerable, index, 1, 1) do
    [value] -> value
    [] -> default
  end
end

Why: Pipelines communicate "data flows through transformations." Using them for a single function call or wrapping around control flow obscures intent rather than clarifying it.

Anti-pattern: Pipelines for single operations or wrapping control flow:

# BAD — single step, no pipeline needed
list |> Enum.reverse()

# BAD — control flow awkwardly forced into a pipe
result
|> case do
  {:ok, x} -> x
  :error -> nil
end

4. Pipeline Ending with |> elem(1) (Protocol Reduce Unwrap)

Source: lib/elixir/lib/enum.ex lines 363, 403, 433, 468, 725, 1022, 2676

What it does: When calling Enumerable.reduce/3 directly, the result is always {:done | :halted | :suspended, acc}. Core extracts the accumulator with |> elem(1).

# all?/1 (line 363)
def all?(enumerable) do
  Enumerable.reduce(enumerable, {:cont, true}, fn entry, _ ->
    if entry, do: {:cont, true}, else: {:halt, false}
  end)
  |> elem(1)
end

# reduce/3 (line 2676)
def reduce(enumerable, acc, fun) do
  Enumerable.reduce(enumerable, {:cont, acc}, fun) |> elem(1)
end

Why: The protocol returns tagged tuples for the state machine (cont/halt/suspend). End users don't need the tag — only the accumulated value. |> elem(1) is the idiomatic unwrap.

Anti-pattern: Using case when you don't care about the tag:

# BAD — unnecessary pattern match when you always want the value
case Enumerable.reduce(enumerable, {:cont, acc}, fun) do
  {:done, result} -> result
  {:halted, result} -> result
end

5. Private Helper Decomposition: Recursive Workers with Guards

Source: lib/elixir/lib/enum.ex lines 49754995, 50255039

What it does: Complex operations are split into a public entry point (with validation guards) and a private recursive worker function. The worker uses pattern matching on structure (empty list, head|tail) and guards on counters.

# Public entry: validates, delegates (line 890-904)
def drop(enumerable, amount)
    when is_list(enumerable) and is_integer(amount) and amount >= 0 do
  drop_list(enumerable, amount)
end

# Private worker: pattern matches on structure (line 4975-4983)
defp split_list([head | tail], counter, acc) when counter > 0 do
  split_list(tail, counter - 1, [head | acc])
end

defp split_list(list, 0, acc) do
  {:lists.reverse(acc), list}
end

defp split_list([], _, acc) do
  {:lists.reverse(acc), []}
end

Why: Separating validation from recursion keeps each clause focused. Guards in function heads enable the BEAM to optimize dispatch with jump tables. No runtime if/cond needed.

Anti-pattern: Mixing validation, edge cases, and recursion in a single function with internal conditionals:

# BAD — one big function with nested ifs
defp split_list(list, counter, acc) do
  if counter > 0 and list != [] do
    [head | tail] = list
    split_list(tail, counter - 1, [head | acc])
  else
    {:lists.reverse(acc), list}
  end
end

6. Enum vs Stream Decision Pattern

Source: lib/elixir/lib/stream.ex lines 180 (module docs), lib/elixir/lib/enum.ex

What it does: Enum functions are eager (materialize intermediate lists). Stream functions are lazy (build computation recipes). Core uses Stream for:

  • Infinite sequences (cycle, iterate, repeatedly)
  • Resource management (resource/3)
  • Composing transformations to execute in a single pass
# Stream: builds a recipe, zero computation until consumed (stream.ex ~line 490)
def map(enum, fun) when is_function(fun, 1) do
  lazy(enum, fn f1 -> R.map(fun, f1) end)
end

# Enum: immediately materializes the result (enum.ex line 1723)
def map(enumerable, fun) when is_list(enumerable) do
  :lists.map(fun, enumerable)
end

Why: From Stream docs (lines 3741): "When chaining many operations with Enum, intermediate lists are created, while Stream creates a recipe of computations that are executed at a later moment."

Use Enum when:

  • You need the full result now
  • The collection is small/bounded
  • You only chain 12 operations

Use Stream when:

  • The collection is large or infinite
  • You chain many transformations
  • You need resource cleanup (file handles, network)
  • You want single-pass processing

Anti-pattern: Using Stream for small bounded collections (overhead of the lazy machinery exceeds any benefit):

# BAD — Stream overhead for trivial transform
[1, 2, 3] |> Stream.map(&(&1 * 2)) |> Enum.to_list()

# GOOD — just use Enum
[1, 2, 3] |> Enum.map(&(&1 * 2))

7. Map.update vs Map.put Decision Pattern

Source: lib/elixir/lib/map.ex lines 670700

What it does: Map.update/4 transforms an existing value based on its current state. Map.put/3 unconditionally sets a value regardless of current state.

# Map.update/4 (line 682-693): transform based on current value
def update(map, key, default, fun) when is_function(fun, 1) do
  case map do
    %{^key => value} ->
      %{map | key => fun.(value)}
    %{} ->
      put(map, key, default)
    other ->
      :erlang.error({:badmap, other}, [map, key, default, fun])
  end
end

# Map.put/3 (line 636): unconditional set
def put(map, key, value) do
  :maps.put(key, value, map)
end

Why: update/4 is for when the new value depends on the old value (counters, appending to nested lists). put/3 is for when you know the exact new value regardless of what was there.

Anti-pattern: Using get + put when update expresses intent:

# BAD — two lookups, unclear intent
count = Map.get(map, :count, 0)
Map.put(map, :count, count + 1)

# GOOD — single lookup, clear intent
Map.update(map, :count, 1, &(&1 + 1))

8. Pattern Matching on Map Structure for Dispatch

Source: lib/elixir/lib/map.ex lines 398, 509, 586

What it does: Map functions use case map do %{^key => value} -> ... to dispatch on whether a key exists, rather than calling has_key? + conditional.

# Map.get/3 (line 586-594)
def get(map, key, default \\ nil) do
  case map do
    %{^key => value} ->
      value
    %{} ->
      default
    other ->
      :erlang.error({:badmap, other}, [map, key, default])
  end
end

# Map.put_new/3 (line 398-407)
def put_new(map, key, value) do
  case map do
    %{^key => _value} ->
      map
    %{} ->
      put(map, key, value)
    other ->
      :erlang.error({:badmap, other})
  end
end

Why: Pattern matching with %{^key => value} does the lookup AND extraction in one step. The %{} clause (empty map pattern) matches any map where the key is NOT present. The other clause provides a clear error for non-maps. This is both more efficient and more readable than if Map.has_key?(map, key).

Anti-pattern:

# BAD — double lookup, less clear
def get(map, key, default) do
  if Map.has_key?(map, key) do
    :maps.get(key, map)
  else
    default
  end
end

9. Delegating to Erlang BIFs with defdelegate

Source: lib/elixir/lib/map.ex lines 127, 143, 159, 173

What it does: When an Erlang function already does exactly what's needed, Elixir delegates directly rather than wrapping.

@spec keys(map) :: [key]
defdelegate keys(map), to: :maps

@spec values(map) :: [value]
defdelegate values(map), to: :maps

@spec merge(map, map) :: map
defdelegate merge(map1, map2), to: :maps

Why: Zero overhead — the compiler inlines these. No point wrapping an Erlang BIF just to have an Elixir wrapper when the semantics are identical. The @compile {:inline, ...} annotation on line 115 makes this explicit.

Anti-pattern: Wrapping without adding value:

# BAD — pointless wrapper
def keys(map) do
  :maps.keys(map)
end

10. Reduce as the Universal Primitive

Source: lib/elixir/lib/enum.ex lines 1921, 26602676

What it does: Nearly every Enum operation is built on top of reduce. The Enumerable protocol's core function is reduce/3. Everything else (count, member?, slice) is an optimization hint.

# From the protocol docs (line 19-21):
def map(enumerable, fun) do
  reducer = fn x, acc -> {:cont, [fun.(x) | acc]} end
  Enumerable.reduce(enumerable, {:cont, []}, reducer) |> elem(1) |> :lists.reverse()
end

# The actual reduce/3 (line 2676):
def reduce(enumerable, acc, fun) do
  Enumerable.reduce(enumerable, {:cont, acc}, fun) |> elem(1)
end

Why: Reduce is the most general iteration primitive. By building all operations on reduce, any data structure that implements Enumerable.reduce/3 automatically gets the full Enum API. This is the protocol + reduce = universal composability pattern.

Anti-pattern: Implementing each Enum function independently for each data structure:

# BAD — reimplementing map for each type
def map(%MyStruct{items: items}, fun), do: ...
def filter(%MyStruct{items: items}, fun), do: ...
# Instead: implement Enumerable.reduce/3 once and get everything

11. Keyword Multi-Clause Guard Dispatch (String.split pattern)

Source: lib/elixir/lib/string.ex lines 516563

What it does: Functions with many input shapes use multiple def clauses with guards to dispatch, handling each case distinctly rather than using internal cond/case.

def split(string, %Regex{} = pattern, options) when is_binary(string) and is_list(options) do
  Regex.split(pattern, string, options)
end

def split(string, "", options) when is_binary(string) and is_list(options) do
  # special case: split by empty string (grapheme-by-grapheme)
  ...
end

def split(string, [], options) when is_binary(string) and is_list(options) do
  # empty pattern list: no splitting
  ...
end

def split(string, pattern, options) when is_binary(string) and is_list(options) do
  # general binary pattern case
  ...
end

Why: Each clause has a single responsibility. The BEAM compiler generates efficient dispatch for these patterns. Adding a new case is additive (new clause) rather than modifying existing logic.

Anti-pattern: One function with nested conditionals:

# BAD — all cases mashed into one body
def split(string, pattern, options) do
  cond do
    is_struct(pattern, Regex) -> ...
    pattern == "" -> ...
    pattern == [] -> ...
    true -> ...
  end
end

12. Lazy Private Helpers with defp parts_to_index

Source: lib/elixir/lib/string.ex lines 562563

What it does: Tiny private helpers that convert between API-level concepts and implementation-level values use single-line defp with guards.

defp parts_to_index(:infinity), do: 0
defp parts_to_index(n) when is_integer(n) and n > 0, do: n

Why: Clear, self-documenting dispatch. Each case is one line. No branching logic in the caller. The function name explains the conversion.

Anti-pattern: Inline conditional in the caller:

# BAD — logic scattered in caller
index = if parts == :infinity, do: 0, else: parts