Files
elixir-patterns/patterns/data-transforms.md
T
Aaron Weiker 4ea9a884aa docs: idiomatic Elixir and Phoenix patterns with source citations
Extracted patterns, conventions, and code smells directly from the
Elixir and Phoenix source code with file path and line number citations.

Covers: GenServer, error handling, data transforms, process design,
testing, documentation, typespecs, macros, behaviours, module organization,
Phoenix-specific patterns, framework deviations, and anti-patterns.
2026-04-29 22:50:12 -07:00

449 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Data Transform & Pipeline Patterns in Elixir Core
Patterns extracted from Elixir's standard library source code.
---
## 1. List-Specialized Clause Before Protocol Dispatch
**Source:** `lib/elixir/lib/enum.ex` lines 17231733
**What it does:** Every public Enum function defines a `when is_list(enumerable)` clause first, then a generic fallback that uses the Enumerable protocol.
```elixir
def map(enumerable, fun) when is_list(enumerable) do
:lists.map(fun, enumerable)
end
def map(first..last//step, fun) do
map_range(first, last, step, fun)
end
def map(enumerable, fun) do
reduce(enumerable, [], R.map(fun)) |> :lists.reverse()
end
```
**Why:** Lists are by far the most common enumerable. Matching them first avoids protocol dispatch overhead entirely (direct Erlang BIF call). The range clause is a further optimization for a common case. The generic clause handles all other enumerables through the protocol.
**Anti-pattern:** A single clause that always goes through protocol dispatch:
```elixir
# BAD — forces protocol overhead even for lists
def map(enumerable, fun) do
Enumerable.reduce(enumerable, {:cont, []}, fn x, acc ->
{:cont, [fun.(x) | acc]}
end) |> elem(1) |> :lists.reverse()
end
```
---
## 2. Build-Then-Reverse (Cons-Cell Accumulation)
**Source:** `lib/elixir/lib/enum.ex` lines 1124, 1733, 2697
**What it does:** Accumulates results by prepending to a list (`[x | acc]`), then reverses at the end.
```elixir
# filter (line 1124)
def filter(enumerable, fun) do
reduce(enumerable, [], R.filter(fun)) |> :lists.reverse()
end
# map (line 1733)
def map(enumerable, fun) do
reduce(enumerable, [], R.map(fun)) |> :lists.reverse()
end
# reject (line 2697)
def reject(enumerable, fun) do
reduce(enumerable, [], R.reject(fun)) |> :lists.reverse()
end
```
**Why:** Prepending to a linked list is O(1); appending is O(n). Building reversed then flipping once is O(n) total. Appending each element would be O(n²).
**Anti-pattern:** Appending to the end of a list in a loop:
```elixir
# BAD — O(n²)
def map(enumerable, fun) do
reduce(enumerable, [], fn x, acc -> acc ++ [fun.(x)] end)
end
```
---
## 3. Pipeline for Linear Transformations, Bare Calls for Control Flow
**Source:** `lib/elixir/lib/enum.ex` lines 16841685, 1551, vs 496502
**What it does:** Elixir core uses `|>` when data flows linearly through 2+ transformations. It does NOT use `|>` for single-step operations or when the first argument is computed by a `case`/`if`/`with`.
```elixir
# Pipeline: data flows through multiple transforms (line 1684-1685)
def map_join(enumerable, joiner \\ "", mapper) do
enumerable
|> map(&entry_to_string(mapper.(&1)))
|> intersperse(joiner)
|> IO.iodata_to_binary()
end
# NO pipeline: single step or control flow (line 496-502)
def at(enumerable, index, default \\ nil) when is_integer(index) do
case slice_forward(enumerable, index, 1, 1) do
[value] -> value
[] -> default
end
end
```
**Why:** Pipelines communicate "data flows through transformations." Using them for a single function call or wrapping around control flow obscures intent rather than clarifying it.
**Anti-pattern:** Pipelines for single operations or wrapping control flow:
```elixir
# BAD — single step, no pipeline needed
list |> Enum.reverse()
# BAD — control flow awkwardly forced into a pipe
result
|> case do
{:ok, x} -> x
:error -> nil
end
```
---
## 4. Pipeline Ending with `|> elem(1)` (Protocol Reduce Unwrap)
**Source:** `lib/elixir/lib/enum.ex` lines 363, 403, 433, 468, 725, 1022, 2676
**What it does:** When calling `Enumerable.reduce/3` directly, the result is always `{:done | :halted | :suspended, acc}`. Core extracts the accumulator with `|> elem(1)`.
```elixir
# all?/1 (line 363)
def all?(enumerable) do
Enumerable.reduce(enumerable, {:cont, true}, fn entry, _ ->
if entry, do: {:cont, true}, else: {:halt, false}
end)
|> elem(1)
end
# reduce/3 (line 2676)
def reduce(enumerable, acc, fun) do
Enumerable.reduce(enumerable, {:cont, acc}, fun) |> elem(1)
end
```
**Why:** The protocol returns tagged tuples for the state machine (cont/halt/suspend). End users don't need the tag — only the accumulated value. `|> elem(1)` is the idiomatic unwrap.
**Anti-pattern:** Using `case` when you don't care about the tag:
```elixir
# BAD — unnecessary pattern match when you always want the value
case Enumerable.reduce(enumerable, {:cont, acc}, fun) do
{:done, result} -> result
{:halted, result} -> result
end
```
---
## 5. Private Helper Decomposition: Recursive Workers with Guards
**Source:** `lib/elixir/lib/enum.ex` lines 49754995, 50255039
**What it does:** Complex operations are split into a public entry point (with validation guards) and a private recursive worker function. The worker uses pattern matching on structure (empty list, head|tail) and guards on counters.
```elixir
# Public entry: validates, delegates (line 890-904)
def drop(enumerable, amount)
when is_list(enumerable) and is_integer(amount) and amount >= 0 do
drop_list(enumerable, amount)
end
# Private worker: pattern matches on structure (line 4975-4983)
defp split_list([head | tail], counter, acc) when counter > 0 do
split_list(tail, counter - 1, [head | acc])
end
defp split_list(list, 0, acc) do
{:lists.reverse(acc), list}
end
defp split_list([], _, acc) do
{:lists.reverse(acc), []}
end
```
**Why:** Separating validation from recursion keeps each clause focused. Guards in function heads enable the BEAM to optimize dispatch with jump tables. No runtime `if`/`cond` needed.
**Anti-pattern:** Mixing validation, edge cases, and recursion in a single function with internal conditionals:
```elixir
# BAD — one big function with nested ifs
defp split_list(list, counter, acc) do
if counter > 0 and list != [] do
[head | tail] = list
split_list(tail, counter - 1, [head | acc])
else
{:lists.reverse(acc), list}
end
end
```
---
## 6. Enum vs Stream Decision Pattern
**Source:** `lib/elixir/lib/stream.ex` lines 180 (module docs), `lib/elixir/lib/enum.ex`
**What it does:** Enum functions are eager (materialize intermediate lists). Stream functions are lazy (build computation recipes). Core uses Stream for:
- Infinite sequences (`cycle`, `iterate`, `repeatedly`)
- Resource management (`resource/3`)
- Composing transformations to execute in a single pass
```elixir
# Stream: builds a recipe, zero computation until consumed (stream.ex ~line 490)
def map(enum, fun) when is_function(fun, 1) do
lazy(enum, fn f1 -> R.map(fun, f1) end)
end
# Enum: immediately materializes the result (enum.ex line 1723)
def map(enumerable, fun) when is_list(enumerable) do
:lists.map(fun, enumerable)
end
```
**Why:** From Stream docs (lines 3741): "When chaining many operations with `Enum`, intermediate lists are created, while `Stream` creates a recipe of computations that are executed at a later moment."
Use Enum when:
- You need the full result now
- The collection is small/bounded
- You only chain 12 operations
Use Stream when:
- The collection is large or infinite
- You chain many transformations
- You need resource cleanup (file handles, network)
- You want single-pass processing
**Anti-pattern:** Using Stream for small bounded collections (overhead of the lazy machinery exceeds any benefit):
```elixir
# BAD — Stream overhead for trivial transform
[1, 2, 3] |> Stream.map(&(&1 * 2)) |> Enum.to_list()
# GOOD — just use Enum
[1, 2, 3] |> Enum.map(&(&1 * 2))
```
---
## 7. Map.update vs Map.put Decision Pattern
**Source:** `lib/elixir/lib/map.ex` lines 670700
**What it does:** `Map.update/4` transforms an existing value based on its current state. `Map.put/3` unconditionally sets a value regardless of current state.
```elixir
# Map.update/4 (line 682-693): transform based on current value
def update(map, key, default, fun) when is_function(fun, 1) do
case map do
%{^key => value} ->
%{map | key => fun.(value)}
%{} ->
put(map, key, default)
other ->
:erlang.error({:badmap, other}, [map, key, default, fun])
end
end
# Map.put/3 (line 636): unconditional set
def put(map, key, value) do
:maps.put(key, value, map)
end
```
**Why:** `update/4` is for when the new value depends on the old value (counters, appending to nested lists). `put/3` is for when you know the exact new value regardless of what was there.
**Anti-pattern:** Using `get` + `put` when `update` expresses intent:
```elixir
# BAD — two lookups, unclear intent
count = Map.get(map, :count, 0)
Map.put(map, :count, count + 1)
# GOOD — single lookup, clear intent
Map.update(map, :count, 1, &(&1 + 1))
```
---
## 8. Pattern Matching on Map Structure for Dispatch
**Source:** `lib/elixir/lib/map.ex` lines 398, 509, 586
**What it does:** Map functions use `case map do %{^key => value} -> ...` to dispatch on whether a key exists, rather than calling `has_key?` + conditional.
```elixir
# Map.get/3 (line 586-594)
def get(map, key, default \\ nil) do
case map do
%{^key => value} ->
value
%{} ->
default
other ->
:erlang.error({:badmap, other}, [map, key, default])
end
end
# Map.put_new/3 (line 398-407)
def put_new(map, key, value) do
case map do
%{^key => _value} ->
map
%{} ->
put(map, key, value)
other ->
:erlang.error({:badmap, other})
end
end
```
**Why:** Pattern matching with `%{^key => value}` does the lookup AND extraction in one step. The `%{}` clause (empty map pattern) matches any map where the key is NOT present. The `other` clause provides a clear error for non-maps. This is both more efficient and more readable than `if Map.has_key?(map, key)`.
**Anti-pattern:**
```elixir
# BAD — double lookup, less clear
def get(map, key, default) do
if Map.has_key?(map, key) do
:maps.get(key, map)
else
default
end
end
```
---
## 9. Delegating to Erlang BIFs with `defdelegate`
**Source:** `lib/elixir/lib/map.ex` lines 127, 143, 159, 173
**What it does:** When an Erlang function already does exactly what's needed, Elixir delegates directly rather than wrapping.
```elixir
@spec keys(map) :: [key]
defdelegate keys(map), to: :maps
@spec values(map) :: [value]
defdelegate values(map), to: :maps
@spec merge(map, map) :: map
defdelegate merge(map1, map2), to: :maps
```
**Why:** Zero overhead — the compiler inlines these. No point wrapping an Erlang BIF just to have an Elixir wrapper when the semantics are identical. The `@compile {:inline, ...}` annotation on line 115 makes this explicit.
**Anti-pattern:** Wrapping without adding value:
```elixir
# BAD — pointless wrapper
def keys(map) do
:maps.keys(map)
end
```
---
## 10. Reduce as the Universal Primitive
**Source:** `lib/elixir/lib/enum.ex` lines 1921, 26602676
**What it does:** Nearly every Enum operation is built on top of `reduce`. The Enumerable protocol's core function is `reduce/3`. Everything else (`count`, `member?`, `slice`) is an optimization hint.
```elixir
# From the protocol docs (line 19-21):
def map(enumerable, fun) do
reducer = fn x, acc -> {:cont, [fun.(x) | acc]} end
Enumerable.reduce(enumerable, {:cont, []}, reducer) |> elem(1) |> :lists.reverse()
end
# The actual reduce/3 (line 2676):
def reduce(enumerable, acc, fun) do
Enumerable.reduce(enumerable, {:cont, acc}, fun) |> elem(1)
end
```
**Why:** Reduce is the most general iteration primitive. By building all operations on reduce, any data structure that implements `Enumerable.reduce/3` automatically gets the full `Enum` API. This is the protocol + reduce = universal composability pattern.
**Anti-pattern:** Implementing each Enum function independently for each data structure:
```elixir
# BAD — reimplementing map for each type
def map(%MyStruct{items: items}, fun), do: ...
def filter(%MyStruct{items: items}, fun), do: ...
# Instead: implement Enumerable.reduce/3 once and get everything
```
---
## 11. Keyword Multi-Clause Guard Dispatch (String.split pattern)
**Source:** `lib/elixir/lib/string.ex` lines 516563
**What it does:** Functions with many input shapes use multiple `def` clauses with guards to dispatch, handling each case distinctly rather than using internal `cond`/`case`.
```elixir
def split(string, %Regex{} = pattern, options) when is_binary(string) and is_list(options) do
Regex.split(pattern, string, options)
end
def split(string, "", options) when is_binary(string) and is_list(options) do
# special case: split by empty string (grapheme-by-grapheme)
...
end
def split(string, [], options) when is_binary(string) and is_list(options) do
# empty pattern list: no splitting
...
end
def split(string, pattern, options) when is_binary(string) and is_list(options) do
# general binary pattern case
...
end
```
**Why:** Each clause has a single responsibility. The BEAM compiler generates efficient dispatch for these patterns. Adding a new case is additive (new clause) rather than modifying existing logic.
**Anti-pattern:** One function with nested conditionals:
```elixir
# BAD — all cases mashed into one body
def split(string, pattern, options) do
cond do
is_struct(pattern, Regex) -> ...
pattern == "" -> ...
pattern == [] -> ...
true -> ...
end
end
```
---
## 12. Lazy Private Helpers with `defp parts_to_index`
**Source:** `lib/elixir/lib/string.ex` lines 562563
**What it does:** Tiny private helpers that convert between API-level concepts and implementation-level values use single-line `defp` with guards.
```elixir
defp parts_to_index(:infinity), do: 0
defp parts_to_index(n) when is_integer(n) and n > 0, do: n
```
**Why:** Clear, self-documenting dispatch. Each case is one line. No branching logic in the caller. The function name explains the conversion.
**Anti-pattern:** Inline conditional in the caller:
```elixir
# BAD — logic scattered in caller
index = if parts == :infinity, do: 0, else: parts
```