4ea9a884aa
Extracted patterns, conventions, and code smells directly from the Elixir and Phoenix source code with file path and line number citations. Covers: GenServer, error handling, data transforms, process design, testing, documentation, typespecs, macros, behaviours, module organization, Phoenix-specific patterns, framework deviations, and anti-patterns.
449 lines
14 KiB
Markdown
449 lines
14 KiB
Markdown
# Data Transform & Pipeline Patterns in Elixir Core
|
||
|
||
Patterns extracted from Elixir's standard library source code.
|
||
|
||
---
|
||
|
||
## 1. List-Specialized Clause Before Protocol Dispatch
|
||
|
||
**Source:** `lib/elixir/lib/enum.ex` lines 1723–1733
|
||
|
||
**What it does:** Every public Enum function defines a `when is_list(enumerable)` clause first, then a generic fallback that uses the Enumerable protocol.
|
||
|
||
```elixir
|
||
def map(enumerable, fun) when is_list(enumerable) do
|
||
:lists.map(fun, enumerable)
|
||
end
|
||
|
||
def map(first..last//step, fun) do
|
||
map_range(first, last, step, fun)
|
||
end
|
||
|
||
def map(enumerable, fun) do
|
||
reduce(enumerable, [], R.map(fun)) |> :lists.reverse()
|
||
end
|
||
```
|
||
|
||
**Why:** Lists are by far the most common enumerable. Matching them first avoids protocol dispatch overhead entirely (direct Erlang BIF call). The range clause is a further optimization for a common case. The generic clause handles all other enumerables through the protocol.
|
||
|
||
**Anti-pattern:** A single clause that always goes through protocol dispatch:
|
||
```elixir
|
||
# BAD — forces protocol overhead even for lists
|
||
def map(enumerable, fun) do
|
||
Enumerable.reduce(enumerable, {:cont, []}, fn x, acc ->
|
||
{:cont, [fun.(x) | acc]}
|
||
end) |> elem(1) |> :lists.reverse()
|
||
end
|
||
```
|
||
|
||
---
|
||
|
||
## 2. Build-Then-Reverse (Cons-Cell Accumulation)
|
||
|
||
**Source:** `lib/elixir/lib/enum.ex` lines 1124, 1733, 2697
|
||
|
||
**What it does:** Accumulates results by prepending to a list (`[x | acc]`), then reverses at the end.
|
||
|
||
```elixir
|
||
# filter (line 1124)
|
||
def filter(enumerable, fun) do
|
||
reduce(enumerable, [], R.filter(fun)) |> :lists.reverse()
|
||
end
|
||
|
||
# map (line 1733)
|
||
def map(enumerable, fun) do
|
||
reduce(enumerable, [], R.map(fun)) |> :lists.reverse()
|
||
end
|
||
|
||
# reject (line 2697)
|
||
def reject(enumerable, fun) do
|
||
reduce(enumerable, [], R.reject(fun)) |> :lists.reverse()
|
||
end
|
||
```
|
||
|
||
**Why:** Prepending to a linked list is O(1); appending is O(n). Building reversed then flipping once is O(n) total. Appending each element would be O(n²).
|
||
|
||
**Anti-pattern:** Appending to the end of a list in a loop:
|
||
```elixir
|
||
# BAD — O(n²)
|
||
def map(enumerable, fun) do
|
||
reduce(enumerable, [], fn x, acc -> acc ++ [fun.(x)] end)
|
||
end
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Pipeline for Linear Transformations, Bare Calls for Control Flow
|
||
|
||
**Source:** `lib/elixir/lib/enum.ex` lines 1684–1685, 1551, vs 496–502
|
||
|
||
**What it does:** Elixir core uses `|>` when data flows linearly through 2+ transformations. It does NOT use `|>` for single-step operations or when the first argument is computed by a `case`/`if`/`with`.
|
||
|
||
```elixir
|
||
# Pipeline: data flows through multiple transforms (line 1684-1685)
|
||
def map_join(enumerable, joiner \\ "", mapper) do
|
||
enumerable
|
||
|> map(&entry_to_string(mapper.(&1)))
|
||
|> intersperse(joiner)
|
||
|> IO.iodata_to_binary()
|
||
end
|
||
|
||
# NO pipeline: single step or control flow (line 496-502)
|
||
def at(enumerable, index, default \\ nil) when is_integer(index) do
|
||
case slice_forward(enumerable, index, 1, 1) do
|
||
[value] -> value
|
||
[] -> default
|
||
end
|
||
end
|
||
```
|
||
|
||
**Why:** Pipelines communicate "data flows through transformations." Using them for a single function call or wrapping around control flow obscures intent rather than clarifying it.
|
||
|
||
**Anti-pattern:** Pipelines for single operations or wrapping control flow:
|
||
```elixir
|
||
# BAD — single step, no pipeline needed
|
||
list |> Enum.reverse()
|
||
|
||
# BAD — control flow awkwardly forced into a pipe
|
||
result
|
||
|> case do
|
||
{:ok, x} -> x
|
||
:error -> nil
|
||
end
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Pipeline Ending with `|> elem(1)` (Protocol Reduce Unwrap)
|
||
|
||
**Source:** `lib/elixir/lib/enum.ex` lines 363, 403, 433, 468, 725, 1022, 2676
|
||
|
||
**What it does:** When calling `Enumerable.reduce/3` directly, the result is always `{:done | :halted | :suspended, acc}`. Core extracts the accumulator with `|> elem(1)`.
|
||
|
||
```elixir
|
||
# all?/1 (line 363)
|
||
def all?(enumerable) do
|
||
Enumerable.reduce(enumerable, {:cont, true}, fn entry, _ ->
|
||
if entry, do: {:cont, true}, else: {:halt, false}
|
||
end)
|
||
|> elem(1)
|
||
end
|
||
|
||
# reduce/3 (line 2676)
|
||
def reduce(enumerable, acc, fun) do
|
||
Enumerable.reduce(enumerable, {:cont, acc}, fun) |> elem(1)
|
||
end
|
||
```
|
||
|
||
**Why:** The protocol returns tagged tuples for the state machine (cont/halt/suspend). End users don't need the tag — only the accumulated value. `|> elem(1)` is the idiomatic unwrap.
|
||
|
||
**Anti-pattern:** Using `case` when you don't care about the tag:
|
||
```elixir
|
||
# BAD — unnecessary pattern match when you always want the value
|
||
case Enumerable.reduce(enumerable, {:cont, acc}, fun) do
|
||
{:done, result} -> result
|
||
{:halted, result} -> result
|
||
end
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Private Helper Decomposition: Recursive Workers with Guards
|
||
|
||
**Source:** `lib/elixir/lib/enum.ex` lines 4975–4995, 5025–5039
|
||
|
||
**What it does:** Complex operations are split into a public entry point (with validation guards) and a private recursive worker function. The worker uses pattern matching on structure (empty list, head|tail) and guards on counters.
|
||
|
||
```elixir
|
||
# Public entry: validates, delegates (line 890-904)
|
||
def drop(enumerable, amount)
|
||
when is_list(enumerable) and is_integer(amount) and amount >= 0 do
|
||
drop_list(enumerable, amount)
|
||
end
|
||
|
||
# Private worker: pattern matches on structure (line 4975-4983)
|
||
defp split_list([head | tail], counter, acc) when counter > 0 do
|
||
split_list(tail, counter - 1, [head | acc])
|
||
end
|
||
|
||
defp split_list(list, 0, acc) do
|
||
{:lists.reverse(acc), list}
|
||
end
|
||
|
||
defp split_list([], _, acc) do
|
||
{:lists.reverse(acc), []}
|
||
end
|
||
```
|
||
|
||
**Why:** Separating validation from recursion keeps each clause focused. Guards in function heads enable the BEAM to optimize dispatch with jump tables. No runtime `if`/`cond` needed.
|
||
|
||
**Anti-pattern:** Mixing validation, edge cases, and recursion in a single function with internal conditionals:
|
||
```elixir
|
||
# BAD — one big function with nested ifs
|
||
defp split_list(list, counter, acc) do
|
||
if counter > 0 and list != [] do
|
||
[head | tail] = list
|
||
split_list(tail, counter - 1, [head | acc])
|
||
else
|
||
{:lists.reverse(acc), list}
|
||
end
|
||
end
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Enum vs Stream Decision Pattern
|
||
|
||
**Source:** `lib/elixir/lib/stream.ex` lines 1–80 (module docs), `lib/elixir/lib/enum.ex`
|
||
|
||
**What it does:** Enum functions are eager (materialize intermediate lists). Stream functions are lazy (build computation recipes). Core uses Stream for:
|
||
- Infinite sequences (`cycle`, `iterate`, `repeatedly`)
|
||
- Resource management (`resource/3`)
|
||
- Composing transformations to execute in a single pass
|
||
|
||
```elixir
|
||
# Stream: builds a recipe, zero computation until consumed (stream.ex ~line 490)
|
||
def map(enum, fun) when is_function(fun, 1) do
|
||
lazy(enum, fn f1 -> R.map(fun, f1) end)
|
||
end
|
||
|
||
# Enum: immediately materializes the result (enum.ex line 1723)
|
||
def map(enumerable, fun) when is_list(enumerable) do
|
||
:lists.map(fun, enumerable)
|
||
end
|
||
```
|
||
|
||
**Why:** From Stream docs (lines 37–41): "When chaining many operations with `Enum`, intermediate lists are created, while `Stream` creates a recipe of computations that are executed at a later moment."
|
||
|
||
Use Enum when:
|
||
- You need the full result now
|
||
- The collection is small/bounded
|
||
- You only chain 1–2 operations
|
||
|
||
Use Stream when:
|
||
- The collection is large or infinite
|
||
- You chain many transformations
|
||
- You need resource cleanup (file handles, network)
|
||
- You want single-pass processing
|
||
|
||
**Anti-pattern:** Using Stream for small bounded collections (overhead of the lazy machinery exceeds any benefit):
|
||
```elixir
|
||
# BAD — Stream overhead for trivial transform
|
||
[1, 2, 3] |> Stream.map(&(&1 * 2)) |> Enum.to_list()
|
||
|
||
# GOOD — just use Enum
|
||
[1, 2, 3] |> Enum.map(&(&1 * 2))
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Map.update vs Map.put Decision Pattern
|
||
|
||
**Source:** `lib/elixir/lib/map.ex` lines 670–700
|
||
|
||
**What it does:** `Map.update/4` transforms an existing value based on its current state. `Map.put/3` unconditionally sets a value regardless of current state.
|
||
|
||
```elixir
|
||
# Map.update/4 (line 682-693): transform based on current value
|
||
def update(map, key, default, fun) when is_function(fun, 1) do
|
||
case map do
|
||
%{^key => value} ->
|
||
%{map | key => fun.(value)}
|
||
%{} ->
|
||
put(map, key, default)
|
||
other ->
|
||
:erlang.error({:badmap, other}, [map, key, default, fun])
|
||
end
|
||
end
|
||
|
||
# Map.put/3 (line 636): unconditional set
|
||
def put(map, key, value) do
|
||
:maps.put(key, value, map)
|
||
end
|
||
```
|
||
|
||
**Why:** `update/4` is for when the new value depends on the old value (counters, appending to nested lists). `put/3` is for when you know the exact new value regardless of what was there.
|
||
|
||
**Anti-pattern:** Using `get` + `put` when `update` expresses intent:
|
||
```elixir
|
||
# BAD — two lookups, unclear intent
|
||
count = Map.get(map, :count, 0)
|
||
Map.put(map, :count, count + 1)
|
||
|
||
# GOOD — single lookup, clear intent
|
||
Map.update(map, :count, 1, &(&1 + 1))
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Pattern Matching on Map Structure for Dispatch
|
||
|
||
**Source:** `lib/elixir/lib/map.ex` lines 398, 509, 586
|
||
|
||
**What it does:** Map functions use `case map do %{^key => value} -> ...` to dispatch on whether a key exists, rather than calling `has_key?` + conditional.
|
||
|
||
```elixir
|
||
# Map.get/3 (line 586-594)
|
||
def get(map, key, default \\ nil) do
|
||
case map do
|
||
%{^key => value} ->
|
||
value
|
||
%{} ->
|
||
default
|
||
other ->
|
||
:erlang.error({:badmap, other}, [map, key, default])
|
||
end
|
||
end
|
||
|
||
# Map.put_new/3 (line 398-407)
|
||
def put_new(map, key, value) do
|
||
case map do
|
||
%{^key => _value} ->
|
||
map
|
||
%{} ->
|
||
put(map, key, value)
|
||
other ->
|
||
:erlang.error({:badmap, other})
|
||
end
|
||
end
|
||
```
|
||
|
||
**Why:** Pattern matching with `%{^key => value}` does the lookup AND extraction in one step. The `%{}` clause (empty map pattern) matches any map where the key is NOT present. The `other` clause provides a clear error for non-maps. This is both more efficient and more readable than `if Map.has_key?(map, key)`.
|
||
|
||
**Anti-pattern:**
|
||
```elixir
|
||
# BAD — double lookup, less clear
|
||
def get(map, key, default) do
|
||
if Map.has_key?(map, key) do
|
||
:maps.get(key, map)
|
||
else
|
||
default
|
||
end
|
||
end
|
||
```
|
||
|
||
---
|
||
|
||
## 9. Delegating to Erlang BIFs with `defdelegate`
|
||
|
||
**Source:** `lib/elixir/lib/map.ex` lines 127, 143, 159, 173
|
||
|
||
**What it does:** When an Erlang function already does exactly what's needed, Elixir delegates directly rather than wrapping.
|
||
|
||
```elixir
|
||
@spec keys(map) :: [key]
|
||
defdelegate keys(map), to: :maps
|
||
|
||
@spec values(map) :: [value]
|
||
defdelegate values(map), to: :maps
|
||
|
||
@spec merge(map, map) :: map
|
||
defdelegate merge(map1, map2), to: :maps
|
||
```
|
||
|
||
**Why:** Zero overhead — the compiler inlines these. No point wrapping an Erlang BIF just to have an Elixir wrapper when the semantics are identical. The `@compile {:inline, ...}` annotation on line 115 makes this explicit.
|
||
|
||
**Anti-pattern:** Wrapping without adding value:
|
||
```elixir
|
||
# BAD — pointless wrapper
|
||
def keys(map) do
|
||
:maps.keys(map)
|
||
end
|
||
```
|
||
|
||
---
|
||
|
||
## 10. Reduce as the Universal Primitive
|
||
|
||
**Source:** `lib/elixir/lib/enum.ex` lines 19–21, 2660–2676
|
||
|
||
**What it does:** Nearly every Enum operation is built on top of `reduce`. The Enumerable protocol's core function is `reduce/3`. Everything else (`count`, `member?`, `slice`) is an optimization hint.
|
||
|
||
```elixir
|
||
# From the protocol docs (line 19-21):
|
||
def map(enumerable, fun) do
|
||
reducer = fn x, acc -> {:cont, [fun.(x) | acc]} end
|
||
Enumerable.reduce(enumerable, {:cont, []}, reducer) |> elem(1) |> :lists.reverse()
|
||
end
|
||
|
||
# The actual reduce/3 (line 2676):
|
||
def reduce(enumerable, acc, fun) do
|
||
Enumerable.reduce(enumerable, {:cont, acc}, fun) |> elem(1)
|
||
end
|
||
```
|
||
|
||
**Why:** Reduce is the most general iteration primitive. By building all operations on reduce, any data structure that implements `Enumerable.reduce/3` automatically gets the full `Enum` API. This is the protocol + reduce = universal composability pattern.
|
||
|
||
**Anti-pattern:** Implementing each Enum function independently for each data structure:
|
||
```elixir
|
||
# BAD — reimplementing map for each type
|
||
def map(%MyStruct{items: items}, fun), do: ...
|
||
def filter(%MyStruct{items: items}, fun), do: ...
|
||
# Instead: implement Enumerable.reduce/3 once and get everything
|
||
```
|
||
|
||
---
|
||
|
||
## 11. Keyword Multi-Clause Guard Dispatch (String.split pattern)
|
||
|
||
**Source:** `lib/elixir/lib/string.ex` lines 516–563
|
||
|
||
**What it does:** Functions with many input shapes use multiple `def` clauses with guards to dispatch, handling each case distinctly rather than using internal `cond`/`case`.
|
||
|
||
```elixir
|
||
def split(string, %Regex{} = pattern, options) when is_binary(string) and is_list(options) do
|
||
Regex.split(pattern, string, options)
|
||
end
|
||
|
||
def split(string, "", options) when is_binary(string) and is_list(options) do
|
||
# special case: split by empty string (grapheme-by-grapheme)
|
||
...
|
||
end
|
||
|
||
def split(string, [], options) when is_binary(string) and is_list(options) do
|
||
# empty pattern list: no splitting
|
||
...
|
||
end
|
||
|
||
def split(string, pattern, options) when is_binary(string) and is_list(options) do
|
||
# general binary pattern case
|
||
...
|
||
end
|
||
```
|
||
|
||
**Why:** Each clause has a single responsibility. The BEAM compiler generates efficient dispatch for these patterns. Adding a new case is additive (new clause) rather than modifying existing logic.
|
||
|
||
**Anti-pattern:** One function with nested conditionals:
|
||
```elixir
|
||
# BAD — all cases mashed into one body
|
||
def split(string, pattern, options) do
|
||
cond do
|
||
is_struct(pattern, Regex) -> ...
|
||
pattern == "" -> ...
|
||
pattern == [] -> ...
|
||
true -> ...
|
||
end
|
||
end
|
||
```
|
||
|
||
---
|
||
|
||
## 12. Lazy Private Helpers with `defp parts_to_index`
|
||
|
||
**Source:** `lib/elixir/lib/string.ex` lines 562–563
|
||
|
||
**What it does:** Tiny private helpers that convert between API-level concepts and implementation-level values use single-line `defp` with guards.
|
||
|
||
```elixir
|
||
defp parts_to_index(:infinity), do: 0
|
||
defp parts_to_index(n) when is_integer(n) and n > 0, do: n
|
||
```
|
||
|
||
**Why:** Clear, self-documenting dispatch. Each case is one line. No branching logic in the caller. The function name explains the conversion.
|
||
|
||
**Anti-pattern:** Inline conditional in the caller:
|
||
```elixir
|
||
# BAD — logic scattered in caller
|
||
index = if parts == :infinity, do: 0, else: parts
|
||
```
|