docs: add when/when-not to data-transforms

This commit is contained in:
Aaron Weiker
2026-04-30 05:38:33 -07:00
parent f23623250e
commit cb94a157a1
7 changed files with 3278 additions and 414 deletions
+563
View File
@@ -36,6 +36,51 @@ def map(enumerable, fun) do
end
```
### When to Use
**Triggers:** You have a public function that accepts "any enumerable" but lists account for the majority of callers. Profiling shows protocol dispatch is a measurable cost. You can call an Erlang BIF or a direct recursive implementation for the list case.
**Example — before:**
```elixir
def sum(enumerable) do
Enumerable.reduce(enumerable, {:cont, 0}, fn x, acc -> {:cont, acc + x} end)
|> elem(1)
end
```
**Example — after:**
```elixir
def sum(enumerable) when is_list(enumerable) do
:lists.foldl(fn x, acc -> acc + x end, 0, enumerable)
end
def sum(enumerable) do
Enumerable.reduce(enumerable, {:cont, 0}, fn x, acc -> {:cont, acc + x} end)
|> elem(1)
end
```
### When NOT to Use
**Don't use this when:** The function is rarely called with lists, or the function body is complex enough that maintaining two implementations creates a bug risk. Also avoid when the protocol path is already fast enough (micro-optimization for non-hot paths).
**Over-application example:**
```elixir
# Pointless — this function is only ever called with streams
def expensive_transform(enumerable) when is_list(enumerable) do
# duplicate complex logic just in case a list shows up
enumerable |> do_phase_1() |> do_phase_2() |> do_phase_3()
end
def expensive_transform(enumerable) do
enumerable |> do_phase_1() |> do_phase_2() |> do_phase_3()
end
```
**Better alternative:** Keep one clause. Add the specialization only when profiling proves the protocol dispatch is a bottleneck for real workloads.
**Why:** Premature optimization. Two copies of the same logic means two places to fix bugs. The BEAM's protocol dispatch is already highly optimized — you need evidence before duplicating.
---
## 2. Build-Then-Reverse (Cons-Cell Accumulation)
@@ -71,6 +116,55 @@ def map(enumerable, fun) do
end
```
### When to Use
**Triggers:** You're building a result list element-by-element through recursion or reduce, and the output order must match the input order. The collection can be any size.
**Example — before:**
```elixir
def keep_positives(list) do
Enum.reduce(list, [], fn x, acc ->
if x > 0, do: acc ++ [x], else: acc
end)
end
```
**Example — after:**
```elixir
def keep_positives(list) do
Enum.reduce(list, [], fn x, acc ->
if x > 0, do: [x | acc], else: acc
end)
|> :lists.reverse()
end
```
### When NOT to Use
**Don't use this when:** Order doesn't matter (e.g., building a set of unique items, collecting into a MapSet), or when you're only extracting a single value (sum, count, max). Also unnecessary if you're using `Enum.map/2` or `Enum.filter/2` directly — they already do this internally.
**Over-application example:**
```elixir
# Unnecessary — order doesn't matter for uniqueness
def unique_tags(items) do
Enum.reduce(items, [], fn item, acc ->
if item.tag in acc, do: acc, else: [item.tag | acc]
end)
|> :lists.reverse() # why reverse if you're just checking membership?
end
```
**Better alternative:** Use a MapSet or just don't reverse:
```elixir
def unique_tags(items) do
Enum.reduce(items, MapSet.new(), fn item, acc ->
MapSet.put(acc, item.tag)
end)
end
```
**Why:** The reverse adds O(n) work and a full list traversal. If you don't care about order, skip it. If you're collecting into a non-list structure, this pattern doesn't apply.
---
## 3. Pipeline for Linear Transformations, Bare Calls for Control Flow
@@ -112,6 +206,54 @@ result
end
```
### When to Use
**Triggers:** Data flows through 2 or more transformations in sequence, each taking the previous result as its first argument. The reader should see a "conveyor belt" of operations.
**Example — before:**
```elixir
def format_names(users) do
String.upcase(Enum.join(Enum.map(users, & &1.name), ", "))
end
```
**Example — after:**
```elixir
def format_names(users) do
users
|> Enum.map(& &1.name)
|> Enum.join(", ")
|> String.upcase()
end
```
### When NOT to Use
**Don't use this when:** There's only one transformation, the result needs to go into a pattern match (`case`/`with`), or the pipe would require anonymous function wrapping (`|> then(fn x -> ... end)`) to fit.
**Over-application example:**
```elixir
# Forced — then/1 wrapper defeats readability
params
|> Map.get(:user_id)
|> then(fn id ->
case Repo.get(User, id) do
nil -> {:error, :not_found}
user -> {:ok, user}
end
end)
```
**Better alternative:**
```elixir
case Repo.get(User, Map.get(params, :user_id)) do
nil -> {:error, :not_found}
user -> {:ok, user}
end
```
**Why:** Pipes are for linear data flow. When you need branching (case/cond/with), break out of the pipeline. Forcing control flow through `then/1` adds indirection without clarity.
---
## 4. Pipeline Ending with `|> elem(1)` (Protocol Reduce Unwrap)
@@ -146,6 +288,50 @@ case Enumerable.reduce(enumerable, {:cont, acc}, fun) do
end
```
### When to Use
**Triggers:** You're calling `Enumerable.reduce/3` directly (implementing a custom Enum-like function) and you always want the accumulated value regardless of whether iteration completed or halted.
**Example — before:**
```elixir
def sum_until(enumerable, limit) do
result = Enumerable.reduce(enumerable, {:cont, 0}, fn x, acc ->
new = acc + x
if new >= limit, do: {:halt, new}, else: {:cont, new}
end)
case result do
{:done, val} -> val
{:halted, val} -> val
end
end
```
**Example — after:**
```elixir
def sum_until(enumerable, limit) do
Enumerable.reduce(enumerable, {:cont, 0}, fn x, acc ->
new = acc + x
if new >= limit, do: {:halt, new}, else: {:cont, new}
end)
|> elem(1)
end
```
### When NOT to Use
**Don't use this when:** You need to distinguish between `:done` and `:halted` to decide subsequent behavior (e.g., you want to know if iteration was interrupted). Also don't use in application code where you should be using `Enum.reduce/3` (which handles unwrapping for you).
**Over-application example:**
```elixir
# Pointless — Enum.reduce already unwraps
Enum.reduce([1, 2, 3], 0, &(&1 + &2)) |> elem(1)
# This crashes! Enum.reduce returns the value directly, not a tuple.
```
**Better alternative:** Use `Enum.reduce/3` in application code. Only use the `|> elem(1)` pattern when directly calling `Enumerable.reduce/3` in library code.
**Why:** This pattern is for protocol implementers, not application developers. Using it on already-unwrapped results causes crashes. It's an internal idiom that shouldn't leak into regular code.
---
## 5. Private Helper Decomposition: Recursive Workers with Guards
@@ -190,6 +376,61 @@ defp split_list(list, counter, acc) do
end
```
### When to Use
**Triggers:** You're writing a recursive function that processes a list element-by-element with termination conditions (counter hits zero, list becomes empty, accumulator reaches a threshold). Multiple base cases exist.
**Example — before:**
```elixir
defp take_while_impl(list, fun, acc) do
case list do
[] -> :lists.reverse(acc)
[head | tail] ->
if fun.(head) do
take_while_impl(tail, fun, [head | acc])
else
:lists.reverse(acc)
end
end
end
```
**Example — after:**
```elixir
defp take_while_impl([], _fun, acc) do
:lists.reverse(acc)
end
defp take_while_impl([head | tail], fun, acc) do
if fun.(head) do
take_while_impl(tail, fun, [head | acc])
else
:lists.reverse(acc)
end
end
```
### When NOT to Use
**Don't use this when:** The logic doesn't recurse (a simple one-shot transformation), or when `Enum` functions already express the operation clearly. Don't decompose for decomposition's sake.
**Over-application example:**
```elixir
# Over-engineered — this is just Enum.take/2
defp my_take_list([], _n, acc), do: :lists.reverse(acc)
defp my_take_list(_list, 0, acc), do: :lists.reverse(acc)
defp my_take_list([h | t], n, acc), do: my_take_list(t, n - 1, [h | acc])
def my_take(list, n), do: my_take_list(list, n, [])
```
**Better alternative:**
```elixir
def my_take(list, n), do: Enum.take(list, n)
```
**Why:** Elixir's standard library already provides optimized implementations of common list operations. Writing your own recursive versions adds maintenance burden and likely performs worse (Enum's list clauses call Erlang BIFs). Reserve this pattern for genuinely novel recursion.
---
## 6. Enum vs Stream Decision Pattern
@@ -235,6 +476,54 @@ Use Stream when:
[1, 2, 3] |> Enum.map(&(&1 * 2))
```
### When to Use
**Triggers:** You're chaining 3+ transformations on a large (or unbounded) collection. You're reading from a file/network where you want backpressure. You need `Stream.resource/3` for cleanup guarantees.
**Example — before:**
```elixir
# Materializes 3 intermediate lists for a 1M-line file
File.read!("large.csv")
|> String.split("\n")
|> Enum.map(&String.trim/1)
|> Enum.filter(&(&1 != ""))
|> Enum.map(&parse_row/1)
|> Enum.take(100)
```
**Example — after:**
```elixir
# Single pass, constant memory, stops after 100
File.stream!("large.csv")
|> Stream.map(&String.trim/1)
|> Stream.reject(&(&1 == ""))
|> Stream.map(&parse_row/1)
|> Enum.take(100)
```
### When NOT to Use
**Don't use this when:** The collection is small and bounded (under ~1000 elements), you only apply 12 transformations, or you need random access to the full result. Stream's lazy machinery has overhead that exceeds the savings for small data.
**Over-application example:**
```elixir
# Stream overhead exceeds any benefit for 5 items
config = [:a, :b, :c, :d, :e]
config
|> Stream.map(&Atom.to_string/1)
|> Stream.map(&String.upcase/1)
|> Enum.to_list()
```
**Better alternative:**
```elixir
config
|> Enum.map(&(&1 |> Atom.to_string() |> String.upcase()))
```
**Why:** Stream wraps each step in a closure and creates a lazy struct. For small collections, the allocation and indirection cost more than just building the intermediate list. The breakeven point is roughly when collections exceed hundreds of elements AND you chain 3+ operations.
---
## 7. Map.update vs Map.put Decision Pattern
@@ -274,6 +563,42 @@ Map.put(map, :count, count + 1)
Map.update(map, :count, 1, &(&1 + 1))
```
### When to Use
**Triggers:** The new value is computed FROM the old value — incrementing counters, appending to lists, toggling booleans, merging nested maps. You also need a sensible default for the "key doesn't exist yet" case.
**Example — before:**
```elixir
def add_tag(state, tag) do
existing = Map.get(state, :tags, [])
Map.put(state, :tags, [tag | existing])
end
```
**Example — after:**
```elixir
def add_tag(state, tag) do
Map.update(state, :tags, [tag], fn tags -> [tag | tags] end)
end
```
### When NOT to Use
**Don't use this when:** The new value is independent of the old value (you're replacing, not transforming). Also avoid when you need to handle the "missing key" case differently from "present key" (use `Map.get_and_update/3` or explicit `case` instead).
**Over-application example:**
```elixir
# Awkward — the "update" function ignores the old value entirely
Map.update(user, :name, new_name, fn _old -> new_name end)
```
**Better alternative:**
```elixir
Map.put(user, :name, new_name)
```
**Why:** `Map.update/4` communicates "the new value depends on the old one." When you ignore the old value in the update function, you're lying to the reader. Use `put/3` for unconditional replacement — it's simpler and signals intent correctly.
---
## 8. Pattern Matching on Map Structure for Dispatch
@@ -322,6 +647,55 @@ def get(map, key, default) do
end
```
### When to Use
**Triggers:** You need to branch based on whether a key exists in a map, especially when you also want the value if it does exist. You want a single lookup that both checks existence and extracts the value.
**Example — before:**
```elixir
def fetch_config(config, key) do
if Map.has_key?(config, key) do
{:ok, Map.get(config, key)}
else
{:error, :missing}
end
end
```
**Example — after:**
```elixir
def fetch_config(config, key) do
case config do
%{^key => value} -> {:ok, value}
%{} -> {:error, :missing}
end
end
```
### When NOT to Use
**Don't use this when:** You're checking for multiple keys simultaneously (use a multi-key pattern match instead), or when `Map.get/3` with a default already expresses what you need. Don't use `case` dispatch for simple "get with fallback" scenarios.
**Over-application example:**
```elixir
# Over-engineered — Map.get/3 already does this
def get_name(user) do
case user do
%{:name => name} -> name
%{} -> "Anonymous"
end
end
```
**Better alternative:**
```elixir
def get_name(user) do
Map.get(user, :name, "Anonymous")
end
```
**Why:** `Map.get/3` already implements this exact pattern internally. Rewriting it as an explicit `case` adds visual noise without any semantic or performance benefit. Use the case pattern when you're doing something `Map.get` can't — like returning different tagged tuples or triggering side effects.
---
## 9. Delegating to Erlang BIFs with `defdelegate`
@@ -351,6 +725,49 @@ def keys(map) do
end
```
### When to Use
**Triggers:** An Erlang module already exports a function with the exact semantics you need. The argument order matches. You want to expose it under an Elixir-idiomatic name or in your module's namespace for discoverability.
**Example — before:**
```elixir
defmodule MyQueue do
def new, do: :queue.new()
def push(q, item), do: :queue.in(item, q)
def pop(q), do: :queue.out(q)
end
```
**Example — after:**
```elixir
defmodule MyQueue do
defdelegate new(), to: :queue
# Can't delegate push — argument order differs, needs wrapper
def push(q, item), do: :queue.in(item, q)
defdelegate pop(q), to: :queue, as: :out
end
```
### When NOT to Use
**Don't use this when:** You need to validate inputs, transform arguments, change argument order, add defaults, or adapt the return value. Also avoid when the Erlang function has unclear semantics that benefit from a documenting wrapper.
**Over-application example:**
```elixir
# Broken — Erlang arg order is (key, map), Elixir convention is (map, key)
defdelegate get(map, key), to: :maps
# This compiles but has wrong argument order expectations!
```
**Better alternative:**
```elixir
def get(map, key) do
:maps.get(key, map)
end
```
**Why:** `defdelegate` is a transparent pass-through. If argument order, defaults, validation, or error handling differ between your desired API and the Erlang function, you need a real wrapper. Delegating with a semantic mismatch creates subtle bugs.
---
## 10. Reduce as the Universal Primitive
@@ -382,6 +799,60 @@ def filter(%MyStruct{items: items}, fun), do: ...
# Instead: implement Enumerable.reduce/3 once and get everything
```
### When to Use
**Triggers:** You're implementing a custom data structure that should be iterable. You want the full `Enum` API without implementing each function. You're designing a protocol where one function provides maximum leverage.
**Example — before:**
```elixir
defmodule RingBuffer do
def map(%RingBuffer{} = rb, fun), do: ...
def filter(%RingBuffer{} = rb, fun), do: ...
def reduce(%RingBuffer{} = rb, acc, fun), do: ...
def count(%RingBuffer{} = rb), do: ...
# 70+ functions to implement...
end
```
**Example — after:**
```elixir
defimpl Enumerable, for: RingBuffer do
def reduce(%RingBuffer{data: data, head: h, size: s}, acc, fun) do
# One function — yields elements in order
do_reduce(data, h, s, acc, fun)
end
def count(%RingBuffer{size: s}), do: {:ok, s}
def member?(_, _), do: {:error, __MODULE__}
def slice(_), do: {:error, __MODULE__}
end
# Now Enum.map/filter/take/etc. all work automatically
```
### When NOT to Use
**Don't use this when:** Your data structure has specialized algorithms that are significantly faster than the generic reduce-based approach (e.g., binary search on a sorted structure for `member?`). In that case, implement the specific protocol callbacks.
**Over-application example:**
```elixir
# Wasteful — reduce traverses all elements to count a structure with O(1) size
defimpl Enumerable, for: SizedCollection do
def count(_), do: {:error, __MODULE__}
# This forces Enum.count to use reduce: O(n)
# when the size is stored in a field: O(1)
end
```
**Better alternative:**
```elixir
defimpl Enumerable, for: SizedCollection do
def count(%{size: s}), do: {:ok, s}
# Now Enum.count is O(1)
end
```
**Why:** The optimization callbacks (`count`, `member?`, `slice`) exist precisely because reduce is O(n) for operations that some structures can do faster. Use reduce as the universal fallback, but implement the fast paths when your structure supports them.
---
## 11. Keyword Multi-Clause Guard Dispatch (String.split pattern)
@@ -426,6 +897,54 @@ def split(string, pattern, options) do
end
```
### When to Use
**Triggers:** A function accepts multiple distinct input shapes (different types, specific sentinel values, structural patterns). Each shape requires substantially different handling. The shapes are distinguishable via guards or pattern matching.
**Example — before:**
```elixir
def parse(input, format) do
cond do
format == :json -> Jason.decode!(input)
format == :yaml -> YamlElixir.read_from_string!(input)
is_binary(format) -> custom_parse(input, format)
true -> raise "unknown format"
end
end
```
**Example — after:**
```elixir
def parse(input, :json) when is_binary(input), do: Jason.decode!(input)
def parse(input, :yaml) when is_binary(input), do: YamlElixir.read_from_string!(input)
def parse(input, format) when is_binary(input) and is_binary(format), do: custom_parse(input, format)
```
### When NOT to Use
**Don't use this when:** The differences between cases are minor (a single flag toggles a small behavior), or when you'd end up with 10+ nearly-identical clauses that differ by one line. Also avoid when the distinguishing condition can't be expressed in a guard (e.g., requires a database lookup).
**Over-application example:**
```elixir
# Absurd — 5 clauses that differ only in a multiplier
def convert(value, :mm), do: value * 1.0
def convert(value, :cm), do: value * 10.0
def convert(value, :m), do: value * 1000.0
def convert(value, :km), do: value * 1_000_000.0
def convert(value, :in), do: value * 25.4
```
**Better alternative:**
```elixir
@multipliers %{mm: 1.0, cm: 10.0, m: 1000.0, km: 1_000_000.0, in: 25.4}
def convert(value, unit) when is_map_key(@multipliers, unit) do
value * @multipliers[unit]
end
```
**Why:** When clauses share identical structure and differ only by data, a lookup table is cleaner and more maintainable. Multi-clause dispatch shines when each case has genuinely different logic, not just different constants.
---
## 12. Lazy Private Helpers with `defp parts_to_index`
@@ -446,3 +965,47 @@ defp parts_to_index(n) when is_integer(n) and n > 0, do: n
# BAD — logic scattered in caller
index = if parts == :infinity, do: 0, else: parts
```
### When to Use
**Triggers:** You have a small, well-defined mapping between API-level values and internal representations. The conversion appears in multiple places, or the mapping is non-obvious enough to deserve a name.
**Example — before:**
```elixir
def fetch(resource, timeout) do
ms = if timeout == :infinity, do: 0, else: timeout * 1000
do_fetch(resource, ms)
end
```
**Example — after:**
```elixir
def fetch(resource, timeout) do
do_fetch(resource, timeout_to_ms(timeout))
end
defp timeout_to_ms(:infinity), do: :infinity
defp timeout_to_ms(seconds) when is_number(seconds) and seconds >= 0, do: round(seconds * 1000)
```
### When NOT to Use
**Don't use this when:** The conversion is trivial and only used once (a single `if` is clearer than a named function for `x + 1`), or when the mapping has many entries that would be better served by a lookup map.
**Over-application example:**
```elixir
# Over-engineered — named function for trivial identity-like conversion
defp ensure_string(s) when is_binary(s), do: s
defp ensure_string(a) when is_atom(a), do: Atom.to_string(a)
# Used exactly once:
def log(msg), do: IO.puts(ensure_string(msg))
```
**Better alternative:**
```elixir
def log(msg) when is_binary(msg), do: IO.puts(msg)
def log(msg) when is_atom(msg), do: IO.puts(Atom.to_string(msg))
```
**Why:** When a conversion is used exactly once and the calling function already dispatches on clauses, folding the conversion into the caller's clauses reduces indirection. Named helpers shine when reused or when they name a non-obvious transformation.