# Data Transform & Pipeline Patterns in Elixir Core Patterns extracted from Elixir's standard library source code. --- ## 1. List-Specialized Clause Before Protocol Dispatch **Source:** `lib/elixir/lib/enum.ex` lines 1723–1733 **What it does:** Every public Enum function defines a `when is_list(enumerable)` clause first, then a generic fallback that uses the Enumerable protocol. ```elixir def map(enumerable, fun) when is_list(enumerable) do :lists.map(fun, enumerable) end def map(first..last//step, fun) do map_range(first, last, step, fun) end def map(enumerable, fun) do reduce(enumerable, [], R.map(fun)) |> :lists.reverse() end ``` **Why:** Lists are by far the most common enumerable. Matching them first avoids protocol dispatch overhead entirely (direct Erlang BIF call). The range clause is a further optimization for a common case. The generic clause handles all other enumerables through the protocol. **Anti-pattern:** A single clause that always goes through protocol dispatch: ```elixir # BAD — forces protocol overhead even for lists def map(enumerable, fun) do Enumerable.reduce(enumerable, {:cont, []}, fn x, acc -> {:cont, [fun.(x) | acc]} end) |> elem(1) |> :lists.reverse() end ``` --- ## 2. Build-Then-Reverse (Cons-Cell Accumulation) **Source:** `lib/elixir/lib/enum.ex` lines 1124, 1733, 2697 **What it does:** Accumulates results by prepending to a list (`[x | acc]`), then reverses at the end. ```elixir # filter (line 1124) def filter(enumerable, fun) do reduce(enumerable, [], R.filter(fun)) |> :lists.reverse() end # map (line 1733) def map(enumerable, fun) do reduce(enumerable, [], R.map(fun)) |> :lists.reverse() end # reject (line 2697) def reject(enumerable, fun) do reduce(enumerable, [], R.reject(fun)) |> :lists.reverse() end ``` **Why:** Prepending to a linked list is O(1); appending is O(n). Building reversed then flipping once is O(n) total. Appending each element would be O(n²). **Anti-pattern:** Appending to the end of a list in a loop: ```elixir # BAD — O(n²) def map(enumerable, fun) do reduce(enumerable, [], fn x, acc -> acc ++ [fun.(x)] end) end ``` --- ## 3. Pipeline for Linear Transformations, Bare Calls for Control Flow **Source:** `lib/elixir/lib/enum.ex` lines 1684–1685, 1551, vs 496–502 **What it does:** Elixir core uses `|>` when data flows linearly through 2+ transformations. It does NOT use `|>` for single-step operations or when the first argument is computed by a `case`/`if`/`with`. ```elixir # Pipeline: data flows through multiple transforms (line 1684-1685) def map_join(enumerable, joiner \\ "", mapper) do enumerable |> map(&entry_to_string(mapper.(&1))) |> intersperse(joiner) |> IO.iodata_to_binary() end # NO pipeline: single step or control flow (line 496-502) def at(enumerable, index, default \\ nil) when is_integer(index) do case slice_forward(enumerable, index, 1, 1) do [value] -> value [] -> default end end ``` **Why:** Pipelines communicate "data flows through transformations." Using them for a single function call or wrapping around control flow obscures intent rather than clarifying it. **Anti-pattern:** Pipelines for single operations or wrapping control flow: ```elixir # BAD — single step, no pipeline needed list |> Enum.reverse() # BAD — control flow awkwardly forced into a pipe result |> case do {:ok, x} -> x :error -> nil end ``` --- ## 4. Pipeline Ending with `|> elem(1)` (Protocol Reduce Unwrap) **Source:** `lib/elixir/lib/enum.ex` lines 363, 403, 433, 468, 725, 1022, 2676 **What it does:** When calling `Enumerable.reduce/3` directly, the result is always `{:done | :halted | :suspended, acc}`. Core extracts the accumulator with `|> elem(1)`. ```elixir # all?/1 (line 363) def all?(enumerable) do Enumerable.reduce(enumerable, {:cont, true}, fn entry, _ -> if entry, do: {:cont, true}, else: {:halt, false} end) |> elem(1) end # reduce/3 (line 2676) def reduce(enumerable, acc, fun) do Enumerable.reduce(enumerable, {:cont, acc}, fun) |> elem(1) end ``` **Why:** The protocol returns tagged tuples for the state machine (cont/halt/suspend). End users don't need the tag — only the accumulated value. `|> elem(1)` is the idiomatic unwrap. **Anti-pattern:** Using `case` when you don't care about the tag: ```elixir # BAD — unnecessary pattern match when you always want the value case Enumerable.reduce(enumerable, {:cont, acc}, fun) do {:done, result} -> result {:halted, result} -> result end ``` --- ## 5. Private Helper Decomposition: Recursive Workers with Guards **Source:** `lib/elixir/lib/enum.ex` lines 4975–4995, 5025–5039 **What it does:** Complex operations are split into a public entry point (with validation guards) and a private recursive worker function. The worker uses pattern matching on structure (empty list, head|tail) and guards on counters. ```elixir # Public entry: validates, delegates (line 890-904) def drop(enumerable, amount) when is_list(enumerable) and is_integer(amount) and amount >= 0 do drop_list(enumerable, amount) end # Private worker: pattern matches on structure (line 4975-4983) defp split_list([head | tail], counter, acc) when counter > 0 do split_list(tail, counter - 1, [head | acc]) end defp split_list(list, 0, acc) do {:lists.reverse(acc), list} end defp split_list([], _, acc) do {:lists.reverse(acc), []} end ``` **Why:** Separating validation from recursion keeps each clause focused. Guards in function heads enable the BEAM to optimize dispatch with jump tables. No runtime `if`/`cond` needed. **Anti-pattern:** Mixing validation, edge cases, and recursion in a single function with internal conditionals: ```elixir # BAD — one big function with nested ifs defp split_list(list, counter, acc) do if counter > 0 and list != [] do [head | tail] = list split_list(tail, counter - 1, [head | acc]) else {:lists.reverse(acc), list} end end ``` --- ## 6. Enum vs Stream Decision Pattern **Source:** `lib/elixir/lib/stream.ex` lines 1–80 (module docs), `lib/elixir/lib/enum.ex` **What it does:** Enum functions are eager (materialize intermediate lists). Stream functions are lazy (build computation recipes). Core uses Stream for: - Infinite sequences (`cycle`, `iterate`, `repeatedly`) - Resource management (`resource/3`) - Composing transformations to execute in a single pass ```elixir # Stream: builds a recipe, zero computation until consumed (stream.ex ~line 490) def map(enum, fun) when is_function(fun, 1) do lazy(enum, fn f1 -> R.map(fun, f1) end) end # Enum: immediately materializes the result (enum.ex line 1723) def map(enumerable, fun) when is_list(enumerable) do :lists.map(fun, enumerable) end ``` **Why:** From Stream docs (lines 37–41): "When chaining many operations with `Enum`, intermediate lists are created, while `Stream` creates a recipe of computations that are executed at a later moment." Use Enum when: - You need the full result now - The collection is small/bounded - You only chain 1–2 operations Use Stream when: - The collection is large or infinite - You chain many transformations - You need resource cleanup (file handles, network) - You want single-pass processing **Anti-pattern:** Using Stream for small bounded collections (overhead of the lazy machinery exceeds any benefit): ```elixir # BAD — Stream overhead for trivial transform [1, 2, 3] |> Stream.map(&(&1 * 2)) |> Enum.to_list() # GOOD — just use Enum [1, 2, 3] |> Enum.map(&(&1 * 2)) ``` --- ## 7. Map.update vs Map.put Decision Pattern **Source:** `lib/elixir/lib/map.ex` lines 670–700 **What it does:** `Map.update/4` transforms an existing value based on its current state. `Map.put/3` unconditionally sets a value regardless of current state. ```elixir # Map.update/4 (line 682-693): transform based on current value def update(map, key, default, fun) when is_function(fun, 1) do case map do %{^key => value} -> %{map | key => fun.(value)} %{} -> put(map, key, default) other -> :erlang.error({:badmap, other}, [map, key, default, fun]) end end # Map.put/3 (line 636): unconditional set def put(map, key, value) do :maps.put(key, value, map) end ``` **Why:** `update/4` is for when the new value depends on the old value (counters, appending to nested lists). `put/3` is for when you know the exact new value regardless of what was there. **Anti-pattern:** Using `get` + `put` when `update` expresses intent: ```elixir # BAD — two lookups, unclear intent count = Map.get(map, :count, 0) Map.put(map, :count, count + 1) # GOOD — single lookup, clear intent Map.update(map, :count, 1, &(&1 + 1)) ``` --- ## 8. Pattern Matching on Map Structure for Dispatch **Source:** `lib/elixir/lib/map.ex` lines 398, 509, 586 **What it does:** Map functions use `case map do %{^key => value} -> ...` to dispatch on whether a key exists, rather than calling `has_key?` + conditional. ```elixir # Map.get/3 (line 586-594) def get(map, key, default \\ nil) do case map do %{^key => value} -> value %{} -> default other -> :erlang.error({:badmap, other}, [map, key, default]) end end # Map.put_new/3 (line 398-407) def put_new(map, key, value) do case map do %{^key => _value} -> map %{} -> put(map, key, value) other -> :erlang.error({:badmap, other}) end end ``` **Why:** Pattern matching with `%{^key => value}` does the lookup AND extraction in one step. The `%{}` clause (empty map pattern) matches any map where the key is NOT present. The `other` clause provides a clear error for non-maps. This is both more efficient and more readable than `if Map.has_key?(map, key)`. **Anti-pattern:** ```elixir # BAD — double lookup, less clear def get(map, key, default) do if Map.has_key?(map, key) do :maps.get(key, map) else default end end ``` --- ## 9. Delegating to Erlang BIFs with `defdelegate` **Source:** `lib/elixir/lib/map.ex` lines 127, 143, 159, 173 **What it does:** When an Erlang function already does exactly what's needed, Elixir delegates directly rather than wrapping. ```elixir @spec keys(map) :: [key] defdelegate keys(map), to: :maps @spec values(map) :: [value] defdelegate values(map), to: :maps @spec merge(map, map) :: map defdelegate merge(map1, map2), to: :maps ``` **Why:** Zero overhead — the compiler inlines these. No point wrapping an Erlang BIF just to have an Elixir wrapper when the semantics are identical. The `@compile {:inline, ...}` annotation on line 115 makes this explicit. **Anti-pattern:** Wrapping without adding value: ```elixir # BAD — pointless wrapper def keys(map) do :maps.keys(map) end ``` --- ## 10. Reduce as the Universal Primitive **Source:** `lib/elixir/lib/enum.ex` lines 19–21, 2660–2676 **What it does:** Nearly every Enum operation is built on top of `reduce`. The Enumerable protocol's core function is `reduce/3`. Everything else (`count`, `member?`, `slice`) is an optimization hint. ```elixir # From the protocol docs (line 19-21): def map(enumerable, fun) do reducer = fn x, acc -> {:cont, [fun.(x) | acc]} end Enumerable.reduce(enumerable, {:cont, []}, reducer) |> elem(1) |> :lists.reverse() end # The actual reduce/3 (line 2676): def reduce(enumerable, acc, fun) do Enumerable.reduce(enumerable, {:cont, acc}, fun) |> elem(1) end ``` **Why:** Reduce is the most general iteration primitive. By building all operations on reduce, any data structure that implements `Enumerable.reduce/3` automatically gets the full `Enum` API. This is the protocol + reduce = universal composability pattern. **Anti-pattern:** Implementing each Enum function independently for each data structure: ```elixir # BAD — reimplementing map for each type def map(%MyStruct{items: items}, fun), do: ... def filter(%MyStruct{items: items}, fun), do: ... # Instead: implement Enumerable.reduce/3 once and get everything ``` --- ## 11. Keyword Multi-Clause Guard Dispatch (String.split pattern) **Source:** `lib/elixir/lib/string.ex` lines 516–563 **What it does:** Functions with many input shapes use multiple `def` clauses with guards to dispatch, handling each case distinctly rather than using internal `cond`/`case`. ```elixir def split(string, %Regex{} = pattern, options) when is_binary(string) and is_list(options) do Regex.split(pattern, string, options) end def split(string, "", options) when is_binary(string) and is_list(options) do # special case: split by empty string (grapheme-by-grapheme) ... end def split(string, [], options) when is_binary(string) and is_list(options) do # empty pattern list: no splitting ... end def split(string, pattern, options) when is_binary(string) and is_list(options) do # general binary pattern case ... end ``` **Why:** Each clause has a single responsibility. The BEAM compiler generates efficient dispatch for these patterns. Adding a new case is additive (new clause) rather than modifying existing logic. **Anti-pattern:** One function with nested conditionals: ```elixir # BAD — all cases mashed into one body def split(string, pattern, options) do cond do is_struct(pattern, Regex) -> ... pattern == "" -> ... pattern == [] -> ... true -> ... end end ``` --- ## 12. Lazy Private Helpers with `defp parts_to_index` **Source:** `lib/elixir/lib/string.ex` lines 562–563 **What it does:** Tiny private helpers that convert between API-level concepts and implementation-level values use single-line `defp` with guards. ```elixir defp parts_to_index(:infinity), do: 0 defp parts_to_index(n) when is_integer(n) and n > 0, do: n ``` **Why:** Clear, self-documenting dispatch. Each case is one line. No branching logic in the caller. The function name explains the conversion. **Anti-pattern:** Inline conditional in the caller: ```elixir # BAD — logic scattered in caller index = if parts == :infinity, do: 0, else: parts ```