# Elixir Language Source: Architectural Conventions How does José Valim and the Elixir core team build Elixir itself? What does the language source reveal about conventions that aren't documented anywhere else? **Repo:** [elixir-lang/elixir](https://github.com/elixir-lang/elixir) --- ## 1. Repo Shape | Metric | Value | |--------|-------| | Size | 92M | | Source files | 567 .ex/.exs | | Erlang bootstrap | 33 .erl files | | Commits | 22,032 | | Contributors | 1,578 | | Test files | 208 | | Production files | 248 | | Test ratio | 1:1.2 | | TODOs (non-test) | 127 (all version-gated) | ### Organizational Philosophy ``` lib/ ├── elixir/ # The language core (compiler + stdlib) │ ├── src/ # 33 Erlang files (bootstrap) │ └── lib/ # Elixir stdlib + compiler ├── eex/ # Templating (independent OTP app) ├── ex_unit/ # Testing framework (independent OTP app) ├── iex/ # Interactive shell (independent OTP app) ├── logger/ # Logging (independent OTP app) └── mix/ # Build tool (independent OTP app) ``` Each component is a separate OTP application. They could theoretically be released independently. This is Elixir eating its own dog food — the umbrella project convention that Phoenix apps use comes directly from how the language itself is organized. --- ## 2. What the Codebase Values ### By size (what gets the most lines) | Module | Lines | Role | |--------|-------|------| | `Kernel` | 7,102 | The implicit language surface | | `Module.Types.Descr` | 6,301 | Set-theoretic type descriptions | | `Enum` | 5,242 | Collection operations | | `String` | 3,263 | First-class string concept | | `Macro` | 3,102 | Metaprogramming foundation | | `Exception` | 2,720 | Error taxonomy | | `Code.Formatter` | 2,605 | Code formatting as library | **The surprise:** The type system (`types/descr.ex` at 6,301 lines) is nearly as large as Kernel (7,102 lines). It's the newest and fastest-growing module — 504 commits, 96% written by José Valim. This is where the investment is going. ### By authorship (who shapes the language) Type system: 396/504 commits from José, 32 from Eric Meadows-Jönsson, 31 from Guillaume Duboc. This is auteur-driven development — one person holds the architectural vision for the most complex subsystem. --- ## 3. The Bootstrap Problem **How does Elixir compile itself?** The answer is 33 Erlang files in `lib/elixir/src/`: ``` elixir_bootstrap.erl — minimal Kernel for self-compilation elixir_compiler.erl — the compiler entry point elixir_tokenizer.erl — lexer (in Erlang for speed) elixir_expand.erl — macro expansion elixir_erl.erl — Elixir AST → Erlang AST elixir_erl_pass.erl — code generation pass elixir_env.erl — compilation environment elixir_clauses.erl — pattern matching compilation ``` **Convention:** The tokenizer and core compiler remain in Erlang permanently. This isn't technical debt — it's a deliberate choice. The tokenizer benefits from Erlang's binary pattern matching performance. The compiler needs to exist before Elixir does. **Origin:** The bootstrap file dates to Nov 22, 2013 (commit `260be7c8e`: "Start porting elixir_macros to pure elixir"). Before this, MORE of the compiler was in Erlang. The trajectory is clear: minimize Erlang over time, but keep it where it provides genuine value. --- ## 4. TODO Culture: Version-Gated Deadlines ```elixir # TODO: Remove me on v2.0 — 16 occurrences # TODO: Deprecate me on Elixir v1.23 — 6 occurrences # TODO: Remove this clause on Elixir v2.0 once single-quoted charlists are removed # TODO: Make an error on Elixir v2.0 — 3 occurrences # TODO: Deprecate on Elixir v1.22 — 3 occurrences ``` **Convention:** Every TODO has a version target. No "someday" TODOs exist. When a version ships, grep for that version's TODOs and resolve them all. **127 total TODOs** across 567 files. Contrast with Go's 3,428 TODOs across 11K files — the Elixir team treats TODOs as time-bombs, not documentation. --- ## 5. Unique Patterns ### 5.1 Protocol Consolidation Protocols dispatch dynamically at runtime by default (checking each struct's implementation). **Protocol consolidation** compiles all known implementations into a single dispatch module at build time. From `lib/elixir/lib/protocol.ex`: > "Consolidation directly links the protocol to its implementations. > Invoking a consolidated protocol is equivalent to invoking two remote > functions." **Convention:** Mix enables consolidation by default in production. The `@callback __protocol__(:consolidated?)` exists so code can check at runtime whether fast-path dispatch is active. **When NOT to use:** Tests often disable consolidation (`consolidate_ protocols: false`) so new protocol implementations added during tests are discoverable without recompilation. ### 5.2 Parallel Type Checker `Module.ParallelChecker` (introduced July 2019, PR #9203 by Eric Meadows-Jönsson as "Add ExCk chunk") enables concurrent type checking across modules. The type system itself (13,034 lines across 7 files in `lib/elixir/lib/module/types/`) is set-theoretic — types are sets, and operations are set operations (union, intersection, difference). **Key files:** - `descr.ex` (6,301 lines) — type descriptions and set operations - `apply.ex` — function application typing - `expr.ex` — expression typing - `pattern.ex` — pattern match typing - `of.ex` — type inference - `helpers.ex` — shared utilities - `traverse.ex` — AST traversal ### 5.3 Code.Formatter as Library Function The code formatter (2,605 lines) is a library function, not a CLI tool. You can call `Code.format_string!/2` from any Elixir code. **Introduced:** Oct 7, 2017 (PR #6639 by José Valim). **Zero review comments. Merged in 1 hour.** José opened and merged his own formatter with no external review. This is the BDFL model — the language author ships foundational infrastructure by authority. **Convention:** The formatter uses `Inspect.Algebra` (Wadler-Lindig pretty-printing) for layout decisions. It defines all operators and their associativity as module attributes: ```elixir @pipeline_operators [:|>, :~>>, :<<~, :~>, :<~, :<~>, :"<|>"] @right_new_line_before_binary_operators [:|, :when] @required_parens_logical_binary_operands [:|||, :||, :or, :&&&, :&&, :and] ``` ### 5.4 Mix Tasks as Single-File Modules 55 Mix tasks, each in its own file. Convention: - One task = one file - Module name determines task name: `Mix.Tasks.Deps.Clean` → `deps.clean` - `@shortdoc` for brief help, `@moduledoc` for full docs - `@recursive true` for umbrella traversal ### 5.5 ExUnit CaseTemplate (Extension Pattern) The `ExUnit.CaseTemplate` is how Elixir's test framework supports extension — you define a module that `use`s `CaseTemplate`, and test modules `use YourModule` to inherit setup callbacks and helpers. This is the same pattern Phoenix uses for `ConnCase` and `DataCase`. It originates from ExUnit itself — the framework demonstrates its own extension point. ### 5.6 Logger: Erlang Integration Done Right PR #9333 (Sep 2019, merged Nov 2019): "Use Erlang's logger as main logging implementation." The Elixir Logger was rewritten to sit on top of Erlang's `:logger` module rather than reimplementing log dispatch. **Convention:** When OTP provides infrastructure, wrap it rather than replace it. The compatibility layer translates Erlang log messages to Elixir format, but dispatch/filtering/handlers are OTP's. --- ## 6. PR Discussion Patterns ### JSON.Encoder (PR #14021, Dec 2024) 38 review comments, 13 days to merge. Key debate: **sabiwara** asked: "What is the reason we went with a different API than Jason?" — questioning why the stdlib JSON module doesn't mirror the dominant community library. **michalmuskala** (Jason author): "Once 1.18 is released with the new JSON module, I plan to make a new release of Jason with some small fixes and then effectively deprecate it." **Lesson:** When stdlib absorbs community library functionality, the community library author participates in the review. Jason's author blessed the replacement and planned deprecation. This is how healthy ecosystem evolution works. ### Duration (PR #13385, Mar-Apr 2024) 75 comments + 116 review comments. The most debated PR in recent Elixir history. **Pattern:** Community contributor (@tfiedlerdejanze) opened a PR adding `Date.shift/2`. José redirected to a broader `Duration` type. The contributor iterated through multiple designs. **José's key intervention:** "I would rather prefer to pass a Duration to Calendar.ISO.shift_date, if we ever have such a type, rather than a keyword list." — refusing a simpler PR because it would lock in a suboptimal API before the full design was clear. **Lesson:** The BDFL model means one person can say "this is the wrong abstraction" and redirect months of work. The PR took 33 days and several complete rewrites. The result was better because someone held the line on "solve the whole problem, not just the immediate pain." ### Formatter (PR #6639, Oct 2017) Zero comments. Merged in 1 hour. 2,605 lines of new code. **Lesson:** BDFL-driven projects can ship massive foundational changes with no review. José was both the author and the authority. This is the opposite of CockroachDB's Handle PR (2.5 months, extensive debate). Neither model is wrong — it depends on team structure and trust level. --- ## 7. Cross-Ecosystem Comparisons | Aspect | Elixir | Go | |--------|--------|-----| | TODOs | 127, all version-gated | 3,428, all owner-attributed | | Formatter origin | BDFL ships in 1hr, no review | `gofmt` shipped with language | | Bootstrap | Erlang (33 files, permanent) | Assembly + Go (self-hosting since 1.5) | | Extension | 6 protocols + CaseTemplate | `internal/` packages (61 of them) | | Type system | Set-theoretic, 13K lines, growing | Static, mature, compile-time only | | Test ratio | 1:1.2 (file per file) | 1:3.3 (package-level tests) | | Governance | BDFL (José) | Committee (Russ Cox + team) | --- ## 8. What This Teaches 1. **BDFL projects can move faster on foundational infrastructure** — the formatter, type system, and JSON module all shipped because one person had authority. But Duration took 33 days because community contribution required iteration with the BDFL's vision. 2. **Version-gated TODOs are a superior cleanup strategy** for projects with regular release cycles. You never have to decide "is this worth fixing?" — the version bump forces the question. 3. **Keep the minimum viable bootstrap in the host language.** 33 Erlang files is the floor, not a ceiling. The trajectory is always toward more Elixir, less Erlang — but the tokenizer stays in Erlang because binary matching is genuinely faster there. 4. **The type system's growth rate predicts the language's future.** 504 commits, 96% from José, nearly as large as Kernel. Elixir's next 5 years will be defined by gradual typing. 5. **Community library authors should bless stdlib absorption.** The Jason → JSON.Encoder transition worked because michalmuskala participated in the review and planned deprecation. 6. **Each OTP app is an independent unit** — this convention flows directly into how Phoenix projects are organized. The language teaches its own architectural pattern by example.