diff --git a/analysis/language-source-comparison.md b/analysis/language-source-comparison.md new file mode 100644 index 0000000..2c3422d --- /dev/null +++ b/analysis/language-source-comparison.md @@ -0,0 +1,281 @@ +# Language Source Analysis: Go vs Elixir + +How do the language teams build their own languages? What +does the source reveal about conventions that users should +follow? + +--- + +## Shape Comparison + +| Metric | Go (golang/go) | Elixir (elixir-lang/elixir) | +|--------|---------------|---------------------------| +| Size | 632M | 92M | +| Source files | 11,245 .go | 567 .ex/.exs | +| Commits | 66,142 | 22,032 | +| Contributors | 2,842 | 1,578 | +| Test files | 1,811 | 208 | +| Non-test files | 6,065 | 248 | +| Test ratio | 1:3.3 | 1:1.2 | +| TODOs (non-test) | 3,428 | 127 | + +**Key observation:** Elixir has almost a 1:1 test-to-production +file ratio. Go has roughly 1 test file per 3.3 production files. +But Go has 7x more TODOs per file — the Elixir team cleans +theirs aggressively. + +--- + +## Organizational Philosophy + +### Go: Deep internal/ + flat stdlib + +``` +src/ +├── cmd/ # toolchain (compile, go, gofmt, etc.) +├── internal/ # 61 hidden packages (NOT user-visible) +├── io/ # flat stdlib packages +├── fmt/ +├── net/ +├── runtime/ +└── ... +``` + +**What this reveals:** +- The Go team uses `internal/` extensively (61 packages) to + hide implementation details from users. This is Go's answer + to "how do you share code without committing to an API." +- Stdlib packages are flat — no `pkg/` wrapper, no nesting + beyond one level (with exceptions like `net/http`). +- The compiler alone is 562,727 lines. The largest files are + generated (SSA rewrite rules: 97K lines). + +### Elixir: Nested libraries as independent apps + +``` +lib/ +├── elixir/ # the language itself +├── eex/ # templating +├── ex_unit/ # testing framework +├── iex/ # interactive shell +├── logger/ # logging +└── mix/ # build tool +``` + +**What this reveals:** +- Each component is a separate OTP application — they could + theoretically be released independently. +- 55 Mix tasks, each in its own file — one task = one file + is a hard convention. +- The type system (`lib/elixir/lib/module/types/`) is 13,034 + lines, the newest and fastest-growing module. + +--- + +## TODO Culture + +### Go: Owned TODOs as a permanent layer + +```go +// TODO(gri) — 320 occurrences +// TODO(mdempsky) — 198 occurrences +// TODO(adonovan) — 170 occurrences +// TODO(mknyszek) — 98 occurrences +// TODO(rsc) — 96 occurrences +``` + +**3,428 TODOs** in non-test code. Every TODO has an owner. +The top TODO authors are core team members. These aren't +aspirational — they're load-bearing markers for known +limitations that specific people are expected to address. + +**Convention:** `// TODO(username): description` + +### Elixir: Version-gated TODOs as deprecation roadmap + +```elixir +# TODO: Remove me on v2.0 — 16 occurrences +# TODO: Deprecate me on Elixir v1.23 — 6 occurrences +# TODO: Remove this clause on Elixir v2.0 once single-quoted charlists are removed +``` + +**127 TODOs** total. Almost all are version-gated — they +explicitly state WHEN the TODO should be resolved. This is +systematic cleanup culture: bump the version, grep for that +version's TODOs, resolve them all. + +**Convention:** `# TODO: Action on version X.Y` + +**The lesson:** Go accepts permanent TODOs as documentation of +known limitations. Elixir treats TODOs as time-bombs with +deadlines. Both are disciplined — just different philosophies. + +--- + +## What Each Language Values (Import Hierarchy) + +### Go's foundation + +| Package | Imports | Role | +|---------|---------|------| +| `fmt` | 2,031 | Formatting everywhere | +| `testing` | 1,658 | Tests are first-class | +| `strings` | 1,454 | String manipulation | +| `os` | 1,306 | System interaction | +| `unsafe` | 1,304 | Low-level access (surprising frequency) | +| `runtime` | 970 | Runtime introspection | +| `io` | 924 | Stream abstraction | + +**Surprising:** `unsafe` is the 5th most-imported package in +Go's own source. The language that preaches safety uses unsafe +extensively in its own implementation. This is the same pattern +as Prometheus's global vars — the authors know the rules and +know where to break them safely. + +### Elixir's foundation + +The Elixir source doesn't use `alias`/`import` heavily — it +relies on the module system's implicit availability. The key +modules by size tell the story: + +| Module | Lines | Role | +|--------|-------|------| +| `Kernel` | 7,102 | The implicit language | +| `types/descr.ex` | 6,301 | Type descriptions (set-theoretic) | +| `Enum` | 5,242 | Collection operations | +| `String` | 3,263 | String as first-class concept | +| `Macro` | 3,102 | Metaprogramming foundation | +| `Exception` | 2,720 | Error taxonomy | + +**Surprising:** The type system's description module (6,301 +lines) is nearly as large as Kernel itself. This is the newest +addition and already dominates the codebase — showing where +the team's investment is going. + +--- + +## Interface/Protocol Philosophy + +### Go: Composition of single-method interfaces + +```go +type Reader interface { Read(p []byte) (n int, err error) } +type Writer interface { Write(p []byte) (n int, err error) } +type Closer interface { Close() error } +type ReadWriter interface { Reader; Writer } +type ReadCloser interface { Reader; Closer } +type ReadWriteCloser interface { Reader; Writer; Closer } +``` + +15 interfaces in `io/io.go` alone, all composed from 4 +primitives. This IS Go's philosophy: small interfaces composed +into larger ones. The source practices what the documentation +preaches. + +### Elixir: 6 protocols + 24 behaviours + +Core protocols (the extensibility points): +- `Enumerable` — collection contract +- `Collectable` — inverse of Enumerable +- `Inspect` — debug representation +- `String.Chars` — string conversion +- `List.Chars` — charlist conversion +- `JSON.Encoder` — JSON serialization (newest) + +**Only 6 protocols** in the stdlib. Elixir is conservative +about adding extension points. Compare to Go's dozens of +interfaces — Elixir prefers fewer, more powerful abstractions. + +24 files define `@callback` — these are behaviours (Go's +interface equivalent for OTP patterns). Used for GenServer, +Supervisor, Application, etc. + +--- + +## Unique Infrastructure + +### Go: internal/ as API firewall + +61 `internal/` packages implement things users need but +shouldn't depend on: +- `internal/singleflight` — dedup concurrent calls (too + specialized for stdlib, too useful to not have) +- `internal/godebug` — runtime feature flags via `$GODEBUG` +- `internal/goexperiment` — compile-time experiment flags +- `internal/poll` — OS-level I/O polling (used by net, os) +- `internal/cpu` — CPU feature detection + +**The pattern:** Code that's shared between stdlib packages but +isn't a public API lives in `internal/`. This is Go's answer to +the "shared utility" problem that other languages solve with +package-private visibility. + +### Elixir: The compiler as a library + +The Elixir compiler is structured as a library you could +theoretically call: +- `Module.Types` — the type checker +- `Module.ParallelChecker` — concurrent type checking +- `Code.Formatter` — the formatter is a library function + +Mix tasks as single-file modules (55 of them) enforce one +responsibility per task. The build tool's extension point is +"write a module that uses `Mix.Task`." + +--- + +## Generated Code + +### Go: Heavy generation, clearly marked + +The compiler's SSA rewrite rules are generated: +- `opGen.go` — 97,135 lines +- `rewriteAMD64.go` — 79,703 lines +- `rewritegeneric.go` — 38,337 lines + +Convention: Generated files contain a `// Code generated` +header comment. Go's tooling (`go generate`) is designed around +this pattern — the source is explicit about what's human-written +vs machine-written. + +### Elixir: Minimal generation + +No significant generated code. The Elixir source is almost +entirely human-written. The compilation model (AST macros) +means code generation happens at compile time rather than as +checked-in artifacts. + +--- + +## Lessons for Convention Extraction + +### What the language source teaches that stdlib docs don't: + +1. **Go's `unsafe` usage in its own source** — the safety rules + are for users, not for the runtime team. 1,304 unsafe imports + in stdlib code. Know the rules so you know where they don't + apply. + +2. **Elixir's TODO discipline is version-gated** — not "clean + up someday" but "remove on v2.0." This is why `elixir- + patterns` documents zero-TODO culture as achievable. + +3. **Go accepts permanent TODOs** — 3,428 of them, owned by + specific people. This isn't sloppiness; it's documentation + of known limitations. The Go team would rather have an honest + TODO than a half-baked fix. + +4. **Elixir's test ratio (1:1.2) vs Go's (1:3.3)** — Elixir's + smaller, more focused files mean each one has a direct test + counterpart. Go's larger files and package-level tests mean + more production code per test file. + +5. **Both use generated code** — but Go checks it in (97K line + files) while Elixir generates at compile time. Neither is + wrong; it reflects the language's compilation model. + +6. **`internal/` is Go's most distinctive structural pattern** + — 61 packages that solve "shared but not public." No other + language has this built into the module system. + +