diff --git a/sources/ecosystem-analysis.md b/sources/ecosystem-analysis.md new file mode 100644 index 0000000..e8ce749 --- /dev/null +++ b/sources/ecosystem-analysis.md @@ -0,0 +1,371 @@ +# Ecosystem-Level Patterns: How Codebases Present to Consumers + +## The Three Questions + +For each codebase, ask: +1. How do consumers **extend** it? (What interfaces/behaviours + do they implement?) +2. How do consumers **compose** with it? (What does day-to-day + usage look like?) +3. What does it deliberately **NOT do**? (What forces shaped + those refusals?) + +--- + +## CockroachDB: Errors as First-Class Distributed Data + +### Extension Points + +CockroachDB is not a library — it is a system. Consumers +extend it through: +- **SQL builtins** (function registration) +- **Storage engines** (via pebble interface) +- **Service discovery** (not user-extensible — closed) + +The interesting pattern is how errors flow from storage +through KV through SQL to the client. + +### Error Architecture (ecosystem-level idiom) + +``` +Storage error → encoded via cockroachdb/errors → + KV wraps with context → serialized across gRPC → + SQL decodes → maps to pgcode → wire protocol to client +``` + +**Key design decisions:** + +1. **Errors have priority.** `ErrPriority()` ranks errors so + the system knows which to surface when multiple things + fail simultaneously. Transaction abort > restart > + unambiguous error > non-retriable. + +2. **Errors survive serialization.** `EncodeError` / + `DecodeError` serialize errors across RPC boundaries. + The error that originated on node 3 arrives at node 1 + with its full cause chain intact. + +3. **Errors map to pg codes.** Every internal error maps to + a Postgres error code that clients understand. This is + the *ecosystem contract* — clients write + `if pgcode == '40001' { retry }`. + +**What this teaches:** In a distributed system, an error +isn't a string — it's a data object with identity, +priority, serializability, and a consumer-facing code. +Design your error types for the *consumer*, not the +*producer*. + +### Deliberate Absences + +- **No dependency injection framework.** Config structs + passed explicitly. 1178-line `StoreConfig` struct, but + it's all data — no framework magic. +- **No context.Background() on hot paths.** 144 uses in + kvserver, but auditable — each justified in comments. +- **No functional options.** CockroachDB uses config + structs universally. The Option interface in stopper is + the exception, not the rule. + +### Test Architecture + +- **TestMain in every package.** Sets up security certs, + random seeds, and test server factories. +- **Goroutine leak detection.** `leaktest.AfterTest(t)()` + at the start of every test. Detects leaked goroutines + by diffing goroutine stacks before/after. +- **Stopper leak detection.** Every Stopper is tracked + globally; `PrintLeakedStoppers(t)` in TestMain catches + forgot-to-stop bugs. +- **`//go:generate` for test setup.** Codegen tool + (`add-leaktest.sh`) auto-adds leak checks to every + test file. + +**What this teaches:** At scale, the most important test +infrastructure isn't assertions — it's resource leak +detection. Every goroutine, every connection, every +Stopper is tracked and verified to be cleaned up. + +--- + +## Prometheus: The One-Method Interface Contract + +### Extension Points + +Prometheus is extended through: +- **Service discovery** (30 implementations, 1 interface) +- **Storage** (remote read/write adapters) +- **Exporters** (client_golang metrics) + +### The Discoverer Pattern (ecosystem-level idiom) + +```go +type Discoverer interface { + Run(ctx context.Context, up chan<- []*targetgroup.Group) +} +``` + +This is **one method**. Thirty implementations. The +channel-based push model means: +- The discoverer controls timing (not polled) +- The manager multiplexes without knowing implementations +- Adding a new discovery source = implement Run, register + +**Registration via init():** +```go +func init() { + discovery.RegisterConfig(&SDConfig{}) +} +``` + +This is the classic Go plugin pattern. Import the package +→ init registers it → the system discovers it at startup. + +**What this teaches:** The smallest possible interface +creates the largest possible ecosystem. One method + one +channel = 30 implementations without coordination. + +### Storage Contract (15 interfaces, 1 file) + +All of Prometheus's storage contract lives in +`storage/interface.go`. This is the: +- Read path: `Queryable → Querier → SeriesSet → Series` +- Write path: `Appendable → Appender` +- Extension: `ExemplarAppender`, `MetadataUpdater` + +**Key:** Every implementation proves satisfaction at +compile time with `var _ storage.Searcher = &type{}`. +When the contract evolves, the compiler finds every +broken implementation. + +### Deliberate Absences + +- **No generics in storage interfaces.** Despite Go 1.20+ + support. The interfaces predate generics and adding them + would break all existing implementations. +- **No dependency injection.** Direct struct construction + everywhere. Testability through interface satisfaction, + not framework wiring. +- **Almost no functional options.** Only in leaf packages + (chunk writer, parser). Core APIs use config structs. +- **No goroutine leak in production code.** `goleak` in + tests, `TolerantVerifyLeak` with explicit allowlist for + known third-party leaks. + +### Test Architecture + +- **`TolerantVerifyLeak`** — goroutine leak detection with + allowlist for known third-party leaks (opencensus, klog) +- **Mock implementations of every interface** — defined + right in `storage/interface.go` next to the real ones +- **Golden file tests** in PromQL evaluation + +--- + +## Ecto: Composability as Architectural Principle + +### Extension Points + +Consumers extend Ecto through: +- **Custom types** (7 callbacks: cast, load, dump, equal?, + embed_as, autogenerate, type) +- **Adapters** (Queryable, Schema, Transaction, Storage — + 4 behaviour modules) +- **Protocols** (`Ecto.Queryable` — anything can become a + query) + +### The NotLoaded Sentinel (ecosystem-level idiom) + +```elixir +defmodule Ecto.Association.NotLoaded do + defstruct [:__field__, :__owner__, :__cardinality__] +end +``` + +Ecto **refuses to lazy-load associations**. If you access +`user.posts` without preloading, you get a `NotLoaded` +struct — not nil, not an empty list, not a database query. + +**Why this is an ecosystem decision:** +- Forces consumers to be explicit about data needs +- Prevents N+1 queries by making them impossible +- Makes the data boundary visible in code + +This is a *consumer-hostile* decision that makes +*systems built on Ecto* dramatically better. The library +optimizes for the 1000th user, not the first-day +experience. + +### Query Composition (ecosystem-level idiom) + +Every query clause appends to a list in the Query struct. +Nothing executes. The Query is pure data that accumulates +intent. + +**Consumer impact:** You can build queries across module +boundaries: + +```elixir +# Module A builds the base +def active_users, do: from(u in User, where: u.active) + +# Module B adds pagination +def paginate(query, page, size) do + query + |> limit(^size) + |> offset(^((page - 1) * size)) +end + +# Module C adds authorization +def visible_to(query, role) do + where(query, [u], u.role in ^roles_for(role)) +end +``` + +Each module is independent. They compose because queries +are data, not effects. + +### Adapter Architecture + +``` +Ecto.Repo.all(query) + → Planner resolves types, bindings + → Adapter.prepare/2 produces {cache, prepared} + → Adapter.execute/5 runs against DB + → Adapter.loaders/2 converts back to Elixir types +``` + +The adapter is the ONLY part that knows SQL. Ecto core +is database-agnostic. This is why the same code works on +Postgres, MySQL, SQLite, and custom stores. + +### Deliberate Absences + +- **No lazy loading.** `NotLoaded` struct instead. +- **No global state.** Per-repo config, per-repo process. +- **No query caching at library level.** The adapter + caches prepared statements; Ecto doesn't. +- **No connection to schema naming.** `schema "legacy_tbl"` + is independent of `defmodule NewUser`. + +--- + +## Oban: Designing for Testability First + +### Extension Points + +Consumers extend Oban through: +- **Workers** (`perform/1` — the job logic) +- **Plugins** (GenServer + validate callback) +- **Engines** (entire backend swap) +- **Notifiers** (pub/sub mechanism) +- **Peers** (leader election) + +### The Worker Result Type (ecosystem-level idiom) + +```elixir +@type result :: + :ok + | {:ok, ignored :: term()} + | {:error, reason :: term()} + | {:cancel, reason :: term()} + | {:snooze, period :: Period.t()} +``` + +Five possible outcomes, each with distinct semantics: +- `:ok` → success, remove from queue +- `{:error, reason}` → retry (respects max_attempts) +- `{:cancel, reason}` → permanent failure, don't retry +- `{:snooze, period}` → reschedule for later + +**Ecosystem impact:** Every worker author makes an +explicit decision about failure semantics. "What should +happen when this fails?" is answered in the type system, +not in configuration. + +### Contextual Backoff (ecosystem-level idiom) + +```elixir +def backoff(%Job{attempt: attempt, unsaved_error: err}) do + case err.reason do + %RateLimitError{retry_after: ms} -> ms + _ -> trunc(:math.pow(attempt, 4) + jitter()) + end +end +``` + +The error that caused the failure is available to the +backoff calculation. Different errors → different retry +strategies. This is impossible in systems where backoff +is configured globally. + +### Testing Design (ecosystem-level idiom) + +Three testing modes via config: +- **`:inline`** — execute jobs synchronously in tests +- **`:manual`** — enqueue but don't execute +- **`:disabled`** — production behavior + +Plus `use Oban.Testing` which provides: +- `assert_enqueued/1` — verify job was queued +- `refute_enqueued/1` — verify job was NOT queued +- `perform_job/2` — execute a job manually in tests +- `all_enqueued/1` — list all matching jobs + +**Ecosystem impact:** Every Oban consumer gets +deterministic, fast, isolated tests for free. No sleep, +no polling, no flaky async assertions. + +### Deliberate Absences + +- **No global process names.** Registry.via everywhere — + multiple Oban instances can coexist. +- **No direct DB coupling in workers.** Workers receive a + Job struct; they don't import Repo. +- **No implicit retries.** max_attempts is explicit per + worker. No "retry forever" default. +- **No built-in rate limiting in OSS.** That is a Pro + feature — deliberate business boundary. + +--- + +## Cross-Cutting: What "Idiomatic" Means at Ecosystem Level + +### 1. The Consumer Contract is the API + +Not the functions you export — the *experience* of +building on your system: +- CockroachDB: "Your errors will be pg-codes, always" +- Prometheus: "Implement Run(), get discovery for free" +- Ecto: "Queries are data; loading is always explicit" +- Oban: "Return a result type; testing is built in" + +### 2. Deliberate Absences Define Character + +What a system refuses to do is as important as what it +does: +- Ecto refuses lazy loading → forces explicit data needs +- Oban refuses global names → enables multi-instance +- Prometheus refuses DI frameworks → keeps simplicity +- CockroachDB refuses context.Background on hot paths → + forces timeout discipline + +### 3. Testability is Never Retrofitted + +Every system that tests well designed testing in from the +start: +- CockroachDB: leak detection, stopper tracking +- Prometheus: goroutine leak verification, mock interfaces +- Ecto: adapter abstraction, embedded schemas for testing +- Oban: engine swap, testing modes, assertion helpers + +### 4. Extension Points Define the Ecosystem Size + +- Prometheus: 1 interface, 30 discoverers +- Ecto: 7 type callbacks, hundreds of custom types +- Oban: Worker behaviour + 5 engine callbacks + +**Smaller interface → larger ecosystem.** The less you +demand from implementors, the more you get. + +