Files
Rodin 76f4bcc33e docs: architectural analysis of top repos (CockroachDB, Prometheus, Ecto, Oban)
Four documents examining codebases at module and ecosystem levels:
- architectural-analysis.md — internal structure, dependency flow
- ecosystem-analysis.md — consumer extension points, deliberate absences
- crosscutting-analysis.md — logging, config, retry, lifecycle
- testing-evolution-analysis.md — proof models, API evolution strategies
2026-04-30 10:50:54 -07:00

11 KiB

Ecosystem-Level Patterns: How Codebases Present to Consumers

The Three Questions

For each codebase, ask:

  1. How do consumers extend it? (What interfaces/behaviours do they implement?)
  2. How do consumers compose with it? (What does day-to-day usage look like?)
  3. What does it deliberately NOT do? (What forces shaped those refusals?)

CockroachDB: Errors as First-Class Distributed Data

Extension Points

CockroachDB is not a library — it is a system. Consumers extend it through:

  • SQL builtins (function registration)
  • Storage engines (via pebble interface)
  • Service discovery (not user-extensible — closed)

The interesting pattern is how errors flow from storage through KV through SQL to the client.

Error Architecture (ecosystem-level idiom)

Storage error → encoded via cockroachdb/errors →
  KV wraps with context → serialized across gRPC →
  SQL decodes → maps to pgcode → wire protocol to client

Key design decisions:

  1. Errors have priority. ErrPriority() ranks errors so the system knows which to surface when multiple things fail simultaneously. Transaction abort > restart > unambiguous error > non-retriable.

  2. Errors survive serialization. EncodeError / DecodeError serialize errors across RPC boundaries. The error that originated on node 3 arrives at node 1 with its full cause chain intact.

  3. Errors map to pg codes. Every internal error maps to a Postgres error code that clients understand. This is the ecosystem contract — clients write if pgcode == '40001' { retry }.

What this teaches: In a distributed system, an error isn't a string — it's a data object with identity, priority, serializability, and a consumer-facing code. Design your error types for the consumer, not the producer.

Deliberate Absences

  • No dependency injection framework. Config structs passed explicitly. 1178-line StoreConfig struct, but it's all data — no framework magic.
  • No context.Background() on hot paths. 144 uses in kvserver, but auditable — each justified in comments.
  • No functional options. CockroachDB uses config structs universally. The Option interface in stopper is the exception, not the rule.

Test Architecture

  • TestMain in every package. Sets up security certs, random seeds, and test server factories.
  • Goroutine leak detection. leaktest.AfterTest(t)() at the start of every test. Detects leaked goroutines by diffing goroutine stacks before/after.
  • Stopper leak detection. Every Stopper is tracked globally; PrintLeakedStoppers(t) in TestMain catches forgot-to-stop bugs.
  • //go:generate for test setup. Codegen tool (add-leaktest.sh) auto-adds leak checks to every test file.

What this teaches: At scale, the most important test infrastructure isn't assertions — it's resource leak detection. Every goroutine, every connection, every Stopper is tracked and verified to be cleaned up.


Prometheus: The One-Method Interface Contract

Extension Points

Prometheus is extended through:

  • Service discovery (30 implementations, 1 interface)
  • Storage (remote read/write adapters)
  • Exporters (client_golang metrics)

The Discoverer Pattern (ecosystem-level idiom)

type Discoverer interface {
    Run(ctx context.Context, up chan<- []*targetgroup.Group)
}

This is one method. Thirty implementations. The channel-based push model means:

  • The discoverer controls timing (not polled)
  • The manager multiplexes without knowing implementations
  • Adding a new discovery source = implement Run, register

Registration via init():

func init() {
    discovery.RegisterConfig(&SDConfig{})
}

This is the classic Go plugin pattern. Import the package → init registers it → the system discovers it at startup.

What this teaches: The smallest possible interface creates the largest possible ecosystem. One method + one channel = 30 implementations without coordination.

Storage Contract (15 interfaces, 1 file)

All of Prometheus's storage contract lives in storage/interface.go. This is the:

  • Read path: Queryable → Querier → SeriesSet → Series
  • Write path: Appendable → Appender
  • Extension: ExemplarAppender, MetadataUpdater

Key: Every implementation proves satisfaction at compile time with var _ storage.Searcher = &type{}. When the contract evolves, the compiler finds every broken implementation.

Deliberate Absences

  • No generics in storage interfaces. Despite Go 1.20+ support. The interfaces predate generics and adding them would break all existing implementations.
  • No dependency injection. Direct struct construction everywhere. Testability through interface satisfaction, not framework wiring.
  • Almost no functional options. Only in leaf packages (chunk writer, parser). Core APIs use config structs.
  • No goroutine leak in production code. goleak in tests, TolerantVerifyLeak with explicit allowlist for known third-party leaks.

Test Architecture

  • TolerantVerifyLeak — goroutine leak detection with allowlist for known third-party leaks (opencensus, klog)
  • Mock implementations of every interface — defined right in storage/interface.go next to the real ones
  • Golden file tests in PromQL evaluation

Ecto: Composability as Architectural Principle

Extension Points

Consumers extend Ecto through:

  • Custom types (7 callbacks: cast, load, dump, equal?, embed_as, autogenerate, type)
  • Adapters (Queryable, Schema, Transaction, Storage — 4 behaviour modules)
  • Protocols (Ecto.Queryable — anything can become a query)

The NotLoaded Sentinel (ecosystem-level idiom)

defmodule Ecto.Association.NotLoaded do
  defstruct [:__field__, :__owner__, :__cardinality__]
end

Ecto refuses to lazy-load associations. If you access user.posts without preloading, you get a NotLoaded struct — not nil, not an empty list, not a database query.

Why this is an ecosystem decision:

  • Forces consumers to be explicit about data needs
  • Prevents N+1 queries by making them impossible
  • Makes the data boundary visible in code

This is a consumer-hostile decision that makes systems built on Ecto dramatically better. The library optimizes for the 1000th user, not the first-day experience.

Query Composition (ecosystem-level idiom)

Every query clause appends to a list in the Query struct. Nothing executes. The Query is pure data that accumulates intent.

Consumer impact: You can build queries across module boundaries:

# Module A builds the base
def active_users, do: from(u in User, where: u.active)

# Module B adds pagination
def paginate(query, page, size) do
  query
  |> limit(^size)
  |> offset(^((page - 1) * size))
end

# Module C adds authorization
def visible_to(query, role) do
  where(query, [u], u.role in ^roles_for(role))
end

Each module is independent. They compose because queries are data, not effects.

Adapter Architecture

Ecto.Repo.all(query)
  → Planner resolves types, bindings
  → Adapter.prepare/2 produces {cache, prepared}
  → Adapter.execute/5 runs against DB
  → Adapter.loaders/2 converts back to Elixir types

The adapter is the ONLY part that knows SQL. Ecto core is database-agnostic. This is why the same code works on Postgres, MySQL, SQLite, and custom stores.

Deliberate Absences

  • No lazy loading. NotLoaded struct instead.
  • No global state. Per-repo config, per-repo process.
  • No query caching at library level. The adapter caches prepared statements; Ecto doesn't.
  • No connection to schema naming. schema "legacy_tbl" is independent of defmodule NewUser.

Oban: Designing for Testability First

Extension Points

Consumers extend Oban through:

  • Workers (perform/1 — the job logic)
  • Plugins (GenServer + validate callback)
  • Engines (entire backend swap)
  • Notifiers (pub/sub mechanism)
  • Peers (leader election)

The Worker Result Type (ecosystem-level idiom)

@type result ::
  :ok
  | {:ok, ignored :: term()}
  | {:error, reason :: term()}
  | {:cancel, reason :: term()}
  | {:snooze, period :: Period.t()}

Five possible outcomes, each with distinct semantics:

  • :ok → success, remove from queue
  • {:error, reason} → retry (respects max_attempts)
  • {:cancel, reason} → permanent failure, don't retry
  • {:snooze, period} → reschedule for later

Ecosystem impact: Every worker author makes an explicit decision about failure semantics. "What should happen when this fails?" is answered in the type system, not in configuration.

Contextual Backoff (ecosystem-level idiom)

def backoff(%Job{attempt: attempt, unsaved_error: err}) do
  case err.reason do
    %RateLimitError{retry_after: ms} -> ms
    _ -> trunc(:math.pow(attempt, 4) + jitter())
  end
end

The error that caused the failure is available to the backoff calculation. Different errors → different retry strategies. This is impossible in systems where backoff is configured globally.

Testing Design (ecosystem-level idiom)

Three testing modes via config:

  • :inline — execute jobs synchronously in tests
  • :manual — enqueue but don't execute
  • :disabled — production behavior

Plus use Oban.Testing which provides:

  • assert_enqueued/1 — verify job was queued
  • refute_enqueued/1 — verify job was NOT queued
  • perform_job/2 — execute a job manually in tests
  • all_enqueued/1 — list all matching jobs

Ecosystem impact: Every Oban consumer gets deterministic, fast, isolated tests for free. No sleep, no polling, no flaky async assertions.

Deliberate Absences

  • No global process names. Registry.via everywhere — multiple Oban instances can coexist.
  • No direct DB coupling in workers. Workers receive a Job struct; they don't import Repo.
  • No implicit retries. max_attempts is explicit per worker. No "retry forever" default.
  • No built-in rate limiting in OSS. That is a Pro feature — deliberate business boundary.

Cross-Cutting: What "Idiomatic" Means at Ecosystem Level

1. The Consumer Contract is the API

Not the functions you export — the experience of building on your system:

  • CockroachDB: "Your errors will be pg-codes, always"
  • Prometheus: "Implement Run(), get discovery for free"
  • Ecto: "Queries are data; loading is always explicit"
  • Oban: "Return a result type; testing is built in"

2. Deliberate Absences Define Character

What a system refuses to do is as important as what it does:

  • Ecto refuses lazy loading → forces explicit data needs
  • Oban refuses global names → enables multi-instance
  • Prometheus refuses DI frameworks → keeps simplicity
  • CockroachDB refuses context.Background on hot paths → forces timeout discipline

3. Testability is Never Retrofitted

Every system that tests well designed testing in from the start:

  • CockroachDB: leak detection, stopper tracking
  • Prometheus: goroutine leak verification, mock interfaces
  • Ecto: adapter abstraction, embedded schemas for testing
  • Oban: engine swap, testing modes, assertion helpers

4. Extension Points Define the Ecosystem Size

  • Prometheus: 1 interface, 30 discoverers
  • Ecto: 7 type callbacks, hundreds of custom types
  • Oban: Worker behaviour + 5 engine callbacks

Smaller interface → larger ecosystem. The less you demand from implementors, the more you get.