76f4bcc33e
Four documents examining codebases at module and ecosystem levels: - architectural-analysis.md — internal structure, dependency flow - ecosystem-analysis.md — consumer extension points, deliberate absences - crosscutting-analysis.md — logging, config, retry, lifecycle - testing-evolution-analysis.md — proof models, API evolution strategies
372 lines
11 KiB
Markdown
372 lines
11 KiB
Markdown
# Ecosystem-Level Patterns: How Codebases Present to Consumers
|
|
|
|
## The Three Questions
|
|
|
|
For each codebase, ask:
|
|
1. How do consumers **extend** it? (What interfaces/behaviours
|
|
do they implement?)
|
|
2. How do consumers **compose** with it? (What does day-to-day
|
|
usage look like?)
|
|
3. What does it deliberately **NOT do**? (What forces shaped
|
|
those refusals?)
|
|
|
|
---
|
|
|
|
## CockroachDB: Errors as First-Class Distributed Data
|
|
|
|
### Extension Points
|
|
|
|
CockroachDB is not a library — it is a system. Consumers
|
|
extend it through:
|
|
- **SQL builtins** (function registration)
|
|
- **Storage engines** (via pebble interface)
|
|
- **Service discovery** (not user-extensible — closed)
|
|
|
|
The interesting pattern is how errors flow from storage
|
|
through KV through SQL to the client.
|
|
|
|
### Error Architecture (ecosystem-level idiom)
|
|
|
|
```
|
|
Storage error → encoded via cockroachdb/errors →
|
|
KV wraps with context → serialized across gRPC →
|
|
SQL decodes → maps to pgcode → wire protocol to client
|
|
```
|
|
|
|
**Key design decisions:**
|
|
|
|
1. **Errors have priority.** `ErrPriority()` ranks errors so
|
|
the system knows which to surface when multiple things
|
|
fail simultaneously. Transaction abort > restart >
|
|
unambiguous error > non-retriable.
|
|
|
|
2. **Errors survive serialization.** `EncodeError` /
|
|
`DecodeError` serialize errors across RPC boundaries.
|
|
The error that originated on node 3 arrives at node 1
|
|
with its full cause chain intact.
|
|
|
|
3. **Errors map to pg codes.** Every internal error maps to
|
|
a Postgres error code that clients understand. This is
|
|
the *ecosystem contract* — clients write
|
|
`if pgcode == '40001' { retry }`.
|
|
|
|
**What this teaches:** In a distributed system, an error
|
|
isn't a string — it's a data object with identity,
|
|
priority, serializability, and a consumer-facing code.
|
|
Design your error types for the *consumer*, not the
|
|
*producer*.
|
|
|
|
### Deliberate Absences
|
|
|
|
- **No dependency injection framework.** Config structs
|
|
passed explicitly. 1178-line `StoreConfig` struct, but
|
|
it's all data — no framework magic.
|
|
- **No context.Background() on hot paths.** 144 uses in
|
|
kvserver, but auditable — each justified in comments.
|
|
- **No functional options.** CockroachDB uses config
|
|
structs universally. The Option interface in stopper is
|
|
the exception, not the rule.
|
|
|
|
### Test Architecture
|
|
|
|
- **TestMain in every package.** Sets up security certs,
|
|
random seeds, and test server factories.
|
|
- **Goroutine leak detection.** `leaktest.AfterTest(t)()`
|
|
at the start of every test. Detects leaked goroutines
|
|
by diffing goroutine stacks before/after.
|
|
- **Stopper leak detection.** Every Stopper is tracked
|
|
globally; `PrintLeakedStoppers(t)` in TestMain catches
|
|
forgot-to-stop bugs.
|
|
- **`//go:generate` for test setup.** Codegen tool
|
|
(`add-leaktest.sh`) auto-adds leak checks to every
|
|
test file.
|
|
|
|
**What this teaches:** At scale, the most important test
|
|
infrastructure isn't assertions — it's resource leak
|
|
detection. Every goroutine, every connection, every
|
|
Stopper is tracked and verified to be cleaned up.
|
|
|
|
---
|
|
|
|
## Prometheus: The One-Method Interface Contract
|
|
|
|
### Extension Points
|
|
|
|
Prometheus is extended through:
|
|
- **Service discovery** (30 implementations, 1 interface)
|
|
- **Storage** (remote read/write adapters)
|
|
- **Exporters** (client_golang metrics)
|
|
|
|
### The Discoverer Pattern (ecosystem-level idiom)
|
|
|
|
```go
|
|
type Discoverer interface {
|
|
Run(ctx context.Context, up chan<- []*targetgroup.Group)
|
|
}
|
|
```
|
|
|
|
This is **one method**. Thirty implementations. The
|
|
channel-based push model means:
|
|
- The discoverer controls timing (not polled)
|
|
- The manager multiplexes without knowing implementations
|
|
- Adding a new discovery source = implement Run, register
|
|
|
|
**Registration via init():**
|
|
```go
|
|
func init() {
|
|
discovery.RegisterConfig(&SDConfig{})
|
|
}
|
|
```
|
|
|
|
This is the classic Go plugin pattern. Import the package
|
|
→ init registers it → the system discovers it at startup.
|
|
|
|
**What this teaches:** The smallest possible interface
|
|
creates the largest possible ecosystem. One method + one
|
|
channel = 30 implementations without coordination.
|
|
|
|
### Storage Contract (15 interfaces, 1 file)
|
|
|
|
All of Prometheus's storage contract lives in
|
|
`storage/interface.go`. This is the:
|
|
- Read path: `Queryable → Querier → SeriesSet → Series`
|
|
- Write path: `Appendable → Appender`
|
|
- Extension: `ExemplarAppender`, `MetadataUpdater`
|
|
|
|
**Key:** Every implementation proves satisfaction at
|
|
compile time with `var _ storage.Searcher = &type{}`.
|
|
When the contract evolves, the compiler finds every
|
|
broken implementation.
|
|
|
|
### Deliberate Absences
|
|
|
|
- **No generics in storage interfaces.** Despite Go 1.20+
|
|
support. The interfaces predate generics and adding them
|
|
would break all existing implementations.
|
|
- **No dependency injection.** Direct struct construction
|
|
everywhere. Testability through interface satisfaction,
|
|
not framework wiring.
|
|
- **Almost no functional options.** Only in leaf packages
|
|
(chunk writer, parser). Core APIs use config structs.
|
|
- **No goroutine leak in production code.** `goleak` in
|
|
tests, `TolerantVerifyLeak` with explicit allowlist for
|
|
known third-party leaks.
|
|
|
|
### Test Architecture
|
|
|
|
- **`TolerantVerifyLeak`** — goroutine leak detection with
|
|
allowlist for known third-party leaks (opencensus, klog)
|
|
- **Mock implementations of every interface** — defined
|
|
right in `storage/interface.go` next to the real ones
|
|
- **Golden file tests** in PromQL evaluation
|
|
|
|
---
|
|
|
|
## Ecto: Composability as Architectural Principle
|
|
|
|
### Extension Points
|
|
|
|
Consumers extend Ecto through:
|
|
- **Custom types** (7 callbacks: cast, load, dump, equal?,
|
|
embed_as, autogenerate, type)
|
|
- **Adapters** (Queryable, Schema, Transaction, Storage —
|
|
4 behaviour modules)
|
|
- **Protocols** (`Ecto.Queryable` — anything can become a
|
|
query)
|
|
|
|
### The NotLoaded Sentinel (ecosystem-level idiom)
|
|
|
|
```elixir
|
|
defmodule Ecto.Association.NotLoaded do
|
|
defstruct [:__field__, :__owner__, :__cardinality__]
|
|
end
|
|
```
|
|
|
|
Ecto **refuses to lazy-load associations**. If you access
|
|
`user.posts` without preloading, you get a `NotLoaded`
|
|
struct — not nil, not an empty list, not a database query.
|
|
|
|
**Why this is an ecosystem decision:**
|
|
- Forces consumers to be explicit about data needs
|
|
- Prevents N+1 queries by making them impossible
|
|
- Makes the data boundary visible in code
|
|
|
|
This is a *consumer-hostile* decision that makes
|
|
*systems built on Ecto* dramatically better. The library
|
|
optimizes for the 1000th user, not the first-day
|
|
experience.
|
|
|
|
### Query Composition (ecosystem-level idiom)
|
|
|
|
Every query clause appends to a list in the Query struct.
|
|
Nothing executes. The Query is pure data that accumulates
|
|
intent.
|
|
|
|
**Consumer impact:** You can build queries across module
|
|
boundaries:
|
|
|
|
```elixir
|
|
# Module A builds the base
|
|
def active_users, do: from(u in User, where: u.active)
|
|
|
|
# Module B adds pagination
|
|
def paginate(query, page, size) do
|
|
query
|
|
|> limit(^size)
|
|
|> offset(^((page - 1) * size))
|
|
end
|
|
|
|
# Module C adds authorization
|
|
def visible_to(query, role) do
|
|
where(query, [u], u.role in ^roles_for(role))
|
|
end
|
|
```
|
|
|
|
Each module is independent. They compose because queries
|
|
are data, not effects.
|
|
|
|
### Adapter Architecture
|
|
|
|
```
|
|
Ecto.Repo.all(query)
|
|
→ Planner resolves types, bindings
|
|
→ Adapter.prepare/2 produces {cache, prepared}
|
|
→ Adapter.execute/5 runs against DB
|
|
→ Adapter.loaders/2 converts back to Elixir types
|
|
```
|
|
|
|
The adapter is the ONLY part that knows SQL. Ecto core
|
|
is database-agnostic. This is why the same code works on
|
|
Postgres, MySQL, SQLite, and custom stores.
|
|
|
|
### Deliberate Absences
|
|
|
|
- **No lazy loading.** `NotLoaded` struct instead.
|
|
- **No global state.** Per-repo config, per-repo process.
|
|
- **No query caching at library level.** The adapter
|
|
caches prepared statements; Ecto doesn't.
|
|
- **No connection to schema naming.** `schema "legacy_tbl"`
|
|
is independent of `defmodule NewUser`.
|
|
|
|
---
|
|
|
|
## Oban: Designing for Testability First
|
|
|
|
### Extension Points
|
|
|
|
Consumers extend Oban through:
|
|
- **Workers** (`perform/1` — the job logic)
|
|
- **Plugins** (GenServer + validate callback)
|
|
- **Engines** (entire backend swap)
|
|
- **Notifiers** (pub/sub mechanism)
|
|
- **Peers** (leader election)
|
|
|
|
### The Worker Result Type (ecosystem-level idiom)
|
|
|
|
```elixir
|
|
@type result ::
|
|
:ok
|
|
| {:ok, ignored :: term()}
|
|
| {:error, reason :: term()}
|
|
| {:cancel, reason :: term()}
|
|
| {:snooze, period :: Period.t()}
|
|
```
|
|
|
|
Five possible outcomes, each with distinct semantics:
|
|
- `:ok` → success, remove from queue
|
|
- `{:error, reason}` → retry (respects max_attempts)
|
|
- `{:cancel, reason}` → permanent failure, don't retry
|
|
- `{:snooze, period}` → reschedule for later
|
|
|
|
**Ecosystem impact:** Every worker author makes an
|
|
explicit decision about failure semantics. "What should
|
|
happen when this fails?" is answered in the type system,
|
|
not in configuration.
|
|
|
|
### Contextual Backoff (ecosystem-level idiom)
|
|
|
|
```elixir
|
|
def backoff(%Job{attempt: attempt, unsaved_error: err}) do
|
|
case err.reason do
|
|
%RateLimitError{retry_after: ms} -> ms
|
|
_ -> trunc(:math.pow(attempt, 4) + jitter())
|
|
end
|
|
end
|
|
```
|
|
|
|
The error that caused the failure is available to the
|
|
backoff calculation. Different errors → different retry
|
|
strategies. This is impossible in systems where backoff
|
|
is configured globally.
|
|
|
|
### Testing Design (ecosystem-level idiom)
|
|
|
|
Three testing modes via config:
|
|
- **`:inline`** — execute jobs synchronously in tests
|
|
- **`:manual`** — enqueue but don't execute
|
|
- **`:disabled`** — production behavior
|
|
|
|
Plus `use Oban.Testing` which provides:
|
|
- `assert_enqueued/1` — verify job was queued
|
|
- `refute_enqueued/1` — verify job was NOT queued
|
|
- `perform_job/2` — execute a job manually in tests
|
|
- `all_enqueued/1` — list all matching jobs
|
|
|
|
**Ecosystem impact:** Every Oban consumer gets
|
|
deterministic, fast, isolated tests for free. No sleep,
|
|
no polling, no flaky async assertions.
|
|
|
|
### Deliberate Absences
|
|
|
|
- **No global process names.** Registry.via everywhere —
|
|
multiple Oban instances can coexist.
|
|
- **No direct DB coupling in workers.** Workers receive a
|
|
Job struct; they don't import Repo.
|
|
- **No implicit retries.** max_attempts is explicit per
|
|
worker. No "retry forever" default.
|
|
- **No built-in rate limiting in OSS.** That is a Pro
|
|
feature — deliberate business boundary.
|
|
|
|
---
|
|
|
|
## Cross-Cutting: What "Idiomatic" Means at Ecosystem Level
|
|
|
|
### 1. The Consumer Contract is the API
|
|
|
|
Not the functions you export — the *experience* of
|
|
building on your system:
|
|
- CockroachDB: "Your errors will be pg-codes, always"
|
|
- Prometheus: "Implement Run(), get discovery for free"
|
|
- Ecto: "Queries are data; loading is always explicit"
|
|
- Oban: "Return a result type; testing is built in"
|
|
|
|
### 2. Deliberate Absences Define Character
|
|
|
|
What a system refuses to do is as important as what it
|
|
does:
|
|
- Ecto refuses lazy loading → forces explicit data needs
|
|
- Oban refuses global names → enables multi-instance
|
|
- Prometheus refuses DI frameworks → keeps simplicity
|
|
- CockroachDB refuses context.Background on hot paths →
|
|
forces timeout discipline
|
|
|
|
### 3. Testability is Never Retrofitted
|
|
|
|
Every system that tests well designed testing in from the
|
|
start:
|
|
- CockroachDB: leak detection, stopper tracking
|
|
- Prometheus: goroutine leak verification, mock interfaces
|
|
- Ecto: adapter abstraction, embedded schemas for testing
|
|
- Oban: engine swap, testing modes, assertion helpers
|
|
|
|
### 4. Extension Points Define the Ecosystem Size
|
|
|
|
- Prometheus: 1 interface, 30 discoverers
|
|
- Ecto: 7 type callbacks, hundreds of custom types
|
|
- Oban: Worker behaviour + 5 engine callbacks
|
|
|
|
**Smaller interface → larger ecosystem.** The less you
|
|
demand from implementors, the more you get.
|
|
|
|
<!-- PATTERN_COMPLETE -->
|