Files
patterns-vs-guidelines/analysis/testing-evolution-analysis.md
Rodin 76f4bcc33e docs: architectural analysis of top repos (CockroachDB, Prometheus, Ecto, Oban)
Four documents examining codebases at module and ecosystem levels:
- architectural-analysis.md — internal structure, dependency flow
- ecosystem-analysis.md — consumer extension points, deliberate absences
- crosscutting-analysis.md — logging, config, retry, lifecycle
- testing-evolution-analysis.md — proof models, API evolution strategies
2026-04-30 10:50:54 -07:00

8.5 KiB
Raw Permalink Blame History

Testing Philosophy & API Evolution

How codebases prove correctness and manage change over time reveals their deepest architectural commitments.


Testing Philosophy: Four Models of Proof

CockroachDB: Defense in Depth

Levels of proof:

  1. Unit tests — co-located in same package
  2. Echotest/golden files — snapshot expected output (209 testdata directories, auto-rewrite with -rewrite flag)
  3. Data-driven tests — declarative test specs in txt files
  4. KVNemesis — chaos/fuzzing that generates random KV operations and checks linearizability
  5. Leak detection — goroutines, stoppers tracked globally

The echotest pattern:

echotest.Require(t, output, filepath.Join("testdata", name+".txt"))

Golden file says:

echo
----
result is ambiguous: boom with a secret
result is ambiguous: boom with a secret

The test produces output, compares against the golden file. Run with -rewrite to update. This means:

  • Tests are self-documenting (the golden file IS the spec)
  • Regressions are visible in diffs (the golden file changes)
  • No manual expected-value maintenance

KVNemesis (chaos testing at ecosystem level): Generates random sequences of KV operations (puts, gets, splits, merges, transfers) against a real cluster, then validates that results satisfy serializable isolation.

This isn't unit testing. This is proving the system is correct, not individual functions.

Resource leak detection as CI gate:

// Every test file
defer leaktest.AfterTest(t)()

// Every TestMain
func init() {
    leaktest.PrintLeakedStoppers = PrintLeakedStoppers
}

If a test leaks a goroutine or Stopper, it fails. Not a warning. A failure. This means resource correctness is as enforceable as logic correctness.

Prometheus: Golden Files + Goroutine Verification

Testing DSL for PromQL:

load 5m
  http_requests{job="api-server"} 0+10x10

eval instant at 50m SUM BY (group) (http_requests)
  {group="canary"} 700
  {group="production"} 300

This is a custom test language. Load data, evaluate expressions, assert results. 205 test config files in config/testdata/ alone.

Force: PromQL is complex enough that example-based testing would be insufficient. The DSL lets you write hundreds of test cases concisely, covering edge cases that would require dozens of Go test functions.

Goroutine leak detection:

func TolerantVerifyLeak(m *testing.M) {
    goleak.VerifyTestMain(m,
        goleak.IgnoreTopFunction("go.opencensus.io/..."),
        goleak.IgnoreTopFunction("k8s.io/klog/..."),
    )
}

Explicit allowlist for known third-party leaks. Everything else is a test failure. Zero-tolerance with escape hatches for unfixable external dependencies.

Ecto: Fake Adapter + Process Mailbox Assertions

defmodule Ecto.TestAdapter do
  @behaviour Ecto.Adapter
  @behaviour Ecto.Adapter.Queryable
  @behaviour Ecto.Adapter.Schema
  @behaviour Ecto.Adapter.Transaction

  def execute(_, _, {:nocache, {:all, query}}, _, _) do
    send(self(), {:all, query})
    Process.get(:test_repo_all_results) || results_for_all_query(query)
  end
end

Ecto tests the entire query pipeline without a database. The fake adapter:

  • Sends messages to self() on every operation
  • Tests assert on receive {:insert, meta} etc.
  • No network, no state, pure message-passing verification

48 test files, 43 with async: true. The test suite runs in parallel because there's no shared state — every test talks to its own process mailbox.

Force: Ecto is a library, not a service. It can't require Postgres in CI for every contributor. The fake adapter makes the entire query compilation + planning pipeline testable without external dependencies.

Oban: Testing Modes as First-Class Feature

# In test config
config :my_app, Oban, testing: :inline

# In test
use Oban.Testing, repo: MyApp.Repo

test "job was enqueued" do
  assert_enqueued worker: MyWorker, args: %{id: 1}
end

test "job executes correctly" do
  assert :ok = perform_job(MyWorker, %{id: 1})
end

Three modes:

  • :inline — jobs execute synchronously in the test process. No GenServers, no queues, no async.
  • :manual — jobs are enqueued but not executed. Use assert_enqueued to verify they were created.
  • :disabled — production behavior in tests.

Force: Background jobs are the #1 source of test flakiness. Oban eliminates it by making the execution model configurable. Tests never poll, never sleep, never race.


API Evolution: Three Strategies

CockroachDB: Version Gates (Distributed Migration)

const (
    V26_2_AddStatementStatisticsComputedColumns Key = iota
    V26_2_ChangefeedsStopReadingSpanLevelCheckpoints
    V26_2_ChangefeedsStopWritingSpanLevelCheckpoints
)

// In code:
if settings.Version.IsActive(ctx, clusterversion.V26_2) {
    // use new behavior
}

The pattern: Every change to observable behavior gets a version constant. The feature is only enabled when ALL nodes in the cluster have been upgraded past that version.

Two-phase deprecation for distributed changes:

V26_2_ChangefeedsStopReadingSpanLevelCheckpoints
V26_2_ChangefeedsStopWritingSpanLevelCheckpoints
V26_2_ChangefeedsNoLongerHaveSpanLevelCheckpoints

Three versions for one removal:

  1. Stop reading (new code doesn't depend on old format)
  2. Stop writing (old format no longer produced)
  3. Clean up (safe to remove the old code)

Force: In a distributed database, you can't change behavior atomically. Some nodes will be old, some new. The version gate ensures new behavior only activates when it's safe — when all nodes understand it.

Pruning: Once MinSupported advances past a version constant, it's deleted. The code path is always active so the IsActive check becomes dead code. Regular pruning keeps the codebase from accumulating gates.

Oban: Numbered Migrations (Schema Evolution)

lib/oban/migrations/postgres/
├── v01.ex  # Initial schema (job table, state enum)
├── v02.ex  # Add columns
├── v03.ex  # Index optimization
...
├── v14.ex  # Latest

Each migration is:

  • Idempotent (safe to run twice)
  • Prefix-aware (multi-tenant schemas)
  • Bidirectional (up + down)
  • Database-specific (postgres/, sqlite/, myxql/)

Consumer usage:

defmodule MyApp.Repo.Migrations.AddOban do
  use Ecto.Migration
  def up, do: Oban.Migrations.up(version: 14)
  def down, do: Oban.Migrations.down(version: 14)
end

Force: Oban owns a database table but lives inside the consumer's migration system. Numbered versions let consumers upgrade incrementally without knowing Oban internals.

Ecto: Compile-Time Deprecation + Semver

# In changeset.ex
IO.warn(
  "passing a list of binaries to cast/3 is deprecated..."
)

Ecto deprecates at compile time. When you compile code that uses a deprecated API, you get a warning. At runtime, everything still works.

CHANGELOG as contract:

## v3.14.0-dev
### Enhancements
### Bug fixes

## v3.13.5 (2025-11-09)
### Enhancements

The changelog is the API evolution document. Breaking changes require a major version bump (hasn't happened in years because the adapter pattern provides extensibility without breakage).


What This Teaches for Code Review

Testing Questions:

  1. Is this testable without standing up the system? (Ecto's fake adapter, Oban's inline engine)
  2. Are resources tracked and leak-detected? (CockroachDB's stopper/goroutine tracking)
  3. Are test assertions deterministic? No sleep, no poll, no "eventually consistent" in unit tests.
  4. Could this be a golden file test? If the output is deterministic, snapshot it. Regression = visible diff.
  5. Is there chaos/property testing for invariants? (KVNemesis for linearizability)

Evolution Questions:

  1. Can this change be deployed gradually? Or does it require all consumers to upgrade atomically?
  2. Is there a two-phase path? (Stop reading → stop writing → remove)
  3. Is the deprecation visible at compile time? Or will consumers only discover it at runtime?
  4. Is the migration idempotent? Can it be run twice safely?

Red Flags:

  • Tests that require a running database for unit-level logic
  • No resource leak detection in concurrent code
  • time.Sleep / Process.sleep in tests instead of deterministic signals
  • Breaking changes without version gates or migration path
  • Deprecation that only appears in docs, not in tooling