Files
patterns-vs-guidelines/analysis/purity-addendum-archaeology.md
Rodin ea9fa61d1e docs: pattern break archaeology — WHY (not just WHAT) is impure
Git blame + commit messages reveal four categories:
1. Ship behavior, fix plumbing later (time pressure)
2. Better tooling exposed limitations (observability)
3. Removal cost > carrying cost (zero-interest debt)
4. Context needs different patterns (not actually a break)

Specific findings:
- CockroachDB Handle: introduced for profiling, not correctness
- Stale 22.2 TODO: struct field costs 0, removal touches many files
- Prometheus globals: startup constants read in hot path = correct call
- AppenderV2: unifies 4 type-asserted interfaces into 1
- Ecto zero TODOs: version-gated, cleaned on version bumps by José
2026-04-30 11:07:12 -07:00

7.5 KiB

Addendum: The Archaeology — WHY Patterns Break

The previous doc showed WHAT is impure. This digs into WHY — based on git blame, commit messages, and PR discussions.


CockroachDB: The Bare Goroutine in kvadmission

The code:

// TODO(irfansharif): Use a stopper here instead.
go func() {
    ticker := time.NewTicker(10 * time.Minute)
    for { select { case <-ticker.C: ... } }
}()

The history:

  • 2022-08-22: Irfan Sharif introduces the elastic CPU limiter (admission,kvserver: introduce an elastic CPU limiter). This was a complex new system for preventing long-running elastic work (backups) from starving foreground traffic.
  • The goroutine was part of a weight calculation loop. Irfan marked it with a TODO in the same commit.

Why the shortcut: The commit was already touching 20+ files introducing a novel admission control system. Getting the Stopper integration right would have been one more thing to debug in an already complex change. Ship the behavior first, fix the plumbing later.

Why it survived 3+ years: The function receives a stopper as a parameter. It could be fixed in 5 lines. But nobody touched this function for any other reason, and the goroutine leaks nowhere (it lives for the process lifetime). The TODO is correct but not urgent.


CockroachDB: The Handle Pattern's Origin

The code (new way):

ctx, hdl, err := s.GetHandle(ctx, opts)
go func(ctx context.Context) {
    defer hdl.Activate(ctx).Release(ctx)
    taskFn(ctx)
}(ctx)

Why it replaced RunAsyncTask (Feb 2025):

Tobias Grieger's commit message explains it perfectly:

Prior to this change, the only option to launch async tasks was the stopper's RunAsyncTask{,Ex} method. This method would accept a closure and invoke it in a goroutine.

This has the undesirable property of recording a location in Stopper.RunAsyncTaskEx as the async task's creating goroutine plus its lowermost call frame. Anyone who's ever looked at any profile will be familiar with this fact.

In other words, it becomes an ordeal to search for a specific goroutine, since all goroutines started by the Stopper get assigned the same searchable identity.

The force: Profiling. When every goroutine has the same root frame (RunAsyncTaskEx), CPU profiles and execution traces become unreadable. The new API makes callers launch goroutines directly so each has a unique stack identity in profiling tools.

The migration strategy: Old API stays. New code uses Handle. Conversion happens "as follow-up work" (the commit's words). Today: 295 Handle vs 63 RunAsyncTask. The ratio will shift naturally as old code is touched.


CockroachDB: The Stale "Remove in 22.2" TODO

The code:

// TODO(ajwerner): Remove in 22.2.
SystemConfigProvider config.SystemConfigProvider

The history:

  • 2022-02-09: Andrew Werner lands "stop gossiping the system config." The span configuration infrastructure replaces the old gossip-based system config.
  • SystemConfigProvider was the bridge for mixed-version clusters (some nodes on old gossip, some on new spans).
  • The TODO expected this to be removable in 22.2 (the next major release), when all clusters would be past the migration.

Why it survived: The field is in a config struct. It's set but only used in the "mixed-version state." Once all clusters upgraded, it became dead weight — but it doesn't break anything. It's a struct field that consumes one pointer of memory. The cost of carrying it is near zero. The cost of removing it is touching StoreConfig initialization across many call sites.

Lesson: TODOs survive when removal cost > carrying cost. The more a struct is embedded in initialization paths, the harder it is to remove fields.


Prometheus: Why Global Vars Instead of Config

The code:

var ScrapeTimestampTolerance = 2 * time.Millisecond
var AlignScrapeTimestamps = true

The history:

  • 2020-10-07: Julien Pivotto adds "experimental, hidden flag" for scrape timestamp tolerance.
  • 2021-09-08: Made public after production data showed 10ms tolerance saves significant disk space. "For users who can afford a 10ms difference, there is a lot of resources and disk space to win."

Why a global var:

  1. Set once at startup from a command-line flag
  2. Read thousands of times per second (every scrape)
  3. Never changes after startup
  4. Threading a config struct would require adding a parameter to scrapeLoop, scrapeLoopAppender, and every function in the hot path — for a value that's effectively a constant.

When this IS the right call: A startup-configured value that never changes and is read in a hot path is semantically a constant. Making it a config struct field adds indirection for no benefit. The global is honest about what this value IS: a process-lifetime constant that happens to be configurable at init.


Prometheus: WHY AppenderV2

The problem: V1 Appender has separate interfaces for different data types:

  • Append() — float64 samples
  • AppendExemplar() — exemplar data
  • AppendHistogram() — native histograms

Each requires type-assertion to check if the appender supports it. The scrape loop has 4 different code paths depending on which interfaces the underlying storage implements.

V2 unifies them: Single Append() method that accepts all types. No type assertions needed. The scrape loop becomes one code path.

The migration pattern: An adapter interface (scrapeLoopAppendAdapter) makes both V1 and V2 callable through the same code. The selection happens once at construction time. ETA for V1 removal: Q2 2026 (it's in the source comments).


Ecto: Zero TODOs Is Discipline, Not Accident

Evidence from git history:

  • TODOs ARE added. Example: "TODO: Remove changeset.empty_values field" (added 2023-10-01)
  • They include deadlines: "in Ecto v3.14"
  • They get cleaned in dedicated "Update TODOs" commits by José Valim, triggered by version bumps

The pattern: TODOs are conditional on Elixir/Erlang version requirements. When Ecto bumps its minimum supported Elixir version, all TODOs gated on that version become actionable. José runs through them in a single commit.

Oban (same pattern): Parker Selbert removes TODOs within ~2 weeks of adding them. The solo-maintainer model means one person's standards apply uniformly.


The Meta-Lesson: WHY Patterns Break

After archaeology, pattern breaks fall into exactly four categories:

  1. "Ship the behavior, fix the plumbing later" (kvadmission goroutine). Time pressure on complex features. The author KNOWS it's wrong. Tagged with TODO immediately. Survives because removal isn't urgent.

  2. "Better tooling exposed the limitation" (Handle replacing RunAsyncTask). The old pattern worked correctly but was invisible to profiling. The new pattern was motivated by observability, not correctness.

  3. "The removal cost exceeds the carrying cost" (stale 22.2 TODO, global vars). Technically debt, but the interest rate is zero. Nobody will fix what doesn't hurt.

  4. "Different context needs different patterns" (functional options in chunk writer, fmt.Errorf in CLI tools). Not actually a break — it's the right pattern for that specific context.

For Code Review:

When you see a pattern break, ask: which category?

  • Category 1 → worth filing a TODO, not blocking the PR
  • Category 2 → this is how progress happens; track it
  • Category 3 → leave it alone
  • Category 4 → approve it; context matters