docs: pattern break archaeology — WHY (not just WHAT) is impure

Git blame + commit messages reveal four categories:
1. Ship behavior, fix plumbing later (time pressure)
2. Better tooling exposed limitations (observability)
3. Removal cost > carrying cost (zero-interest debt)
4. Context needs different patterns (not actually a break)

Specific findings:
- CockroachDB Handle: introduced for profiling, not correctness
- Stale 22.2 TODO: struct field costs 0, removal touches many files
- Prometheus globals: startup constants read in hot path = correct call
- AppenderV2: unifies 4 type-asserted interfaces into 1
- Ecto zero TODOs: version-gated, cleaned on version bumps by José
This commit is contained in:
Rodin
2026-04-30 11:07:12 -07:00
parent 9961f939c5
commit ea9fa61d1e
+234
View File
@@ -0,0 +1,234 @@
# Addendum: The Archaeology — WHY Patterns Break
The previous doc showed WHAT is impure. This digs into
WHY — based on git blame, commit messages, and PR
discussions.
---
## CockroachDB: The Bare Goroutine in kvadmission
**The code:**
```go
// TODO(irfansharif): Use a stopper here instead.
go func() {
ticker := time.NewTicker(10 * time.Minute)
for { select { case <-ticker.C: ... } }
}()
```
**The history:**
- **2022-08-22:** Irfan Sharif introduces the elastic CPU
limiter (`admission,kvserver: introduce an elastic CPU
limiter`). This was a complex new system for preventing
long-running elastic work (backups) from starving
foreground traffic.
- The goroutine was part of a weight calculation loop.
Irfan marked it with a TODO *in the same commit*.
**Why the shortcut:** The commit was already touching
20+ files introducing a novel admission control system.
Getting the Stopper integration right would have been
one more thing to debug in an already complex change.
Ship the behavior first, fix the plumbing later.
**Why it survived 3+ years:** The function *receives* a
stopper as a parameter. It could be fixed in 5 lines.
But nobody touched this function for any other reason,
and the goroutine leaks nowhere (it lives for the
process lifetime). The TODO is correct but not urgent.
---
## CockroachDB: The Handle Pattern's Origin
**The code (new way):**
```go
ctx, hdl, err := s.GetHandle(ctx, opts)
go func(ctx context.Context) {
defer hdl.Activate(ctx).Release(ctx)
taskFn(ctx)
}(ctx)
```
**Why it replaced RunAsyncTask (Feb 2025):**
Tobias Grieger's commit message explains it perfectly:
> Prior to this change, the only option to launch async
> tasks was the stopper's `RunAsyncTask{,Ex}` method.
> This method would accept a closure and invoke it in a
> goroutine.
>
> This has the undesirable property of recording a
> location in `Stopper.RunAsyncTaskEx` as the async
> task's creating goroutine plus its lowermost call
> frame. Anyone who's ever looked at any profile will be
> familiar with this fact.
>
> In other words, it becomes an ordeal to search for a
> specific goroutine, since all goroutines started by the
> Stopper get assigned the same searchable identity.
**The force:** Profiling. When every goroutine has the
same root frame (`RunAsyncTaskEx`), CPU profiles and
execution traces become unreadable. The new API makes
callers launch goroutines directly so each has a unique
stack identity in profiling tools.
**The migration strategy:** Old API stays. New code uses
Handle. Conversion happens "as follow-up work" (the
commit's words). Today: 295 Handle vs 63 RunAsyncTask.
The ratio will shift naturally as old code is touched.
---
## CockroachDB: The Stale "Remove in 22.2" TODO
**The code:**
```go
// TODO(ajwerner): Remove in 22.2.
SystemConfigProvider config.SystemConfigProvider
```
**The history:**
- **2022-02-09:** Andrew Werner lands "stop gossiping the
system config." The span configuration infrastructure
replaces the old gossip-based system config.
- `SystemConfigProvider` was the bridge for mixed-version
clusters (some nodes on old gossip, some on new spans).
- The TODO expected this to be removable in 22.2 (the
next major release), when all clusters would be past
the migration.
**Why it survived:** The field is in a config struct.
It's set but only used in the "mixed-version state."
Once all clusters upgraded, it became dead weight — but
it doesn't *break* anything. It's a struct field that
consumes one pointer of memory. The cost of carrying it
is near zero. The cost of removing it is touching
`StoreConfig` initialization across many call sites.
**Lesson:** TODOs survive when removal cost > carrying
cost. The more a struct is embedded in initialization
paths, the harder it is to remove fields.
---
## Prometheus: Why Global Vars Instead of Config
**The code:**
```go
var ScrapeTimestampTolerance = 2 * time.Millisecond
var AlignScrapeTimestamps = true
```
**The history:**
- **2020-10-07:** Julien Pivotto adds "experimental,
hidden flag" for scrape timestamp tolerance.
- **2021-09-08:** Made public after production data
showed 10ms tolerance saves significant disk space.
"For users who can afford a 10ms difference, there is
a lot of resources and disk space to win."
**Why a global var:**
1. Set once at startup from a command-line flag
2. Read thousands of times per second (every scrape)
3. Never changes after startup
4. Threading a config struct would require adding a
parameter to `scrapeLoop`, `scrapeLoopAppender`, and
every function in the hot path — for a value that's
effectively a constant.
**When this IS the right call:** A startup-configured
value that never changes and is read in a hot path is
semantically a constant. Making it a config struct field
adds indirection for no benefit. The global is honest
about what this value IS: a process-lifetime constant
that happens to be configurable at init.
---
## Prometheus: WHY AppenderV2
**The problem:** V1 Appender has separate interfaces for
different data types:
- `Append()` — float64 samples
- `AppendExemplar()` — exemplar data
- `AppendHistogram()` — native histograms
Each requires type-assertion to check if the appender
supports it. The scrape loop has 4 different code paths
depending on which interfaces the underlying storage
implements.
**V2 unifies them:** Single `Append()` method that
accepts all types. No type assertions needed. The
scrape loop becomes one code path.
**The migration pattern:** An adapter interface
(`scrapeLoopAppendAdapter`) makes both V1 and V2
callable through the same code. The selection happens
once at construction time. ETA for V1 removal: Q2 2026
(it's in the source comments).
---
## Ecto: Zero TODOs Is Discipline, Not Accident
**Evidence from git history:**
- TODOs ARE added. Example: "TODO: Remove
changeset.empty_values field" (added 2023-10-01)
- They include *deadlines*: "in Ecto v3.14"
- They get cleaned in dedicated "Update TODOs" commits
by José Valim, triggered by version bumps
**The pattern:** TODOs are conditional on Elixir/Erlang
version requirements. When Ecto bumps its minimum
supported Elixir version, all TODOs gated on that
version become actionable. José runs through them in a
single commit.
**Oban (same pattern):** Parker Selbert removes TODOs
within ~2 weeks of adding them. The solo-maintainer
model means one person's standards apply uniformly.
---
## The Meta-Lesson: WHY Patterns Break
After archaeology, pattern breaks fall into exactly
four categories:
1. **"Ship the behavior, fix the plumbing later"**
(kvadmission goroutine). Time pressure on complex
features. The author KNOWS it's wrong. Tagged with
TODO immediately. Survives because removal isn't
urgent.
2. **"Better tooling exposed the limitation"**
(Handle replacing RunAsyncTask). The old pattern
worked correctly but was invisible to profiling. The
new pattern was motivated by observability, not
correctness.
3. **"The removal cost exceeds the carrying cost"**
(stale 22.2 TODO, global vars). Technically debt, but
the interest rate is zero. Nobody will fix what
doesn't hurt.
4. **"Different context needs different patterns"**
(functional options in chunk writer, fmt.Errorf in CLI
tools). Not actually a break — it's the right pattern
for that specific context.
### For Code Review:
When you see a pattern break, ask: *which category?*
- Category 1 → worth filing a TODO, not blocking the PR
- Category 2 → this is how progress happens; track it
- Category 3 → leave it alone
- Category 4 → approve it; context matters
<!-- PATTERN_COMPLETE -->