Git blame + commit messages reveal four categories: 1. Ship behavior, fix plumbing later (time pressure) 2. Better tooling exposed limitations (observability) 3. Removal cost > carrying cost (zero-interest debt) 4. Context needs different patterns (not actually a break) Specific findings: - CockroachDB Handle: introduced for profiling, not correctness - Stale 22.2 TODO: struct field costs 0, removal touches many files - Prometheus globals: startup constants read in hot path = correct call - AppenderV2: unifies 4 type-asserted interfaces into 1 - Ecto zero TODOs: version-gated, cleaned on version bumps by José
7.5 KiB
Addendum: The Archaeology — WHY Patterns Break
The previous doc showed WHAT is impure. This digs into WHY — based on git blame, commit messages, and PR discussions.
CockroachDB: The Bare Goroutine in kvadmission
The code:
// TODO(irfansharif): Use a stopper here instead.
go func() {
ticker := time.NewTicker(10 * time.Minute)
for { select { case <-ticker.C: ... } }
}()
The history:
- 2022-08-22: Irfan Sharif introduces the elastic CPU
limiter (
admission,kvserver: introduce an elastic CPU limiter). This was a complex new system for preventing long-running elastic work (backups) from starving foreground traffic. - The goroutine was part of a weight calculation loop. Irfan marked it with a TODO in the same commit.
Why the shortcut: The commit was already touching 20+ files introducing a novel admission control system. Getting the Stopper integration right would have been one more thing to debug in an already complex change. Ship the behavior first, fix the plumbing later.
Why it survived 3+ years: The function receives a stopper as a parameter. It could be fixed in 5 lines. But nobody touched this function for any other reason, and the goroutine leaks nowhere (it lives for the process lifetime). The TODO is correct but not urgent.
CockroachDB: The Handle Pattern's Origin
The code (new way):
ctx, hdl, err := s.GetHandle(ctx, opts)
go func(ctx context.Context) {
defer hdl.Activate(ctx).Release(ctx)
taskFn(ctx)
}(ctx)
Why it replaced RunAsyncTask (Feb 2025):
Tobias Grieger's commit message explains it perfectly:
Prior to this change, the only option to launch async tasks was the stopper's
RunAsyncTask{,Ex}method. This method would accept a closure and invoke it in a goroutine.This has the undesirable property of recording a location in
Stopper.RunAsyncTaskExas the async task's creating goroutine plus its lowermost call frame. Anyone who's ever looked at any profile will be familiar with this fact.In other words, it becomes an ordeal to search for a specific goroutine, since all goroutines started by the Stopper get assigned the same searchable identity.
The force: Profiling. When every goroutine has the
same root frame (RunAsyncTaskEx), CPU profiles and
execution traces become unreadable. The new API makes
callers launch goroutines directly so each has a unique
stack identity in profiling tools.
The migration strategy: Old API stays. New code uses Handle. Conversion happens "as follow-up work" (the commit's words). Today: 295 Handle vs 63 RunAsyncTask. The ratio will shift naturally as old code is touched.
CockroachDB: The Stale "Remove in 22.2" TODO
The code:
// TODO(ajwerner): Remove in 22.2.
SystemConfigProvider config.SystemConfigProvider
The history:
- 2022-02-09: Andrew Werner lands "stop gossiping the system config." The span configuration infrastructure replaces the old gossip-based system config.
SystemConfigProviderwas the bridge for mixed-version clusters (some nodes on old gossip, some on new spans).- The TODO expected this to be removable in 22.2 (the next major release), when all clusters would be past the migration.
Why it survived: The field is in a config struct.
It's set but only used in the "mixed-version state."
Once all clusters upgraded, it became dead weight — but
it doesn't break anything. It's a struct field that
consumes one pointer of memory. The cost of carrying it
is near zero. The cost of removing it is touching
StoreConfig initialization across many call sites.
Lesson: TODOs survive when removal cost > carrying cost. The more a struct is embedded in initialization paths, the harder it is to remove fields.
Prometheus: Why Global Vars Instead of Config
The code:
var ScrapeTimestampTolerance = 2 * time.Millisecond
var AlignScrapeTimestamps = true
The history:
- 2020-10-07: Julien Pivotto adds "experimental, hidden flag" for scrape timestamp tolerance.
- 2021-09-08: Made public after production data showed 10ms tolerance saves significant disk space. "For users who can afford a 10ms difference, there is a lot of resources and disk space to win."
Why a global var:
- Set once at startup from a command-line flag
- Read thousands of times per second (every scrape)
- Never changes after startup
- Threading a config struct would require adding a
parameter to
scrapeLoop,scrapeLoopAppender, and every function in the hot path — for a value that's effectively a constant.
When this IS the right call: A startup-configured value that never changes and is read in a hot path is semantically a constant. Making it a config struct field adds indirection for no benefit. The global is honest about what this value IS: a process-lifetime constant that happens to be configurable at init.
Prometheus: WHY AppenderV2
The problem: V1 Appender has separate interfaces for different data types:
Append()— float64 samplesAppendExemplar()— exemplar dataAppendHistogram()— native histograms
Each requires type-assertion to check if the appender supports it. The scrape loop has 4 different code paths depending on which interfaces the underlying storage implements.
V2 unifies them: Single Append() method that
accepts all types. No type assertions needed. The
scrape loop becomes one code path.
The migration pattern: An adapter interface
(scrapeLoopAppendAdapter) makes both V1 and V2
callable through the same code. The selection happens
once at construction time. ETA for V1 removal: Q2 2026
(it's in the source comments).
Ecto: Zero TODOs Is Discipline, Not Accident
Evidence from git history:
- TODOs ARE added. Example: "TODO: Remove changeset.empty_values field" (added 2023-10-01)
- They include deadlines: "in Ecto v3.14"
- They get cleaned in dedicated "Update TODOs" commits by José Valim, triggered by version bumps
The pattern: TODOs are conditional on Elixir/Erlang version requirements. When Ecto bumps its minimum supported Elixir version, all TODOs gated on that version become actionable. José runs through them in a single commit.
Oban (same pattern): Parker Selbert removes TODOs within ~2 weeks of adding them. The solo-maintainer model means one person's standards apply uniformly.
The Meta-Lesson: WHY Patterns Break
After archaeology, pattern breaks fall into exactly four categories:
-
"Ship the behavior, fix the plumbing later" (kvadmission goroutine). Time pressure on complex features. The author KNOWS it's wrong. Tagged with TODO immediately. Survives because removal isn't urgent.
-
"Better tooling exposed the limitation" (Handle replacing RunAsyncTask). The old pattern worked correctly but was invisible to profiling. The new pattern was motivated by observability, not correctness.
-
"The removal cost exceeds the carrying cost" (stale 22.2 TODO, global vars). Technically debt, but the interest rate is zero. Nobody will fix what doesn't hurt.
-
"Different context needs different patterns" (functional options in chunk writer, fmt.Errorf in CLI tools). Not actually a break — it's the right pattern for that specific context.
For Code Review:
When you see a pattern break, ask: which category?
- Category 1 → worth filing a TODO, not blocking the PR
- Category 2 → this is how progress happens; track it
- Category 3 → leave it alone
- Category 4 → approve it; context matters