From ea9fa61d1e56bb6c74cc388eb759d695614a95d7 Mon Sep 17 00:00:00 2001 From: Rodin Date: Thu, 30 Apr 2026 11:07:12 -0700 Subject: [PATCH] =?UTF-8?q?docs:=20pattern=20break=20archaeology=20?= =?UTF-8?q?=E2=80=94=20WHY=20(not=20just=20WHAT)=20is=20impure?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Git blame + commit messages reveal four categories: 1. Ship behavior, fix plumbing later (time pressure) 2. Better tooling exposed limitations (observability) 3. Removal cost > carrying cost (zero-interest debt) 4. Context needs different patterns (not actually a break) Specific findings: - CockroachDB Handle: introduced for profiling, not correctness - Stale 22.2 TODO: struct field costs 0, removal touches many files - Prometheus globals: startup constants read in hot path = correct call - AppenderV2: unifies 4 type-asserted interfaces into 1 - Ecto zero TODOs: version-gated, cleaned on version bumps by José --- analysis/purity-addendum-archaeology.md | 234 ++++++++++++++++++++++++ 1 file changed, 234 insertions(+) create mode 100644 analysis/purity-addendum-archaeology.md diff --git a/analysis/purity-addendum-archaeology.md b/analysis/purity-addendum-archaeology.md new file mode 100644 index 0000000..9320baa --- /dev/null +++ b/analysis/purity-addendum-archaeology.md @@ -0,0 +1,234 @@ +# Addendum: The Archaeology — WHY Patterns Break + +The previous doc showed WHAT is impure. This digs into +WHY — based on git blame, commit messages, and PR +discussions. + +--- + +## CockroachDB: The Bare Goroutine in kvadmission + +**The code:** +```go +// TODO(irfansharif): Use a stopper here instead. +go func() { + ticker := time.NewTicker(10 * time.Minute) + for { select { case <-ticker.C: ... } } +}() +``` + +**The history:** +- **2022-08-22:** Irfan Sharif introduces the elastic CPU + limiter (`admission,kvserver: introduce an elastic CPU + limiter`). This was a complex new system for preventing + long-running elastic work (backups) from starving + foreground traffic. +- The goroutine was part of a weight calculation loop. + Irfan marked it with a TODO *in the same commit*. + +**Why the shortcut:** The commit was already touching +20+ files introducing a novel admission control system. +Getting the Stopper integration right would have been +one more thing to debug in an already complex change. +Ship the behavior first, fix the plumbing later. + +**Why it survived 3+ years:** The function *receives* a +stopper as a parameter. It could be fixed in 5 lines. +But nobody touched this function for any other reason, +and the goroutine leaks nowhere (it lives for the +process lifetime). The TODO is correct but not urgent. + +--- + +## CockroachDB: The Handle Pattern's Origin + +**The code (new way):** +```go +ctx, hdl, err := s.GetHandle(ctx, opts) +go func(ctx context.Context) { + defer hdl.Activate(ctx).Release(ctx) + taskFn(ctx) +}(ctx) +``` + +**Why it replaced RunAsyncTask (Feb 2025):** + +Tobias Grieger's commit message explains it perfectly: + +> Prior to this change, the only option to launch async +> tasks was the stopper's `RunAsyncTask{,Ex}` method. +> This method would accept a closure and invoke it in a +> goroutine. +> +> This has the undesirable property of recording a +> location in `Stopper.RunAsyncTaskEx` as the async +> task's creating goroutine plus its lowermost call +> frame. Anyone who's ever looked at any profile will be +> familiar with this fact. +> +> In other words, it becomes an ordeal to search for a +> specific goroutine, since all goroutines started by the +> Stopper get assigned the same searchable identity. + +**The force:** Profiling. When every goroutine has the +same root frame (`RunAsyncTaskEx`), CPU profiles and +execution traces become unreadable. The new API makes +callers launch goroutines directly so each has a unique +stack identity in profiling tools. + +**The migration strategy:** Old API stays. New code uses +Handle. Conversion happens "as follow-up work" (the +commit's words). Today: 295 Handle vs 63 RunAsyncTask. +The ratio will shift naturally as old code is touched. + +--- + +## CockroachDB: The Stale "Remove in 22.2" TODO + +**The code:** +```go +// TODO(ajwerner): Remove in 22.2. +SystemConfigProvider config.SystemConfigProvider +``` + +**The history:** +- **2022-02-09:** Andrew Werner lands "stop gossiping the + system config." The span configuration infrastructure + replaces the old gossip-based system config. +- `SystemConfigProvider` was the bridge for mixed-version + clusters (some nodes on old gossip, some on new spans). +- The TODO expected this to be removable in 22.2 (the + next major release), when all clusters would be past + the migration. + +**Why it survived:** The field is in a config struct. +It's set but only used in the "mixed-version state." +Once all clusters upgraded, it became dead weight — but +it doesn't *break* anything. It's a struct field that +consumes one pointer of memory. The cost of carrying it +is near zero. The cost of removing it is touching +`StoreConfig` initialization across many call sites. + +**Lesson:** TODOs survive when removal cost > carrying +cost. The more a struct is embedded in initialization +paths, the harder it is to remove fields. + +--- + +## Prometheus: Why Global Vars Instead of Config + +**The code:** +```go +var ScrapeTimestampTolerance = 2 * time.Millisecond +var AlignScrapeTimestamps = true +``` + +**The history:** +- **2020-10-07:** Julien Pivotto adds "experimental, + hidden flag" for scrape timestamp tolerance. +- **2021-09-08:** Made public after production data + showed 10ms tolerance saves significant disk space. + "For users who can afford a 10ms difference, there is + a lot of resources and disk space to win." + +**Why a global var:** +1. Set once at startup from a command-line flag +2. Read thousands of times per second (every scrape) +3. Never changes after startup +4. Threading a config struct would require adding a + parameter to `scrapeLoop`, `scrapeLoopAppender`, and + every function in the hot path — for a value that's + effectively a constant. + +**When this IS the right call:** A startup-configured +value that never changes and is read in a hot path is +semantically a constant. Making it a config struct field +adds indirection for no benefit. The global is honest +about what this value IS: a process-lifetime constant +that happens to be configurable at init. + +--- + +## Prometheus: WHY AppenderV2 + +**The problem:** V1 Appender has separate interfaces for +different data types: +- `Append()` — float64 samples +- `AppendExemplar()` — exemplar data +- `AppendHistogram()` — native histograms + +Each requires type-assertion to check if the appender +supports it. The scrape loop has 4 different code paths +depending on which interfaces the underlying storage +implements. + +**V2 unifies them:** Single `Append()` method that +accepts all types. No type assertions needed. The +scrape loop becomes one code path. + +**The migration pattern:** An adapter interface +(`scrapeLoopAppendAdapter`) makes both V1 and V2 +callable through the same code. The selection happens +once at construction time. ETA for V1 removal: Q2 2026 +(it's in the source comments). + +--- + +## Ecto: Zero TODOs Is Discipline, Not Accident + +**Evidence from git history:** +- TODOs ARE added. Example: "TODO: Remove + changeset.empty_values field" (added 2023-10-01) +- They include *deadlines*: "in Ecto v3.14" +- They get cleaned in dedicated "Update TODOs" commits + by José Valim, triggered by version bumps + +**The pattern:** TODOs are conditional on Elixir/Erlang +version requirements. When Ecto bumps its minimum +supported Elixir version, all TODOs gated on that +version become actionable. José runs through them in a +single commit. + +**Oban (same pattern):** Parker Selbert removes TODOs +within ~2 weeks of adding them. The solo-maintainer +model means one person's standards apply uniformly. + +--- + +## The Meta-Lesson: WHY Patterns Break + +After archaeology, pattern breaks fall into exactly +four categories: + +1. **"Ship the behavior, fix the plumbing later"** + (kvadmission goroutine). Time pressure on complex + features. The author KNOWS it's wrong. Tagged with + TODO immediately. Survives because removal isn't + urgent. + +2. **"Better tooling exposed the limitation"** + (Handle replacing RunAsyncTask). The old pattern + worked correctly but was invisible to profiling. The + new pattern was motivated by observability, not + correctness. + +3. **"The removal cost exceeds the carrying cost"** + (stale 22.2 TODO, global vars). Technically debt, but + the interest rate is zero. Nobody will fix what + doesn't hurt. + +4. **"Different context needs different patterns"** + (functional options in chunk writer, fmt.Errorf in CLI + tools). Not actually a break — it's the right pattern + for that specific context. + +### For Code Review: + +When you see a pattern break, ask: *which category?* + +- Category 1 → worth filing a TODO, not blocking the PR +- Category 2 → this is how progress happens; track it +- Category 3 → leave it alone +- Category 4 → approve it; context matters + +