# Addendum: The Archaeology — WHY Patterns Break The previous doc showed WHAT is impure. This digs into WHY — based on git blame, commit messages, and PR discussions. --- ## CockroachDB: The Bare Goroutine in kvadmission **The code:** ```go // TODO(irfansharif): Use a stopper here instead. go func() { ticker := time.NewTicker(10 * time.Minute) for { select { case <-ticker.C: ... } } }() ``` **The history:** - **2022-08-22:** Irfan Sharif introduces the elastic CPU limiter (`admission,kvserver: introduce an elastic CPU limiter`). This was a complex new system for preventing long-running elastic work (backups) from starving foreground traffic. - The goroutine was part of a weight calculation loop. Irfan marked it with a TODO *in the same commit*. **Why the shortcut:** The commit was already touching 20+ files introducing a novel admission control system. Getting the Stopper integration right would have been one more thing to debug in an already complex change. Ship the behavior first, fix the plumbing later. **Why it survived 3+ years:** The function *receives* a stopper as a parameter. It could be fixed in 5 lines. But nobody touched this function for any other reason, and the goroutine leaks nowhere (it lives for the process lifetime). The TODO is correct but not urgent. --- ## CockroachDB: The Handle Pattern's Origin **The code (new way):** ```go ctx, hdl, err := s.GetHandle(ctx, opts) go func(ctx context.Context) { defer hdl.Activate(ctx).Release(ctx) taskFn(ctx) }(ctx) ``` **Why it replaced RunAsyncTask (Feb 2025):** Tobias Grieger's commit message explains it perfectly: > Prior to this change, the only option to launch async > tasks was the stopper's `RunAsyncTask{,Ex}` method. > This method would accept a closure and invoke it in a > goroutine. > > This has the undesirable property of recording a > location in `Stopper.RunAsyncTaskEx` as the async > task's creating goroutine plus its lowermost call > frame. Anyone who's ever looked at any profile will be > familiar with this fact. > > In other words, it becomes an ordeal to search for a > specific goroutine, since all goroutines started by the > Stopper get assigned the same searchable identity. **The force:** Profiling. When every goroutine has the same root frame (`RunAsyncTaskEx`), CPU profiles and execution traces become unreadable. The new API makes callers launch goroutines directly so each has a unique stack identity in profiling tools. **The migration strategy:** Old API stays. New code uses Handle. Conversion happens "as follow-up work" (the commit's words). Today: 295 Handle vs 63 RunAsyncTask. The ratio will shift naturally as old code is touched. --- ## CockroachDB: The Stale "Remove in 22.2" TODO **The code:** ```go // TODO(ajwerner): Remove in 22.2. SystemConfigProvider config.SystemConfigProvider ``` **The history:** - **2022-02-09:** Andrew Werner lands "stop gossiping the system config." The span configuration infrastructure replaces the old gossip-based system config. - `SystemConfigProvider` was the bridge for mixed-version clusters (some nodes on old gossip, some on new spans). - The TODO expected this to be removable in 22.2 (the next major release), when all clusters would be past the migration. **Why it survived:** The field is in a config struct. It's set but only used in the "mixed-version state." Once all clusters upgraded, it became dead weight — but it doesn't *break* anything. It's a struct field that consumes one pointer of memory. The cost of carrying it is near zero. The cost of removing it is touching `StoreConfig` initialization across many call sites. **Lesson:** TODOs survive when removal cost > carrying cost. The more a struct is embedded in initialization paths, the harder it is to remove fields. --- ## Prometheus: Why Global Vars Instead of Config **The code:** ```go var ScrapeTimestampTolerance = 2 * time.Millisecond var AlignScrapeTimestamps = true ``` **The history:** - **2020-10-07:** Julien Pivotto adds "experimental, hidden flag" for scrape timestamp tolerance. - **2021-09-08:** Made public after production data showed 10ms tolerance saves significant disk space. "For users who can afford a 10ms difference, there is a lot of resources and disk space to win." **Why a global var:** 1. Set once at startup from a command-line flag 2. Read thousands of times per second (every scrape) 3. Never changes after startup 4. Threading a config struct would require adding a parameter to `scrapeLoop`, `scrapeLoopAppender`, and every function in the hot path — for a value that's effectively a constant. **When this IS the right call:** A startup-configured value that never changes and is read in a hot path is semantically a constant. Making it a config struct field adds indirection for no benefit. The global is honest about what this value IS: a process-lifetime constant that happens to be configurable at init. --- ## Prometheus: WHY AppenderV2 **The problem:** V1 Appender has separate interfaces for different data types: - `Append()` — float64 samples - `AppendExemplar()` — exemplar data - `AppendHistogram()` — native histograms Each requires type-assertion to check if the appender supports it. The scrape loop has 4 different code paths depending on which interfaces the underlying storage implements. **V2 unifies them:** Single `Append()` method that accepts all types. No type assertions needed. The scrape loop becomes one code path. **The migration pattern:** An adapter interface (`scrapeLoopAppendAdapter`) makes both V1 and V2 callable through the same code. The selection happens once at construction time. ETA for V1 removal: Q2 2026 (it's in the source comments). --- ## Ecto: Zero TODOs Is Discipline, Not Accident **Evidence from git history:** - TODOs ARE added. Example: "TODO: Remove changeset.empty_values field" (added 2023-10-01) - They include *deadlines*: "in Ecto v3.14" - They get cleaned in dedicated "Update TODOs" commits by José Valim, triggered by version bumps **The pattern:** TODOs are conditional on Elixir/Erlang version requirements. When Ecto bumps its minimum supported Elixir version, all TODOs gated on that version become actionable. José runs through them in a single commit. **Oban (same pattern):** Parker Selbert removes TODOs within ~2 weeks of adding them. The solo-maintainer model means one person's standards apply uniformly. --- ## The Meta-Lesson: WHY Patterns Break After archaeology, pattern breaks fall into exactly four categories: 1. **"Ship the behavior, fix the plumbing later"** (kvadmission goroutine). Time pressure on complex features. The author KNOWS it's wrong. Tagged with TODO immediately. Survives because removal isn't urgent. 2. **"Better tooling exposed the limitation"** (Handle replacing RunAsyncTask). The old pattern worked correctly but was invisible to profiling. The new pattern was motivated by observability, not correctness. 3. **"The removal cost exceeds the carrying cost"** (stale 22.2 TODO, global vars). Technically debt, but the interest rate is zero. Nobody will fix what doesn't hurt. 4. **"Different context needs different patterns"** (functional options in chunk writer, fmt.Errorf in CLI tools). Not actually a break — it's the right pattern for that specific context. ### For Code Review: When you see a pattern break, ask: *which category?* - Category 1 → worth filing a TODO, not blocking the PR - Category 2 → this is how progress happens; track it - Category 3 → leave it alone - Category 4 → approve it; context matters