From 7e99faff599d6d3f6a859f725f66c3cffc5878f8 Mon Sep 17 00:00:00 2001 From: Rodin Date: Thu, 30 Apr 2026 11:12:07 -0700 Subject: [PATCH] =?UTF-8?q?docs:=20PR=20discussion=20analysis=20=E2=80=94?= =?UTF-8?q?=20how=20new=20patterns=20are=20debated=20and=20landed?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Real PR threads from CockroachDB, Prometheus, and Oban showing: - Handle pattern (PR #142059): 2.5 months, trace span issues discovered in review - slog migration (PR #14906): 'merge and iterate', 162 files, one commit for 3.0 - AppenderV2 (issue #17632): maintainer pushback defeated by 'it's not user-facing' - Oban inline testing (PR #684): <24h, zero discussion, solo maintainer Key insight: 'merge and iterate' beats 'perfect before merge' for pattern introductions. --- analysis/pr-discussions-analysis.md | 284 ++++++++++++++++++++++++++++ 1 file changed, 284 insertions(+) create mode 100644 analysis/pr-discussions-analysis.md diff --git a/analysis/pr-discussions-analysis.md b/analysis/pr-discussions-analysis.md new file mode 100644 index 0000000..f5f294c --- /dev/null +++ b/analysis/pr-discussions-analysis.md @@ -0,0 +1,284 @@ +# PR Discussions: How New Patterns Are Debated and Landed + +The most revealing artifact of codebase evolution is not +the code — it's the conversation. Here's what the PR +discussions tell us about how patterns actually get +introduced. + +--- + +## CockroachDB PR #142059: The Handle Pattern + +**Title:** "stop: prefer launching goroutines at caller" +**Author:** Tobias Grieger (tbg) +**Created:** 2025-02-26, **Merged:** 2025-05-14 (~2.5 months) +**Size:** +314/-84 lines + +### The Problem Statement (from PR body) + +> This has the undesirable property of recording a +> location in `Stopper.RunAsyncTaskEx` as the async +> task's lowermost call frame. Anyone who's ever looked +> at any profile will be familiar with this fact. + +The motivation wasn't correctness — RunAsyncTask was +correct. It was *observability*. Every goroutine looked +the same in profiling tools. + +### The Discussion + +**Reviewer (kvoli) asked:** "Noticed there were some +span after use related test failures, do you know +what's going on there?" + +**Author (tbg) explained:** "Previously, the stopper +created the trace span on the caller's goroutine, and +would then hand it over to the goroutine it spawned. +There was an ominous comment about 'we can't create the +span on the goroutine itself because the parent may at +that time have finished.' I didn't think our code would +so aggressively enforce that ordering." + +**What this reveals:** The new pattern exposed a hidden +assumption in the tracing infrastructure. The old API +hid the span lifecycle problem by accident. The new API +made it visible, requiring explicit handling. + +### Migration Strategy (explicit in PR body) + +> The "old way" to start a task via `StartAsyncTaskEx` +> remains in place. The hope is that we'll convert all +> important tasks to `GetHandle`. This can be done as +> follow-up work. + +**Decision:** Coexistence, not replacement. Zero +migration commitment in the PR. Conversion is "hope" +and "follow-up work." No timeline. + +### Benchmark Evidence (in PR body) + +``` +AsyncTaskEx-10 3096800 1169 ns/op 72 B/op 3 allocs/op +Handle-10 3165979 1173 ns/op 48 B/op 1 allocs/op +``` + +Same performance (1173 vs 1169 ns/op) but fewer +allocations (1 vs 3). The PR isn't sold on performance +— it's sold on profiling clarity. Benchmarks are proof +of "at least not worse." + +--- + +## Prometheus PR #14906: slog Migration + +**Title:** "chore!: adopt log/slog, remove go-kit/log" +**Author:** TJ Hoplock (tjhop) +**Created:** 2024-09-15, **Merged:** 2024-10-07 (~3 weeks) +**Size:** 162 files, +1534/-1691 lines + +### Why It Happened (from linked issue #14355) + +Go stdlib added `log/slog` in Go 1.21 (Aug 2023). +Prometheus already depended on go-kit/log, a third-party +structured logging library. The stdlib version offers: +- No external dependency +- Standard interface other libraries adopt +- Better performance + +### The Discussion (26 comments) + +**Key themes from the thread:** + +1. **"Merge and iterate"** — SuperQ (reviewer): "Awesome + work. I think we should merge this asap and iterate + on improvements." Not: "make it perfect first." + +2. **Formatting debate** — roidelapluie: "didn't we go + for `ts`?" (about timestamp format). Decision: since + this is a 3.0 breaking change anyway, adopt slog's + `time` key as-is. + +3. **Source file paths** — roidelapluie: "I approve this + but I would appreciate if we do not have the full + path to source file." SuperQ: "We haven't been able + to figure out a way to get the relative path." + Unresolved — merged anyway. + +4. **Upstream dependencies** — Required changes in + prometheus/common that weren't released yet. Solution: + pin to commit in go.mod, replace after release. + +5. **Self-review** — tjhop: "Did a thorough self-review + and see a few other spots to tidy up." The author + caught issues after posting. + +6. **Merge conflicts** — tjhop: "Please excuse the + frequent force pushes -- this PR touches a ton of + stuff and I'm trying to stay on top of all the merge + conflicts." Large migrations have coordination costs. + +### What This Reveals About Pattern Introduction + +- The breaking change was held for a **major version** + (3.0). They didn't try to do it incrementally. +- **162 files in one commit** — total replacement, not + gradual migration. This is the opposite of + CockroachDB's approach. +- Multiple reviewers said **"let's merge and iterate"** + rather than blocking on polish. Ship the migration, + fix details in follow-ups. +- **Upstream coordination** required (prometheus/common + had to release first). + +--- + +## Prometheus Issue #17632: AppenderV2 Design + +**Title:** "storage: Switch to unified, transactional +AppenderV2 API; remove Appender." +**Author:** bwplotka +**Status:** In progress (M1 complete, iterating) + +### Why V2? + +The original Appender interface grew organically: +- `Appender.Append()` — float samples +- `ExemplarAppender.AppendExemplar()` — per-series +- `HistogramAppender.AppendHistogram()` — native histograms +- `StartTimestampAppender` — for OTel start timestamps + +Each requires a type assertion. The scrape loop has +multiple code paths depending on which interfaces the +storage supports. + +### The Debate + +**roidelapluie (maintainer) pushed back:** +> "I would like to see scrape and prometheus continuing +> to support v1. The changes in some areas like scrape +> seem really high. We could move everything to v2 in +> Prometheus v4." + +**bwplotka (author) challenged:** +> "I'd like to challenge this. It's not a breaking +> change. It's not even a user facing change. So why +> waiting for v4?" + +**Resolution:** Author won. V2 is being adopted now +(target: remove V1 by Q2 2026) rather than waiting for +a hypothetical v4. + +**External validation:** +> "We got amazing feedback from OTel collector side — +> AppenderV2 is helpful." + +The OTel collector team confirmed the unified interface +solves real problems for consumers. + +### Implementation Strategy + +The issue tracks a detailed milestone plan: +- M1: Add interface + TSDB impl (done in parts 1-5) +- Adapter layer for coexistence +- Switch each consumer one at a time +- Understand impact on downstream (Thanos, Mimir, Cortex) + +**This is a 12-PR migration** — each PR switches one +consumer. The interface was introduced first, then +consumers migrated one by one. + +--- + +## Oban PR #684: Inline Testing Mode + +**Title:** "Inline testing mode" +**Author:** Parker Selbert (sorentwo) +**Created:** 2022-04-11, **Merged:** 2022-04-12 (< 24 hours) +**Size:** (not available — private discussion) +**Comments/Reviews:** 0 + +### What This Reveals + +Parker Selbert is the solo maintainer. The PR was +merged in less than 24 hours with zero discussion. This +isn't poor process — it's the reality of +solo-maintained open source: +- The design decision happened in his head (or in the + linked issue discussion) +- No external review needed because he owns the whole + system +- The commit message is the entire design document + +### The Feature + +> Inline testing immediately executes jobs as they are +> inserted within the current process and without ever +> touching the database. It's an alternative to the +> standard insert/verify/execute flow for testing. + +This single feature eliminated the #1 source of test +flakiness for all Oban consumers. No polling, no sleep, +no async assertions. + +--- + +## Cross-Cutting Lessons from PR Discussions + +### 1. "Merge and iterate" > "Perfect before merge" + +Both the slog migration and the Handle pattern were +merged with known imperfections. The slog PR merged +with full file paths in logs (unresolved). The Handle +PR merged with trace span issues discovered during +review. + +**Lesson for review:** Don't block a pattern introduction +on polish. Block on correctness. Accept imperfect-but- +correct as mergeable. + +### 2. Breaking changes need a major version OR gradual migration + +- Prometheus: slog migration in one commit, held for 3.0 +- CockroachDB: Handle API coexists with RunAsyncTask, + no major version needed +- Prometheus AppenderV2: NOT a user-facing change, so no + need to wait for v4 + +**Lesson:** The decision to batch vs. gradual depends on +whether it's user-facing. Internal refactors can ship +incrementally. User-facing API changes need version +boundaries. + +### 3. External consumers validate pattern decisions + +AppenderV2 was validated by the OTel collector team. +The Handle pattern was validated by profiling data. +The slog migration was validated by "the stdlib does it." + +**Lesson:** Pattern introductions that cite external +evidence (benchmarks, user feedback, ecosystem +alignment) get approved faster. + +### 4. Large migrations require coordination + +The slog PR needed prometheus/common changes first. +AppenderV2 is a 12-PR sequence. Large changes need +dependency ordering. + +**Lesson for review:** When reviewing the first PR in a +migration, ask about the full plan. Is this standalone? +What comes next? What depends on this? + +### 5. Solo maintainers decide faster + +Oban's inline testing: conceived, implemented, merged +in <24 hours. CockroachDB's Handle pattern: 2.5 months. +Prometheus slog: 3 weeks. Prometheus AppenderV2: multi- +month ongoing. + +**Lesson:** Team size correlates with decision latency, +not decision quality. Parker's inline testing mode was +the right call, arrived fast, and has been stable for +3+ years. + +