docs: PR discussion analysis — how new patterns are debated and landed

Real PR threads from CockroachDB, Prometheus, and Oban showing:
- Handle pattern (PR #142059): 2.5 months, trace span issues discovered in review
- slog migration (PR #14906): 'merge and iterate', 162 files, one commit for 3.0
- AppenderV2 (issue #17632): maintainer pushback defeated by 'it's not user-facing'
- Oban inline testing (PR #684): <24h, zero discussion, solo maintainer

Key insight: 'merge and iterate' beats 'perfect before merge' for
pattern introductions.
This commit is contained in:
Rodin
2026-04-30 11:12:07 -07:00
parent ea9fa61d1e
commit 7e99faff59
+284
View File
@@ -0,0 +1,284 @@
# PR Discussions: How New Patterns Are Debated and Landed
The most revealing artifact of codebase evolution is not
the code — it's the conversation. Here's what the PR
discussions tell us about how patterns actually get
introduced.
---
## CockroachDB PR #142059: The Handle Pattern
**Title:** "stop: prefer launching goroutines at caller"
**Author:** Tobias Grieger (tbg)
**Created:** 2025-02-26, **Merged:** 2025-05-14 (~2.5 months)
**Size:** +314/-84 lines
### The Problem Statement (from PR body)
> This has the undesirable property of recording a
> location in `Stopper.RunAsyncTaskEx` as the async
> task's lowermost call frame. Anyone who's ever looked
> at any profile will be familiar with this fact.
The motivation wasn't correctness — RunAsyncTask was
correct. It was *observability*. Every goroutine looked
the same in profiling tools.
### The Discussion
**Reviewer (kvoli) asked:** "Noticed there were some
span after use related test failures, do you know
what's going on there?"
**Author (tbg) explained:** "Previously, the stopper
created the trace span on the caller's goroutine, and
would then hand it over to the goroutine it spawned.
There was an ominous comment about 'we can't create the
span on the goroutine itself because the parent may at
that time have finished.' I didn't think our code would
so aggressively enforce that ordering."
**What this reveals:** The new pattern exposed a hidden
assumption in the tracing infrastructure. The old API
hid the span lifecycle problem by accident. The new API
made it visible, requiring explicit handling.
### Migration Strategy (explicit in PR body)
> The "old way" to start a task via `StartAsyncTaskEx`
> remains in place. The hope is that we'll convert all
> important tasks to `GetHandle`. This can be done as
> follow-up work.
**Decision:** Coexistence, not replacement. Zero
migration commitment in the PR. Conversion is "hope"
and "follow-up work." No timeline.
### Benchmark Evidence (in PR body)
```
AsyncTaskEx-10 3096800 1169 ns/op 72 B/op 3 allocs/op
Handle-10 3165979 1173 ns/op 48 B/op 1 allocs/op
```
Same performance (1173 vs 1169 ns/op) but fewer
allocations (1 vs 3). The PR isn't sold on performance
— it's sold on profiling clarity. Benchmarks are proof
of "at least not worse."
---
## Prometheus PR #14906: slog Migration
**Title:** "chore!: adopt log/slog, remove go-kit/log"
**Author:** TJ Hoplock (tjhop)
**Created:** 2024-09-15, **Merged:** 2024-10-07 (~3 weeks)
**Size:** 162 files, +1534/-1691 lines
### Why It Happened (from linked issue #14355)
Go stdlib added `log/slog` in Go 1.21 (Aug 2023).
Prometheus already depended on go-kit/log, a third-party
structured logging library. The stdlib version offers:
- No external dependency
- Standard interface other libraries adopt
- Better performance
### The Discussion (26 comments)
**Key themes from the thread:**
1. **"Merge and iterate"** — SuperQ (reviewer): "Awesome
work. I think we should merge this asap and iterate
on improvements." Not: "make it perfect first."
2. **Formatting debate** — roidelapluie: "didn't we go
for `ts`?" (about timestamp format). Decision: since
this is a 3.0 breaking change anyway, adopt slog's
`time` key as-is.
3. **Source file paths** — roidelapluie: "I approve this
but I would appreciate if we do not have the full
path to source file." SuperQ: "We haven't been able
to figure out a way to get the relative path."
Unresolved — merged anyway.
4. **Upstream dependencies** — Required changes in
prometheus/common that weren't released yet. Solution:
pin to commit in go.mod, replace after release.
5. **Self-review** — tjhop: "Did a thorough self-review
and see a few other spots to tidy up." The author
caught issues after posting.
6. **Merge conflicts** — tjhop: "Please excuse the
frequent force pushes -- this PR touches a ton of
stuff and I'm trying to stay on top of all the merge
conflicts." Large migrations have coordination costs.
### What This Reveals About Pattern Introduction
- The breaking change was held for a **major version**
(3.0). They didn't try to do it incrementally.
- **162 files in one commit** — total replacement, not
gradual migration. This is the opposite of
CockroachDB's approach.
- Multiple reviewers said **"let's merge and iterate"**
rather than blocking on polish. Ship the migration,
fix details in follow-ups.
- **Upstream coordination** required (prometheus/common
had to release first).
---
## Prometheus Issue #17632: AppenderV2 Design
**Title:** "storage: Switch to unified, transactional
AppenderV2 API; remove Appender."
**Author:** bwplotka
**Status:** In progress (M1 complete, iterating)
### Why V2?
The original Appender interface grew organically:
- `Appender.Append()` — float samples
- `ExemplarAppender.AppendExemplar()` — per-series
- `HistogramAppender.AppendHistogram()` — native histograms
- `StartTimestampAppender` — for OTel start timestamps
Each requires a type assertion. The scrape loop has
multiple code paths depending on which interfaces the
storage supports.
### The Debate
**roidelapluie (maintainer) pushed back:**
> "I would like to see scrape and prometheus continuing
> to support v1. The changes in some areas like scrape
> seem really high. We could move everything to v2 in
> Prometheus v4."
**bwplotka (author) challenged:**
> "I'd like to challenge this. It's not a breaking
> change. It's not even a user facing change. So why
> waiting for v4?"
**Resolution:** Author won. V2 is being adopted now
(target: remove V1 by Q2 2026) rather than waiting for
a hypothetical v4.
**External validation:**
> "We got amazing feedback from OTel collector side —
> AppenderV2 is helpful."
The OTel collector team confirmed the unified interface
solves real problems for consumers.
### Implementation Strategy
The issue tracks a detailed milestone plan:
- M1: Add interface + TSDB impl (done in parts 1-5)
- Adapter layer for coexistence
- Switch each consumer one at a time
- Understand impact on downstream (Thanos, Mimir, Cortex)
**This is a 12-PR migration** — each PR switches one
consumer. The interface was introduced first, then
consumers migrated one by one.
---
## Oban PR #684: Inline Testing Mode
**Title:** "Inline testing mode"
**Author:** Parker Selbert (sorentwo)
**Created:** 2022-04-11, **Merged:** 2022-04-12 (< 24 hours)
**Size:** (not available — private discussion)
**Comments/Reviews:** 0
### What This Reveals
Parker Selbert is the solo maintainer. The PR was
merged in less than 24 hours with zero discussion. This
isn't poor process — it's the reality of
solo-maintained open source:
- The design decision happened in his head (or in the
linked issue discussion)
- No external review needed because he owns the whole
system
- The commit message is the entire design document
### The Feature
> Inline testing immediately executes jobs as they are
> inserted within the current process and without ever
> touching the database. It's an alternative to the
> standard insert/verify/execute flow for testing.
This single feature eliminated the #1 source of test
flakiness for all Oban consumers. No polling, no sleep,
no async assertions.
---
## Cross-Cutting Lessons from PR Discussions
### 1. "Merge and iterate" > "Perfect before merge"
Both the slog migration and the Handle pattern were
merged with known imperfections. The slog PR merged
with full file paths in logs (unresolved). The Handle
PR merged with trace span issues discovered during
review.
**Lesson for review:** Don't block a pattern introduction
on polish. Block on correctness. Accept imperfect-but-
correct as mergeable.
### 2. Breaking changes need a major version OR gradual migration
- Prometheus: slog migration in one commit, held for 3.0
- CockroachDB: Handle API coexists with RunAsyncTask,
no major version needed
- Prometheus AppenderV2: NOT a user-facing change, so no
need to wait for v4
**Lesson:** The decision to batch vs. gradual depends on
whether it's user-facing. Internal refactors can ship
incrementally. User-facing API changes need version
boundaries.
### 3. External consumers validate pattern decisions
AppenderV2 was validated by the OTel collector team.
The Handle pattern was validated by profiling data.
The slog migration was validated by "the stdlib does it."
**Lesson:** Pattern introductions that cite external
evidence (benchmarks, user feedback, ecosystem
alignment) get approved faster.
### 4. Large migrations require coordination
The slog PR needed prometheus/common changes first.
AppenderV2 is a 12-PR sequence. Large changes need
dependency ordering.
**Lesson for review:** When reviewing the first PR in a
migration, ask about the full plan. Is this standalone?
What comes next? What depends on this?
### 5. Solo maintainers decide faster
Oban's inline testing: conceived, implemented, merged
in <24 hours. CockroachDB's Handle pattern: 2.5 months.
Prometheus slog: 3 weeks. Prometheus AppenderV2: multi-
month ongoing.
**Lesson:** Team size correlates with decision latency,
not decision quality. Parker's inline testing mode was
the right call, arrived fast, and has been stable for
3+ years.
<!-- PATTERN_COMPLETE -->