docs: PR discussion analysis — how new patterns are debated and landed
Real PR threads from CockroachDB, Prometheus, and Oban showing: - Handle pattern (PR #142059): 2.5 months, trace span issues discovered in review - slog migration (PR #14906): 'merge and iterate', 162 files, one commit for 3.0 - AppenderV2 (issue #17632): maintainer pushback defeated by 'it's not user-facing' - Oban inline testing (PR #684): <24h, zero discussion, solo maintainer Key insight: 'merge and iterate' beats 'perfect before merge' for pattern introductions.
This commit is contained in:
@@ -0,0 +1,284 @@
|
||||
# PR Discussions: How New Patterns Are Debated and Landed
|
||||
|
||||
The most revealing artifact of codebase evolution is not
|
||||
the code — it's the conversation. Here's what the PR
|
||||
discussions tell us about how patterns actually get
|
||||
introduced.
|
||||
|
||||
---
|
||||
|
||||
## CockroachDB PR #142059: The Handle Pattern
|
||||
|
||||
**Title:** "stop: prefer launching goroutines at caller"
|
||||
**Author:** Tobias Grieger (tbg)
|
||||
**Created:** 2025-02-26, **Merged:** 2025-05-14 (~2.5 months)
|
||||
**Size:** +314/-84 lines
|
||||
|
||||
### The Problem Statement (from PR body)
|
||||
|
||||
> This has the undesirable property of recording a
|
||||
> location in `Stopper.RunAsyncTaskEx` as the async
|
||||
> task's lowermost call frame. Anyone who's ever looked
|
||||
> at any profile will be familiar with this fact.
|
||||
|
||||
The motivation wasn't correctness — RunAsyncTask was
|
||||
correct. It was *observability*. Every goroutine looked
|
||||
the same in profiling tools.
|
||||
|
||||
### The Discussion
|
||||
|
||||
**Reviewer (kvoli) asked:** "Noticed there were some
|
||||
span after use related test failures, do you know
|
||||
what's going on there?"
|
||||
|
||||
**Author (tbg) explained:** "Previously, the stopper
|
||||
created the trace span on the caller's goroutine, and
|
||||
would then hand it over to the goroutine it spawned.
|
||||
There was an ominous comment about 'we can't create the
|
||||
span on the goroutine itself because the parent may at
|
||||
that time have finished.' I didn't think our code would
|
||||
so aggressively enforce that ordering."
|
||||
|
||||
**What this reveals:** The new pattern exposed a hidden
|
||||
assumption in the tracing infrastructure. The old API
|
||||
hid the span lifecycle problem by accident. The new API
|
||||
made it visible, requiring explicit handling.
|
||||
|
||||
### Migration Strategy (explicit in PR body)
|
||||
|
||||
> The "old way" to start a task via `StartAsyncTaskEx`
|
||||
> remains in place. The hope is that we'll convert all
|
||||
> important tasks to `GetHandle`. This can be done as
|
||||
> follow-up work.
|
||||
|
||||
**Decision:** Coexistence, not replacement. Zero
|
||||
migration commitment in the PR. Conversion is "hope"
|
||||
and "follow-up work." No timeline.
|
||||
|
||||
### Benchmark Evidence (in PR body)
|
||||
|
||||
```
|
||||
AsyncTaskEx-10 3096800 1169 ns/op 72 B/op 3 allocs/op
|
||||
Handle-10 3165979 1173 ns/op 48 B/op 1 allocs/op
|
||||
```
|
||||
|
||||
Same performance (1173 vs 1169 ns/op) but fewer
|
||||
allocations (1 vs 3). The PR isn't sold on performance
|
||||
— it's sold on profiling clarity. Benchmarks are proof
|
||||
of "at least not worse."
|
||||
|
||||
---
|
||||
|
||||
## Prometheus PR #14906: slog Migration
|
||||
|
||||
**Title:** "chore!: adopt log/slog, remove go-kit/log"
|
||||
**Author:** TJ Hoplock (tjhop)
|
||||
**Created:** 2024-09-15, **Merged:** 2024-10-07 (~3 weeks)
|
||||
**Size:** 162 files, +1534/-1691 lines
|
||||
|
||||
### Why It Happened (from linked issue #14355)
|
||||
|
||||
Go stdlib added `log/slog` in Go 1.21 (Aug 2023).
|
||||
Prometheus already depended on go-kit/log, a third-party
|
||||
structured logging library. The stdlib version offers:
|
||||
- No external dependency
|
||||
- Standard interface other libraries adopt
|
||||
- Better performance
|
||||
|
||||
### The Discussion (26 comments)
|
||||
|
||||
**Key themes from the thread:**
|
||||
|
||||
1. **"Merge and iterate"** — SuperQ (reviewer): "Awesome
|
||||
work. I think we should merge this asap and iterate
|
||||
on improvements." Not: "make it perfect first."
|
||||
|
||||
2. **Formatting debate** — roidelapluie: "didn't we go
|
||||
for `ts`?" (about timestamp format). Decision: since
|
||||
this is a 3.0 breaking change anyway, adopt slog's
|
||||
`time` key as-is.
|
||||
|
||||
3. **Source file paths** — roidelapluie: "I approve this
|
||||
but I would appreciate if we do not have the full
|
||||
path to source file." SuperQ: "We haven't been able
|
||||
to figure out a way to get the relative path."
|
||||
Unresolved — merged anyway.
|
||||
|
||||
4. **Upstream dependencies** — Required changes in
|
||||
prometheus/common that weren't released yet. Solution:
|
||||
pin to commit in go.mod, replace after release.
|
||||
|
||||
5. **Self-review** — tjhop: "Did a thorough self-review
|
||||
and see a few other spots to tidy up." The author
|
||||
caught issues after posting.
|
||||
|
||||
6. **Merge conflicts** — tjhop: "Please excuse the
|
||||
frequent force pushes -- this PR touches a ton of
|
||||
stuff and I'm trying to stay on top of all the merge
|
||||
conflicts." Large migrations have coordination costs.
|
||||
|
||||
### What This Reveals About Pattern Introduction
|
||||
|
||||
- The breaking change was held for a **major version**
|
||||
(3.0). They didn't try to do it incrementally.
|
||||
- **162 files in one commit** — total replacement, not
|
||||
gradual migration. This is the opposite of
|
||||
CockroachDB's approach.
|
||||
- Multiple reviewers said **"let's merge and iterate"**
|
||||
rather than blocking on polish. Ship the migration,
|
||||
fix details in follow-ups.
|
||||
- **Upstream coordination** required (prometheus/common
|
||||
had to release first).
|
||||
|
||||
---
|
||||
|
||||
## Prometheus Issue #17632: AppenderV2 Design
|
||||
|
||||
**Title:** "storage: Switch to unified, transactional
|
||||
AppenderV2 API; remove Appender."
|
||||
**Author:** bwplotka
|
||||
**Status:** In progress (M1 complete, iterating)
|
||||
|
||||
### Why V2?
|
||||
|
||||
The original Appender interface grew organically:
|
||||
- `Appender.Append()` — float samples
|
||||
- `ExemplarAppender.AppendExemplar()` — per-series
|
||||
- `HistogramAppender.AppendHistogram()` — native histograms
|
||||
- `StartTimestampAppender` — for OTel start timestamps
|
||||
|
||||
Each requires a type assertion. The scrape loop has
|
||||
multiple code paths depending on which interfaces the
|
||||
storage supports.
|
||||
|
||||
### The Debate
|
||||
|
||||
**roidelapluie (maintainer) pushed back:**
|
||||
> "I would like to see scrape and prometheus continuing
|
||||
> to support v1. The changes in some areas like scrape
|
||||
> seem really high. We could move everything to v2 in
|
||||
> Prometheus v4."
|
||||
|
||||
**bwplotka (author) challenged:**
|
||||
> "I'd like to challenge this. It's not a breaking
|
||||
> change. It's not even a user facing change. So why
|
||||
> waiting for v4?"
|
||||
|
||||
**Resolution:** Author won. V2 is being adopted now
|
||||
(target: remove V1 by Q2 2026) rather than waiting for
|
||||
a hypothetical v4.
|
||||
|
||||
**External validation:**
|
||||
> "We got amazing feedback from OTel collector side —
|
||||
> AppenderV2 is helpful."
|
||||
|
||||
The OTel collector team confirmed the unified interface
|
||||
solves real problems for consumers.
|
||||
|
||||
### Implementation Strategy
|
||||
|
||||
The issue tracks a detailed milestone plan:
|
||||
- M1: Add interface + TSDB impl (done in parts 1-5)
|
||||
- Adapter layer for coexistence
|
||||
- Switch each consumer one at a time
|
||||
- Understand impact on downstream (Thanos, Mimir, Cortex)
|
||||
|
||||
**This is a 12-PR migration** — each PR switches one
|
||||
consumer. The interface was introduced first, then
|
||||
consumers migrated one by one.
|
||||
|
||||
---
|
||||
|
||||
## Oban PR #684: Inline Testing Mode
|
||||
|
||||
**Title:** "Inline testing mode"
|
||||
**Author:** Parker Selbert (sorentwo)
|
||||
**Created:** 2022-04-11, **Merged:** 2022-04-12 (< 24 hours)
|
||||
**Size:** (not available — private discussion)
|
||||
**Comments/Reviews:** 0
|
||||
|
||||
### What This Reveals
|
||||
|
||||
Parker Selbert is the solo maintainer. The PR was
|
||||
merged in less than 24 hours with zero discussion. This
|
||||
isn't poor process — it's the reality of
|
||||
solo-maintained open source:
|
||||
- The design decision happened in his head (or in the
|
||||
linked issue discussion)
|
||||
- No external review needed because he owns the whole
|
||||
system
|
||||
- The commit message is the entire design document
|
||||
|
||||
### The Feature
|
||||
|
||||
> Inline testing immediately executes jobs as they are
|
||||
> inserted within the current process and without ever
|
||||
> touching the database. It's an alternative to the
|
||||
> standard insert/verify/execute flow for testing.
|
||||
|
||||
This single feature eliminated the #1 source of test
|
||||
flakiness for all Oban consumers. No polling, no sleep,
|
||||
no async assertions.
|
||||
|
||||
---
|
||||
|
||||
## Cross-Cutting Lessons from PR Discussions
|
||||
|
||||
### 1. "Merge and iterate" > "Perfect before merge"
|
||||
|
||||
Both the slog migration and the Handle pattern were
|
||||
merged with known imperfections. The slog PR merged
|
||||
with full file paths in logs (unresolved). The Handle
|
||||
PR merged with trace span issues discovered during
|
||||
review.
|
||||
|
||||
**Lesson for review:** Don't block a pattern introduction
|
||||
on polish. Block on correctness. Accept imperfect-but-
|
||||
correct as mergeable.
|
||||
|
||||
### 2. Breaking changes need a major version OR gradual migration
|
||||
|
||||
- Prometheus: slog migration in one commit, held for 3.0
|
||||
- CockroachDB: Handle API coexists with RunAsyncTask,
|
||||
no major version needed
|
||||
- Prometheus AppenderV2: NOT a user-facing change, so no
|
||||
need to wait for v4
|
||||
|
||||
**Lesson:** The decision to batch vs. gradual depends on
|
||||
whether it's user-facing. Internal refactors can ship
|
||||
incrementally. User-facing API changes need version
|
||||
boundaries.
|
||||
|
||||
### 3. External consumers validate pattern decisions
|
||||
|
||||
AppenderV2 was validated by the OTel collector team.
|
||||
The Handle pattern was validated by profiling data.
|
||||
The slog migration was validated by "the stdlib does it."
|
||||
|
||||
**Lesson:** Pattern introductions that cite external
|
||||
evidence (benchmarks, user feedback, ecosystem
|
||||
alignment) get approved faster.
|
||||
|
||||
### 4. Large migrations require coordination
|
||||
|
||||
The slog PR needed prometheus/common changes first.
|
||||
AppenderV2 is a 12-PR sequence. Large changes need
|
||||
dependency ordering.
|
||||
|
||||
**Lesson for review:** When reviewing the first PR in a
|
||||
migration, ask about the full plan. Is this standalone?
|
||||
What comes next? What depends on this?
|
||||
|
||||
### 5. Solo maintainers decide faster
|
||||
|
||||
Oban's inline testing: conceived, implemented, merged
|
||||
in <24 hours. CockroachDB's Handle pattern: 2.5 months.
|
||||
Prometheus slog: 3 weeks. Prometheus AppenderV2: multi-
|
||||
month ongoing.
|
||||
|
||||
**Lesson:** Team size correlates with decision latency,
|
||||
not decision quality. Parker's inline testing mode was
|
||||
the right call, arrived fast, and has been stable for
|
||||
3+ years.
|
||||
|
||||
<!-- PATTERN_COMPLETE -->
|
||||
Reference in New Issue
Block a user