docs: PR discussion analysis — how new patterns are debated and landed
Real PR threads from CockroachDB, Prometheus, and Oban showing: - Handle pattern (PR #142059): 2.5 months, trace span issues discovered in review - slog migration (PR #14906): 'merge and iterate', 162 files, one commit for 3.0 - AppenderV2 (issue #17632): maintainer pushback defeated by 'it's not user-facing' - Oban inline testing (PR #684): <24h, zero discussion, solo maintainer Key insight: 'merge and iterate' beats 'perfect before merge' for pattern introductions.
This commit is contained in:
@@ -0,0 +1,284 @@
|
|||||||
|
# PR Discussions: How New Patterns Are Debated and Landed
|
||||||
|
|
||||||
|
The most revealing artifact of codebase evolution is not
|
||||||
|
the code — it's the conversation. Here's what the PR
|
||||||
|
discussions tell us about how patterns actually get
|
||||||
|
introduced.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CockroachDB PR #142059: The Handle Pattern
|
||||||
|
|
||||||
|
**Title:** "stop: prefer launching goroutines at caller"
|
||||||
|
**Author:** Tobias Grieger (tbg)
|
||||||
|
**Created:** 2025-02-26, **Merged:** 2025-05-14 (~2.5 months)
|
||||||
|
**Size:** +314/-84 lines
|
||||||
|
|
||||||
|
### The Problem Statement (from PR body)
|
||||||
|
|
||||||
|
> This has the undesirable property of recording a
|
||||||
|
> location in `Stopper.RunAsyncTaskEx` as the async
|
||||||
|
> task's lowermost call frame. Anyone who's ever looked
|
||||||
|
> at any profile will be familiar with this fact.
|
||||||
|
|
||||||
|
The motivation wasn't correctness — RunAsyncTask was
|
||||||
|
correct. It was *observability*. Every goroutine looked
|
||||||
|
the same in profiling tools.
|
||||||
|
|
||||||
|
### The Discussion
|
||||||
|
|
||||||
|
**Reviewer (kvoli) asked:** "Noticed there were some
|
||||||
|
span after use related test failures, do you know
|
||||||
|
what's going on there?"
|
||||||
|
|
||||||
|
**Author (tbg) explained:** "Previously, the stopper
|
||||||
|
created the trace span on the caller's goroutine, and
|
||||||
|
would then hand it over to the goroutine it spawned.
|
||||||
|
There was an ominous comment about 'we can't create the
|
||||||
|
span on the goroutine itself because the parent may at
|
||||||
|
that time have finished.' I didn't think our code would
|
||||||
|
so aggressively enforce that ordering."
|
||||||
|
|
||||||
|
**What this reveals:** The new pattern exposed a hidden
|
||||||
|
assumption in the tracing infrastructure. The old API
|
||||||
|
hid the span lifecycle problem by accident. The new API
|
||||||
|
made it visible, requiring explicit handling.
|
||||||
|
|
||||||
|
### Migration Strategy (explicit in PR body)
|
||||||
|
|
||||||
|
> The "old way" to start a task via `StartAsyncTaskEx`
|
||||||
|
> remains in place. The hope is that we'll convert all
|
||||||
|
> important tasks to `GetHandle`. This can be done as
|
||||||
|
> follow-up work.
|
||||||
|
|
||||||
|
**Decision:** Coexistence, not replacement. Zero
|
||||||
|
migration commitment in the PR. Conversion is "hope"
|
||||||
|
and "follow-up work." No timeline.
|
||||||
|
|
||||||
|
### Benchmark Evidence (in PR body)
|
||||||
|
|
||||||
|
```
|
||||||
|
AsyncTaskEx-10 3096800 1169 ns/op 72 B/op 3 allocs/op
|
||||||
|
Handle-10 3165979 1173 ns/op 48 B/op 1 allocs/op
|
||||||
|
```
|
||||||
|
|
||||||
|
Same performance (1173 vs 1169 ns/op) but fewer
|
||||||
|
allocations (1 vs 3). The PR isn't sold on performance
|
||||||
|
— it's sold on profiling clarity. Benchmarks are proof
|
||||||
|
of "at least not worse."
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prometheus PR #14906: slog Migration
|
||||||
|
|
||||||
|
**Title:** "chore!: adopt log/slog, remove go-kit/log"
|
||||||
|
**Author:** TJ Hoplock (tjhop)
|
||||||
|
**Created:** 2024-09-15, **Merged:** 2024-10-07 (~3 weeks)
|
||||||
|
**Size:** 162 files, +1534/-1691 lines
|
||||||
|
|
||||||
|
### Why It Happened (from linked issue #14355)
|
||||||
|
|
||||||
|
Go stdlib added `log/slog` in Go 1.21 (Aug 2023).
|
||||||
|
Prometheus already depended on go-kit/log, a third-party
|
||||||
|
structured logging library. The stdlib version offers:
|
||||||
|
- No external dependency
|
||||||
|
- Standard interface other libraries adopt
|
||||||
|
- Better performance
|
||||||
|
|
||||||
|
### The Discussion (26 comments)
|
||||||
|
|
||||||
|
**Key themes from the thread:**
|
||||||
|
|
||||||
|
1. **"Merge and iterate"** — SuperQ (reviewer): "Awesome
|
||||||
|
work. I think we should merge this asap and iterate
|
||||||
|
on improvements." Not: "make it perfect first."
|
||||||
|
|
||||||
|
2. **Formatting debate** — roidelapluie: "didn't we go
|
||||||
|
for `ts`?" (about timestamp format). Decision: since
|
||||||
|
this is a 3.0 breaking change anyway, adopt slog's
|
||||||
|
`time` key as-is.
|
||||||
|
|
||||||
|
3. **Source file paths** — roidelapluie: "I approve this
|
||||||
|
but I would appreciate if we do not have the full
|
||||||
|
path to source file." SuperQ: "We haven't been able
|
||||||
|
to figure out a way to get the relative path."
|
||||||
|
Unresolved — merged anyway.
|
||||||
|
|
||||||
|
4. **Upstream dependencies** — Required changes in
|
||||||
|
prometheus/common that weren't released yet. Solution:
|
||||||
|
pin to commit in go.mod, replace after release.
|
||||||
|
|
||||||
|
5. **Self-review** — tjhop: "Did a thorough self-review
|
||||||
|
and see a few other spots to tidy up." The author
|
||||||
|
caught issues after posting.
|
||||||
|
|
||||||
|
6. **Merge conflicts** — tjhop: "Please excuse the
|
||||||
|
frequent force pushes -- this PR touches a ton of
|
||||||
|
stuff and I'm trying to stay on top of all the merge
|
||||||
|
conflicts." Large migrations have coordination costs.
|
||||||
|
|
||||||
|
### What This Reveals About Pattern Introduction
|
||||||
|
|
||||||
|
- The breaking change was held for a **major version**
|
||||||
|
(3.0). They didn't try to do it incrementally.
|
||||||
|
- **162 files in one commit** — total replacement, not
|
||||||
|
gradual migration. This is the opposite of
|
||||||
|
CockroachDB's approach.
|
||||||
|
- Multiple reviewers said **"let's merge and iterate"**
|
||||||
|
rather than blocking on polish. Ship the migration,
|
||||||
|
fix details in follow-ups.
|
||||||
|
- **Upstream coordination** required (prometheus/common
|
||||||
|
had to release first).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prometheus Issue #17632: AppenderV2 Design
|
||||||
|
|
||||||
|
**Title:** "storage: Switch to unified, transactional
|
||||||
|
AppenderV2 API; remove Appender."
|
||||||
|
**Author:** bwplotka
|
||||||
|
**Status:** In progress (M1 complete, iterating)
|
||||||
|
|
||||||
|
### Why V2?
|
||||||
|
|
||||||
|
The original Appender interface grew organically:
|
||||||
|
- `Appender.Append()` — float samples
|
||||||
|
- `ExemplarAppender.AppendExemplar()` — per-series
|
||||||
|
- `HistogramAppender.AppendHistogram()` — native histograms
|
||||||
|
- `StartTimestampAppender` — for OTel start timestamps
|
||||||
|
|
||||||
|
Each requires a type assertion. The scrape loop has
|
||||||
|
multiple code paths depending on which interfaces the
|
||||||
|
storage supports.
|
||||||
|
|
||||||
|
### The Debate
|
||||||
|
|
||||||
|
**roidelapluie (maintainer) pushed back:**
|
||||||
|
> "I would like to see scrape and prometheus continuing
|
||||||
|
> to support v1. The changes in some areas like scrape
|
||||||
|
> seem really high. We could move everything to v2 in
|
||||||
|
> Prometheus v4."
|
||||||
|
|
||||||
|
**bwplotka (author) challenged:**
|
||||||
|
> "I'd like to challenge this. It's not a breaking
|
||||||
|
> change. It's not even a user facing change. So why
|
||||||
|
> waiting for v4?"
|
||||||
|
|
||||||
|
**Resolution:** Author won. V2 is being adopted now
|
||||||
|
(target: remove V1 by Q2 2026) rather than waiting for
|
||||||
|
a hypothetical v4.
|
||||||
|
|
||||||
|
**External validation:**
|
||||||
|
> "We got amazing feedback from OTel collector side —
|
||||||
|
> AppenderV2 is helpful."
|
||||||
|
|
||||||
|
The OTel collector team confirmed the unified interface
|
||||||
|
solves real problems for consumers.
|
||||||
|
|
||||||
|
### Implementation Strategy
|
||||||
|
|
||||||
|
The issue tracks a detailed milestone plan:
|
||||||
|
- M1: Add interface + TSDB impl (done in parts 1-5)
|
||||||
|
- Adapter layer for coexistence
|
||||||
|
- Switch each consumer one at a time
|
||||||
|
- Understand impact on downstream (Thanos, Mimir, Cortex)
|
||||||
|
|
||||||
|
**This is a 12-PR migration** — each PR switches one
|
||||||
|
consumer. The interface was introduced first, then
|
||||||
|
consumers migrated one by one.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Oban PR #684: Inline Testing Mode
|
||||||
|
|
||||||
|
**Title:** "Inline testing mode"
|
||||||
|
**Author:** Parker Selbert (sorentwo)
|
||||||
|
**Created:** 2022-04-11, **Merged:** 2022-04-12 (< 24 hours)
|
||||||
|
**Size:** (not available — private discussion)
|
||||||
|
**Comments/Reviews:** 0
|
||||||
|
|
||||||
|
### What This Reveals
|
||||||
|
|
||||||
|
Parker Selbert is the solo maintainer. The PR was
|
||||||
|
merged in less than 24 hours with zero discussion. This
|
||||||
|
isn't poor process — it's the reality of
|
||||||
|
solo-maintained open source:
|
||||||
|
- The design decision happened in his head (or in the
|
||||||
|
linked issue discussion)
|
||||||
|
- No external review needed because he owns the whole
|
||||||
|
system
|
||||||
|
- The commit message is the entire design document
|
||||||
|
|
||||||
|
### The Feature
|
||||||
|
|
||||||
|
> Inline testing immediately executes jobs as they are
|
||||||
|
> inserted within the current process and without ever
|
||||||
|
> touching the database. It's an alternative to the
|
||||||
|
> standard insert/verify/execute flow for testing.
|
||||||
|
|
||||||
|
This single feature eliminated the #1 source of test
|
||||||
|
flakiness for all Oban consumers. No polling, no sleep,
|
||||||
|
no async assertions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cross-Cutting Lessons from PR Discussions
|
||||||
|
|
||||||
|
### 1. "Merge and iterate" > "Perfect before merge"
|
||||||
|
|
||||||
|
Both the slog migration and the Handle pattern were
|
||||||
|
merged with known imperfections. The slog PR merged
|
||||||
|
with full file paths in logs (unresolved). The Handle
|
||||||
|
PR merged with trace span issues discovered during
|
||||||
|
review.
|
||||||
|
|
||||||
|
**Lesson for review:** Don't block a pattern introduction
|
||||||
|
on polish. Block on correctness. Accept imperfect-but-
|
||||||
|
correct as mergeable.
|
||||||
|
|
||||||
|
### 2. Breaking changes need a major version OR gradual migration
|
||||||
|
|
||||||
|
- Prometheus: slog migration in one commit, held for 3.0
|
||||||
|
- CockroachDB: Handle API coexists with RunAsyncTask,
|
||||||
|
no major version needed
|
||||||
|
- Prometheus AppenderV2: NOT a user-facing change, so no
|
||||||
|
need to wait for v4
|
||||||
|
|
||||||
|
**Lesson:** The decision to batch vs. gradual depends on
|
||||||
|
whether it's user-facing. Internal refactors can ship
|
||||||
|
incrementally. User-facing API changes need version
|
||||||
|
boundaries.
|
||||||
|
|
||||||
|
### 3. External consumers validate pattern decisions
|
||||||
|
|
||||||
|
AppenderV2 was validated by the OTel collector team.
|
||||||
|
The Handle pattern was validated by profiling data.
|
||||||
|
The slog migration was validated by "the stdlib does it."
|
||||||
|
|
||||||
|
**Lesson:** Pattern introductions that cite external
|
||||||
|
evidence (benchmarks, user feedback, ecosystem
|
||||||
|
alignment) get approved faster.
|
||||||
|
|
||||||
|
### 4. Large migrations require coordination
|
||||||
|
|
||||||
|
The slog PR needed prometheus/common changes first.
|
||||||
|
AppenderV2 is a 12-PR sequence. Large changes need
|
||||||
|
dependency ordering.
|
||||||
|
|
||||||
|
**Lesson for review:** When reviewing the first PR in a
|
||||||
|
migration, ask about the full plan. Is this standalone?
|
||||||
|
What comes next? What depends on this?
|
||||||
|
|
||||||
|
### 5. Solo maintainers decide faster
|
||||||
|
|
||||||
|
Oban's inline testing: conceived, implemented, merged
|
||||||
|
in <24 hours. CockroachDB's Handle pattern: 2.5 months.
|
||||||
|
Prometheus slog: 3 weeks. Prometheus AppenderV2: multi-
|
||||||
|
month ongoing.
|
||||||
|
|
||||||
|
**Lesson:** Team size correlates with decision latency,
|
||||||
|
not decision quality. Parker's inline testing mode was
|
||||||
|
the right call, arrived fast, and has been stable for
|
||||||
|
3+ years.
|
||||||
|
|
||||||
|
<!-- PATTERN_COMPLETE -->
|
||||||
Reference in New Issue
Block a user