Files

T

Rodin 7e99faff59 docs: PR discussion analysis — how new patterns are debated and landed

Real PR threads from CockroachDB, Prometheus, and Oban showing:
- Handle pattern (PR #142059): 2.5 months, trace span issues discovered in review
- slog migration (PR #14906): 'merge and iterate', 162 files, one commit for 3.0
- AppenderV2 (issue #17632): maintainer pushback defeated by 'it's not user-facing'
- Oban inline testing (PR #684): <24h, zero discussion, solo maintainer

Key insight: 'merge and iterate' beats 'perfect before merge' for
pattern introductions.

2026-04-30 11:12:07 -07:00

9.1 KiB

Raw Permalink Blame History

PR Discussions: How New Patterns Are Debated and Landed

The most revealing artifact of codebase evolution is not the code — it's the conversation. Here's what the PR discussions tell us about how patterns actually get introduced.

CockroachDB PR #142059: The Handle Pattern

Title: "stop: prefer launching goroutines at caller" Author: Tobias Grieger (tbg) Created: 2025-02-26, Merged: 2025-05-14 (~2.5 months) Size: +314/-84 lines

The Problem Statement (from PR body)

This has the undesirable property of recording a location in Stopper.RunAsyncTaskEx as the async task's lowermost call frame. Anyone who's ever looked at any profile will be familiar with this fact.

The motivation wasn't correctness — RunAsyncTask was correct. It was observability. Every goroutine looked the same in profiling tools.

The Discussion

Reviewer (kvoli) asked: "Noticed there were some span after use related test failures, do you know what's going on there?"

Author (tbg) explained: "Previously, the stopper created the trace span on the caller's goroutine, and would then hand it over to the goroutine it spawned. There was an ominous comment about 'we can't create the span on the goroutine itself because the parent may at that time have finished.' I didn't think our code would so aggressively enforce that ordering."

What this reveals: The new pattern exposed a hidden assumption in the tracing infrastructure. The old API hid the span lifecycle problem by accident. The new API made it visible, requiring explicit handling.

Migration Strategy (explicit in PR body)

The "old way" to start a task via StartAsyncTaskEx remains in place. The hope is that we'll convert all important tasks to GetHandle. This can be done as follow-up work.

Decision: Coexistence, not replacement. Zero migration commitment in the PR. Conversion is "hope" and "follow-up work." No timeline.

Benchmark Evidence (in PR body)

AsyncTaskEx-10   3096800   1169 ns/op   72 B/op   3 allocs/op
Handle-10        3165979   1173 ns/op   48 B/op   1 allocs/op

Same performance (1173 vs 1169 ns/op) but fewer allocations (1 vs 3). The PR isn't sold on performance — it's sold on profiling clarity. Benchmarks are proof of "at least not worse."

Prometheus PR #14906: slog Migration

Title: "chore!: adopt log/slog, remove go-kit/log" Author: TJ Hoplock (tjhop) Created: 2024-09-15, Merged: 2024-10-07 (~3 weeks) Size: 162 files, +1534/-1691 lines

Why It Happened (from linked issue #14355)

Go stdlib added log/slog in Go 1.21 (Aug 2023). Prometheus already depended on go-kit/log, a third-party structured logging library. The stdlib version offers:

No external dependency
Standard interface other libraries adopt
Better performance

The Discussion (26 comments)

Key themes from the thread:

"Merge and iterate" — SuperQ (reviewer): "Awesome work. I think we should merge this asap and iterate on improvements." Not: "make it perfect first."
Formatting debate — roidelapluie: "didn't we go for ts?" (about timestamp format). Decision: since this is a 3.0 breaking change anyway, adopt slog's time key as-is.
Source file paths — roidelapluie: "I approve this but I would appreciate if we do not have the full path to source file." SuperQ: "We haven't been able to figure out a way to get the relative path." Unresolved — merged anyway.
Upstream dependencies — Required changes in prometheus/common that weren't released yet. Solution: pin to commit in go.mod, replace after release.
Self-review — tjhop: "Did a thorough self-review and see a few other spots to tidy up." The author caught issues after posting.
Merge conflicts — tjhop: "Please excuse the frequent force pushes -- this PR touches a ton of stuff and I'm trying to stay on top of all the merge conflicts." Large migrations have coordination costs.

What This Reveals About Pattern Introduction

The breaking change was held for a major version (3.0). They didn't try to do it incrementally.
162 files in one commit — total replacement, not gradual migration. This is the opposite of CockroachDB's approach.
Multiple reviewers said "let's merge and iterate" rather than blocking on polish. Ship the migration, fix details in follow-ups.
Upstream coordination required (prometheus/common had to release first).

Prometheus Issue #17632: AppenderV2 Design

Title: "storage: Switch to unified, transactional AppenderV2 API; remove Appender." Author: bwplotka Status: In progress (M1 complete, iterating)

Why V2?

The original Appender interface grew organically:

Appender.Append() — float samples
ExemplarAppender.AppendExemplar() — per-series
HistogramAppender.AppendHistogram() — native histograms
StartTimestampAppender — for OTel start timestamps

Each requires a type assertion. The scrape loop has multiple code paths depending on which interfaces the storage supports.

The Debate

roidelapluie (maintainer) pushed back:

"I would like to see scrape and prometheus continuing to support v1. The changes in some areas like scrape seem really high. We could move everything to v2 in Prometheus v4."

bwplotka (author) challenged:

"I'd like to challenge this. It's not a breaking change. It's not even a user facing change. So why waiting for v4?"

Resolution: Author won. V2 is being adopted now (target: remove V1 by Q2 2026) rather than waiting for a hypothetical v4.

External validation:

"We got amazing feedback from OTel collector side — AppenderV2 is helpful."

The OTel collector team confirmed the unified interface solves real problems for consumers.

Implementation Strategy

The issue tracks a detailed milestone plan:

M1: Add interface + TSDB impl (done in parts 1-5)
Adapter layer for coexistence
Switch each consumer one at a time
Understand impact on downstream (Thanos, Mimir, Cortex)

This is a 12-PR migration — each PR switches one consumer. The interface was introduced first, then consumers migrated one by one.

Oban PR #684: Inline Testing Mode

Title: "Inline testing mode" Author: Parker Selbert (sorentwo) Created: 2022-04-11, Merged: 2022-04-12 (< 24 hours) Size: (not available — private discussion) Comments/Reviews: 0

What This Reveals

Parker Selbert is the solo maintainer. The PR was merged in less than 24 hours with zero discussion. This isn't poor process — it's the reality of solo-maintained open source:

The design decision happened in his head (or in the linked issue discussion)
No external review needed because he owns the whole system
The commit message is the entire design document

The Feature

Inline testing immediately executes jobs as they are inserted within the current process and without ever touching the database. It's an alternative to the standard insert/verify/execute flow for testing.

This single feature eliminated the #1 source of test flakiness for all Oban consumers. No polling, no sleep, no async assertions.

Cross-Cutting Lessons from PR Discussions

1. "Merge and iterate" > "Perfect before merge"

Both the slog migration and the Handle pattern were merged with known imperfections. The slog PR merged with full file paths in logs (unresolved). The Handle PR merged with trace span issues discovered during review.

Lesson for review: Don't block a pattern introduction on polish. Block on correctness. Accept imperfect-but- correct as mergeable.

2. Breaking changes need a major version OR gradual migration

Prometheus: slog migration in one commit, held for 3.0
CockroachDB: Handle API coexists with RunAsyncTask, no major version needed
Prometheus AppenderV2: NOT a user-facing change, so no need to wait for v4

Lesson: The decision to batch vs. gradual depends on whether it's user-facing. Internal refactors can ship incrementally. User-facing API changes need version boundaries.

3. External consumers validate pattern decisions

AppenderV2 was validated by the OTel collector team. The Handle pattern was validated by profiling data. The slog migration was validated by "the stdlib does it."

Lesson: Pattern introductions that cite external evidence (benchmarks, user feedback, ecosystem alignment) get approved faster.

4. Large migrations require coordination

The slog PR needed prometheus/common changes first. AppenderV2 is a 12-PR sequence. Large changes need dependency ordering.

Lesson for review: When reviewing the first PR in a migration, ask about the full plan. Is this standalone? What comes next? What depends on this?

5. Solo maintainers decide faster

Oban's inline testing: conceived, implemented, merged in <24 hours. CockroachDB's Handle pattern: 2.5 months. Prometheus slog: 3 weeks. Prometheus AppenderV2: multi- month ongoing.

Lesson: Team size correlates with decision latency, not decision quality. Parker's inline testing mode was the right call, arrived fast, and has been stable for 3+ years.

9.1 KiB Raw Permalink Blame History