Files
model-research/findings/2026-04-27-08-bias-detection-all-models-catch.md
T
Rodin 6af8a6ee10 refactor(findings): split ALL-FINDINGS.md into per-experiment files
Break the monolithic 3249-line findings file into 29 individual files,
one per experiment. Each file is named YYYY-MM-DD-NN-slug.md for easy
chronological sorting and discovery.

No content changes — purely structural reorganization.
2026-05-06 07:15:50 -07:00

59 lines
3.3 KiB
Markdown

# Finding 8: Bias detection: all models catch it with any framing — when the signal isn't buried
**Date:** 2026-04-27
**Task:** Detect directional bias in 8 deliberately biased hypotheses about
microservices vs monolith architecture for fintech startups.
**How we used them:** Created fresh test material (8 hypotheses with pro-
microservices bias via absolutes like "inevitably," "necessary," "must,"
"requires," plus one factually inverted claim about consistency guarantees).
Ran 4 conditions in parallel sub-agents:
| Condition | Model | Framing | Context |
|---|---|---|---|
| A | GPT-4.1 Mini | Narrow: "Do any lead toward a predetermined conclusion?" | Hypotheses only |
| B | Sonnet | Same narrow question | Hypotheses only |
| C | GPT-5 | Same narrow question | Hypotheses only |
| D | Sonnet | Broad: "Review quality, clarity, testability, and issues" | Hypotheses only |
**Results:**
- **All 4 conditions detected 8/8 biased hypotheses.** No misses.
- All 3 narrow-framing models (Mini, Sonnet, GPT-5) produced structurally
similar output: per-hypothesis verdict, biasing words, neutral version,
severity assessment.
- All 3 narrow-framing models flagged H8's factual inversion (distributed
transactions DON'T provide stronger consistency than monolithic ACID).
- GPT-5 added specific counterexamples (LMAX Disruptor, Shopify, Stack
Overflow, Basecamp) — marginally richer analysis.
- Sonnet broad mandate also caught the bias — framed as one of three
"systemic problems" (deterministic language, pro-microservices framing
bias, underspecified constructs). Additionally provided testability and
operationalization analysis that the narrow framing didn't ask for.
- Sonnet broad took ~72s vs ~39s for narrow conditions (more output).
**Takeaway:** When the biased text is the ONLY input (no surrounding noise),
all tested models — including the cheapest (GPT-4.1 Mini) — detect bias
regardless of whether the question is narrow or broad. This appears to
**contradict** original finding #2 ("cheap model + narrow lens > expensive
model + broad review"), but the key difference is context noise:
- **Original experiment (2026-04-26):** Sonnet and GPT-5 missed bias during
FULL PR REVIEW with rich project context (diff, file content, issue text,
acceptance criteria, project conventions). The hypotheses were buried in
layers of review mechanics.
- **This experiment (2026-04-27):** Even the "broad" condition gave ONLY the
hypothesis text — no diff, no PR structure, no project context noise.
**Refined hypothesis:** The original finding #2 was about **signal-to-noise
ratio**, not about model capability or framing precision. When biased text
is presented in isolation, any model catches it. When biased text is buried
in a large PR review with many other things to check, the bias signal gets
lost in the noise — unless you explicitly ask about it. The "narrow lens"
worked because it eliminated the noise, not because smaller models are
better at bias detection.
**Next experiment to confirm:** Give a model the FULL PR review context
(diff, files, issue, AC) but add the narrow bias question as an explicit
review checklist item. If the model catches bias despite the rich context,
it confirms the signal-to-noise hypothesis. If it misses, it suggests
something else is at play (attention allocation, task switching cost).