refactor(findings): split ALL-FINDINGS.md into per-experiment files
Break the monolithic 3249-line findings file into 29 individual files, one per experiment. Each file is named YYYY-MM-DD-NN-slug.md for easy chronological sorting and discovery. No content changes — purely structural reorganization.
This commit is contained in:
@@ -0,0 +1,18 @@
|
||||
# Finding 2: Cheap model + narrow lens > expensive model + broad review (one data point)
|
||||
|
||||
**Date:** 2026-04-26
|
||||
**Task:** Check 12 rewritten hypotheses for directional bias
|
||||
**How we used them:**
|
||||
- Sonnet & GPT-5: full PR review context (diff, file content, issue, AC).
|
||||
Broad mandate: "review this PR." Rich context but unfocused task.
|
||||
- GPT-4.1 Mini: given ONLY the 12 hypothesis texts + one focused question:
|
||||
"Do any of these hypotheses lead toward a predetermined conclusion?"
|
||||
Minimal context, laser-focused task. No diff, no project docs, no issue.
|
||||
|
||||
- Both Sonnet and GPT-5 approved the hypotheses as reviewers
|
||||
- GPT-4.1 Mini found ALL 12 pushed toward predetermined conclusions
|
||||
- Words like "requires," "necessary," "must be" were flagged as directional
|
||||
- **Takeaway:** Task framing mattered more than model size. Rich context +
|
||||
broad mandate = missed the forest for the trees. Minimal context + precise
|
||||
question = found exactly what mattered. This needs more testing — was it
|
||||
the narrow framing, the lack of surrounding context, or both?
|
||||
Reference in New Issue
Block a user