refactor(findings): split ALL-FINDINGS.md into per-experiment files

Break the monolithic 3249-line findings file into 29 individual files, one per experiment. Each file is named YYYY-MM-DD-NN-slug.md for easy chronological sorting and discovery. No content changes — purely structural reorganization.
2026-05-06 07:15:50 -07:00
parent 1b108ff66e
commit 6af8a6ee10
32 changed files with 3232 additions and 3254 deletions
@@ -0,0 +1,17 @@
+# Finding 5: Sonnet is fast and catches structural issues; GPT-5 is slow and catches semantic issues
+
+**Date:** 2026-04-26
+**Task:** Dual review across PRs #372, #375, #378, #380, #382
+**How we used them:** Same pr-review skill, same context (diff + files +
+issue + AC), same sub-agent pattern. Only variable: model. Both got rich
+context. Both ran the full 7-phase review skill.
+
+- Sonnet consistently finishes first, catches formatting, broken links,
+  structural problems (missing sections, dangling refs)
+- GPT-5 takes longer, catches meaning-level problems (verdict mismatches,
+  classification inconsistencies, logical gaps)
+- **Takeaway:** With identical rich context and identical instructions, the
+  models naturally gravitate to different things. Sonnet is the structural
+  reviewer; GPT-5 is the semantic reviewer. Both roles matter. Question:
+  would Sonnet catch semantic issues if given a narrower "check for logical
+  consistency" framing instead of broad review?