Files
model-research/findings/2026-04-26-05-sonnet-is-fast-and-catches.md
T
Rodin 6af8a6ee10 refactor(findings): split ALL-FINDINGS.md into per-experiment files
Break the monolithic 3249-line findings file into 29 individual files,
one per experiment. Each file is named YYYY-MM-DD-NN-slug.md for easy
chronological sorting and discovery.

No content changes — purely structural reorganization.
2026-05-06 07:15:50 -07:00

979 B

Finding 5: Sonnet is fast and catches structural issues; GPT-5 is slow and catches semantic issues

Date: 2026-04-26 Task: Dual review across PRs #372, #375, #378, #380, #382 How we used them: Same pr-review skill, same context (diff + files + issue + AC), same sub-agent pattern. Only variable: model. Both got rich context. Both ran the full 7-phase review skill.

  • Sonnet consistently finishes first, catches formatting, broken links, structural problems (missing sections, dangling refs)
  • GPT-5 takes longer, catches meaning-level problems (verdict mismatches, classification inconsistencies, logical gaps)
  • Takeaway: With identical rich context and identical instructions, the models naturally gravitate to different things. Sonnet is the structural reviewer; GPT-5 is the semantic reviewer. Both roles matter. Question: would Sonnet catch semantic issues if given a narrower "check for logical consistency" framing instead of broad review?