model-research/findings/2026-04-26-05-sonnet-is-fast-and-catches.md

# Finding 5: Sonnet is fast and catches structural issues; GPT-5 is slow and catches semantic issues

**Date:** 2026-04-26
**Task:** Dual review across PRs #372, #375, #378, #380, #382
**How we used them:** Same pr-review skill, same context (diff + files +
issue + AC), same sub-agent pattern. Only variable: model. Both got rich
context. Both ran the full 7-phase review skill.

- Sonnet consistently finishes first, catches formatting, broken links,
  structural problems (missing sections, dangling refs)
- GPT-5 takes longer, catches meaning-level problems (verdict mismatches,
  classification inconsistencies, logical gaps)
- **Takeaway:** With identical rich context and identical instructions, the
  models naturally gravitate to different things. Sonnet is the structural
  reviewer; GPT-5 is the semantic reviewer. Both roles matter. Question:
  would Sonnet catch semantic issues if given a narrower "check for logical
  consistency" framing instead of broad review?