# Finding 1: Different models catch different things (confirmed)

**Date:** 2026-04-26
**Task:** PR reviews on DDD reference docs (~6,600 lines across 18 files)
**How we used them:** Both models got the same task via pr-review skill —
fetch diff, fetch full file content for changed files, review against PR
description and linked issue acceptance criteria. Rich context: full diff,
project CLAUDE.md conventions, issue body. Each reviewer ran independently
in its own sub-agent with its own Gitea token. No cross-pollination.

- GPT-5 caught SUMMARY.md verdict mismatches (Commanded classification,
  small teams classification) that Sonnet missed entirely (PR #375)
- Sonnet caught a broken cross-reference link first that GPT-5 missed (PR #378)
- **Takeaway:** Different blind spots are real. Neither model is strictly better
  for analytical review — they complement each other. This is why we run two
  independent reviewers from different model families.