finding #39: narrow framing does not close Sonnet-GPT-5 gap for semantic consistency

Tested open question from Finding #5: does narrow framing give Sonnet GPT-5-level semantic analysis? Result: NO. Narrow framing changes WHAT Sonnet looks for (redirects from gaps to contradictions) but not HOW WELL it reasons. Sonnet narrow found 3 contradictions but only 1 was genuine (2 were analytical errors/misreads). GPT-5 found 4 all-genuine findings with precise reasoning. Key insight: framing controls scope, not reasoning depth. For tasks requiring logical verification (contradictions, race conditions, invariant violations), reasoning tokens are necessary — framing alone is insufficient. Updated open-questions.md: marked Sonnet+narrow as answered, added new question about Opus+narrow for contradiction detection.
2026-05-07 09:26:08 -07:00
parent d27ce6f5e1
commit 0c632c255a
2 changed files with 196 additions and 3 deletions
@@ -22,10 +22,22 @@ cross-doc contradictions are easy to verify once spotted (reducing GPT-5's
 verification advantage)? Or because boundary reasoning (Opus's strength)
 is the primary skill needed?

-### Sonnet + narrow framing = GPT-5 level? (from Finding #5)
-Would Sonnet catch semantic issues if given a narrower "check for logical
+### Opus + narrow framing for contradiction detection (from Finding #39)
+Would Opus + narrow framing match GPT-5 for self-contradiction detection?
+Finding #39 showed Sonnet can't do it even with narrow framing (reasoning
+depth issue). Opus has strong cross-boundary reasoning — does its internal
+reasoning depth suffice for the verification step that Sonnet lacks?
+
+### ~~Sonnet + narrow framing = GPT-5 level? (from Finding #5)~~ → ANSWERED (Finding #39)
+~~Would Sonnet catch semantic issues if given a narrower "check for logical
 consistency" framing instead of broad review? The hypothesis: Sonnet's
-"structural reviewer" tendency is a framing artifact, not a capability limit.
+"structural reviewer" tendency is a framing artifact, not a capability limit.~~
+
+**NO.** Narrow framing changes WHAT Sonnet looks for (redirects from gaps to
+contradictions) but not HOW WELL it reasons. Sonnet narrow found 3 contradictions
+but only 1 was genuine (2 were misreadings). GPT-5 found 4 all-genuine findings.
+The gap is reasoning depth, not framing — Sonnet can't reliably verify whether
+two statements actually contradict each other.

 ## Medium Priority