finding #39: narrow framing does not close Sonnet-GPT-5 gap for semantic consistency

Tested open question from Finding #5: does narrow framing give Sonnet
GPT-5-level semantic analysis?

Result: NO. Narrow framing changes WHAT Sonnet looks for (redirects from
gaps to contradictions) but not HOW WELL it reasons. Sonnet narrow found
3 contradictions but only 1 was genuine (2 were analytical errors/misreads).
GPT-5 found 4 all-genuine findings with precise reasoning.

Key insight: framing controls scope, not reasoning depth. For tasks
requiring logical verification (contradictions, race conditions, invariant
violations), reasoning tokens are necessary — framing alone is insufficient.

Updated open-questions.md: marked Sonnet+narrow as answered, added new
question about Opus+narrow for contradiction detection.
This commit is contained in:
claw
2026-05-07 09:26:08 -07:00
parent d27ce6f5e1
commit 0c632c255a
2 changed files with 196 additions and 3 deletions
+15 -3
View File
@@ -22,10 +22,22 @@ cross-doc contradictions are easy to verify once spotted (reducing GPT-5's
verification advantage)? Or because boundary reasoning (Opus's strength)
is the primary skill needed?
### Sonnet + narrow framing = GPT-5 level? (from Finding #5)
Would Sonnet catch semantic issues if given a narrower "check for logical
### Opus + narrow framing for contradiction detection (from Finding #39)
Would Opus + narrow framing match GPT-5 for self-contradiction detection?
Finding #39 showed Sonnet can't do it even with narrow framing (reasoning
depth issue). Opus has strong cross-boundary reasoning — does its internal
reasoning depth suffice for the verification step that Sonnet lacks?
### ~~Sonnet + narrow framing = GPT-5 level? (from Finding #5)~~ → ANSWERED (Finding #39)
~~Would Sonnet catch semantic issues if given a narrower "check for logical
consistency" framing instead of broad review? The hypothesis: Sonnet's
"structural reviewer" tendency is a framing artifact, not a capability limit.
"structural reviewer" tendency is a framing artifact, not a capability limit.~~
**NO.** Narrow framing changes WHAT Sonnet looks for (redirects from gaps to
contradictions) but not HOW WELL it reasons. Sonnet narrow found 3 contradictions
but only 1 was genuine (2 were misreadings). GPT-5 found 4 all-genuine findings.
The gap is reasoning depth, not framing — Sonnet can't reliably verify whether
two statements actually contradict each other.
## Medium Priority