finding #39: narrow framing does not close Sonnet-GPT-5 gap for semantic consistency
Tested open question from Finding #5: does narrow framing give Sonnet GPT-5-level semantic analysis? Result: NO. Narrow framing changes WHAT Sonnet looks for (redirects from gaps to contradictions) but not HOW WELL it reasons. Sonnet narrow found 3 contradictions but only 1 was genuine (2 were analytical errors/misreads). GPT-5 found 4 all-genuine findings with precise reasoning. Key insight: framing controls scope, not reasoning depth. For tasks requiring logical verification (contradictions, race conditions, invariant violations), reasoning tokens are necessary — framing alone is insufficient. Updated open-questions.md: marked Sonnet+narrow as answered, added new question about Opus+narrow for contradiction detection.
This commit is contained in:
+15
-3
@@ -22,10 +22,22 @@ cross-doc contradictions are easy to verify once spotted (reducing GPT-5's
|
||||
verification advantage)? Or because boundary reasoning (Opus's strength)
|
||||
is the primary skill needed?
|
||||
|
||||
### Sonnet + narrow framing = GPT-5 level? (from Finding #5)
|
||||
Would Sonnet catch semantic issues if given a narrower "check for logical
|
||||
### Opus + narrow framing for contradiction detection (from Finding #39)
|
||||
Would Opus + narrow framing match GPT-5 for self-contradiction detection?
|
||||
Finding #39 showed Sonnet can't do it even with narrow framing (reasoning
|
||||
depth issue). Opus has strong cross-boundary reasoning — does its internal
|
||||
reasoning depth suffice for the verification step that Sonnet lacks?
|
||||
|
||||
### ~~Sonnet + narrow framing = GPT-5 level? (from Finding #5)~~ → ANSWERED (Finding #39)
|
||||
~~Would Sonnet catch semantic issues if given a narrower "check for logical
|
||||
consistency" framing instead of broad review? The hypothesis: Sonnet's
|
||||
"structural reviewer" tendency is a framing artifact, not a capability limit.
|
||||
"structural reviewer" tendency is a framing artifact, not a capability limit.~~
|
||||
|
||||
**NO.** Narrow framing changes WHAT Sonnet looks for (redirects from gaps to
|
||||
contradictions) but not HOW WELL it reasons. Sonnet narrow found 3 contradictions
|
||||
but only 1 was genuine (2 were misreadings). GPT-5 found 4 all-genuine findings.
|
||||
The gap is reasoning depth, not framing — Sonnet can't reliably verify whether
|
||||
two statements actually contradict each other.
|
||||
|
||||
## Medium Priority
|
||||
|
||||
|
||||
Reference in New Issue
Block a user