finding #43: opus + narrow framing for contradiction detection
Tests the open question from Finding #39: does Opus's internal reasoning depth suffice for self-contradiction verification? Key result: wrong question. Opus finds a different CLASS of contradiction than GPT-5. GPT-5 finds specification conflicts (statement comparison). Opus finds logical impossibilities (deductive rule interaction). Neither dominates — they don't overlap. Sonnet remains unreliable (~33% precision). Document tested: escalation-policy.md (228 lines) Models: GPT-5, Claude Opus 4.6, Claude Sonnet 4.6
This commit is contained in:
+10
-3
@@ -22,11 +22,18 @@ cross-doc contradictions are easy to verify once spotted (reducing GPT-5's
|
||||
verification advantage)? Or because boundary reasoning (Opus's strength)
|
||||
is the primary skill needed?
|
||||
|
||||
### Opus + narrow framing for contradiction detection (from Finding #39)
|
||||
Would Opus + narrow framing match GPT-5 for self-contradiction detection?
|
||||
### ~~Opus + narrow framing for contradiction detection (from Finding #39)~~ → ANSWERED (Finding #43)
|
||||
~~Would Opus + narrow framing match GPT-5 for self-contradiction detection?
|
||||
Finding #39 showed Sonnet can't do it even with narrow framing (reasoning
|
||||
depth issue). Opus has strong cross-boundary reasoning — does its internal
|
||||
reasoning depth suffice for the verification step that Sonnet lacks?
|
||||
reasoning depth suffice for the verification step that Sonnet lacks?~~
|
||||
|
||||
**WRONG QUESTION.** Opus doesn't try to match GPT-5 — it finds a different CLASS
|
||||
of contradiction. GPT-5 finds specification conflicts (same scenario, conflicting
|
||||
prescriptions via statement comparison). Opus finds logical impossibilities (rules
|
||||
whose interaction produces impossible conditions via deductive reasoning). Neither
|
||||
dominates — they don't overlap. Run both for complete coverage. Sonnet remains
|
||||
unreliable (~33% precision on contradiction detection).
|
||||
|
||||
### ~~Sonnet + narrow framing = GPT-5 level? (from Finding #5)~~ → ANSWERED (Finding #39)
|
||||
~~Would Sonnet catch semantic issues if given a narrower "check for logical
|
||||
|
||||
Reference in New Issue
Block a user