Files
model-research/findings/2026-05-07-43-opus-narrow-contradiction-detection.md
T
claw d8a030d9e9 finding #43: opus + narrow framing for contradiction detection
Tests the open question from Finding #39: does Opus's internal reasoning
depth suffice for self-contradiction verification?

Key result: wrong question. Opus finds a different CLASS of contradiction
than GPT-5. GPT-5 finds specification conflicts (statement comparison).
Opus finds logical impossibilities (deductive rule interaction). Neither
dominates — they don't overlap. Sonnet remains unreliable (~33% precision).

Document tested: escalation-policy.md (228 lines)
Models: GPT-5, Claude Opus 4.6, Claude Sonnet 4.6
2026-05-07 16:05:14 -07:00

7.3 KiB

Finding #43: Opus + narrow framing produces qualitatively different contradiction type than GPT-5; neither dominates

Date: 2026-05-07 Document: docs/domain/contexts/risk/escalation-policy.md (228 lines) Task type: Internal logical consistency / self-contradiction detection Models: Claude Opus 4.6, GPT-5, Claude Sonnet 4.6 (all narrow framing) Open question tested: "Would Opus + narrow framing match GPT-5 for self-contradiction detection?" (from Finding #39)

Experiment Design

Finding #39 showed that Sonnet + narrow framing does NOT close the gap with GPT-5 for contradiction detection — Sonnet found 3 contradictions but only 1 was genuine (2 misreadings). The open question: does Opus's deeper internal reasoning suffice for the verification step that Sonnet lacks?

Three conditions, same document, same narrow prompt:

Condition Model Time Output tokens Reasoning tokens Contradictions
A GPT-5 52s 6,415 6,208 1
B Claude Opus 4.6 12s 468 (internal) 1
C Claude Sonnet 4.6 26s 1,451 (internal) 3

What They Found

GPT-5 (1 genuine contradiction):

Broker-unavailable timing conflict: The prose says broker unreachability leads to kill switch only after "continued consecutive breaches" (N more evaluations). The table says broker unavailable → "Immediate kill switch escalation." Both describe the same scenario (broker unavailable during liquidation) but prescribe different timing: debounce-gated vs immediate. Severity: High.

Claude Opus 4.6 (1 genuine contradiction):

Debounce reset paradox: The document states "A single clear evaluation resets the breach counter." But the Liquidation Sizing section says if liquidation is insufficient, "the next evaluation cycle can trigger additional liquidation — but only after the debounce count resets and fires again." If the metric NEVER clears (liquidation was insufficient, metric still breaches), the counter can never reset per the stated rule. Yet the document says additional liquidation requires the counter to reset. These cannot both be true for a continuously- breaching metric. Severity: High.

Claude Sonnet 4.6 (3 claimed, assessment below):

  1. Failure modes "Automatic" vs manual de-escalation — Claims "Automatic" recovery in the failure modes table contradicts "manual only" de-escalation from liquidate. Assessment: MISREAD. The "Automatic" column describes how the system HANDLES the failure scenario (auto-retries, escalates to kill switch), not downward de-escalation. The system's autonomous recovery is escalation UPWARD (to kill switch), which is consistent with manual-only downward de-escalation.

  2. Debounce defaults vs calibration guidance — Restrict→Liquidate defaults to 3 but calibration says volatile metrics need 5-8. Assessment: TENSION, not contradiction. The document explicitly says "These are configurable per metric" — the defaults don't need to match the guidance for specific metric types. The calibration section explains HOW to override defaults, not what the defaults must be. This is advice vs defaults, not statement vs statement.

  3. Kill switch immediate trigger vs "post-liquidation" event description — Same finding as GPT-5's: broker-unavailable immediate escalation conflicts with the event described as "post-liquidation." Assessment: GENUINE. This is the same contradiction GPT-5 found but arrived at via a different evidence path (event description rather than prose/table conflict).

Sonnet accuracy: 1 genuine + 1 tension + 1 misread out of 3 claimed = 33-67% precision.

Analysis

GPT-5's finding vs Opus's finding — different types of contradiction:

GPT-5 found a surface-level specification conflict: two statements about the same scenario (broker unavailable) prescribe different behaviors (wait N breaches vs immediate). This is the type of contradiction you'd find during careful proofreading — it's where the document says "X" in one place and "not-X" in another about the same thing.

Opus found a logical impossibility: the interaction between two stated rules creates a situation that can never resolve. The debounce reset rule (requires a clear evaluation) and the re-triggering mechanism (needs the counter to reset) cannot both work as described when the metric continuously breaches. This is NOT a statement-vs-statement conflict — it's a logical consequence that the author likely didn't reason through.

These are qualitatively different:

  • GPT-5's type: "you said conflicting things about the same scenario" (specification bug)
  • Opus's type: "your rules, when combined, produce an impossible requirement" (logic bug)

Does Opus match GPT-5?

No — but not because it's worse. They find different things. GPT-5's 6,208 reasoning tokens went toward exhaustively checking statement pairs for direct conflicts. Opus's internal reasoning went toward understanding the LOGICAL INTERACTION between rules.

GPT-5 missed the debounce reset paradox (likely because it requires multi-step logical reasoning about rule interactions rather than statement comparison). Opus missed the broker-unavailable timing conflict (likely because it's a more surface-level inconsistency between prose and table that doesn't involve logical deduction).

Sonnet's continued weakness:

Consistent with Finding #39: Sonnet found 3 contradictions but only 1 was genuine (the broker-unavailable one, same as GPT-5). The failure-modes misread shows Sonnet doesn't reliably verify whether two statements ACTUALLY conflict — it pattern-matches on surface similarity ("Automatic" and "manual only" appear to conflict) without reasoning about whether they refer to the same thing. The debounce/calibration "contradiction" confuses advisory guidance with specification (a type confusion that reasoning models avoid).

Key Insight — Two distinct contradiction-finding modes:

Mode Best model What it catches Cognitive demand
Specification conflicts GPT-5 Same scenario, different prescriptions Statement comparison + verification
Logical impossibilities Opus Rules that can't coexist under all conditions Multi-step logical deduction

This explains why the open question ("does Opus match GPT-5?") has no clean yes/no answer. They're not attempting the same thing. GPT-5 exhaustively compares statement pairs. Opus reasons about what the stated rules IMPLY when combined. Both modes catch real bugs that the other misses.

Practical Implication

For self-contradiction detection in architecture documents:

  • Run BOTH GPT-5 and Opus — they catch fundamentally different types of contradictions
  • GPT-5 catches specification bugs (conflicting statements about the same thing)
  • Opus catches logic bugs (rules whose interactions produce impossible conditions)
  • Sonnet remains unreliable — too many false positives from surface-pattern matching
  • The cost is minimal (12s + 468 tokens for Opus vs 52s + 6,415 for GPT-5)

Updated Answer to Open Question

"Would Opus + narrow framing match GPT-5 for self-contradiction detection?"

Wrong question. Opus doesn't try to match GPT-5 — it finds a different class of contradiction. The right framing: Opus + GPT-5 together catch more than either alone, and the contradictions they find don't overlap. Run both.