Initial publish: 29 findings, 6 prompts, methodology, open questions

Full comparative analysis of GPT-5, Claude Opus 4.6, Claude Sonnet 4.6, GPT-4.1, and GPT-4.1 Mini on analytical tasks (not coding). Contents: - findings/ALL-FINDINGS.md — complete 3,249-line research log with all 29 findings, methodology notes, and open questions - prompts/ — 6 exact prompts used across experiments - methodology.md — experimental setup and evaluation criteria - open-questions.md — unanswered questions for future work - README.md — overview and summary table Key findings: - Cross-document consistency: Opus is 2.4x faster with more findings - Gap-finding: GPT-5 reasoning tokens find domain-specific gaps - Race conditions: Opus excels at temporal interaction reasoning - Bias detection: Signal-to-noise ratio > model capability - Adversarial analysis: GPT-5 exhaustive, Opus qualitatively different Signed-off-by: Rodin
2026-05-05 19:13:03 -07:00
parent 4aea0d004b
commit 1b108ff66e
10 changed files with 3831 additions and 2 deletions
@@ -0,0 +1,58 @@
+# Prompt: Contradiction Detection
+
+Used in Finding #25.
+
+## Setup
+
+- Single document (full text)
+- Same prompt to all models
+- No tools, no project context beyond the document
+
+## Prompt
+
+```
+You are analyzing a design document for CONTRADICTIONS — places where
+the document makes two claims that cannot both be true simultaneously.
+
+This is NOT about:
+- Missing information
+- Unclear writing
+- Design tradeoffs
+- Things that MIGHT conflict
+
+This IS about:
+- Statement A says X, Statement B says NOT-X
+- Mechanism A requires condition C, Mechanism B prevents condition C
+- Rule A applies to set S, but S includes elements that violate Rule A
+
+## Categories:
+
+1. **Direct contradictions** — Two statements that are logically incompatible
+2. **Mechanism conflicts** — Two described mechanisms that cannot coexist
+3. **Scope violations** — A rule/invariant that is violated by a specific
+   case described elsewhere in the document
+4. **Temporal impossibilities** — A sequence that requires something to be
+   true before the described mechanism makes it true
+
+## For each contradiction:
+
+- **Category:** (one of the 4 above)
+- **Statement A:** (exact text, with section)
+- **Statement B:** (exact text, with section)
+- **Why contradictory:** (formal reasoning about incompatibility)
+- **Severity:** Critical (system correctness) / High (safety) / Medium (confusion)
+
+Be PRECISE. Only report genuine logical contradictions, not differences
+in emphasis or scope.
+
+## Document:
+
+[FULL TEXT OF DOCUMENT]
+```
+
+## Key Design Decision
+
+The "Be PRECISE" instruction and explicit exclusion list ("NOT about")
+is critical. Without it, models pad findings with style/clarity issues.
+The contradiction prompt naturally favors Opus (self-correcting, withdraws
+false positives) over GPT-5 (exhaustive, includes borderline cases).