Files

T

Rodin 1b108ff66e Initial publish: 29 findings, 6 prompts, methodology, open questions

Full comparative analysis of GPT-5, Claude Opus 4.6, Claude Sonnet 4.6,
GPT-4.1, and GPT-4.1 Mini on analytical tasks (not coding).

Contents:
- findings/ALL-FINDINGS.md — complete 3,249-line research log with all
  29 findings, methodology notes, and open questions
- prompts/ — 6 exact prompts used across experiments
- methodology.md — experimental setup and evaluation criteria
- open-questions.md — unanswered questions for future work
- README.md — overview and summary table

Key findings:
- Cross-document consistency: Opus is 2.4x faster with more findings
- Gap-finding: GPT-5 reasoning tokens find domain-specific gaps
- Race conditions: Opus excels at temporal interaction reasoning
- Bias detection: Signal-to-noise ratio > model capability
- Adversarial analysis: GPT-5 exhaustive, Opus qualitatively different

Signed-off-by: Rodin

2026-05-05 19:13:03 -07:00

1.8 KiB

Raw Blame History

Prompt: Contradiction Detection

Used in Finding #25.

Setup

Single document (full text)
Same prompt to all models
No tools, no project context beyond the document

Prompt

You are analyzing a design document for CONTRADICTIONS — places where
the document makes two claims that cannot both be true simultaneously.

This is NOT about:
- Missing information
- Unclear writing
- Design tradeoffs
- Things that MIGHT conflict

This IS about:
- Statement A says X, Statement B says NOT-X
- Mechanism A requires condition C, Mechanism B prevents condition C
- Rule A applies to set S, but S includes elements that violate Rule A

## Categories:

1. **Direct contradictions** — Two statements that are logically incompatible
2. **Mechanism conflicts** — Two described mechanisms that cannot coexist
3. **Scope violations** — A rule/invariant that is violated by a specific
   case described elsewhere in the document
4. **Temporal impossibilities** — A sequence that requires something to be
   true before the described mechanism makes it true

## For each contradiction:

- **Category:** (one of the 4 above)
- **Statement A:** (exact text, with section)
- **Statement B:** (exact text, with section)
- **Why contradictory:** (formal reasoning about incompatibility)
- **Severity:** Critical (system correctness) / High (safety) / Medium (confusion)

Be PRECISE. Only report genuine logical contradictions, not differences
in emphasis or scope.

## Document:

[FULL TEXT OF DOCUMENT]

Key Design Decision

The "Be PRECISE" instruction and explicit exclusion list ("NOT about") is critical. Without it, models pad findings with style/clarity issues. The contradiction prompt naturally favors Opus (self-correcting, withdraws false positives) over GPT-5 (exhaustive, includes borderline cases).

1.8 KiB Raw Blame History

Prompt: Contradiction Detection

Setup

Prompt

Key Design Decision

1.8 KiB

Raw Blame History