Files

T

Rodin 1b108ff66e Initial publish: 29 findings, 6 prompts, methodology, open questions

Full comparative analysis of GPT-5, Claude Opus 4.6, Claude Sonnet 4.6,
GPT-4.1, and GPT-4.1 Mini on analytical tasks (not coding).

Contents:
- findings/ALL-FINDINGS.md — complete 3,249-line research log with all
  29 findings, methodology notes, and open questions
- prompts/ — 6 exact prompts used across experiments
- methodology.md — experimental setup and evaluation criteria
- open-questions.md — unanswered questions for future work
- README.md — overview and summary table

Key findings:
- Cross-document consistency: Opus is 2.4x faster with more findings
- Gap-finding: GPT-5 reasoning tokens find domain-specific gaps
- Race conditions: Opus excels at temporal interaction reasoning
- Bias detection: Signal-to-noise ratio > model capability
- Adversarial analysis: GPT-5 exhaustive, Opus qualitatively different

Signed-off-by: Rodin

2026-05-05 19:13:03 -07:00

2.9 KiB

Raw Blame History

Prompt: Cross-Document Consistency Analysis

Used in Finding #28.

Setup

Two documents provided as full text in a single prompt (~25KB total)
Document A: system-overview.md (323 lines, narrative overview)
Document B: architecture.md (213 lines, DDD-focused)
No tools, no project context beyond the two documents
Same prompt to all 3 models independently

Prompt

You are analyzing two architecture documents that describe the SAME system.
Your task is to identify places where these documents CONTRADICT each other
— not where they differ in scope or detail level, but where they make
incompatible claims about the same concept.

## Categories of inconsistency to check:

1. **Terminology conflicts** — Same concept called different names in ways
   that imply different meanings (not just abbreviation)
2. **Structural contradictions** — Documents disagree about what is inside
   vs outside a component boundary
3. **Flow/sequence conflicts** — Documents describe incompatible orderings
   or data flows for the same process
4. **Ownership/authority conflicts** — Documents disagree about which
   component owns, writes, or is authoritative for a concept
5. **Philosophical contradictions** — Documents state incompatible
   foundational assumptions (e.g., event sourcing vs CRUD)

## What to EXCLUDE:

- Omissions (one doc covers something the other doesn't)
- Detail-level differences (one is more detailed than the other)
- Naming differences that are clearly just abbreviations
- Scope differences (one covers more topics)

## Output format per finding:

For each inconsistency found:
- **Category:** (one of the 5 above)
- **Severity:** Critical / High / Medium
- **Document A says:** (exact quote or precise paraphrase with section ref)
- **Document B says:** (exact quote or precise paraphrase with section ref)
- **Why these are incompatible:** (explain why both cannot be correct)
- **Impact:** (what would go wrong if an implementer followed both)

## Document A: [system-overview.md]

[FULL TEXT OF DOCUMENT A]

## Document B: [architecture.md]

[FULL TEXT OF DOCUMENT B]

Key Design Decisions

Explicit exclusion of omissions — prevents models from padding findings with "Doc A mentions X but Doc B doesn't"
Five specific categories — focuses attention without being so restrictive that models miss novel inconsistency types
Required "why incompatible" explanation — forces models to reason about WHY differences matter, not just list differences
Impact field — grounds findings in practical consequences
Both documents in single prompt — enables cross-referencing without tool calls or context fragmentation

Results

Model	Time	Findings	Tokens/finding
Opus	52s	7	336
GPT-5	125s	6	2,967
Sonnet	14s	4	194

Opus recommended for this task type.

2.9 KiB Raw Blame History

Prompt: Cross-Document Consistency Analysis

Setup

Prompt

Key Design Decisions

Results

2.9 KiB

Raw Blame History