Files
model-research/prompts/cross-document-consistency.md
T
Rodin 1b108ff66e Initial publish: 29 findings, 6 prompts, methodology, open questions
Full comparative analysis of GPT-5, Claude Opus 4.6, Claude Sonnet 4.6,
GPT-4.1, and GPT-4.1 Mini on analytical tasks (not coding).

Contents:
- findings/ALL-FINDINGS.md — complete 3,249-line research log with all
  29 findings, methodology notes, and open questions
- prompts/ — 6 exact prompts used across experiments
- methodology.md — experimental setup and evaluation criteria
- open-questions.md — unanswered questions for future work
- README.md — overview and summary table

Key findings:
- Cross-document consistency: Opus is 2.4x faster with more findings
- Gap-finding: GPT-5 reasoning tokens find domain-specific gaps
- Race conditions: Opus excels at temporal interaction reasoning
- Bias detection: Signal-to-noise ratio > model capability
- Adversarial analysis: GPT-5 exhaustive, Opus qualitatively different

Signed-off-by: Rodin
2026-05-05 19:13:03 -07:00

81 lines
2.9 KiB
Markdown

# Prompt: Cross-Document Consistency Analysis
Used in Finding #28.
## Setup
- Two documents provided as full text in a single prompt (~25KB total)
- Document A: `system-overview.md` (323 lines, narrative overview)
- Document B: `architecture.md` (213 lines, DDD-focused)
- No tools, no project context beyond the two documents
- Same prompt to all 3 models independently
## Prompt
```
You are analyzing two architecture documents that describe the SAME system.
Your task is to identify places where these documents CONTRADICT each other
— not where they differ in scope or detail level, but where they make
incompatible claims about the same concept.
## Categories of inconsistency to check:
1. **Terminology conflicts** — Same concept called different names in ways
that imply different meanings (not just abbreviation)
2. **Structural contradictions** — Documents disagree about what is inside
vs outside a component boundary
3. **Flow/sequence conflicts** — Documents describe incompatible orderings
or data flows for the same process
4. **Ownership/authority conflicts** — Documents disagree about which
component owns, writes, or is authoritative for a concept
5. **Philosophical contradictions** — Documents state incompatible
foundational assumptions (e.g., event sourcing vs CRUD)
## What to EXCLUDE:
- Omissions (one doc covers something the other doesn't)
- Detail-level differences (one is more detailed than the other)
- Naming differences that are clearly just abbreviations
- Scope differences (one covers more topics)
## Output format per finding:
For each inconsistency found:
- **Category:** (one of the 5 above)
- **Severity:** Critical / High / Medium
- **Document A says:** (exact quote or precise paraphrase with section ref)
- **Document B says:** (exact quote or precise paraphrase with section ref)
- **Why these are incompatible:** (explain why both cannot be correct)
- **Impact:** (what would go wrong if an implementer followed both)
## Document A: [system-overview.md]
[FULL TEXT OF DOCUMENT A]
## Document B: [architecture.md]
[FULL TEXT OF DOCUMENT B]
```
## Key Design Decisions
1. **Explicit exclusion of omissions** — prevents models from padding
findings with "Doc A mentions X but Doc B doesn't"
2. **Five specific categories** — focuses attention without being
so restrictive that models miss novel inconsistency types
3. **Required "why incompatible" explanation** — forces models to reason
about WHY differences matter, not just list differences
4. **Impact field** — grounds findings in practical consequences
5. **Both documents in single prompt** — enables cross-referencing
without tool calls or context fragmentation
## Results
| Model | Time | Findings | Tokens/finding |
|-------|------|----------|----------------|
| Opus | 52s | 7 | 336 |
| GPT-5 | 125s | 6 | 2,967 |
| Sonnet | 14s | 4 | 194 |
Opus recommended for this task type.