Initial publish: 29 findings, 6 prompts, methodology, open questions
Full comparative analysis of GPT-5, Claude Opus 4.6, Claude Sonnet 4.6, GPT-4.1, and GPT-4.1 Mini on analytical tasks (not coding). Contents: - findings/ALL-FINDINGS.md — complete 3,249-line research log with all 29 findings, methodology notes, and open questions - prompts/ — 6 exact prompts used across experiments - methodology.md — experimental setup and evaluation criteria - open-questions.md — unanswered questions for future work - README.md — overview and summary table Key findings: - Cross-document consistency: Opus is 2.4x faster with more findings - Gap-finding: GPT-5 reasoning tokens find domain-specific gaps - Race conditions: Opus excels at temporal interaction reasoning - Bias detection: Signal-to-noise ratio > model capability - Adversarial analysis: GPT-5 exhaustive, Opus qualitatively different Signed-off-by: Rodin
This commit is contained in:
@@ -0,0 +1,80 @@
|
||||
# Prompt: Cross-Document Consistency Analysis
|
||||
|
||||
Used in Finding #28.
|
||||
|
||||
## Setup
|
||||
|
||||
- Two documents provided as full text in a single prompt (~25KB total)
|
||||
- Document A: `system-overview.md` (323 lines, narrative overview)
|
||||
- Document B: `architecture.md` (213 lines, DDD-focused)
|
||||
- No tools, no project context beyond the two documents
|
||||
- Same prompt to all 3 models independently
|
||||
|
||||
## Prompt
|
||||
|
||||
```
|
||||
You are analyzing two architecture documents that describe the SAME system.
|
||||
Your task is to identify places where these documents CONTRADICT each other
|
||||
— not where they differ in scope or detail level, but where they make
|
||||
incompatible claims about the same concept.
|
||||
|
||||
## Categories of inconsistency to check:
|
||||
|
||||
1. **Terminology conflicts** — Same concept called different names in ways
|
||||
that imply different meanings (not just abbreviation)
|
||||
2. **Structural contradictions** — Documents disagree about what is inside
|
||||
vs outside a component boundary
|
||||
3. **Flow/sequence conflicts** — Documents describe incompatible orderings
|
||||
or data flows for the same process
|
||||
4. **Ownership/authority conflicts** — Documents disagree about which
|
||||
component owns, writes, or is authoritative for a concept
|
||||
5. **Philosophical contradictions** — Documents state incompatible
|
||||
foundational assumptions (e.g., event sourcing vs CRUD)
|
||||
|
||||
## What to EXCLUDE:
|
||||
|
||||
- Omissions (one doc covers something the other doesn't)
|
||||
- Detail-level differences (one is more detailed than the other)
|
||||
- Naming differences that are clearly just abbreviations
|
||||
- Scope differences (one covers more topics)
|
||||
|
||||
## Output format per finding:
|
||||
|
||||
For each inconsistency found:
|
||||
- **Category:** (one of the 5 above)
|
||||
- **Severity:** Critical / High / Medium
|
||||
- **Document A says:** (exact quote or precise paraphrase with section ref)
|
||||
- **Document B says:** (exact quote or precise paraphrase with section ref)
|
||||
- **Why these are incompatible:** (explain why both cannot be correct)
|
||||
- **Impact:** (what would go wrong if an implementer followed both)
|
||||
|
||||
## Document A: [system-overview.md]
|
||||
|
||||
[FULL TEXT OF DOCUMENT A]
|
||||
|
||||
## Document B: [architecture.md]
|
||||
|
||||
[FULL TEXT OF DOCUMENT B]
|
||||
```
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Explicit exclusion of omissions** — prevents models from padding
|
||||
findings with "Doc A mentions X but Doc B doesn't"
|
||||
2. **Five specific categories** — focuses attention without being
|
||||
so restrictive that models miss novel inconsistency types
|
||||
3. **Required "why incompatible" explanation** — forces models to reason
|
||||
about WHY differences matter, not just list differences
|
||||
4. **Impact field** — grounds findings in practical consequences
|
||||
5. **Both documents in single prompt** — enables cross-referencing
|
||||
without tool calls or context fragmentation
|
||||
|
||||
## Results
|
||||
|
||||
| Model | Time | Findings | Tokens/finding |
|
||||
|-------|------|----------|----------------|
|
||||
| Opus | 52s | 7 | 336 |
|
||||
| GPT-5 | 125s | 6 | 2,967 |
|
||||
| Sonnet | 14s | 4 | 194 |
|
||||
|
||||
Opus recommended for this task type.
|
||||
Reference in New Issue
Block a user