model-research/prompts/gap-finding.md

# Prompt: Gap-Finding in Architecture Documents

Used in Finding #9.

## Setup

- Single document (full text, no truncation)
- Same focused analytical question to all models
- No tools, no project context beyond the document
- Temperature 0.3 for GPT-4.1/Mini, default for GPT-5

## Prompt

```
You are a systems reliability engineer reviewing a failure modes document
for a trading platform. Your task is to identify MISSING failure scenarios
— things that COULD go wrong in this architecture but are NOT covered in
the document.

Focus on:
1. Scenarios specific to THIS architecture (not generic "server could crash")
2. Interactions between components that could produce unexpected states
3. External dependency failures not covered
4. Timing/ordering issues in the described sequences
5. Recovery procedures that have gaps

For each missing scenario:
- **Scenario:** What goes wrong
- **Why it's specific to this system:** Why generic monitoring wouldn't catch it
- **Impact:** What state the system ends up in
- **Why the document misses it:** What assumption makes this invisible

## Document:

[FULL TEXT OF failure-modes.md, 383 lines]
```

## Results

| Model | Time | Output tokens | Reasoning tokens | Scenarios found |
|-------|------|---------------|------------------|-----------------|
| GPT-4.1 Mini | 16s | 2,003 | 0 | 10 |
| GPT-4.1 | 24s | 2,575 | 0 | 15 |
| GPT-5 | 45s | 8,565 | 6,656 | 14 |

GPT-5 found the most domain-specific and actionable gaps despite finding
fewer total scenarios than GPT-4.1. Quality > quantity.