Tested on signal-lifecycle.md (111 lines). Results:
- GPT-5: 17 gaps (7,744 reasoning tokens)
- Opus: 11 gaps (design-level focus)
- Sonnet: 8 gaps (fastest, protocol-level)
Key insight: Union of all models (~26 gaps) far exceeds any single
model (max 17). Only 5 gaps found by all three — highly differentiated
outputs make multi-model runs valuable for interface documents.