Tests GPT-5 → Opus critique+extend pipeline on dtbp-margin-call.md.
Key results:
- Ensemble produces 56 unique findings vs 43 (GPT-5) or 28 (Opus) alone
- Zero full disagreements — GPT-5's coverage is reliable signal
- Critique phase (severity calibration) more valuable than extension phase
- 28% more tokens for 30% more coverage + structured prioritization
- Answers open question about adversarial ensemble value