mark adversarial ensemble question as answered (finding #35)

This commit is contained in:
claw
2026-05-06 21:29:35 -07:00
parent 8338ae3019
commit d8ddbc9861
+9 -3
View File
@@ -29,10 +29,16 @@ consistency" framing instead of broad review? The hypothesis: Sonnet's
## Medium Priority
### Adversarial analysis ensemble (from Finding #29)
Run GPT-5 and Opus sequentially — give Opus access to GPT-5's findings
### ~~Adversarial analysis ensemble (from Finding #29)~~ → ANSWERED (Finding #35)
~~Run GPT-5 and Opus sequentially — give Opus access to GPT-5's findings
and ask it to critique and extend. Does the ensemble find more than either
alone? Does Opus's system-level thinking complement GPT-5's exhaustiveness?
alone? Does Opus's system-level thinking complement GPT-5's exhaustiveness?~~
**YES.** Ensemble produces 56 findings vs 43 (GPT-5) or 28 (Opus) alone (30%
improvement). Zero full disagreements — critique phase calibrates severity
without discarding. Extension phase adds 13 genuinely new findings (4 High).
The critique's structured assessment is more valuable than raw extensions.
Cost: ~28% more tokens for 30% more coverage + prioritization.
### Reasoning effort parameter (from Finding #21)
Reasoning effort (low/medium/high) had negligible effect on GPT-5's