mark adversarial ensemble question as answered (finding #35)

This commit is contained in:
claw
2026-05-06 21:29:35 -07:00
parent 8338ae3019
commit d8ddbc9861
+9 -3
View File
@@ -29,10 +29,16 @@ consistency" framing instead of broad review? The hypothesis: Sonnet's
## Medium Priority ## Medium Priority
### Adversarial analysis ensemble (from Finding #29) ### ~~Adversarial analysis ensemble (from Finding #29)~~ → ANSWERED (Finding #35)
Run GPT-5 and Opus sequentially — give Opus access to GPT-5's findings ~~Run GPT-5 and Opus sequentially — give Opus access to GPT-5's findings
and ask it to critique and extend. Does the ensemble find more than either and ask it to critique and extend. Does the ensemble find more than either
alone? Does Opus's system-level thinking complement GPT-5's exhaustiveness? alone? Does Opus's system-level thinking complement GPT-5's exhaustiveness?~~
**YES.** Ensemble produces 56 findings vs 43 (GPT-5) or 28 (Opus) alone (30%
improvement). Zero full disagreements — critique phase calibrates severity
without discarding. Extension phase adds 13 genuinely new findings (4 High).
The critique's structured assessment is more valuable than raw extensions.
Cost: ~28% more tokens for 30% more coverage + prioritization.
### Reasoning effort parameter (from Finding #21) ### Reasoning effort parameter (from Finding #21)
Reasoning effort (low/medium/high) had negligible effect on GPT-5's Reasoning effort (low/medium/high) had negligible effect on GPT-5's