From d8ddbc98611f4c8b0095252728a05e6a7f4647ed Mon Sep 17 00:00:00 2001 From: claw Date: Wed, 6 May 2026 21:29:35 -0700 Subject: [PATCH] mark adversarial ensemble question as answered (finding #35) --- open-questions.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/open-questions.md b/open-questions.md index 8f32a8f..cf7ed11 100644 --- a/open-questions.md +++ b/open-questions.md @@ -29,10 +29,16 @@ consistency" framing instead of broad review? The hypothesis: Sonnet's ## Medium Priority -### Adversarial analysis ensemble (from Finding #29) -Run GPT-5 and Opus sequentially — give Opus access to GPT-5's findings +### ~~Adversarial analysis ensemble (from Finding #29)~~ → ANSWERED (Finding #35) +~~Run GPT-5 and Opus sequentially — give Opus access to GPT-5's findings and ask it to critique and extend. Does the ensemble find more than either -alone? Does Opus's system-level thinking complement GPT-5's exhaustiveness? +alone? Does Opus's system-level thinking complement GPT-5's exhaustiveness?~~ + +**YES.** Ensemble produces 56 findings vs 43 (GPT-5) or 28 (Opus) alone (30% +improvement). Zero full disagreements — critique phase calibrates severity +without discarding. Extension phase adds 13 genuinely new findings (4 High). +The critique's structured assessment is more valuable than raw extensions. +Cost: ~28% more tokens for 30% more coverage + prioritization. ### Reasoning effort parameter (from Finding #21) Reasoning effort (low/medium/high) had negligible effect on GPT-5's