mark adversarial ensemble question as answered (finding #35)
This commit is contained in:
+9
-3
@@ -29,10 +29,16 @@ consistency" framing instead of broad review? The hypothesis: Sonnet's
|
||||
|
||||
## Medium Priority
|
||||
|
||||
### Adversarial analysis ensemble (from Finding #29)
|
||||
Run GPT-5 and Opus sequentially — give Opus access to GPT-5's findings
|
||||
### ~~Adversarial analysis ensemble (from Finding #29)~~ → ANSWERED (Finding #35)
|
||||
~~Run GPT-5 and Opus sequentially — give Opus access to GPT-5's findings
|
||||
and ask it to critique and extend. Does the ensemble find more than either
|
||||
alone? Does Opus's system-level thinking complement GPT-5's exhaustiveness?
|
||||
alone? Does Opus's system-level thinking complement GPT-5's exhaustiveness?~~
|
||||
|
||||
**YES.** Ensemble produces 56 findings vs 43 (GPT-5) or 28 (Opus) alone (30%
|
||||
improvement). Zero full disagreements — critique phase calibrates severity
|
||||
without discarding. Extension phase adds 13 genuinely new findings (4 High).
|
||||
The critique's structured assessment is more valuable than raw extensions.
|
||||
Cost: ~28% more tokens for 30% more coverage + prioritization.
|
||||
|
||||
### Reasoning effort parameter (from Finding #21)
|
||||
Reasoning effort (low/medium/high) had negligible effect on GPT-5's
|
||||
|
||||
Reference in New Issue
Block a user