Initial publish: 29 findings, 6 prompts, methodology, open questions

Full comparative analysis of GPT-5, Claude Opus 4.6, Claude Sonnet 4.6, GPT-4.1, and GPT-4.1 Mini on analytical tasks (not coding). Contents: - findings/ALL-FINDINGS.md — complete 3,249-line research log with all 29 findings, methodology notes, and open questions - prompts/ — 6 exact prompts used across experiments - methodology.md — experimental setup and evaluation criteria - open-questions.md — unanswered questions for future work - README.md — overview and summary table Key findings: - Cross-document consistency: Opus is 2.4x faster with more findings - Gap-finding: GPT-5 reasoning tokens find domain-specific gaps - Race conditions: Opus excels at temporal interaction reasoning - Bias detection: Signal-to-noise ratio > model capability - Adversarial analysis: GPT-5 exhaustive, Opus qualitatively different Signed-off-by: Rodin
2026-05-05 19:13:03 -07:00
parent 4aea0d004b
commit 1b108ff66e
10 changed files with 3831 additions and 2 deletions
@@ -0,0 +1,59 @@
+# Prompt: Adversarial Manipulation Analysis
+
+Used in Finding #29.
+
+## Setup
+
+- Single document (full text)
+- Same prompt to all models
+- No tools, no project context beyond the document
+
+## Prompt
+
+```
+You are a red-team security analyst reviewing a trading system's
+aggregation component. Your task is to identify how a MISBEHAVING,
+COMPROMISED, or BUGGY upstream component could exploit this design
+to produce harmful trading outcomes that bypass downstream safety controls.
+
+## Categories of adversarial manipulation:
+
+1. **Signal injection** — How could a compromised strategy inject signals
+   that exploit the aggregator's logic to produce dangerous decisions?
+2. **Timing manipulation** — How could an attacker manipulate timing
+   (delays, bursts, clock skew) to exploit the aggregator's temporal logic?
+3. **Capacity weaponization** — How could the max_signals bound or group
+   completion logic be exploited to force premature or delayed decisions?
+4. **State corruption via crash** — How could deliberate crashes be used
+   to put the aggregator in an exploitable state?
+5. **Audit evasion** — How could an attacker cause the aggregator to make
+   decisions that don't appear in the audit log, or appear differently
+   than what actually happened?
+
+## For each attack vector:
+
+- **Category:** (one of the 5 above)
+- **Attack vector:** Name of the attack
+- **Mechanism:** How the attacker exploits the design
+- **Exploit:** Step-by-step attack sequence
+- **Why downstream controls miss it:** Why PortfolioRisk, BuyingPower,
+  or other downstream checks don't catch this
+- **Severity:** Critical / High / Medium
+- **Mitigation:** What the design could add to prevent it
+
+## Document:
+
+[FULL TEXT OF aggregation.md, 193 lines]
+```
+
+## Results
+
+| Model | Time | Findings | Unique vectors |
+|-------|------|----------|----------------|
+| GPT-5 | ~150s | 8 | 3 (most exhaustive) |
+| Opus | ~65s | 6 | 2 (qualitatively different) |
+| Sonnet | ~20s | 4 | 0 (subset of others) |
+
+GPT-5 was most exhaustive and systematic. Opus found qualitatively different
+attack vectors with system-level thinking (e.g., exploiting supervision tree
+restart semantics).