Initial publish: 29 findings, 6 prompts, methodology, open questions

Full comparative analysis of GPT-5, Claude Opus 4.6, Claude Sonnet 4.6,
GPT-4.1, and GPT-4.1 Mini on analytical tasks (not coding).

Contents:
- findings/ALL-FINDINGS.md — complete 3,249-line research log with all
  29 findings, methodology notes, and open questions
- prompts/ — 6 exact prompts used across experiments
- methodology.md — experimental setup and evaluation criteria
- open-questions.md — unanswered questions for future work
- README.md — overview and summary table

Key findings:
- Cross-document consistency: Opus is 2.4x faster with more findings
- Gap-finding: GPT-5 reasoning tokens find domain-specific gaps
- Race conditions: Opus excels at temporal interaction reasoning
- Bias detection: Signal-to-noise ratio > model capability
- Adversarial analysis: GPT-5 exhaustive, Opus qualitatively different

Signed-off-by: Rodin
This commit is contained in:
Rodin
2026-05-05 19:13:03 -07:00
parent 4aea0d004b
commit 1b108ff66e
10 changed files with 3831 additions and 2 deletions
+59
View File
@@ -0,0 +1,59 @@
# Prompt: Adversarial Manipulation Analysis
Used in Finding #29.
## Setup
- Single document (full text)
- Same prompt to all models
- No tools, no project context beyond the document
## Prompt
```
You are a red-team security analyst reviewing a trading system's
aggregation component. Your task is to identify how a MISBEHAVING,
COMPROMISED, or BUGGY upstream component could exploit this design
to produce harmful trading outcomes that bypass downstream safety controls.
## Categories of adversarial manipulation:
1. **Signal injection** — How could a compromised strategy inject signals
that exploit the aggregator's logic to produce dangerous decisions?
2. **Timing manipulation** — How could an attacker manipulate timing
(delays, bursts, clock skew) to exploit the aggregator's temporal logic?
3. **Capacity weaponization** — How could the max_signals bound or group
completion logic be exploited to force premature or delayed decisions?
4. **State corruption via crash** — How could deliberate crashes be used
to put the aggregator in an exploitable state?
5. **Audit evasion** — How could an attacker cause the aggregator to make
decisions that don't appear in the audit log, or appear differently
than what actually happened?
## For each attack vector:
- **Category:** (one of the 5 above)
- **Attack vector:** Name of the attack
- **Mechanism:** How the attacker exploits the design
- **Exploit:** Step-by-step attack sequence
- **Why downstream controls miss it:** Why PortfolioRisk, BuyingPower,
or other downstream checks don't catch this
- **Severity:** Critical / High / Medium
- **Mitigation:** What the design could add to prevent it
## Document:
[FULL TEXT OF aggregation.md, 193 lines]
```
## Results
| Model | Time | Findings | Unique vectors |
|-------|------|----------|----------------|
| GPT-5 | ~150s | 8 | 3 (most exhaustive) |
| Opus | ~65s | 6 | 2 (qualitatively different) |
| Sonnet | ~20s | 4 | 0 (subset of others) |
GPT-5 was most exhaustive and systematic. Opus found qualitatively different
attack vectors with system-level thinking (e.g., exploiting supervision tree
restart semantics).