1b108ff66e
Full comparative analysis of GPT-5, Claude Opus 4.6, Claude Sonnet 4.6, GPT-4.1, and GPT-4.1 Mini on analytical tasks (not coding). Contents: - findings/ALL-FINDINGS.md — complete 3,249-line research log with all 29 findings, methodology notes, and open questions - prompts/ — 6 exact prompts used across experiments - methodology.md — experimental setup and evaluation criteria - open-questions.md — unanswered questions for future work - README.md — overview and summary table Key findings: - Cross-document consistency: Opus is 2.4x faster with more findings - Gap-finding: GPT-5 reasoning tokens find domain-specific gaps - Race conditions: Opus excels at temporal interaction reasoning - Bias detection: Signal-to-noise ratio > model capability - Adversarial analysis: GPT-5 exhaustive, Opus qualitatively different Signed-off-by: Rodin
60 lines
2.1 KiB
Markdown
60 lines
2.1 KiB
Markdown
# Prompt: Adversarial Manipulation Analysis
|
|
|
|
Used in Finding #29.
|
|
|
|
## Setup
|
|
|
|
- Single document (full text)
|
|
- Same prompt to all models
|
|
- No tools, no project context beyond the document
|
|
|
|
## Prompt
|
|
|
|
```
|
|
You are a red-team security analyst reviewing a trading system's
|
|
aggregation component. Your task is to identify how a MISBEHAVING,
|
|
COMPROMISED, or BUGGY upstream component could exploit this design
|
|
to produce harmful trading outcomes that bypass downstream safety controls.
|
|
|
|
## Categories of adversarial manipulation:
|
|
|
|
1. **Signal injection** — How could a compromised strategy inject signals
|
|
that exploit the aggregator's logic to produce dangerous decisions?
|
|
2. **Timing manipulation** — How could an attacker manipulate timing
|
|
(delays, bursts, clock skew) to exploit the aggregator's temporal logic?
|
|
3. **Capacity weaponization** — How could the max_signals bound or group
|
|
completion logic be exploited to force premature or delayed decisions?
|
|
4. **State corruption via crash** — How could deliberate crashes be used
|
|
to put the aggregator in an exploitable state?
|
|
5. **Audit evasion** — How could an attacker cause the aggregator to make
|
|
decisions that don't appear in the audit log, or appear differently
|
|
than what actually happened?
|
|
|
|
## For each attack vector:
|
|
|
|
- **Category:** (one of the 5 above)
|
|
- **Attack vector:** Name of the attack
|
|
- **Mechanism:** How the attacker exploits the design
|
|
- **Exploit:** Step-by-step attack sequence
|
|
- **Why downstream controls miss it:** Why PortfolioRisk, BuyingPower,
|
|
or other downstream checks don't catch this
|
|
- **Severity:** Critical / High / Medium
|
|
- **Mitigation:** What the design could add to prevent it
|
|
|
|
## Document:
|
|
|
|
[FULL TEXT OF aggregation.md, 193 lines]
|
|
```
|
|
|
|
## Results
|
|
|
|
| Model | Time | Findings | Unique vectors |
|
|
|-------|------|----------|----------------|
|
|
| GPT-5 | ~150s | 8 | 3 (most exhaustive) |
|
|
| Opus | ~65s | 6 | 2 (qualitatively different) |
|
|
| Sonnet | ~20s | 4 | 0 (subset of others) |
|
|
|
|
GPT-5 was most exhaustive and systematic. Opus found qualitatively different
|
|
attack vectors with system-level thinking (e.g., exploiting supervision tree
|
|
restart semantics).
|