Initial publish: 29 findings, 6 prompts, methodology, open questions
Full comparative analysis of GPT-5, Claude Opus 4.6, Claude Sonnet 4.6, GPT-4.1, and GPT-4.1 Mini on analytical tasks (not coding). Contents: - findings/ALL-FINDINGS.md — complete 3,249-line research log with all 29 findings, methodology notes, and open questions - prompts/ — 6 exact prompts used across experiments - methodology.md — experimental setup and evaluation criteria - open-questions.md — unanswered questions for future work - README.md — overview and summary table Key findings: - Cross-document consistency: Opus is 2.4x faster with more findings - Gap-finding: GPT-5 reasoning tokens find domain-specific gaps - Race conditions: Opus excels at temporal interaction reasoning - Bias detection: Signal-to-noise ratio > model capability - Adversarial analysis: GPT-5 exhaustive, Opus qualitatively different Signed-off-by: Rodin
This commit is contained in:
@@ -0,0 +1,59 @@
|
||||
# Prompt: Adversarial Manipulation Analysis
|
||||
|
||||
Used in Finding #29.
|
||||
|
||||
## Setup
|
||||
|
||||
- Single document (full text)
|
||||
- Same prompt to all models
|
||||
- No tools, no project context beyond the document
|
||||
|
||||
## Prompt
|
||||
|
||||
```
|
||||
You are a red-team security analyst reviewing a trading system's
|
||||
aggregation component. Your task is to identify how a MISBEHAVING,
|
||||
COMPROMISED, or BUGGY upstream component could exploit this design
|
||||
to produce harmful trading outcomes that bypass downstream safety controls.
|
||||
|
||||
## Categories of adversarial manipulation:
|
||||
|
||||
1. **Signal injection** — How could a compromised strategy inject signals
|
||||
that exploit the aggregator's logic to produce dangerous decisions?
|
||||
2. **Timing manipulation** — How could an attacker manipulate timing
|
||||
(delays, bursts, clock skew) to exploit the aggregator's temporal logic?
|
||||
3. **Capacity weaponization** — How could the max_signals bound or group
|
||||
completion logic be exploited to force premature or delayed decisions?
|
||||
4. **State corruption via crash** — How could deliberate crashes be used
|
||||
to put the aggregator in an exploitable state?
|
||||
5. **Audit evasion** — How could an attacker cause the aggregator to make
|
||||
decisions that don't appear in the audit log, or appear differently
|
||||
than what actually happened?
|
||||
|
||||
## For each attack vector:
|
||||
|
||||
- **Category:** (one of the 5 above)
|
||||
- **Attack vector:** Name of the attack
|
||||
- **Mechanism:** How the attacker exploits the design
|
||||
- **Exploit:** Step-by-step attack sequence
|
||||
- **Why downstream controls miss it:** Why PortfolioRisk, BuyingPower,
|
||||
or other downstream checks don't catch this
|
||||
- **Severity:** Critical / High / Medium
|
||||
- **Mitigation:** What the design could add to prevent it
|
||||
|
||||
## Document:
|
||||
|
||||
[FULL TEXT OF aggregation.md, 193 lines]
|
||||
```
|
||||
|
||||
## Results
|
||||
|
||||
| Model | Time | Findings | Unique vectors |
|
||||
|-------|------|----------|----------------|
|
||||
| GPT-5 | ~150s | 8 | 3 (most exhaustive) |
|
||||
| Opus | ~65s | 6 | 2 (qualitatively different) |
|
||||
| Sonnet | ~20s | 4 | 0 (subset of others) |
|
||||
|
||||
GPT-5 was most exhaustive and systematic. Opus found qualitatively different
|
||||
attack vectors with system-level thinking (e.g., exploiting supervision tree
|
||||
restart semantics).
|
||||
Reference in New Issue
Block a user