model-research/prompts/adversarial-manipulation.md

# Prompt: Adversarial Manipulation Analysis

Used in Finding #29.

## Setup

- Single document (full text)
- Same prompt to all models
- No tools, no project context beyond the document

## Prompt

```
You are a red-team security analyst reviewing a trading system's
aggregation component. Your task is to identify how a MISBEHAVING,
COMPROMISED, or BUGGY upstream component could exploit this design
to produce harmful trading outcomes that bypass downstream safety controls.

## Categories of adversarial manipulation:

1. **Signal injection** — How could a compromised strategy inject signals
   that exploit the aggregator's logic to produce dangerous decisions?
2. **Timing manipulation** — How could an attacker manipulate timing
   (delays, bursts, clock skew) to exploit the aggregator's temporal logic?
3. **Capacity weaponization** — How could the max_signals bound or group
   completion logic be exploited to force premature or delayed decisions?
4. **State corruption via crash** — How could deliberate crashes be used
   to put the aggregator in an exploitable state?
5. **Audit evasion** — How could an attacker cause the aggregator to make
   decisions that don't appear in the audit log, or appear differently
   than what actually happened?

## For each attack vector:

- **Category:** (one of the 5 above)
- **Attack vector:** Name of the attack
- **Mechanism:** How the attacker exploits the design
- **Exploit:** Step-by-step attack sequence
- **Why downstream controls miss it:** Why PortfolioRisk, BuyingPower,
  or other downstream checks don't catch this
- **Severity:** Critical / High / Medium
- **Mitigation:** What the design could add to prevent it

## Document:

[FULL TEXT OF aggregation.md, 193 lines]
```

## Results

| Model | Time | Findings | Unique vectors |
|-------|------|----------|----------------|
| GPT-5 | ~150s | 8 | 3 (most exhaustive) |
| Opus | ~65s | 6 | 2 (qualitatively different) |
| Sonnet | ~20s | 4 | 0 (subset of others) |

GPT-5 was most exhaustive and systematic. Opus found qualitatively different
attack vectors with system-level thinking (e.g., exploiting supervision tree
restart semantics).