finding 49: adversarial evasion/tampering analysis on audit-log.md
New analytical lens (adversarial/offensive security) tested on gargoyle's signal audit log spec. GPT-5 most exhaustive (25), Opus deepest individual attack narratives (14), Sonnet most creative meta-attacks (11). Adversarial lens is ~2.5x more productive than defensive lenses on comparable docs. All three models converged on same root cause (trust model).
This commit is contained in:
@@ -0,0 +1,122 @@
|
||||
# Finding 49: Adversarial Evasion and Tampering Path Analysis
|
||||
|
||||
**Date:** 2026-05-08
|
||||
**Document:** gargoyle `audit-log.md` (170 lines) — Signal Audit Log specification
|
||||
**Lens:** Adversarial evasion and tampering paths (NEW)
|
||||
**Models:** GPT-5, Claude Opus 4.6, Claude Sonnet 4
|
||||
|
||||
## Summary
|
||||
|
||||
First experiment using an explicitly adversarial/offensive security lens — asking
|
||||
models to identify ways a malicious insider or compromised component could
|
||||
manipulate, evade, or undermine the audit system. The adversarial lens produced
|
||||
significantly MORE findings than defensive lenses (defense-in-depth, gap analysis)
|
||||
on comparable documents.
|
||||
|
||||
## Results
|
||||
|
||||
| Model | Time | Output tokens | Reasoning tokens | Findings | Critical | High | Medium |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| GPT-5 | ~97s | 9,508 | 5,376 | 25 | 5 | 12 | 8 |
|
||||
| Claude Opus 4.6 | ~105s | 5,477 | (internal) | 14 | 6 | 6 | 2 |
|
||||
| Claude Sonnet 4 | ~fast | 1,732 | (internal) | 11 | 4 | 4 | 3 |
|
||||
|
||||
## Common Ground (all 3 identified)
|
||||
|
||||
- Deliberate storage unavailability creates unaudited trading windows
|
||||
- Signal Risk write-only-on-rejection as silent bypass channel
|
||||
- signal_data snapshot falsification by trusted components
|
||||
- Correction entries as effective mutation mechanism
|
||||
- Retention policy weaponization (DE controls both retention and the log)
|
||||
- Signal ID reuse/collision for forward trace contamination
|
||||
- Multi-aggregator fan-out abuse for correlation dilution
|
||||
|
||||
## GPT-5 Unique Findings
|
||||
|
||||
- **Lossy aggregation via forced re-formation:** Disrupt atomic write → decision
|
||||
"lost" → re-form with different signals. Original decision vanishes.
|
||||
- **Decision content vs executed order mismatch:** Audit records conservative
|
||||
decision but OM receives larger order.
|
||||
- **Decision_id collision/merging:** Reuse IDs to co-mingle unrelated entries.
|
||||
- **Flood-and-hide with rejection storms:** Generate noise to overwhelm query views.
|
||||
- **Backdated corrections + timestamp smearing:** Writer-supplied timestamps
|
||||
alter perceived event ordering.
|
||||
- **Decision_id ghost approvals:** Append late "approved" entries to justify
|
||||
unauthorized executions.
|
||||
- **Correction recursion:** Chain corrections-of-corrections for irresolvable ambiguity.
|
||||
- **Forward-trace truncation via selective W1 gaps:** Block only rejections writes
|
||||
while passing signals downstream.
|
||||
|
||||
## Opus Unique Findings
|
||||
|
||||
- **Signal Risk write failure as indistinguishable stealth bypass:** Attack looks
|
||||
IDENTICAL to documented operational failures — built-in plausible deniability.
|
||||
- **Atomic write temporal gap exploitation:** Decision forwarded to PR BEFORE
|
||||
audit write completes — trade executes without aggregation entries.
|
||||
- **Outcome mislabeling "accomplice" framing:** Audit system becomes an active
|
||||
participant in concealment, not just failing to detect.
|
||||
- **Timing-based entry injection:** No "sealing" of decision_ids after atomic
|
||||
write — other writers can inflate evidence retroactively.
|
||||
- **Portfolio Risk entry preemption:** Write false "approved" BEFORE PR evaluates,
|
||||
creating irresolvable contradictions.
|
||||
- **Weaponized immutability via flooding:** Inject noise that can NEVER be cleaned
|
||||
up — the system's strength becomes the attack vector.
|
||||
|
||||
## Sonnet Unique Findings
|
||||
|
||||
- **Schema evolution attack:** Introduce version field, write new entries with
|
||||
same IDs + higher versions, modify queries to return "latest" — effectively
|
||||
overwrite history through schema change. Uniquely creative meta-attack.
|
||||
- **Query semantic manipulation:** Modify query logic to filter out entries —
|
||||
data exists but becomes invisible to operations.
|
||||
- **Decision ID fragmentation:** Split decisions so each fragment falls below
|
||||
risk thresholds individually.
|
||||
|
||||
## Key Insights
|
||||
|
||||
### Adversarial analysis as a new task type
|
||||
|
||||
Compared to defensive lenses, adversarial analysis requires:
|
||||
1. Attacker perspective ("how would I exploit this?")
|
||||
2. Plausible deniability reasoning ("how would this look legitimate?")
|
||||
3. Multi-step attack chain construction
|
||||
4. Impact from attacker goals, not system goals
|
||||
|
||||
### Model strengths on adversarial tasks
|
||||
|
||||
- **GPT-5:** Strongest at attack surface ENUMERATION — systematically covers
|
||||
every exploitable gap. 25 findings covering essentially every section.
|
||||
- **Opus:** Strongest at attack PLAUSIBILITY — how attacks would be perceived,
|
||||
what provides cover, second-order effects of design decisions.
|
||||
- **Sonnet:** Occasionally produces most creative META-ATTACKS that exploit
|
||||
governance/authority rather than mechanisms. Fast and efficient.
|
||||
|
||||
### Productivity of adversarial lens
|
||||
|
||||
Finding #48 (defense-in-depth) on a comparable doc (209 lines):
|
||||
GPT-5: 10, Opus: 7, Sonnet: 6.
|
||||
|
||||
Finding #49 (adversarial) on this doc (170 lines):
|
||||
GPT-5: 25, Opus: 14, Sonnet: 11.
|
||||
|
||||
~2.5x more findings. A single design decision can be exploited in multiple ways
|
||||
from an attacker's perspective, while defense-in-depth identifies one gap per
|
||||
missing layer.
|
||||
|
||||
### Root cause convergence
|
||||
|
||||
All three models independently identified the same root cause: the audit log's
|
||||
trust model. The system records CLAIMS from trusted components, not independently
|
||||
observed facts. Any compromised writer can fabricate schema-valid entries
|
||||
indistinguishable from truth.
|
||||
|
||||
## Practical Implication
|
||||
|
||||
For architecture review rotation:
|
||||
- **GPT-5** for exhaustive attack surface (what COULD happen)
|
||||
- **Opus** for realistic threat modeling (how it WOULD play out)
|
||||
- **Sonnet** for creative lateral attacks (meta-level exploitation)
|
||||
|
||||
The adversarial lens is the most productive new lens since cross-document
|
||||
consistency analysis. Generates more findings and produces directly actionable
|
||||
security improvements.
|
||||
Reference in New Issue
Block a user