diff --git a/findings/2026-05-08-49-adversarial-evasion-tampering-audit-log.md b/findings/2026-05-08-49-adversarial-evasion-tampering-audit-log.md new file mode 100644 index 0000000..08d15c3 --- /dev/null +++ b/findings/2026-05-08-49-adversarial-evasion-tampering-audit-log.md @@ -0,0 +1,122 @@ +# Finding 49: Adversarial Evasion and Tampering Path Analysis + +**Date:** 2026-05-08 +**Document:** gargoyle `audit-log.md` (170 lines) — Signal Audit Log specification +**Lens:** Adversarial evasion and tampering paths (NEW) +**Models:** GPT-5, Claude Opus 4.6, Claude Sonnet 4 + +## Summary + +First experiment using an explicitly adversarial/offensive security lens — asking +models to identify ways a malicious insider or compromised component could +manipulate, evade, or undermine the audit system. The adversarial lens produced +significantly MORE findings than defensive lenses (defense-in-depth, gap analysis) +on comparable documents. + +## Results + +| Model | Time | Output tokens | Reasoning tokens | Findings | Critical | High | Medium | +|---|---|---|---|---|---|---|---| +| GPT-5 | ~97s | 9,508 | 5,376 | 25 | 5 | 12 | 8 | +| Claude Opus 4.6 | ~105s | 5,477 | (internal) | 14 | 6 | 6 | 2 | +| Claude Sonnet 4 | ~fast | 1,732 | (internal) | 11 | 4 | 4 | 3 | + +## Common Ground (all 3 identified) + +- Deliberate storage unavailability creates unaudited trading windows +- Signal Risk write-only-on-rejection as silent bypass channel +- signal_data snapshot falsification by trusted components +- Correction entries as effective mutation mechanism +- Retention policy weaponization (DE controls both retention and the log) +- Signal ID reuse/collision for forward trace contamination +- Multi-aggregator fan-out abuse for correlation dilution + +## GPT-5 Unique Findings + +- **Lossy aggregation via forced re-formation:** Disrupt atomic write → decision + "lost" → re-form with different signals. Original decision vanishes. +- **Decision content vs executed order mismatch:** Audit records conservative + decision but OM receives larger order. +- **Decision_id collision/merging:** Reuse IDs to co-mingle unrelated entries. +- **Flood-and-hide with rejection storms:** Generate noise to overwhelm query views. +- **Backdated corrections + timestamp smearing:** Writer-supplied timestamps + alter perceived event ordering. +- **Decision_id ghost approvals:** Append late "approved" entries to justify + unauthorized executions. +- **Correction recursion:** Chain corrections-of-corrections for irresolvable ambiguity. +- **Forward-trace truncation via selective W1 gaps:** Block only rejections writes + while passing signals downstream. + +## Opus Unique Findings + +- **Signal Risk write failure as indistinguishable stealth bypass:** Attack looks + IDENTICAL to documented operational failures — built-in plausible deniability. +- **Atomic write temporal gap exploitation:** Decision forwarded to PR BEFORE + audit write completes — trade executes without aggregation entries. +- **Outcome mislabeling "accomplice" framing:** Audit system becomes an active + participant in concealment, not just failing to detect. +- **Timing-based entry injection:** No "sealing" of decision_ids after atomic + write — other writers can inflate evidence retroactively. +- **Portfolio Risk entry preemption:** Write false "approved" BEFORE PR evaluates, + creating irresolvable contradictions. +- **Weaponized immutability via flooding:** Inject noise that can NEVER be cleaned + up — the system's strength becomes the attack vector. + +## Sonnet Unique Findings + +- **Schema evolution attack:** Introduce version field, write new entries with + same IDs + higher versions, modify queries to return "latest" — effectively + overwrite history through schema change. Uniquely creative meta-attack. +- **Query semantic manipulation:** Modify query logic to filter out entries — + data exists but becomes invisible to operations. +- **Decision ID fragmentation:** Split decisions so each fragment falls below + risk thresholds individually. + +## Key Insights + +### Adversarial analysis as a new task type + +Compared to defensive lenses, adversarial analysis requires: +1. Attacker perspective ("how would I exploit this?") +2. Plausible deniability reasoning ("how would this look legitimate?") +3. Multi-step attack chain construction +4. Impact from attacker goals, not system goals + +### Model strengths on adversarial tasks + +- **GPT-5:** Strongest at attack surface ENUMERATION — systematically covers + every exploitable gap. 25 findings covering essentially every section. +- **Opus:** Strongest at attack PLAUSIBILITY — how attacks would be perceived, + what provides cover, second-order effects of design decisions. +- **Sonnet:** Occasionally produces most creative META-ATTACKS that exploit + governance/authority rather than mechanisms. Fast and efficient. + +### Productivity of adversarial lens + +Finding #48 (defense-in-depth) on a comparable doc (209 lines): +GPT-5: 10, Opus: 7, Sonnet: 6. + +Finding #49 (adversarial) on this doc (170 lines): +GPT-5: 25, Opus: 14, Sonnet: 11. + +~2.5x more findings. A single design decision can be exploited in multiple ways +from an attacker's perspective, while defense-in-depth identifies one gap per +missing layer. + +### Root cause convergence + +All three models independently identified the same root cause: the audit log's +trust model. The system records CLAIMS from trusted components, not independently +observed facts. Any compromised writer can fabricate schema-valid entries +indistinguishable from truth. + +## Practical Implication + +For architecture review rotation: +- **GPT-5** for exhaustive attack surface (what COULD happen) +- **Opus** for realistic threat modeling (how it WOULD play out) +- **Sonnet** for creative lateral attacks (meta-level exploitation) + +The adversarial lens is the most productive new lens since cross-document +consistency analysis. Generates more findings and produces directly actionable +security improvements.