Files
model-research/findings/2026-05-08-49-adversarial-evasion-tampering-audit-log.md
T
claw 7ca01f0cbf finding 49: adversarial evasion/tampering analysis on audit-log.md
New analytical lens (adversarial/offensive security) tested on gargoyle's
signal audit log spec. GPT-5 most exhaustive (25), Opus deepest individual
attack narratives (14), Sonnet most creative meta-attacks (11).

Adversarial lens is ~2.5x more productive than defensive lenses on
comparable docs. All three models converged on same root cause (trust model).
2026-05-08 09:09:58 -07:00

5.5 KiB

Finding 49: Adversarial Evasion and Tampering Path Analysis

Date: 2026-05-08 Document: gargoyle audit-log.md (170 lines) — Signal Audit Log specification Lens: Adversarial evasion and tampering paths (NEW) Models: GPT-5, Claude Opus 4.6, Claude Sonnet 4

Summary

First experiment using an explicitly adversarial/offensive security lens — asking models to identify ways a malicious insider or compromised component could manipulate, evade, or undermine the audit system. The adversarial lens produced significantly MORE findings than defensive lenses (defense-in-depth, gap analysis) on comparable documents.

Results

Model Time Output tokens Reasoning tokens Findings Critical High Medium
GPT-5 ~97s 9,508 5,376 25 5 12 8
Claude Opus 4.6 ~105s 5,477 (internal) 14 6 6 2
Claude Sonnet 4 ~fast 1,732 (internal) 11 4 4 3

Common Ground (all 3 identified)

  • Deliberate storage unavailability creates unaudited trading windows
  • Signal Risk write-only-on-rejection as silent bypass channel
  • signal_data snapshot falsification by trusted components
  • Correction entries as effective mutation mechanism
  • Retention policy weaponization (DE controls both retention and the log)
  • Signal ID reuse/collision for forward trace contamination
  • Multi-aggregator fan-out abuse for correlation dilution

GPT-5 Unique Findings

  • Lossy aggregation via forced re-formation: Disrupt atomic write → decision "lost" → re-form with different signals. Original decision vanishes.
  • Decision content vs executed order mismatch: Audit records conservative decision but OM receives larger order.
  • Decision_id collision/merging: Reuse IDs to co-mingle unrelated entries.
  • Flood-and-hide with rejection storms: Generate noise to overwhelm query views.
  • Backdated corrections + timestamp smearing: Writer-supplied timestamps alter perceived event ordering.
  • Decision_id ghost approvals: Append late "approved" entries to justify unauthorized executions.
  • Correction recursion: Chain corrections-of-corrections for irresolvable ambiguity.
  • Forward-trace truncation via selective W1 gaps: Block only rejections writes while passing signals downstream.

Opus Unique Findings

  • Signal Risk write failure as indistinguishable stealth bypass: Attack looks IDENTICAL to documented operational failures — built-in plausible deniability.
  • Atomic write temporal gap exploitation: Decision forwarded to PR BEFORE audit write completes — trade executes without aggregation entries.
  • Outcome mislabeling "accomplice" framing: Audit system becomes an active participant in concealment, not just failing to detect.
  • Timing-based entry injection: No "sealing" of decision_ids after atomic write — other writers can inflate evidence retroactively.
  • Portfolio Risk entry preemption: Write false "approved" BEFORE PR evaluates, creating irresolvable contradictions.
  • Weaponized immutability via flooding: Inject noise that can NEVER be cleaned up — the system's strength becomes the attack vector.

Sonnet Unique Findings

  • Schema evolution attack: Introduce version field, write new entries with same IDs + higher versions, modify queries to return "latest" — effectively overwrite history through schema change. Uniquely creative meta-attack.
  • Query semantic manipulation: Modify query logic to filter out entries — data exists but becomes invisible to operations.
  • Decision ID fragmentation: Split decisions so each fragment falls below risk thresholds individually.

Key Insights

Adversarial analysis as a new task type

Compared to defensive lenses, adversarial analysis requires:

  1. Attacker perspective ("how would I exploit this?")
  2. Plausible deniability reasoning ("how would this look legitimate?")
  3. Multi-step attack chain construction
  4. Impact from attacker goals, not system goals

Model strengths on adversarial tasks

  • GPT-5: Strongest at attack surface ENUMERATION — systematically covers every exploitable gap. 25 findings covering essentially every section.
  • Opus: Strongest at attack PLAUSIBILITY — how attacks would be perceived, what provides cover, second-order effects of design decisions.
  • Sonnet: Occasionally produces most creative META-ATTACKS that exploit governance/authority rather than mechanisms. Fast and efficient.

Productivity of adversarial lens

Finding #48 (defense-in-depth) on a comparable doc (209 lines): GPT-5: 10, Opus: 7, Sonnet: 6.

Finding #49 (adversarial) on this doc (170 lines): GPT-5: 25, Opus: 14, Sonnet: 11.

~2.5x more findings. A single design decision can be exploited in multiple ways from an attacker's perspective, while defense-in-depth identifies one gap per missing layer.

Root cause convergence

All three models independently identified the same root cause: the audit log's trust model. The system records CLAIMS from trusted components, not independently observed facts. Any compromised writer can fabricate schema-valid entries indistinguishable from truth.

Practical Implication

For architecture review rotation:

  • GPT-5 for exhaustive attack surface (what COULD happen)
  • Opus for realistic threat modeling (how it WOULD play out)
  • Sonnet for creative lateral attacks (meta-level exploitation)

The adversarial lens is the most productive new lens since cross-document consistency analysis. Generates more findings and produces directly actionable security improvements.