Files
model-research/findings/2026-05-07-42-failure-propagation-chain-analysis.md
claw 296bb21eb7 finding #42: failure propagation chain analysis on system-overview.md
New analytical lens: failure propagation chains. Opus matched GPT-5's count
(10 findings each) while using 2.2x fewer tokens. Overview docs are ideal
for this lens. Sonnet produced zero unique insights.
2026-05-07 14:28:26 -07:00

5.2 KiB

Finding #42: Failure Propagation Chain Analysis on system-overview.md

Date: 2026-05-07 Analytical lens: Failure propagation chain analysis (NEW) Document: gargoyle's system-overview.md (323 lines) — high-level architecture overview Models: GPT-5, Claude Opus 4.6, Claude Sonnet 4.6

Summary

New analytical lens: identify failure propagation chains — sequences where a failure in one component silently corrupts, degrades, or destabilizes another component's behavior WITHOUT triggering explicit error handling or alarms.

Results

Model Time Output tokens Reasoning tokens Findings
GPT-5 88s 9,027 6,720 10
Claude Opus 4.6 97s 4,044 (internal) 10
Claude Sonnet 4.6 35s 1,605 (internal) 8

Key Findings

Common Ground (all 3 identified)

  • Shared tick event bus (EVT) as cross-user failure propagation, violating claimed user isolation (Invariant 12)
  • BrokerAdapter fill misattribution/cross-user contamination through the shared port
  • Stale/incorrect instrument_id resolution propagating silently through the pipeline
  • Exact arithmetic boundary violation at float-to-decimal conversion at ingestion
  • Recovery ordering hazards where reconciliation completes but derived state is inconsistent

GPT-5 Unique Findings

  • Duplicate fills after reconnect: BrokerAdapter replays fills on reconnect with no idempotency key → duplicate lots, inflated positions. Reconciliation only helps at startup, not steady-state reconnection.
  • Dual feed ingestion: Live + replay adapters simultaneously connected (port substitutability permits this) → duplicate ticks → double decisions → double exposure. No "single active feed" mutual exclusion.
  • Missing live fills during steady state: Dropped fills undetected until next restart. No continuous reconciliation specified. Positions silently drift.
  • PortfolioMonitor close-only outliving its trigger: No documented lifecycle for clearing → OrderManager blocks new orders indefinitely after trigger resolves.
  • Instrument identity drift between market data and broker: Corporate action causes disagreement between ingestion and adapter → fills recorded against wrong instrument lineage.

Claude Opus Unique Findings

  • PortfolioMonitor/Ledger divergence: PM runs as background process with own fill feed, NO reconciliation against authoritative Ledger lot state. PM's position view can drift → spurious close-only or missed close-only. Most architecturally significant: identifies PM has a PARALLEL position model with no convergence mechanism.
  • Signal rejection asymmetry: SignalRisk rejections invisible to Aggregator (only approvals flow downstream). Aggregator forms decisions on systematically biased subset. Identifies this as design-level information asymmetry.
  • Kill switch + fill precedence invariant deadlock: Kill switch engages while order partially filled → remaining fills forced by Invariant 6 → position grows during kill switch → PortfolioMonitor's close-only blocked by Invariant 8 → UNMANAGEABLE POSITION during crisis. Genuine deadlock between two stated invariants.
  • Corporate action lot adjustment bypasses risk pipeline: Split doubles quantity → exceeds limits → no re-evaluation because risk pipeline only validates decisions, not external state changes.

Claude Sonnet Findings

  • 8 findings, all also identified by GPT-5 or Opus with more depth. Zero unique insights.
  • One finding (audit log corruption) based on architectural misunderstanding.

Analysis

Opus's Token Efficiency

Opus produced 10 findings in 4,044 tokens — roughly 2.2x more token-efficient than GPT-5 (10 findings in 9,027 tokens). This is the first experiment where Opus MATCHED GPT-5's finding count while using significantly fewer tokens. Previous experiments showed Opus finding fewer issues with higher insight density. Here: equal count AND higher density.

Document Level Matters

Overview/architecture documents are IDEAL for failure propagation analysis because they show boundaries and shared resources that component-level docs hide. Suggested document-level → lens matching:

  • Overview docs → failure propagation, blast radius, isolation verification
  • Component specs → race conditions, invariant violations, hidden assumptions
  • Cross-cutting docs → temporal ordering, recovery hazards

Dominant Failure Vector

The shared infrastructure contradiction (EVT/BA as single shared nodes with claimed per-user isolation) is the single most important finding. All models caught it, each exploring different consequences:

  • GPT-5: backpressure propagation, duplicate feed ingestion
  • Opus: fill misattribution, PortfolioMonitor parallel state
  • Sonnet: tick corruption (most obvious variant)

Practical Implications

  • Run Opus for highest insight density and design tension identification (10 findings, 97s, 4K tokens)
  • Run GPT-5 for operational/runtime hazards the architecture doesn't consider (10 findings, 88s, 9K tokens)
  • Sonnet is redundant for this task — provides no unique value over the other two
  • Total unique findings after deduplication: ~14 distinct propagation chains from a 323-line document