finding #42: failure propagation chain analysis on system-overview.md
New analytical lens: failure propagation chains. Opus matched GPT-5's count (10 findings each) while using 2.2x fewer tokens. Overview docs are ideal for this lens. Sonnet produced zero unique insights.
This commit is contained in:
@@ -0,0 +1,77 @@
|
|||||||
|
# Finding #42: Failure Propagation Chain Analysis on system-overview.md
|
||||||
|
|
||||||
|
**Date:** 2026-05-07
|
||||||
|
**Analytical lens:** Failure propagation chain analysis (NEW)
|
||||||
|
**Document:** gargoyle's `system-overview.md` (323 lines) — high-level architecture overview
|
||||||
|
**Models:** GPT-5, Claude Opus 4.6, Claude Sonnet 4.6
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
New analytical lens: identify failure propagation chains — sequences where a failure in one
|
||||||
|
component silently corrupts, degrades, or destabilizes another component's behavior WITHOUT
|
||||||
|
triggering explicit error handling or alarms.
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
| Model | Time | Output tokens | Reasoning tokens | Findings |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| GPT-5 | 88s | 9,027 | 6,720 | 10 |
|
||||||
|
| Claude Opus 4.6 | 97s | 4,044 | (internal) | 10 |
|
||||||
|
| Claude Sonnet 4.6 | 35s | 1,605 | (internal) | 8 |
|
||||||
|
|
||||||
|
## Key Findings
|
||||||
|
|
||||||
|
### Common Ground (all 3 identified)
|
||||||
|
|
||||||
|
- Shared tick event bus (EVT) as cross-user failure propagation, violating claimed user isolation (Invariant 12)
|
||||||
|
- BrokerAdapter fill misattribution/cross-user contamination through the shared port
|
||||||
|
- Stale/incorrect instrument_id resolution propagating silently through the pipeline
|
||||||
|
- Exact arithmetic boundary violation at float-to-decimal conversion at ingestion
|
||||||
|
- Recovery ordering hazards where reconciliation completes but derived state is inconsistent
|
||||||
|
|
||||||
|
### GPT-5 Unique Findings
|
||||||
|
|
||||||
|
- **Duplicate fills after reconnect:** BrokerAdapter replays fills on reconnect with no idempotency key → duplicate lots, inflated positions. Reconciliation only helps at startup, not steady-state reconnection.
|
||||||
|
- **Dual feed ingestion:** Live + replay adapters simultaneously connected (port substitutability permits this) → duplicate ticks → double decisions → double exposure. No "single active feed" mutual exclusion.
|
||||||
|
- **Missing live fills during steady state:** Dropped fills undetected until next restart. No continuous reconciliation specified. Positions silently drift.
|
||||||
|
- **PortfolioMonitor close-only outliving its trigger:** No documented lifecycle for clearing → OrderManager blocks new orders indefinitely after trigger resolves.
|
||||||
|
- **Instrument identity drift between market data and broker:** Corporate action causes disagreement between ingestion and adapter → fills recorded against wrong instrument lineage.
|
||||||
|
|
||||||
|
### Claude Opus Unique Findings
|
||||||
|
|
||||||
|
- **PortfolioMonitor/Ledger divergence:** PM runs as background process with own fill feed, NO reconciliation against authoritative Ledger lot state. PM's position view can drift → spurious close-only or missed close-only. Most architecturally significant: identifies PM has a PARALLEL position model with no convergence mechanism.
|
||||||
|
- **Signal rejection asymmetry:** SignalRisk rejections invisible to Aggregator (only approvals flow downstream). Aggregator forms decisions on systematically biased subset. Identifies this as design-level information asymmetry.
|
||||||
|
- **Kill switch + fill precedence invariant deadlock:** Kill switch engages while order partially filled → remaining fills forced by Invariant 6 → position grows during kill switch → PortfolioMonitor's close-only blocked by Invariant 8 → UNMANAGEABLE POSITION during crisis. Genuine deadlock between two stated invariants.
|
||||||
|
- **Corporate action lot adjustment bypasses risk pipeline:** Split doubles quantity → exceeds limits → no re-evaluation because risk pipeline only validates decisions, not external state changes.
|
||||||
|
|
||||||
|
### Claude Sonnet Findings
|
||||||
|
|
||||||
|
- 8 findings, all also identified by GPT-5 or Opus with more depth. Zero unique insights.
|
||||||
|
- One finding (audit log corruption) based on architectural misunderstanding.
|
||||||
|
|
||||||
|
## Analysis
|
||||||
|
|
||||||
|
### Opus's Token Efficiency
|
||||||
|
|
||||||
|
Opus produced 10 findings in 4,044 tokens — roughly **2.2x more token-efficient** than GPT-5 (10 findings in 9,027 tokens). This is the first experiment where Opus MATCHED GPT-5's finding count while using significantly fewer tokens. Previous experiments showed Opus finding fewer issues with higher insight density. Here: equal count AND higher density.
|
||||||
|
|
||||||
|
### Document Level Matters
|
||||||
|
|
||||||
|
Overview/architecture documents are IDEAL for failure propagation analysis because they show boundaries and shared resources that component-level docs hide. Suggested document-level → lens matching:
|
||||||
|
- **Overview docs** → failure propagation, blast radius, isolation verification
|
||||||
|
- **Component specs** → race conditions, invariant violations, hidden assumptions
|
||||||
|
- **Cross-cutting docs** → temporal ordering, recovery hazards
|
||||||
|
|
||||||
|
### Dominant Failure Vector
|
||||||
|
|
||||||
|
The shared infrastructure contradiction (EVT/BA as single shared nodes with claimed per-user isolation) is the single most important finding. All models caught it, each exploring different consequences:
|
||||||
|
- GPT-5: backpressure propagation, duplicate feed ingestion
|
||||||
|
- Opus: fill misattribution, PortfolioMonitor parallel state
|
||||||
|
- Sonnet: tick corruption (most obvious variant)
|
||||||
|
|
||||||
|
## Practical Implications
|
||||||
|
|
||||||
|
- Run **Opus** for highest insight density and design tension identification (10 findings, 97s, 4K tokens)
|
||||||
|
- Run **GPT-5** for operational/runtime hazards the architecture doesn't consider (10 findings, 88s, 9K tokens)
|
||||||
|
- **Sonnet is redundant** for this task — provides no unique value over the other two
|
||||||
|
- Total unique findings after deduplication: ~14 distinct propagation chains from a 323-line document
|
||||||
Reference in New Issue
Block a user