New analytical lens: where data propagation creates stale, contradictory, or misleading views for different consumers. Key result: highest model convergence (45% common ground) due to document's explicit failure mode table. GPT-5 finds event-level provenance gaps; Opus identifies strategy attribution dimension. Sonnet adds zero unique value. Two-model stack (GPT-5 + Opus) optimal.
3.9 KiB
Finding #34: Information Flow Hazard Analysis on lot-accounting.md
Date: 2026-05-06
Document: docs/domain/contexts/ledger/lot-accounting.md (181 lines)
Lens: Information flow hazards (NEW) — where data propagation creates stale,
contradictory, incomplete, or misleading views for different consumers.
Setup
Same document + same focused analytical prompt to all 3 models via HAI proxy. Specified 5 categories: staleness propagation, fan-out inconsistency, causal ordering violations, write amplification gaps, provenance opacity. Required Flow/Category/Scenario/Impact/Severity format per finding. No tools, no project context beyond the document.
Results
| Model | Time | Output tokens | Reasoning tokens | Findings |
|---|---|---|---|---|
| GPT-5 | 94s | 8,246 | 6,016 | 11 |
| Claude Opus 4.6 | 66s | 3,318 | (internal) | 7 |
| Claude Sonnet 4.6 | 77s | 4,163 | (internal) | 6 |
Common Ground (all 3 identified)
- Position aggregate staleness after crash (write amplification gap) — Critical
- Position.realized_pnl accumulator drift from LotClosed sum — Critical
- Multi-lot close walk creating fan-out inconsistency (intermediate states visible) — High
- Corporate action arrival ordering race creating permanently incorrect immutable LotClosed — Critical
- Provenance opacity of Position.average_cost (no freshness metadata) — High
GPT-5 Unique Findings
- Lot ledger unavailability as information flow hazard: Position freezes at pre-sell state indefinitely during documented outage. Risk/UI operate on stale exposure for entire outage+recovery window.
- Wash sale timing at closing: Document only checks at Opening (buy), not Closing (sell). Disqualifying buy that already exists when loss sale executes creates immutable LotClosed with disallowed loss amount.
- LotClosed per-fill completeness gap: No end-of-fill marker or expected event count. Consumers cannot distinguish "all events written" from "more coming."
- Reconciliation non-atomicity: Re-derivation writes may not be atomic across Position fields, creating transient internal inconsistency.
- Opening write amplification: Lot exists before Position update, creating underreported-exposure window.
Opus Unique Findings
- Strategy P&L attribution fan-out: During multi-LotClosed write, per-strategy P&L consumers see inconsistent attribution. Strategy-level stop-losses or rebalancing may fire incorrectly on partial data.
Sonnet Unique Findings
- None truly unique — all findings were variations of what the other models found.
Key Insight: High Model Convergence
This lens produced the highest convergence rate across all experiments:
- 45% of GPT-5's findings were common ground (vs typical 25-35%)
- Only 6 unique findings across GPT-5's 11 (54% unique vs typical 60-75%)
Why: The document includes an explicit "Failure Modes" table that all models effectively re-derive as information flow hazards. The unique findings come from models going BEYOND the document's own failure analysis.
Practical Implications
- Information flow analysis is most valuable on documents WITHOUT explicit failure mode tables — documents that describe data architecture without self-analyzing their failure properties.
- Two-model stack (GPT-5 + Opus) is optimal for this lens. Sonnet adds zero unique value.
- GPT-5 finds hazards outside the document's own frame (event-level completeness, wash sale timing).
- Opus excels at dimensional analysis (strategy attribution dimension) and produces the most concrete, test-case-ready scenarios.
Model Characterization for This Lens
- GPT-5: Broadest coverage; finds event-level and lifecycle-timing hazards the document doesn't address
- Opus: Most precise scenario construction with concrete dollar amounts; identifies dimensions the document doesn't analyze
- Sonnet: Adequate but redundant; produces elaborated versions of what the other two already find