Files
model-research/findings/2026-05-06-34-information-flow-hazard-analysis.md
T
Rodin 4a69a99d05 finding #34: information flow hazard analysis on lot-accounting.md
New analytical lens: where data propagation creates stale, contradictory,
or misleading views for different consumers.

Key result: highest model convergence (45% common ground) due to document's
explicit failure mode table. GPT-5 finds event-level provenance gaps; Opus
identifies strategy attribution dimension. Sonnet adds zero unique value.
Two-model stack (GPT-5 + Opus) optimal.
2026-05-06 18:29:06 -07:00

3.9 KiB

Finding #34: Information Flow Hazard Analysis on lot-accounting.md

Date: 2026-05-06 Document: docs/domain/contexts/ledger/lot-accounting.md (181 lines) Lens: Information flow hazards (NEW) — where data propagation creates stale, contradictory, incomplete, or misleading views for different consumers.

Setup

Same document + same focused analytical prompt to all 3 models via HAI proxy. Specified 5 categories: staleness propagation, fan-out inconsistency, causal ordering violations, write amplification gaps, provenance opacity. Required Flow/Category/Scenario/Impact/Severity format per finding. No tools, no project context beyond the document.

Results

Model Time Output tokens Reasoning tokens Findings
GPT-5 94s 8,246 6,016 11
Claude Opus 4.6 66s 3,318 (internal) 7
Claude Sonnet 4.6 77s 4,163 (internal) 6

Common Ground (all 3 identified)

  1. Position aggregate staleness after crash (write amplification gap) — Critical
  2. Position.realized_pnl accumulator drift from LotClosed sum — Critical
  3. Multi-lot close walk creating fan-out inconsistency (intermediate states visible) — High
  4. Corporate action arrival ordering race creating permanently incorrect immutable LotClosed — Critical
  5. Provenance opacity of Position.average_cost (no freshness metadata) — High

GPT-5 Unique Findings

  • Lot ledger unavailability as information flow hazard: Position freezes at pre-sell state indefinitely during documented outage. Risk/UI operate on stale exposure for entire outage+recovery window.
  • Wash sale timing at closing: Document only checks at Opening (buy), not Closing (sell). Disqualifying buy that already exists when loss sale executes creates immutable LotClosed with disallowed loss amount.
  • LotClosed per-fill completeness gap: No end-of-fill marker or expected event count. Consumers cannot distinguish "all events written" from "more coming."
  • Reconciliation non-atomicity: Re-derivation writes may not be atomic across Position fields, creating transient internal inconsistency.
  • Opening write amplification: Lot exists before Position update, creating underreported-exposure window.

Opus Unique Findings

  • Strategy P&L attribution fan-out: During multi-LotClosed write, per-strategy P&L consumers see inconsistent attribution. Strategy-level stop-losses or rebalancing may fire incorrectly on partial data.

Sonnet Unique Findings

  • None truly unique — all findings were variations of what the other models found.

Key Insight: High Model Convergence

This lens produced the highest convergence rate across all experiments:

  • 45% of GPT-5's findings were common ground (vs typical 25-35%)
  • Only 6 unique findings across GPT-5's 11 (54% unique vs typical 60-75%)

Why: The document includes an explicit "Failure Modes" table that all models effectively re-derive as information flow hazards. The unique findings come from models going BEYOND the document's own failure analysis.

Practical Implications

  1. Information flow analysis is most valuable on documents WITHOUT explicit failure mode tables — documents that describe data architecture without self-analyzing their failure properties.
  2. Two-model stack (GPT-5 + Opus) is optimal for this lens. Sonnet adds zero unique value.
  3. GPT-5 finds hazards outside the document's own frame (event-level completeness, wash sale timing).
  4. Opus excels at dimensional analysis (strategy attribution dimension) and produces the most concrete, test-case-ready scenarios.

Model Characterization for This Lens

  • GPT-5: Broadest coverage; finds event-level and lifecycle-timing hazards the document doesn't address
  • Opus: Most precise scenario construction with concrete dollar amounts; identifies dimensions the document doesn't analyze
  • Sonnet: Adequate but redundant; produces elaborated versions of what the other two already find