Files

T

Rodin 4a69a99d05 finding #34 : information flow hazard analysis on lot-accounting.md

New analytical lens: where data propagation creates stale, contradictory,
or misleading views for different consumers.

Key result: highest model convergence (45% common ground) due to document's
explicit failure mode table. GPT-5 finds event-level provenance gaps; Opus
identifies strategy attribution dimension. Sonnet adds zero unique value.
Two-model stack (GPT-5 + Opus) optimal.

2026-05-06 18:29:06 -07:00

3.9 KiB

Raw Blame History

Finding #34: Information Flow Hazard Analysis on lot-accounting.md

Date: 2026-05-06 Document: docs/domain/contexts/ledger/lot-accounting.md (181 lines) Lens: Information flow hazards (NEW) — where data propagation creates stale, contradictory, incomplete, or misleading views for different consumers.

Setup

Same document + same focused analytical prompt to all 3 models via HAI proxy. Specified 5 categories: staleness propagation, fan-out inconsistency, causal ordering violations, write amplification gaps, provenance opacity. Required Flow/Category/Scenario/Impact/Severity format per finding. No tools, no project context beyond the document.

Results

Model	Time	Output tokens	Reasoning tokens	Findings
GPT-5	94s	8,246	6,016	11
Claude Opus 4.6	66s	3,318	(internal)	7
Claude Sonnet 4.6	77s	4,163	(internal)	6

Common Ground (all 3 identified)

Position aggregate staleness after crash (write amplification gap) — Critical
Position.realized_pnl accumulator drift from LotClosed sum — Critical
Multi-lot close walk creating fan-out inconsistency (intermediate states visible) — High
Corporate action arrival ordering race creating permanently incorrect immutable LotClosed — Critical
Provenance opacity of Position.average_cost (no freshness metadata) — High

GPT-5 Unique Findings

Lot ledger unavailability as information flow hazard: Position freezes at pre-sell state indefinitely during documented outage. Risk/UI operate on stale exposure for entire outage+recovery window.
Wash sale timing at closing: Document only checks at Opening (buy), not Closing (sell). Disqualifying buy that already exists when loss sale executes creates immutable LotClosed with disallowed loss amount.
LotClosed per-fill completeness gap: No end-of-fill marker or expected event count. Consumers cannot distinguish "all events written" from "more coming."
Reconciliation non-atomicity: Re-derivation writes may not be atomic across Position fields, creating transient internal inconsistency.
Opening write amplification: Lot exists before Position update, creating underreported-exposure window.

Opus Unique Findings

Strategy P&L attribution fan-out: During multi-LotClosed write, per-strategy P&L consumers see inconsistent attribution. Strategy-level stop-losses or rebalancing may fire incorrectly on partial data.

Sonnet Unique Findings

None truly unique — all findings were variations of what the other models found.

Key Insight: High Model Convergence

This lens produced the highest convergence rate across all experiments:

45% of GPT-5's findings were common ground (vs typical 25-35%)
Only 6 unique findings across GPT-5's 11 (54% unique vs typical 60-75%)

Why: The document includes an explicit "Failure Modes" table that all models effectively re-derive as information flow hazards. The unique findings come from models going BEYOND the document's own failure analysis.

Practical Implications

Information flow analysis is most valuable on documents WITHOUT explicit failure mode tables — documents that describe data architecture without self-analyzing their failure properties.
Two-model stack (GPT-5 + Opus) is optimal for this lens. Sonnet adds zero unique value.
GPT-5 finds hazards outside the document's own frame (event-level completeness, wash sale timing).
Opus excels at dimensional analysis (strategy attribution dimension) and produces the most concrete, test-case-ready scenarios.

Model Characterization for This Lens

GPT-5: Broadest coverage; finds event-level and lifecycle-timing hazards the document doesn't address
Opus: Most precise scenario construction with concrete dollar amounts; identifies dimensions the document doesn't analyze
Sonnet: Adequate but redundant; produces elaborated versions of what the other two already find

3.9 KiB Raw Blame History