diff --git a/findings/2026-05-09-62-data-integrity-signal-flow.md b/findings/2026-05-09-62-data-integrity-signal-flow.md new file mode 100644 index 0000000..5bde63e --- /dev/null +++ b/findings/2026-05-09-62-data-integrity-signal-flow.md @@ -0,0 +1,90 @@ +# Finding #62: Data Integrity Analysis on Signal Flow + +**Date:** 2026-05-09 +**Lens:** Data integrity analysis — does the design preserve data integrity as data flows through stages? +**Document:** gargoyle signal-lifecycle.md (102 lines) + +## Task + +Analyze signal-lifecycle design for data integrity gaps: +1. Data loss without audit +2. Data corruption (field mutation) +3. Inconsistent state across components +4. Identity violations (correlation broken) +5. Partial writes (intermediate state visible) +6. Race conditions causing integrity loss + +## Results + +| Model | Time | Output tokens | Reasoning tokens | Findings | +|---|---|---|---|---| +| GPT-5 | 107s | 8,569 | 6,336 | 13 | +| Claude Opus | 47s | 2,547 | (internal) | 6+ (cut off) | +| Claude Sonnet 4 | 16s | 983 | (internal) | 6 | + +## Common Ground (all 3 identified) + +- **Silent signal loss in aggregator buffer** — signals that pass Signal Risk enter the buffer but aren't audited until decision formation; crash/timeout loses them with no record +- **Non-atomic decision creation + audit write** — steps 4-6 describe sequence (decision born → ID generated → signals recorded) without atomicity; partial failure leaves incomplete audit +- **Signal Risk crash loses valid signals** — doc explicitly acknowledges crash = no audit entry, creating blind spots for signals that *would* have passed +- **Aggregator overflow data loss** — "excess signals are expired" during spikes with no mention of audit for dropped signals + +## GPT-5 Unique Findings (not in either Claude model) + +- **Mutable signal objects allow in-place corruption** — No immutability guarantee; if stages mutate fields (e.g., clamp quantity), different consumers observe different values for same signal_id +- **Fan-out to multiple aggregators without per-path provenance** — One signal appearing under many decisions lacks aggregator_id/path tagging; audit can't distinguish intended routing from duplication +- **Open/close intent drift from position state changes** — Action normalized at order time; position changes between signal emission and normalization cause semantic mismatch +- **Instrument ticker drift without version context** — ticker is "current symbol" at creation; corporate actions change it without timestamped snapshot in audit +- **No idempotency guarantees for audit writes** — No unique constraint on [signal_id, stage, event_type]; replay/retry creates duplicate audit rows +- **Ambiguous ordering from lack of sequence/timestamp metadata** — No stage-relative sequence numbers; can't reliably reconstruct event order across concurrent flows +- **Partial visibility during decision commit window** — No snapshot isolation specified; signal arriving during commit may be included/excluded non-deterministically +- **Incomplete signal field snapshot in audit** — Doc doesn't specify if full fields or only IDs are recorded; if only IDs, transient field values lost forever +- **Unspecified audit for Portfolio Risk rejections** — Doc mentions Signal Risk rejection audit but is silent on Portfolio Risk accept/reject recording + +## Opus Unique Findings (not in either other model) + +- **Late-arriving signal creates phantom group with semantic identity violation** — Signal S3 delayed, arrives after timeout; starts new decision D2 instead of contributing to D1 as intended. *Technically* signal_id correlation intact, but *semantic* correlation broken. Scale_in signal becomes new position initiation. +- **Quantity field mutation ambiguity** — Doc says Signal Risk can "reject" but silent on "modify" or "partial approval". If quantity reduced from 1000→500, which value is audited? Original intent lost. + +## Sonnet Unique Findings (not in either other model) + +- **Stale signal processing after strategy restart race** — Strategy crashes, restarts, produces new signals; old buffered signals still processing. Both reach decision-making → double-counting trading intent, 2x intended position size. +- **Instrument identity inconsistency during corporate actions** — Between signal creation and validation, ticker/instrument_id relationship may change; signal carries stale reference. + +## Quality Assessment + +**GPT-5** — Most exhaustive coverage (13 findings). Systematically enumerated all six requested integrity gap categories. Several findings address subtle audit/forensics concerns (ordering ambiguity, idempotency, per-path provenance) that neither Claude model raised. Its 6,336 reasoning tokens enabled a structured walkthrough of each component's failure surface. + +**Opus** — Fewest findings (6+ cut off by response length) but the late-arriving-signal semantic identity violation is the most architecturally significant. It's not just "signal lost" — it's "signal creates wrong decision, causing duplicate trading intent." The quantity mutation ambiguity finding also identifies a gap neither other model named. Opus continues its pattern of finding design tensions the document can't see about itself. + +**Sonnet** — Efficient (983 tokens for 6 findings, 16s). The stale-signal-after-restart race condition is practically important and shows good reasoning about crash/restart interactions. However, findings were less exhaustive and several overlapped heavily with common ground. Output format was clean but depth was limited. + +## Key Insight — GPT-5 Excels at Audit/Forensics Gaps + +GPT-5's unique findings cluster around audit concerns: idempotency, ordering metadata, per-path provenance, field snapshots. These are all "how will we debug/audit this later?" concerns — the model reasons systematically about observability and traceability requirements, not just operational failure modes. + +Opus's findings cluster around semantic violations: the late-signal phantom group and quantity mutation both concern *meaning* corruption even when data technically flows. This is consistent with Opus's pattern of finding what the design can't see about itself. + +Sonnet's findings cluster around operational races: restart scenarios, corporate actions during processing. Practical but surface-level compared to the others. + +## Task Taxonomy Update + +**Data integrity analysis** → GPT-5 for audit/observability gaps, Opus for semantic integrity violations, Sonnet for operational race overview + +This lens is distinct from: +- **State machine completeness** (#58) — tests transition coverage +- **Convention rule gaps** (#59) — tests specification consistency +- **Event ordering** (#60) — tests temporal failure modes +- **Regulatory completeness** (#61) — tests legal/regulatory implementation + +Data integrity analysis tests whether data survives flow through the system with correct identity, values, and auditability. + +## Practical Implication + +For data-flow designs where audit/forensics matter, use GPT-5 to enumerate observability gaps, then Opus to find semantic violations the design can't see. Sonnet provides quick operational race identification but insufficient depth for compliance-critical analysis. + +## Efficiency + +- GPT-5: 659 tokens/finding (verbose, exhaustive) +- Opus: ~425 tokens/finding (good insight density) +- Sonnet: 164 tokens/finding (efficient, surface-level)