Finding #62: Data integrity analysis on signal-lifecycle.md

New lens: data integrity analysis — testing whether data survives flow through systems with correct identity, values, and auditability. Key insights: - GPT-5 excels at audit/forensics gaps (idempotency, ordering, provenance) - Opus finds semantic violations (phantom group, quantity mutation ambiguity) - Sonnet identifies operational races (restart scenarios) Document: gargoyle signal-lifecycle.md (102 lines) Models: GPT-5 (13 findings), Opus (6+), Sonnet (6)
2026-05-09 22:26:46 -07:00
parent 527e71a1d6
commit 9f15047892
1 changed files with 90 additions and 0 deletions
@@ -0,0 +1,90 @@
+# Finding #62: Data Integrity Analysis on Signal Flow
+
+**Date:** 2026-05-09
+**Lens:** Data integrity analysis — does the design preserve data integrity as data flows through stages?
+**Document:** gargoyle signal-lifecycle.md (102 lines)
+
+## Task
+
+Analyze signal-lifecycle design for data integrity gaps:
+1. Data loss without audit
+2. Data corruption (field mutation)
+3. Inconsistent state across components
+4. Identity violations (correlation broken)
+5. Partial writes (intermediate state visible)
+6. Race conditions causing integrity loss
+
+## Results
+
+| Model | Time | Output tokens | Reasoning tokens | Findings |
+|---|---|---|---|---|
+| GPT-5 | 107s | 8,569 | 6,336 | 13 |
+| Claude Opus | 47s | 2,547 | (internal) | 6+ (cut off) |
+| Claude Sonnet 4 | 16s | 983 | (internal) | 6 |
+
+## Common Ground (all 3 identified)
+
+- **Silent signal loss in aggregator buffer** — signals that pass Signal Risk enter the buffer but aren't audited until decision formation; crash/timeout loses them with no record
+- **Non-atomic decision creation + audit write** — steps 4-6 describe sequence (decision born → ID generated → signals recorded) without atomicity; partial failure leaves incomplete audit
+- **Signal Risk crash loses valid signals** — doc explicitly acknowledges crash = no audit entry, creating blind spots for signals that *would* have passed
+- **Aggregator overflow data loss** — "excess signals are expired" during spikes with no mention of audit for dropped signals
+
+## GPT-5 Unique Findings (not in either Claude model)
+
+- **Mutable signal objects allow in-place corruption** — No immutability guarantee; if stages mutate fields (e.g., clamp quantity), different consumers observe different values for same signal_id
+- **Fan-out to multiple aggregators without per-path provenance** — One signal appearing under many decisions lacks aggregator_id/path tagging; audit can't distinguish intended routing from duplication
+- **Open/close intent drift from position state changes** — Action normalized at order time; position changes between signal emission and normalization cause semantic mismatch
+- **Instrument ticker drift without version context** — ticker is "current symbol" at creation; corporate actions change it without timestamped snapshot in audit
+- **No idempotency guarantees for audit writes** — No unique constraint on [signal_id, stage, event_type]; replay/retry creates duplicate audit rows
+- **Ambiguous ordering from lack of sequence/timestamp metadata** — No stage-relative sequence numbers; can't reliably reconstruct event order across concurrent flows
+- **Partial visibility during decision commit window** — No snapshot isolation specified; signal arriving during commit may be included/excluded non-deterministically
+- **Incomplete signal field snapshot in audit** — Doc doesn't specify if full fields or only IDs are recorded; if only IDs, transient field values lost forever
+- **Unspecified audit for Portfolio Risk rejections** — Doc mentions Signal Risk rejection audit but is silent on Portfolio Risk accept/reject recording
+
+## Opus Unique Findings (not in either other model)
+
+- **Late-arriving signal creates phantom group with semantic identity violation** — Signal S3 delayed, arrives after timeout; starts new decision D2 instead of contributing to D1 as intended. *Technically* signal_id correlation intact, but *semantic* correlation broken. Scale_in signal becomes new position initiation.
+- **Quantity field mutation ambiguity** — Doc says Signal Risk can "reject" but silent on "modify" or "partial approval". If quantity reduced from 1000→500, which value is audited? Original intent lost.
+
+## Sonnet Unique Findings (not in either other model)
+
+- **Stale signal processing after strategy restart race** — Strategy crashes, restarts, produces new signals; old buffered signals still processing. Both reach decision-making → double-counting trading intent, 2x intended position size.
+- **Instrument identity inconsistency during corporate actions** — Between signal creation and validation, ticker/instrument_id relationship may change; signal carries stale reference.
+
+## Quality Assessment
+
+**GPT-5** — Most exhaustive coverage (13 findings). Systematically enumerated all six requested integrity gap categories. Several findings address subtle audit/forensics concerns (ordering ambiguity, idempotency, per-path provenance) that neither Claude model raised. Its 6,336 reasoning tokens enabled a structured walkthrough of each component's failure surface.
+
+**Opus** — Fewest findings (6+ cut off by response length) but the late-arriving-signal semantic identity violation is the most architecturally significant. It's not just "signal lost" — it's "signal creates wrong decision, causing duplicate trading intent." The quantity mutation ambiguity finding also identifies a gap neither other model named. Opus continues its pattern of finding design tensions the document can't see about itself.
+
+**Sonnet** — Efficient (983 tokens for 6 findings, 16s). The stale-signal-after-restart race condition is practically important and shows good reasoning about crash/restart interactions. However, findings were less exhaustive and several overlapped heavily with common ground. Output format was clean but depth was limited.
+
+## Key Insight — GPT-5 Excels at Audit/Forensics Gaps
+
+GPT-5's unique findings cluster around audit concerns: idempotency, ordering metadata, per-path provenance, field snapshots. These are all "how will we debug/audit this later?" concerns — the model reasons systematically about observability and traceability requirements, not just operational failure modes.
+
+Opus's findings cluster around semantic violations: the late-signal phantom group and quantity mutation both concern *meaning* corruption even when data technically flows. This is consistent with Opus's pattern of finding what the design can't see about itself.
+
+Sonnet's findings cluster around operational races: restart scenarios, corporate actions during processing. Practical but surface-level compared to the others.
+
+## Task Taxonomy Update
+
+**Data integrity analysis** → GPT-5 for audit/observability gaps, Opus for semantic integrity violations, Sonnet for operational race overview
+
+This lens is distinct from:
+- **State machine completeness** (#58) — tests transition coverage
+- **Convention rule gaps** (#59) — tests specification consistency
+- **Event ordering** (#60) — tests temporal failure modes
+- **Regulatory completeness** (#61) — tests legal/regulatory implementation
+
+Data integrity analysis tests whether data survives flow through the system with correct identity, values, and auditability.
+
+## Practical Implication
+
+For data-flow designs where audit/forensics matter, use GPT-5 to enumerate observability gaps, then Opus to find semantic violations the design can't see. Sonnet provides quick operational race identification but insufficient depth for compliance-critical analysis.
+
+## Efficiency
+
+- GPT-5: 659 tokens/finding (verbose, exhaustive)
+- Opus: ~425 tokens/finding (good insight density)
+- Sonnet: 164 tokens/finding (efficient, surface-level)