Finding #62: Data integrity analysis on signal-lifecycle.md

New lens: data integrity analysis — testing whether data survives flow
through systems with correct identity, values, and auditability.

Key insights:
- GPT-5 excels at audit/forensics gaps (idempotency, ordering, provenance)
- Opus finds semantic violations (phantom group, quantity mutation ambiguity)
- Sonnet identifies operational races (restart scenarios)

Document: gargoyle signal-lifecycle.md (102 lines)
Models: GPT-5 (13 findings), Opus (6+), Sonnet (6)
This commit is contained in:
Rodin
2026-05-09 22:26:46 -07:00
parent 527e71a1d6
commit 9f15047892
@@ -0,0 +1,90 @@
# Finding #62: Data Integrity Analysis on Signal Flow
**Date:** 2026-05-09
**Lens:** Data integrity analysis — does the design preserve data integrity as data flows through stages?
**Document:** gargoyle signal-lifecycle.md (102 lines)
## Task
Analyze signal-lifecycle design for data integrity gaps:
1. Data loss without audit
2. Data corruption (field mutation)
3. Inconsistent state across components
4. Identity violations (correlation broken)
5. Partial writes (intermediate state visible)
6. Race conditions causing integrity loss
## Results
| Model | Time | Output tokens | Reasoning tokens | Findings |
|---|---|---|---|---|
| GPT-5 | 107s | 8,569 | 6,336 | 13 |
| Claude Opus | 47s | 2,547 | (internal) | 6+ (cut off) |
| Claude Sonnet 4 | 16s | 983 | (internal) | 6 |
## Common Ground (all 3 identified)
- **Silent signal loss in aggregator buffer** — signals that pass Signal Risk enter the buffer but aren't audited until decision formation; crash/timeout loses them with no record
- **Non-atomic decision creation + audit write** — steps 4-6 describe sequence (decision born → ID generated → signals recorded) without atomicity; partial failure leaves incomplete audit
- **Signal Risk crash loses valid signals** — doc explicitly acknowledges crash = no audit entry, creating blind spots for signals that *would* have passed
- **Aggregator overflow data loss** — "excess signals are expired" during spikes with no mention of audit for dropped signals
## GPT-5 Unique Findings (not in either Claude model)
- **Mutable signal objects allow in-place corruption** — No immutability guarantee; if stages mutate fields (e.g., clamp quantity), different consumers observe different values for same signal_id
- **Fan-out to multiple aggregators without per-path provenance** — One signal appearing under many decisions lacks aggregator_id/path tagging; audit can't distinguish intended routing from duplication
- **Open/close intent drift from position state changes** — Action normalized at order time; position changes between signal emission and normalization cause semantic mismatch
- **Instrument ticker drift without version context** — ticker is "current symbol" at creation; corporate actions change it without timestamped snapshot in audit
- **No idempotency guarantees for audit writes** — No unique constraint on [signal_id, stage, event_type]; replay/retry creates duplicate audit rows
- **Ambiguous ordering from lack of sequence/timestamp metadata** — No stage-relative sequence numbers; can't reliably reconstruct event order across concurrent flows
- **Partial visibility during decision commit window** — No snapshot isolation specified; signal arriving during commit may be included/excluded non-deterministically
- **Incomplete signal field snapshot in audit** — Doc doesn't specify if full fields or only IDs are recorded; if only IDs, transient field values lost forever
- **Unspecified audit for Portfolio Risk rejections** — Doc mentions Signal Risk rejection audit but is silent on Portfolio Risk accept/reject recording
## Opus Unique Findings (not in either other model)
- **Late-arriving signal creates phantom group with semantic identity violation** — Signal S3 delayed, arrives after timeout; starts new decision D2 instead of contributing to D1 as intended. *Technically* signal_id correlation intact, but *semantic* correlation broken. Scale_in signal becomes new position initiation.
- **Quantity field mutation ambiguity** — Doc says Signal Risk can "reject" but silent on "modify" or "partial approval". If quantity reduced from 1000→500, which value is audited? Original intent lost.
## Sonnet Unique Findings (not in either other model)
- **Stale signal processing after strategy restart race** — Strategy crashes, restarts, produces new signals; old buffered signals still processing. Both reach decision-making → double-counting trading intent, 2x intended position size.
- **Instrument identity inconsistency during corporate actions** — Between signal creation and validation, ticker/instrument_id relationship may change; signal carries stale reference.
## Quality Assessment
**GPT-5** — Most exhaustive coverage (13 findings). Systematically enumerated all six requested integrity gap categories. Several findings address subtle audit/forensics concerns (ordering ambiguity, idempotency, per-path provenance) that neither Claude model raised. Its 6,336 reasoning tokens enabled a structured walkthrough of each component's failure surface.
**Opus** — Fewest findings (6+ cut off by response length) but the late-arriving-signal semantic identity violation is the most architecturally significant. It's not just "signal lost" — it's "signal creates wrong decision, causing duplicate trading intent." The quantity mutation ambiguity finding also identifies a gap neither other model named. Opus continues its pattern of finding design tensions the document can't see about itself.
**Sonnet** — Efficient (983 tokens for 6 findings, 16s). The stale-signal-after-restart race condition is practically important and shows good reasoning about crash/restart interactions. However, findings were less exhaustive and several overlapped heavily with common ground. Output format was clean but depth was limited.
## Key Insight — GPT-5 Excels at Audit/Forensics Gaps
GPT-5's unique findings cluster around audit concerns: idempotency, ordering metadata, per-path provenance, field snapshots. These are all "how will we debug/audit this later?" concerns — the model reasons systematically about observability and traceability requirements, not just operational failure modes.
Opus's findings cluster around semantic violations: the late-signal phantom group and quantity mutation both concern *meaning* corruption even when data technically flows. This is consistent with Opus's pattern of finding what the design can't see about itself.
Sonnet's findings cluster around operational races: restart scenarios, corporate actions during processing. Practical but surface-level compared to the others.
## Task Taxonomy Update
**Data integrity analysis** → GPT-5 for audit/observability gaps, Opus for semantic integrity violations, Sonnet for operational race overview
This lens is distinct from:
- **State machine completeness** (#58) — tests transition coverage
- **Convention rule gaps** (#59) — tests specification consistency
- **Event ordering** (#60) — tests temporal failure modes
- **Regulatory completeness** (#61) — tests legal/regulatory implementation
Data integrity analysis tests whether data survives flow through the system with correct identity, values, and auditability.
## Practical Implication
For data-flow designs where audit/forensics matter, use GPT-5 to enumerate observability gaps, then Opus to find semantic violations the design can't see. Sonnet provides quick operational race identification but insufficient depth for compliance-critical analysis.
## Efficiency
- GPT-5: 659 tokens/finding (verbose, exhaustive)
- Opus: ~425 tokens/finding (good insight density)
- Sonnet: 164 tokens/finding (efficient, surface-level)