Finding #62: Data integrity analysis on signal-lifecycle.md
New lens: data integrity analysis — testing whether data survives flow through systems with correct identity, values, and auditability. Key insights: - GPT-5 excels at audit/forensics gaps (idempotency, ordering, provenance) - Opus finds semantic violations (phantom group, quantity mutation ambiguity) - Sonnet identifies operational races (restart scenarios) Document: gargoyle signal-lifecycle.md (102 lines) Models: GPT-5 (13 findings), Opus (6+), Sonnet (6)
This commit is contained in:
@@ -0,0 +1,90 @@
|
||||
# Finding #62: Data Integrity Analysis on Signal Flow
|
||||
|
||||
**Date:** 2026-05-09
|
||||
**Lens:** Data integrity analysis — does the design preserve data integrity as data flows through stages?
|
||||
**Document:** gargoyle signal-lifecycle.md (102 lines)
|
||||
|
||||
## Task
|
||||
|
||||
Analyze signal-lifecycle design for data integrity gaps:
|
||||
1. Data loss without audit
|
||||
2. Data corruption (field mutation)
|
||||
3. Inconsistent state across components
|
||||
4. Identity violations (correlation broken)
|
||||
5. Partial writes (intermediate state visible)
|
||||
6. Race conditions causing integrity loss
|
||||
|
||||
## Results
|
||||
|
||||
| Model | Time | Output tokens | Reasoning tokens | Findings |
|
||||
|---|---|---|---|---|
|
||||
| GPT-5 | 107s | 8,569 | 6,336 | 13 |
|
||||
| Claude Opus | 47s | 2,547 | (internal) | 6+ (cut off) |
|
||||
| Claude Sonnet 4 | 16s | 983 | (internal) | 6 |
|
||||
|
||||
## Common Ground (all 3 identified)
|
||||
|
||||
- **Silent signal loss in aggregator buffer** — signals that pass Signal Risk enter the buffer but aren't audited until decision formation; crash/timeout loses them with no record
|
||||
- **Non-atomic decision creation + audit write** — steps 4-6 describe sequence (decision born → ID generated → signals recorded) without atomicity; partial failure leaves incomplete audit
|
||||
- **Signal Risk crash loses valid signals** — doc explicitly acknowledges crash = no audit entry, creating blind spots for signals that *would* have passed
|
||||
- **Aggregator overflow data loss** — "excess signals are expired" during spikes with no mention of audit for dropped signals
|
||||
|
||||
## GPT-5 Unique Findings (not in either Claude model)
|
||||
|
||||
- **Mutable signal objects allow in-place corruption** — No immutability guarantee; if stages mutate fields (e.g., clamp quantity), different consumers observe different values for same signal_id
|
||||
- **Fan-out to multiple aggregators without per-path provenance** — One signal appearing under many decisions lacks aggregator_id/path tagging; audit can't distinguish intended routing from duplication
|
||||
- **Open/close intent drift from position state changes** — Action normalized at order time; position changes between signal emission and normalization cause semantic mismatch
|
||||
- **Instrument ticker drift without version context** — ticker is "current symbol" at creation; corporate actions change it without timestamped snapshot in audit
|
||||
- **No idempotency guarantees for audit writes** — No unique constraint on [signal_id, stage, event_type]; replay/retry creates duplicate audit rows
|
||||
- **Ambiguous ordering from lack of sequence/timestamp metadata** — No stage-relative sequence numbers; can't reliably reconstruct event order across concurrent flows
|
||||
- **Partial visibility during decision commit window** — No snapshot isolation specified; signal arriving during commit may be included/excluded non-deterministically
|
||||
- **Incomplete signal field snapshot in audit** — Doc doesn't specify if full fields or only IDs are recorded; if only IDs, transient field values lost forever
|
||||
- **Unspecified audit for Portfolio Risk rejections** — Doc mentions Signal Risk rejection audit but is silent on Portfolio Risk accept/reject recording
|
||||
|
||||
## Opus Unique Findings (not in either other model)
|
||||
|
||||
- **Late-arriving signal creates phantom group with semantic identity violation** — Signal S3 delayed, arrives after timeout; starts new decision D2 instead of contributing to D1 as intended. *Technically* signal_id correlation intact, but *semantic* correlation broken. Scale_in signal becomes new position initiation.
|
||||
- **Quantity field mutation ambiguity** — Doc says Signal Risk can "reject" but silent on "modify" or "partial approval". If quantity reduced from 1000→500, which value is audited? Original intent lost.
|
||||
|
||||
## Sonnet Unique Findings (not in either other model)
|
||||
|
||||
- **Stale signal processing after strategy restart race** — Strategy crashes, restarts, produces new signals; old buffered signals still processing. Both reach decision-making → double-counting trading intent, 2x intended position size.
|
||||
- **Instrument identity inconsistency during corporate actions** — Between signal creation and validation, ticker/instrument_id relationship may change; signal carries stale reference.
|
||||
|
||||
## Quality Assessment
|
||||
|
||||
**GPT-5** — Most exhaustive coverage (13 findings). Systematically enumerated all six requested integrity gap categories. Several findings address subtle audit/forensics concerns (ordering ambiguity, idempotency, per-path provenance) that neither Claude model raised. Its 6,336 reasoning tokens enabled a structured walkthrough of each component's failure surface.
|
||||
|
||||
**Opus** — Fewest findings (6+ cut off by response length) but the late-arriving-signal semantic identity violation is the most architecturally significant. It's not just "signal lost" — it's "signal creates wrong decision, causing duplicate trading intent." The quantity mutation ambiguity finding also identifies a gap neither other model named. Opus continues its pattern of finding design tensions the document can't see about itself.
|
||||
|
||||
**Sonnet** — Efficient (983 tokens for 6 findings, 16s). The stale-signal-after-restart race condition is practically important and shows good reasoning about crash/restart interactions. However, findings were less exhaustive and several overlapped heavily with common ground. Output format was clean but depth was limited.
|
||||
|
||||
## Key Insight — GPT-5 Excels at Audit/Forensics Gaps
|
||||
|
||||
GPT-5's unique findings cluster around audit concerns: idempotency, ordering metadata, per-path provenance, field snapshots. These are all "how will we debug/audit this later?" concerns — the model reasons systematically about observability and traceability requirements, not just operational failure modes.
|
||||
|
||||
Opus's findings cluster around semantic violations: the late-signal phantom group and quantity mutation both concern *meaning* corruption even when data technically flows. This is consistent with Opus's pattern of finding what the design can't see about itself.
|
||||
|
||||
Sonnet's findings cluster around operational races: restart scenarios, corporate actions during processing. Practical but surface-level compared to the others.
|
||||
|
||||
## Task Taxonomy Update
|
||||
|
||||
**Data integrity analysis** → GPT-5 for audit/observability gaps, Opus for semantic integrity violations, Sonnet for operational race overview
|
||||
|
||||
This lens is distinct from:
|
||||
- **State machine completeness** (#58) — tests transition coverage
|
||||
- **Convention rule gaps** (#59) — tests specification consistency
|
||||
- **Event ordering** (#60) — tests temporal failure modes
|
||||
- **Regulatory completeness** (#61) — tests legal/regulatory implementation
|
||||
|
||||
Data integrity analysis tests whether data survives flow through the system with correct identity, values, and auditability.
|
||||
|
||||
## Practical Implication
|
||||
|
||||
For data-flow designs where audit/forensics matter, use GPT-5 to enumerate observability gaps, then Opus to find semantic violations the design can't see. Sonnet provides quick operational race identification but insufficient depth for compliance-critical analysis.
|
||||
|
||||
## Efficiency
|
||||
|
||||
- GPT-5: 659 tokens/finding (verbose, exhaustive)
|
||||
- Opus: ~425 tokens/finding (good insight density)
|
||||
- Sonnet: 164 tokens/finding (efficient, surface-level)
|
||||
Reference in New Issue
Block a user