Files
model-research/findings/2026-05-10-65-concurrent-write-hazards-event-sourcing.md
Rodin 7c64712c2f Add finding #65: concurrent write hazards in event sourcing
New analytical lens testing concurrent write hazards against event-catalog.md.
GPT-5 found 19 hazards, Opus 11, Sonnet 12. Union ~27 distinct findings.
Key insight: this lens is high-value for event sourcing docs because replay
correctness depends on ordering invariants that are often implicit.
2026-05-10 11:48:41 -07:00

3.8 KiB

Finding #65: Concurrent Write Hazards in Event Sourcing

Date: 2026-05-10 Document: gargoyle/docs/impl/event-catalog.md (108 lines) Analytical Lens: Concurrent write hazards in aggregate reconstruction

Summary

All three models found genuine concurrency hazards with moderate overlap. GPT-5 was most exhaustive (19 hazards), Opus identified design-level flaws, and Sonnet covered core issues fastest. The union (~27 distinct hazards) far exceeds any single model's output.

Metrics

Model Time Output tokens Reasoning tokens Hazards Critical High Medium
GPT-5 93s 2,569 4,480 19 6 7 5
Claude Opus 4 64s 3,250 (internal) 11 4 5 2
Claude Sonnet 4 33s 1,631 (internal) 12 4 5 3

Common Ground (All 3 Identified)

  1. Order fill vs cancellation race (CRITICAL) — fill precedence rule doesn't specify timestamp authority
  2. Position update concurrency (CRITICAL) — no optimistic concurrency control
  3. Cross-stream atomicity (CRITICAL) — OrderFilled/LotOpened not atomic
  4. Kill switch toggle race (CRITICAL) — global singleton without concurrency control
  5. Lot closure idempotency (HIGH) — LotPartiallyClosed can double-apply
  6. Partial fill accumulation (HIGH) — duplicates can double-count fills

GPT-5 Unique Findings

  • pending_cancel vs pending_replace race — no precedence between competing non-terminal transitions (HIGH)
  • Terminal-to-nonterminal regression — fill precedence omits rejected state (HIGH)
  • Fill events lack unique fill_id — no idempotency key on fill events (CRITICAL)
  • OrderPartiallyFilled + OrderFilled collision — race between partial and terminal (HIGH)
  • PositionUpdated after PositionClosed — no precedence blocking (HIGH)
  • LotPartiallyClosed vs LotFullyClosed race — competing closures (HIGH)
  • Plus 5 more MEDIUM findings on state transitions and event delivery

Opus Unique Findings

  • Cost basis non-determinism — concurrent partial fills produce different lot cost bases depending on application order (HIGH) — qualitatively different from quantity accumulation
  • Order state machine transition matrix undefined — "handles out-of-order" insufficient (HIGH)
  • User ID collision in stream ID — multi-tenant collision risk (HIGH/MEDIUM)
  • DecisionFormed references unpersisted signals (MEDIUM)
  • Fill ID uniqueness unspecified (HIGH)

Sonnet Unique Findings

  • Broker vs system ordering mismatch — broker may process in reverse order from system (HIGH)
  • Stream ID generation race — algorithms could generate same order_id (HIGH)
  • Position resurrection — delayed fill updates position after PositionClosed (MEDIUM)

Key Insight

Concurrent write hazard analysis is a high-value lens for event sourcing documents because:

  1. Event sourcing inherently involves concurrent writes (producers, brokers, timers)
  2. Replay correctness depends on ordering invariants that are often implicit
  3. Cross-stream dependencies are common but atomicity is hard to achieve
  4. Idempotency requirements are frequently under-specified

Model Strengths

  • GPT-5: Exhaustive enumeration across all hazard categories. Best for comprehensive audits.
  • Opus: Design-level hazards where the model is underspecified (cost basis determinism).
  • Sonnet: External consistency (broker ordering) and upstream hazards (ID generation).

Practical Implication

For event sourcing architecture documents, run all three models. GPT-5 for exhaustive coverage, Opus for design gaps, Sonnet for fast screening. The union provides comprehensive coverage that no single model achieves.

Efficiency

  • GPT-5: 135 tokens/hazard
  • Opus: 295 tokens/hazard (more detailed scenarios)
  • Sonnet: 136 tokens/hazard (similar to GPT-5, but faster)