Finding #68: Cross-context contract coherence analysis

GPT-5 outperforms Sonnet on cross-context integration analysis:
- GPT-5: 10 findings (4 Critical) in 191s with 7,744 reasoning tokens
- Sonnet: 7 findings (1 Critical) in 23s

Key insight: Cross-context contract verification benefits from extended
reasoning (contrast to Finding #67 where Sonnet was better at inter-doc
contradictions). Flow tracing and subscription gap detection require
systematic verification that GPT-5's exhaustive style excels at.

Discovered actual spec gaps in gargoyle domain model (FillReceived
missing fields, no liquidation instruction event, Risk not subscribing
to LotOpened for PDT, etc.).
This commit is contained in:
Rodin
2026-05-10 21:47:27 -07:00
parent 0f43934cb8
commit 2b10595bff
@@ -0,0 +1,92 @@
# 68. Cross-context contract coherence: GPT-5 finds more integration gaps with exhaustive reasoning
**Date:** 2026-05-10
**Finding number:** 68
**Task type:** Cross-context integration analysis (DDD bounded contexts)
**Documents:** Order Management README, Ledger README, Risk README (~432 lines combined)
## Summary
GPT-5 outperforms Sonnet on cross-context contract coherence analysis — finding missing integration events and subscription gaps that require tracing complete system flows. The 8x time cost is justified for architecture review.
## Results
| Model | Time | Input tokens | Output tokens | Reasoning tokens | Findings | Critical | High | Medium |
|---|---|---|---|---|---|---|---|---|
| Claude Sonnet 4 | 23s | 4,861 | 1,282 | (internal) | 7 | 1 | 3 | 3 |
| GPT-5 | 191s | 4,154 | 10,056 | 7,744 | 10 | 4 | 1 | 5 |
## Task
Identify contract coherence issues between three bounded context READMEs that must interoperate: Order Management (order lifecycle), Ledger (positions, lots), and Risk (evaluation, monitoring, kill switch). These contexts communicate via domain events — the task was to find mismatches in what one context publishes vs. what another expects to consume.
## Analytical categories
1. Event payload mismatches
2. Ordering/sequencing assumptions
3. Ownership ambiguity
4. Missing integration events
5. Lifecycle misalignment
## Common findings (both models)
1. **FillReceived missing buy/sell direction** — Both correctly identified that FillReceived payload lacks a side indicator, making it impossible for Ledger to know whether to open or close lots. Both rated Critical.
2. **FillReceived missing user_id** — Ledger's Position uses (user_id, instrument_id) as identity, but FillReceived doesn't include user_id.
3. **FillReceived missing fill_id** — Order Management states fills have unique identity but doesn't publish fill_id in the event.
4. **LotClosed missing timing for PDT** — PDT counting requires knowing when position was opened but LotClosed lacks timestamp or open-lot reference.
5. **Kill switch level → acceptance policy mapping ambiguous** — Risk publishes levels (clear/alert/restrict/liquidate) but Order Management uses different vocabulary (open/close-only/reject-all).
## GPT-5 unique findings
6. **No liquidation instruction event (Critical)** — Risk's Liquidation Sizing outputs "Specific positions and quantities to close" and Order Management states it "must remain available to execute liquidation orders from Risk." But there is NO event that carries liquidation instructions. This is a genuine Critical-severity gap — the documented liquidation flow has no contract to execute it.
7. **Risk doesn't consume LotOpened (Critical)** — PDT counting needs to know when positions were opened, but Risk only consumes LotClosed from Ledger.
8. **Lifecycle misalignment: MarketOpened vs reconciliation** — Risk starts monitoring at MarketOpened, but Ledger may still be reconciling positions. No cross-context readiness signal exists.
9. **Risk claims wash sale consideration but doesn't consume WashSaleDetected** — Risk's Liquidation Sizing says it "considers... wash sale implications" but doesn't subscribe to WashSaleDetected.
## Sonnet unique findings
10. **Race condition in MetricEscalated + KillSwitchEngaged** — Rapid escalation could cause both events to fire, and Order Management's reactions assume specific sequencing.
11. **Missing timestamp in FillReceived** — Explicitly called out wash sale detection needs "within 61 days" determination, requiring a fill timestamp.
## Key insight
Cross-context contract analysis benefits from extended reasoning (contrast to Finding #67).
In Finding #67 (inter-document contradiction), GPT-5's reasoning overhead didn't pay off — Sonnet found more contradictions faster. Here, the task is different: not "find statements that conflict" but "find integration contracts that are incomplete."
This requires:
1. **Flow tracing:** Following capability claims through to actual events to verify paths exist
2. **Subscription gap detection:** Checking whether consumers subscribe to what they need
3. **Cross-referencing identity requirements:** Verifying event payloads include fields consumers need
GPT-5's extended reasoning excels at this systematic verification — essentially state machine traversal across documents.
## Practical implication
For cross-context contract coherence analysis (DDD architecture review):
- Use GPT-5 as primary reviewer — exhaustive reasoning catches missing integration events and subscription gaps
- Sonnet is valuable as fast first-pass for surface-level payload issues, but shouldn't be sole reviewer for integration completeness
- The 8x time cost is justified for architecture docs where missing contracts can cause runtime failures
## Actionable findings
This experiment revealed actual documentation gaps in gargoyle's domain model:
1. **FillReceived event needs:** user_id, side, fill_id, timestamp, instrument_id
2. **LotClosed event needs:** user_id, open_timestamp (or open_lot_id)
3. **Missing event:** LiquidationInstruction from Risk to Order Management
4. **Missing subscription:** Risk should consume LotOpened for PDT correlation
5. **Missing subscription:** Risk should consume WashSaleDetected if Liquidation Sizing considers wash sale implications
6. **Missing event:** LedgerReconciled to coordinate startup lifecycle
## Tags
#cross-context #contract-coherence #ddd #bounded-context #gpt5-excels #architecture-review #integration-analysis