Files
model-research/findings/2026-05-10-boundary-contract-analysis.md
T
Rodin ce4801e8a3 Add Finding #62: Boundary contract analysis (new analytical lens)
Tested on signal-lifecycle.md (111 lines). Results:
- GPT-5: 17 gaps (7,744 reasoning tokens)
- Opus: 11 gaps (design-level focus)
- Sonnet: 8 gaps (fastest, protocol-level)

Key insight: Union of all models (~26 gaps) far exceeds any single
model (max 17). Only 5 gaps found by all three — highly differentiated
outputs make multi-model runs valuable for interface documents.
2026-05-09 23:35:36 -07:00

3.1 KiB

Boundary Contract Analysis — Finding #62

Date: 2026-05-10
Lens: Boundary Contract Analysis (NEW)
Document: gargoyle's signal-lifecycle.md (111 lines)
Models: GPT-5, Claude Opus 4, Claude Sonnet 4

Summary

New analytical lens that examines implicit contracts at component interfaces — what one component promises/expects that another must deliver/understand. Unlike assumption-finding (what must be true) or race condition analysis (temporal interleavings), this focuses specifically on INTERFACE ASSUMPTIONS.

Results

Model Time Output tokens Reasoning tokens Gaps found Critical High
GPT-5 125s 2,062 7,744 17 5 4
Claude Opus 4 ~74s 2,243 (internal) 11 3 4
Claude Sonnet 4 ~40s 947 (internal) 8 2 3

Key Findings

Common Ground (all 3 found)

  • Action normalization responsibility and position state dependency (CRITICAL)
  • Instrument ID resolution timing across corporate actions
  • stop_loss semantic transfer from signal to PortfolioMonitor
  • Quantity/units interpretation for options vs stocks (100x sizing error)
  • Audit log write failure handling

GPT-5 Unique (5 most significant)

  1. Signal fan-out double-execution (CRITICAL) — "one signal can appear under many decisions" creates execution-level hazard with no dedupe contract
  2. Signal replay/dedup gap — pipeline processes duplicates normally, only audit-level symptoms
  3. Instrument resolution trust boundary — wrong-but-known instrument_id passes through
  4. Late-arriving signals silently re-grouped — no notification or audit
  5. Ticker vs instrument_id mismatch — misleading observability

Opus Unique

  1. Entry price reconciliation — multiple signals with different entry_prices aggregate; which wins?
  2. Aggregator group identification key — not specified in signal fields
  3. Backpressure expiration criteria — FIFO without priority could drop risk-critical signals

Sonnet Unique

  1. Signal ordering contract — close signal could arrive before buy signal
  2. Signal ID generation entropy — poor entropy could cause collisions

Model Strengths for This Lens

Model Strength Best For
GPT-5 Exhaustive validation gap enumeration Comprehensive boundary audits
Opus Design-level incompleteness "Model is fundamentally underspecified"
Sonnet Protocol/temporal assumptions Quick first-pass screening

Key Insight

The union of all findings (~26 distinct gaps) significantly exceeds any single model's output (17, 11, 8). Only 5 gaps were found by all three models. This lens produces highly differentiated outputs across models — run all three for architecture documents describing component interfaces.

Practical Application

For documents that describe component interfaces, boundary contract analysis is high-value:

  1. Run Sonnet first for quick temporal/protocol screening (40s, cheap)
  2. Run GPT-5 for exhaustive validation/semantic gaps (125s, thorough)
  3. Run Opus for design-level coherence gaps (74s, insightful)

The combination catches significantly more issues than any single pass.