From ce4801e8a370b1d0e75fea10dfa8d86a022b42f5 Mon Sep 17 00:00:00 2001 From: Rodin Date: Sat, 9 May 2026 23:35:36 -0700 Subject: [PATCH] Add Finding #62: Boundary contract analysis (new analytical lens) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Tested on signal-lifecycle.md (111 lines). Results: - GPT-5: 17 gaps (7,744 reasoning tokens) - Opus: 11 gaps (design-level focus) - Sonnet: 8 gaps (fastest, protocol-level) Key insight: Union of all models (~26 gaps) far exceeds any single model (max 17). Only 5 gaps found by all three — highly differentiated outputs make multi-model runs valuable for interface documents. --- .../2026-05-10-boundary-contract-analysis.md | 64 +++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 findings/2026-05-10-boundary-contract-analysis.md diff --git a/findings/2026-05-10-boundary-contract-analysis.md b/findings/2026-05-10-boundary-contract-analysis.md new file mode 100644 index 0000000..fc23676 --- /dev/null +++ b/findings/2026-05-10-boundary-contract-analysis.md @@ -0,0 +1,64 @@ +# Boundary Contract Analysis — Finding #62 + +**Date:** 2026-05-10 +**Lens:** Boundary Contract Analysis (NEW) +**Document:** gargoyle's `signal-lifecycle.md` (111 lines) +**Models:** GPT-5, Claude Opus 4, Claude Sonnet 4 + +## Summary + +New analytical lens that examines implicit contracts at component interfaces — what one component promises/expects that another must deliver/understand. Unlike assumption-finding (what must be true) or race condition analysis (temporal interleavings), this focuses specifically on INTERFACE ASSUMPTIONS. + +## Results + +| Model | Time | Output tokens | Reasoning tokens | Gaps found | Critical | High | +|---|---|---|---|---|---|---| +| GPT-5 | 125s | 2,062 | 7,744 | 17 | 5 | 4 | +| Claude Opus 4 | ~74s | 2,243 | (internal) | 11 | 3 | 4 | +| Claude Sonnet 4 | ~40s | 947 | (internal) | 8 | 2 | 3 | + +## Key Findings + +### Common Ground (all 3 found) +- Action normalization responsibility and position state dependency (CRITICAL) +- Instrument ID resolution timing across corporate actions +- stop_loss semantic transfer from signal to PortfolioMonitor +- Quantity/units interpretation for options vs stocks (100x sizing error) +- Audit log write failure handling + +### GPT-5 Unique (5 most significant) +1. **Signal fan-out double-execution** (CRITICAL) — "one signal can appear under many decisions" creates execution-level hazard with no dedupe contract +2. **Signal replay/dedup gap** — pipeline processes duplicates normally, only audit-level symptoms +3. **Instrument resolution trust boundary** — wrong-but-known instrument_id passes through +4. **Late-arriving signals silently re-grouped** — no notification or audit +5. **Ticker vs instrument_id mismatch** — misleading observability + +### Opus Unique +1. **Entry price reconciliation** — multiple signals with different entry_prices aggregate; which wins? +2. **Aggregator group identification key** — not specified in signal fields +3. **Backpressure expiration criteria** — FIFO without priority could drop risk-critical signals + +### Sonnet Unique +1. **Signal ordering contract** — close signal could arrive before buy signal +2. **Signal ID generation entropy** — poor entropy could cause collisions + +## Model Strengths for This Lens + +| Model | Strength | Best For | +|---|---|---| +| GPT-5 | Exhaustive validation gap enumeration | Comprehensive boundary audits | +| Opus | Design-level incompleteness | "Model is fundamentally underspecified" | +| Sonnet | Protocol/temporal assumptions | Quick first-pass screening | + +## Key Insight + +The union of all findings (~26 distinct gaps) significantly exceeds any single model's output (17, 11, 8). Only 5 gaps were found by all three models. This lens produces highly differentiated outputs across models — run all three for architecture documents describing component interfaces. + +## Practical Application + +For documents that describe component interfaces, boundary contract analysis is high-value: +1. Run Sonnet first for quick temporal/protocol screening (40s, cheap) +2. Run GPT-5 for exhaustive validation/semantic gaps (125s, thorough) +3. Run Opus for design-level coherence gaps (74s, insightful) + +The combination catches significantly more issues than any single pass.