Finding #63: External System Assumptions Analysis

New analytical lens examining implicit assumptions about broker APIs,
market behavior, network conditions, and timing.

Document: gargoyle's feeds-and-instruments.md (115 lines)
Models: GPT-5 (24 findings), Opus (15), Sonnet (15)

Key insight: External system assumptions benefit more from reasoning
depth than internal architecture analysis. GPT-5's exhaustive coverage
of broker implementation details and network failure modes justifies
the token cost for critical external interfaces.

Union of all models finds ~30 distinct assumptions vs ~24 max single model.
This commit is contained in:
Rodin
2026-05-10 02:27:53 -07:00
parent ce4801e8a3
commit b9036401c2
@@ -0,0 +1,134 @@
# External System Assumptions Analysis — Finding #63
**Date:** 2026-05-10
**Lens:** External System Assumptions (NEW)
**Document:** gargoyle's `feeds-and-instruments.md` (115 lines)
**Models:** GPT-5, Claude Opus 4, Claude Sonnet 4
## Summary
New analytical lens examining implicit assumptions about external systems (broker APIs, market behavior, network conditions, timing) that the document takes for granted. Unlike boundary contract analysis (Finding #62) which examines internal component interfaces, this lens specifically targets dependencies on systems OUTSIDE the platform's control.
## Results
| Model | Time | Output tokens | Reasoning tokens | Assumptions found | Critical | High |
|---|---|---|---|---|---|---|
| GPT-5 | 89s | 6,557 | 4,032 | 24 | 2 | 12 |
| Claude Opus 4 | 58s | 2,773 | (internal) | 15 | 2 | 6 |
| Claude Sonnet 4 | 17s | 1,297 | (internal) | 15 | 4 | 6 |
## Key Findings by Category
### Broker API Behavior (all models found)
- **Subscription capacity limits** — Broker APIs limit concurrent subscriptions (30-500 symbols typically). Discovery feeds can easily exceed limits during volatile markets.
- **Historical backfill rate limits** — Broker throttles historical data requests during peak activity when discovery feeds add many symbols.
- **Bid/ask availability gaps** — Pre-market, after-hours, halted symbols, illiquid names may lack bid/ask quotes.
- **Reconnection under load** — Broker rate-limits reconnection attempts; exponential backoff + token expiry during outages.
### GPT-5 Unique Findings (9 not in either Claude model)
1. **Out-of-order delivery across trade/quote channels** — Trades and quotes on separate channels with different latencies; cache inconsistent state
2. **Ticker string stability across broker endpoints** — Corporate actions rename tickers; broker uses different IDs for history vs real-time
3. **Data entitlement/licensing gaps** — Account may lack entitlement for certain exchanges; delayed data served unknowingly
4. **Network latency triggering false staleness** — Regional outages add seconds of latency exceeding staleness threshold
5. **Late prints/corrections updating with older timestamps** — Price appears to jump backward
6. **OHLCV semantic differences across asset classes** — Options/OTC/FX have irregular volume, odd-lot filtering differences
7. **Domain event timestamp skew** — Event emission uses local clock skewed from broker timestamps
8. **Broker infrastructure throttling during volatility** — Stream lags minutes but staleness detection won't catch if using broker timestamps
9. **Bar boundary misalignment (DST, pre/post-market)** — Broker builds bars in exchange local time; platform assumes UTC RTH-only
### Opus Unique Findings (4 not in GPT-5 or Sonnet)
1. **Deduplication discards corrected bars** — Broker reissues bar correction for same timestamp (VWAP adjustments, late trades); correction discarded as duplicate. Directly impacts Daily P&L accuracy.
2. **Half-open connection detection gap** — TCP connection remains open but broker stops sending; no heartbeat mechanism mentioned. System believes connection is live with fresh prices.
3. **TTL misalignment with discovery timing** — 24-hour TTL from 3PM discovery means stale symbol uses quota while new most-active is undiscovered until next feed run.
4. **Discovery criteria meaninglessness during flash crash** — Market-wide event causes all symbols to show "unusual activity"; criteria adds noise symbols.
### Sonnet Unique Findings (2 not in other models)
1. **Concurrent race on price bar updates** — High-frequency ingestion causes race conditions during deduplication
2. **Volume data accuracy from trade reporting delays** — Exchange cancellations affect volume-based indicators
## Model Analytical Style Comparison
### GPT-5: Exhaustive + Implementation-Specific
GPT-5 produced 24 findings (60% more than either Claude model) with detailed failure scenarios and specific technical details (channel separation, timestamp sources, entitlement systems). Many findings were highly broker-specific and demonstrated reasoning about how real broker systems work.
**Strengths:**
- Identified cross-channel ordering issues (trade vs quote channels)
- Found entitlement/licensing gaps that others missed entirely
- Detailed timestamp source analysis (broker time vs receipt time vs exchange time)
**Weakness:** Some findings were lower-severity operational concerns rather than architectural assumptions.
### Opus: Concise + Design-Level
Opus produced 15 findings in 58s (faster than GPT-5) with a structured format including a severity summary table. Findings were more curated — fewer but more architecturally significant.
**Strengths:**
- Identified the bar correction/deduplication conflict (design flaw with P&L impact)
- Half-open connection gap (mentions specific missing mechanism: heartbeat)
- Connected discovery timing to resource allocation lag
**Weakness:** Missed the cross-channel ordering issues and entitlement gaps that GPT-5 found.
### Sonnet: Fast + Surface-Level
Sonnet produced 15 findings in only 17s — 5x faster than GPT-5. Quality was acceptable but more generic. Several findings restated the document's failure modes table rather than identifying unstated assumptions.
**Strengths:**
- Fast first-pass screening
- Identified concurrency race that others missed (likely due to Elixir familiarity)
**Weakness:** Fewer genuinely novel assumptions; some findings were obvious or redundant with document's own failure modes.
## Overlap Analysis
| Finding Category | All 3 | GPT-5 + Opus | GPT-5 + Sonnet | Opus + Sonnet | Unique |
|-----------------|-------|--------------|----------------|---------------|--------|
| Subscription limits | ✓ | | | | |
| Backfill rate limits | ✓ | | | | |
| Bid/ask availability | ✓ | | | | |
| Reconnection failures | ✓ | | | | |
| Staleness = halt confusion | ✓ | | | | |
| Data format changes | ✓ | | | | |
| Out-of-order delivery | | | | | GPT-5 |
| Ticker stability | | | | | GPT-5 |
| Entitlement gaps | | | | | GPT-5 |
| Bar correction discard | | | | | Opus |
| Half-open connections | | | | | Opus |
| Concurrency race | | | | | Sonnet |
**Common ground:** 6 findings appeared in all three models. These represent the most obvious external system assumptions.
**Union coverage:** ~30 distinct assumptions across all models. No single model found more than 80% of the total.
## Key Insight
**External system assumptions benefit from reasoning depth more than other analytical tasks.**
This document is relatively simple (115 lines, single component) but describes a system boundary with complex external dependencies (broker APIs, network behavior, market conditions). The gap between GPT-5 (24 findings) and Claude models (15 each) is larger here than in Finding #62 (boundary contract analysis on a 111-line doc: 17/11/8).
The difference: external system assumptions require reasoning about systems NOT described in the document — broker implementation details, network failure modes, market microstructure. This benefits from GPT-5's 4,032 reasoning tokens more than internal architecture analysis does.
**Comparison with assumption-finding on internal docs (Finding #10):**
- Internal assumption analysis (cold-start-and-recovery.md): GPT-5 found 26, GPT-4.1 found 14, Mini found 12
- External assumption analysis (feeds-and-instruments.md): GPT-5 found 24, Opus found 15, Sonnet found 15
The Claude models perform relatively better on external assumptions than GPT-4.x did on internal assumptions (~63% vs ~54%). This suggests Opus/Sonnet have better mental models of external systems (broker APIs, market behavior) than of internal architecture patterns.
## Practical Application
For documents describing system boundaries with external dependencies:
1. **Run GPT-5 first** — Produces exhaustive coverage of external failure modes
2. **Run Opus second** — Catches design-level gaps GPT-5 frames as operational concerns
3. **Run Sonnet for speed** — 17s screening catches ~60% of findings; use when time-constrained
The union of all three models finds significantly more assumptions than any single model. For critical external interfaces (broker connections, payment processors, auth providers), running all three is worth the cost.
## Cost Analysis
| Model | Time | Input | Output | Reasoning | Approximate Cost |
|-------|------|-------|--------|-----------|-----------------|
| GPT-5 | 89s | 1,236 | 6,557 | 4,032 | ~$0.40 |
| Opus | 58s | 1,427 | 2,773 | (internal) | ~$0.12 |
| Sonnet | 17s | 1,427 | 1,297 | (internal) | ~$0.02 |
**Total cost for all three:** ~$0.54 for comprehensive external dependency analysis.
**Value:** Finding a single Critical assumption before production is worth far more than the analysis cost.