3.0 KiB
Boundary Violation Risk Analysis on Context-Level README
Date: 2026-05-10 Document: gargoyle Risk context README (149 lines) Task Type: Boundary violation risk identification Status: Partial experiment (heavy models timed out)
Experiment Design
First experiment targeting a bounded context README rather than a mechanism specification. The analytical lens (boundary violations) matches the document type (architecture/boundaries).
Prompt Categories
- Anti-corruption layer gaps
- Event contract underspecification
- Service responsibility creep
- Invariant enforcement gaps
- Temporal coupling risks
Models Tested
| Model | Time | Output tokens | Status |
|---|---|---|---|
| Claude Sonnet 4.6 | 19s | 1,302 | Complete |
| Claude Opus | - | - | TIMEOUT (300s) |
| GPT-5 | - | - | TIMEOUT (300s) |
Results (Sonnet)
12 findings across 5 categories:
Critical Severity (2)
- LotClosed for PDT counting: No definition of "same day" across time zones, no partial close handling
- Kill Switch monotonicity: No mechanism to prevent concurrent escalation level updates
High Severity (8)
- Tick event translation rules missing (symbol mapping, price normalization)
- FillReceived currency/lot identification mapping undefined
- KillSwitchEngaged/Disengaged idempotency semantics missing
- Liquidation Sizing requires wash sale/liquidity data not provided via events
- Portfolio Evaluation requires correlation data not provided via events
- PDT Counting concurrent update atomicity undefined
- Continuous Monitoring/MarketOpened sequencing unspecified
- Portfolio Evaluation price staleness tolerance unspecified
Medium Severity (2)
- MarketDataStale/Fresh staleness threshold undefined
- Kill switch disengage race between automatic and manual
Key Insights
Context-level analysis is different from spec-level
Findings are about RELATIONSHIPS (how contexts communicate) rather than MECHANISMS (how components work internally). Several findings identify where the README CLAIMS a service has responsibilities but doesn't document the DATA FLOW enabling them.
Document type shapes finding character
Spec-level experiments find implementation gaps (ETS ownership, race conditions). Context-level experiments find BOUNDARY gaps (event contracts, cross-context timing).
Sonnet handles context-level analysis well
With structured prompts (5 categories), Sonnet produces well-organized boundary violation analysis in 19s. The feedback loop finding (#12 - pro-forma staleness causing immediate escalation) shows genuine architectural reasoning.
Pending Work
- Retry Opus and GPT-5 during lower-contention period
- Compare whether heavy models find different boundary risks
- Hypothesis: Opus may excel here (boundary/tension reasoning is its strength)
Practical Implication
For context-level architecture review, Sonnet is sufficient for first-pass boundary violation scanning. Structured prompts keep analysis focused. Worth running Opus as second pass for design-tension identification.