Add Finding #30: Boundary violation analysis on context README

2026-05-10 17:28:54 -07:00
parent 8adf09b3fb
commit bb50188e63
1 changed files with 73 additions and 0 deletions
@@ -0,0 +1,73 @@
 # Boundary Violation Risk Analysis on Context-Level README
 **Date:** 2026-05-10
 **Document:** gargoyle Risk context README (149 lines)
 **Task Type:** Boundary violation risk identification
 **Status:** Partial experiment (heavy models timed out)
 ## Experiment Design
 First experiment targeting a bounded context README rather than a mechanism specification.
 The analytical lens (boundary violations) matches the document type (architecture/boundaries).
 ### Prompt Categories
 1. Anti-corruption layer gaps
 2. Event contract underspecification
 3. Service responsibility creep
 4. Invariant enforcement gaps
 5. Temporal coupling risks
 ### Models Tested
 | Model | Time | Output tokens | Status |
 |---|---|---|---|
 | Claude Sonnet 4.6 | 19s | 1,302 | Complete |
 | Claude Opus | - | - | TIMEOUT (300s) |
 | GPT-5 | - | - | TIMEOUT (300s) |
 ## Results (Sonnet)
 12 findings across 5 categories:
 ### Critical Severity (2)
 - **LotClosed for PDT counting:** No definition of "same day" across time zones, no partial close handling
 - **Kill Switch monotonicity:** No mechanism to prevent concurrent escalation level updates
 ### High Severity (8)
 - Tick event translation rules missing (symbol mapping, price normalization)
 - FillReceived currency/lot identification mapping undefined
 - KillSwitchEngaged/Disengaged idempotency semantics missing
 - Liquidation Sizing requires wash sale/liquidity data not provided via events
 - Portfolio Evaluation requires correlation data not provided via events
 - PDT Counting concurrent update atomicity undefined
 - Continuous Monitoring/MarketOpened sequencing unspecified
 - Portfolio Evaluation price staleness tolerance unspecified
 ### Medium Severity (2)
 - MarketDataStale/Fresh staleness threshold undefined
 - Kill switch disengage race between automatic and manual
 ## Key Insights
 ### Context-level analysis is different from spec-level
 Findings are about RELATIONSHIPS (how contexts communicate) rather than MECHANISMS
 (how components work internally). Several findings identify where the README CLAIMS
 a service has responsibilities but doesn't document the DATA FLOW enabling them.
 ### Document type shapes finding character
 Spec-level experiments find implementation gaps (ETS ownership, race conditions).
 Context-level experiments find BOUNDARY gaps (event contracts, cross-context timing).
 ### Sonnet handles context-level analysis well
 With structured prompts (5 categories), Sonnet produces well-organized boundary
 violation analysis in 19s. The feedback loop finding (#12 - pro-forma staleness
 causing immediate escalation) shows genuine architectural reasoning.
 ## Pending Work
 - Retry Opus and GPT-5 during lower-contention period
 - Compare whether heavy models find different boundary risks
 - Hypothesis: Opus may excel here (boundary/tension reasoning is its strength)
 ## Practical Implication
 For context-level architecture review, Sonnet is sufficient for first-pass
 boundary violation scanning. Structured prompts keep analysis focused.
 Worth running Opus as second pass for design-tension identification.