Boundary Violation Risk Analysis on Context-Level README

Date: 2026-05-10 Document: gargoyle Risk context README (149 lines) Task Type: Boundary violation risk identification Status: Partial experiment (heavy models timed out)

Experiment Design

First experiment targeting a bounded context README rather than a mechanism specification. The analytical lens (boundary violations) matches the document type (architecture/boundaries).

Prompt Categories

Anti-corruption layer gaps
Event contract underspecification
Service responsibility creep
Invariant enforcement gaps
Temporal coupling risks

Models Tested

Model	Time	Output tokens	Status
Claude Sonnet 4.6	19s	1,302	Complete
Claude Opus	-	-	TIMEOUT (300s)
GPT-5	-	-	TIMEOUT (300s)

Results (Sonnet)

12 findings across 5 categories:

Critical Severity (2)

LotClosed for PDT counting: No definition of "same day" across time zones, no partial close handling
Kill Switch monotonicity: No mechanism to prevent concurrent escalation level updates

High Severity (8)

Tick event translation rules missing (symbol mapping, price normalization)
FillReceived currency/lot identification mapping undefined
KillSwitchEngaged/Disengaged idempotency semantics missing
Liquidation Sizing requires wash sale/liquidity data not provided via events
Portfolio Evaluation requires correlation data not provided via events
PDT Counting concurrent update atomicity undefined
Continuous Monitoring/MarketOpened sequencing unspecified
Portfolio Evaluation price staleness tolerance unspecified

Medium Severity (2)

MarketDataStale/Fresh staleness threshold undefined
Kill switch disengage race between automatic and manual

Key Insights

Context-level analysis is different from spec-level

Findings are about RELATIONSHIPS (how contexts communicate) rather than MECHANISMS (how components work internally). Several findings identify where the README CLAIMS a service has responsibilities but doesn't document the DATA FLOW enabling them.

Document type shapes finding character

Spec-level experiments find implementation gaps (ETS ownership, race conditions). Context-level experiments find BOUNDARY gaps (event contracts, cross-context timing).

Sonnet handles context-level analysis well

With structured prompts (5 categories), Sonnet produces well-organized boundary violation analysis in 19s. The feedback loop finding (#12 - pro-forma staleness causing immediate escalation) shows genuine architectural reasoning.

Pending Work

Retry Opus and GPT-5 during lower-contention period
Compare whether heavy models find different boundary risks
Hypothesis: Opus may excel here (boundary/tension reasoning is its strength)

Practical Implication

For context-level architecture review, Sonnet is sufficient for first-pass boundary violation scanning. Structured prompts keep analysis focused. Worth running Opus as second pass for design-tension identification.

3.0 KiB Raw Blame History