Files
model-research/findings/2026-05-10-boundary-violation-analysis.md
T

3.0 KiB

Boundary Violation Risk Analysis on Context-Level README

Date: 2026-05-10 Document: gargoyle Risk context README (149 lines) Task Type: Boundary violation risk identification Status: Partial experiment (heavy models timed out)

Experiment Design

First experiment targeting a bounded context README rather than a mechanism specification. The analytical lens (boundary violations) matches the document type (architecture/boundaries).

Prompt Categories

  1. Anti-corruption layer gaps
  2. Event contract underspecification
  3. Service responsibility creep
  4. Invariant enforcement gaps
  5. Temporal coupling risks

Models Tested

Model Time Output tokens Status
Claude Sonnet 4.6 19s 1,302 Complete
Claude Opus - - TIMEOUT (300s)
GPT-5 - - TIMEOUT (300s)

Results (Sonnet)

12 findings across 5 categories:

Critical Severity (2)

  • LotClosed for PDT counting: No definition of "same day" across time zones, no partial close handling
  • Kill Switch monotonicity: No mechanism to prevent concurrent escalation level updates

High Severity (8)

  • Tick event translation rules missing (symbol mapping, price normalization)
  • FillReceived currency/lot identification mapping undefined
  • KillSwitchEngaged/Disengaged idempotency semantics missing
  • Liquidation Sizing requires wash sale/liquidity data not provided via events
  • Portfolio Evaluation requires correlation data not provided via events
  • PDT Counting concurrent update atomicity undefined
  • Continuous Monitoring/MarketOpened sequencing unspecified
  • Portfolio Evaluation price staleness tolerance unspecified

Medium Severity (2)

  • MarketDataStale/Fresh staleness threshold undefined
  • Kill switch disengage race between automatic and manual

Key Insights

Context-level analysis is different from spec-level

Findings are about RELATIONSHIPS (how contexts communicate) rather than MECHANISMS (how components work internally). Several findings identify where the README CLAIMS a service has responsibilities but doesn't document the DATA FLOW enabling them.

Document type shapes finding character

Spec-level experiments find implementation gaps (ETS ownership, race conditions). Context-level experiments find BOUNDARY gaps (event contracts, cross-context timing).

Sonnet handles context-level analysis well

With structured prompts (5 categories), Sonnet produces well-organized boundary violation analysis in 19s. The feedback loop finding (#12 - pro-forma staleness causing immediate escalation) shows genuine architectural reasoning.

Pending Work

  • Retry Opus and GPT-5 during lower-contention period
  • Compare whether heavy models find different boundary risks
  • Hypothesis: Opus may excel here (boundary/tension reasoning is its strength)

Practical Implication

For context-level architecture review, Sonnet is sufficient for first-pass boundary violation scanning. Structured prompts keep analysis focused. Worth running Opus as second pass for design-tension identification.