# Finding #38: Regulatory Compliance Gap Analysis **Date:** 2026-05-07 **Document:** `docs/impl/dtbp-margin-call.md` (363 lines) **Task type:** Domain-specific regulatory knowledge test (FINRA/SEC PDT rules) **Models:** GPT-5, Claude Opus 4.6, Claude Sonnet 4.6 ## Experiment Design First experiment testing **domain-specific regulatory knowledge** rather than pure architectural reasoning. Asked models to identify where the implementation design might violate or inadequately handle actual FINRA/SEC regulatory requirements around Pattern Day Trader (PDT) rules and margin calls. Prompt specified 5 categories: 1. Regulatory gaps (FINRA/SEC PDT rules, Reg T requirements) 2. Broker semantic mismatches (API field meanings under real conditions) 3. Temporal edge cases (market boundaries, holidays, early closes) 4. State machine incompleteness (missing states/transitions) 5. Calculation correctness (DTBP arithmetic under specific order patterns) ## Results | Model | Time | Output tokens | Reasoning tokens | Findings | |---|---|---|---|---| | GPT-5 | 155s | 11,734 | 9,024 | 13+ (cut off by content filter) | | Claude Opus 4.6 | 117s | 5,049 | (internal) | 15 | | Claude Sonnet 4.6 | ~39s | 1,938 | (internal) | 12 | **NOTE:** GPT-5's response was terminated by the SAP HAI proxy's content safety filter (financial/trading content triggered it), cutting off mid-finding #13. ## Common Ground (all 3 identified) - Short sale DTBP consumption not tracked (buy-only accumulation) - Options assignment creating untracked DTBP consumption - Market close/open boundary timing issues - Margin call detection relying solely on DTBP numeric comparison - 5-day cure period calendar computation edge cases ## GPT-5 Unique Findings - `account.buying_power` already being 2× from broker → system double-multiplies to 4× in overnight mode (concrete implementation bug) - After-hours trades consuming DTBP that resets at 4pm (dtbp_used_today reset too early for same-day extended session) - Premarket DTBP enforcement gap (broker enforces DTBP in extended hours but system uses 2× overnight mode pre-open) - House/concentration surcharges consuming DTBP faster than notional cost - GTC orders executing after-hours at 4× sizing while system is in 2× overnight - FIFO/LIFO matching ambiguity for partial sell DTBP release ## Claude Opus Unique Findings - **PDT designation trigger gap:** System passively reads PDT status but doesn't preemptively gate the 4th day trade that CAUSES designation; $25k equity not verified before triggering trade - **90-day freeze allows day trades:** Design restricts to 1× buying power but FINRA actually PROHIBITS the activity entirely during escalation (not just restricts leverage) — a genuine regulatory violation - **Margin call issuance date recovery:** If pipeline is down when call is issued, system sets issued_at to detection time, not actual issuance → extends cure period beyond regulatory 5 days - **Time-and-tick accounting requirement:** FINRA requires tracking maximum open commitment (high water mark) for DTBP, not net basis — the release logic may violate this - **Multiple concurrent margin calls:** Second call upserts over first, losing the earlier deadline (single-state-per-user model inadequate) - **dtbp_used_today NOT reset in margin call mode:** Close sequence guard (`bp_mode != :margin_dtbp`) skips reset, causing stale accumulation - **Cash account free-riding 90-day freeze:** Broader Reg T scope not modeled - **Broker re-query race on rapid fills:** Response ordering creates stale DTBP window between consecutive fills ## Claude Sonnet Unique Findings - PDT designation timing mismatch (Gargoyle vs broker overnight batch) - Wash sale impact on maintenance requirements affecting DTBP (IRS interaction) ## Key Insights ### 1. Regulatory domain expertise varies significantly across models - **Opus has deepest regulatory knowledge.** Cited specific FINRA Rule 4210 subsections, understood the distinction between restricting leverage vs prohibiting activity, and knew about time-and-tick DTBP accounting. - **GPT-5 has deepest broker-API semantic knowledge.** Reasoned about what specific broker API fields actually mean vs what the design assumes (buying_power already being 2×, DTBP in extended hours, house surcharges). - **Sonnet is competent but surface-level.** Good coverage for a first pass but doesn't match regulatory depth of Opus or semantic precision of GPT-5. ### 2. Domain-specific lens changes model ranking In general assumption-finding (previous experiments): - GPT-5 > Sonnet > Opus (by count) - Opus > GPT-5 > Sonnet (by insight per finding) In regulatory compliance analysis: - Opus > GPT-5 > Sonnet (by regulatory significance) - GPT-5 > Opus > Sonnet (by broker-semantic precision) The regulatory lens ELEVATED Opus because it triggered domain-specific knowledge that Opus possesses more deeply than the other models. ### 3. Content filters as a new failure mode Enterprise AI proxies may filter financial/regulatory analytical content. GPT-5's response was cut off by content safety — a failure mode not seen in architectural analysis. For production regulatory compliance review, use direct API access or configure filters for analytical discourse. ## Practical Implications For systems with regulatory requirements (finance, healthcare, legal): - **Run Opus for regulatory compliance analysis** — its domain knowledge produces findings other models won't surface - **Combine with GPT-5 for implementation semantics** — what does this API field actually mean in practice? - **Sonnet for fast first-pass** but not sole reviewer for regulatory matters - **Direct API access for financial domain** — enterprise proxy content filters may interfere ## Comparison to Previous Experiments This extends the finding from #11 and #13 that task type changes model performance. Here we show that task DOMAIN also matters. A model's strength on architectural reasoning doesn't predict its strength on regulatory reasoning. The optimal model assignment depends on both: - Task type (assumptions vs races vs compliance) - Task domain (architecture vs regulation vs security)