From d27ce6f5e13eda5e152c0569a81d333135b4e4b2 Mon Sep 17 00:00:00 2001 From: claw Date: Thu, 7 May 2026 07:47:11 -0700 Subject: [PATCH] finding #38: regulatory compliance gap analysis (FINRA/PDT domain knowledge test) First experiment testing domain-specific regulatory knowledge rather than pure architectural reasoning. Opus demonstrates deepest FINRA Rule 4210 knowledge; GPT-5 finds broker-API semantic mismatches; content filters are a new failure mode for financial domain analysis via enterprise proxies. --- ...7-38-regulatory-compliance-gap-analysis.md | 131 ++++++++++++++++++ 1 file changed, 131 insertions(+) create mode 100644 findings/2026-05-07-38-regulatory-compliance-gap-analysis.md diff --git a/findings/2026-05-07-38-regulatory-compliance-gap-analysis.md b/findings/2026-05-07-38-regulatory-compliance-gap-analysis.md new file mode 100644 index 0000000..2c2383d --- /dev/null +++ b/findings/2026-05-07-38-regulatory-compliance-gap-analysis.md @@ -0,0 +1,131 @@ +# Finding #38: Regulatory Compliance Gap Analysis + +**Date:** 2026-05-07 +**Document:** `docs/impl/dtbp-margin-call.md` (363 lines) +**Task type:** Domain-specific regulatory knowledge test (FINRA/SEC PDT rules) +**Models:** GPT-5, Claude Opus 4.6, Claude Sonnet 4.6 + +## Experiment Design + +First experiment testing **domain-specific regulatory knowledge** rather than pure +architectural reasoning. Asked models to identify where the implementation design +might violate or inadequately handle actual FINRA/SEC regulatory requirements +around Pattern Day Trader (PDT) rules and margin calls. + +Prompt specified 5 categories: +1. Regulatory gaps (FINRA/SEC PDT rules, Reg T requirements) +2. Broker semantic mismatches (API field meanings under real conditions) +3. Temporal edge cases (market boundaries, holidays, early closes) +4. State machine incompleteness (missing states/transitions) +5. Calculation correctness (DTBP arithmetic under specific order patterns) + +## Results + +| Model | Time | Output tokens | Reasoning tokens | Findings | +|---|---|---|---|---| +| GPT-5 | 155s | 11,734 | 9,024 | 13+ (cut off by content filter) | +| Claude Opus 4.6 | 117s | 5,049 | (internal) | 15 | +| Claude Sonnet 4.6 | ~39s | 1,938 | (internal) | 12 | + +**NOTE:** GPT-5's response was terminated by the SAP HAI proxy's content safety +filter (financial/trading content triggered it), cutting off mid-finding #13. + +## Common Ground (all 3 identified) + +- Short sale DTBP consumption not tracked (buy-only accumulation) +- Options assignment creating untracked DTBP consumption +- Market close/open boundary timing issues +- Margin call detection relying solely on DTBP numeric comparison +- 5-day cure period calendar computation edge cases + +## GPT-5 Unique Findings + +- `account.buying_power` already being 2× from broker → system double-multiplies + to 4× in overnight mode (concrete implementation bug) +- After-hours trades consuming DTBP that resets at 4pm (dtbp_used_today reset too + early for same-day extended session) +- Premarket DTBP enforcement gap (broker enforces DTBP in extended hours but system + uses 2× overnight mode pre-open) +- House/concentration surcharges consuming DTBP faster than notional cost +- GTC orders executing after-hours at 4× sizing while system is in 2× overnight +- FIFO/LIFO matching ambiguity for partial sell DTBP release + +## Claude Opus Unique Findings + +- **PDT designation trigger gap:** System passively reads PDT status but doesn't + preemptively gate the 4th day trade that CAUSES designation; $25k equity not + verified before triggering trade +- **90-day freeze allows day trades:** Design restricts to 1× buying power but + FINRA actually PROHIBITS the activity entirely during escalation (not just + restricts leverage) — a genuine regulatory violation +- **Margin call issuance date recovery:** If pipeline is down when call is issued, + system sets issued_at to detection time, not actual issuance → extends cure + period beyond regulatory 5 days +- **Time-and-tick accounting requirement:** FINRA requires tracking maximum open + commitment (high water mark) for DTBP, not net basis — the release logic may + violate this +- **Multiple concurrent margin calls:** Second call upserts over first, losing the + earlier deadline (single-state-per-user model inadequate) +- **dtbp_used_today NOT reset in margin call mode:** Close sequence guard + (`bp_mode != :margin_dtbp`) skips reset, causing stale accumulation +- **Cash account free-riding 90-day freeze:** Broader Reg T scope not modeled +- **Broker re-query race on rapid fills:** Response ordering creates stale DTBP + window between consecutive fills + +## Claude Sonnet Unique Findings + +- PDT designation timing mismatch (Gargoyle vs broker overnight batch) +- Wash sale impact on maintenance requirements affecting DTBP (IRS interaction) + +## Key Insights + +### 1. Regulatory domain expertise varies significantly across models + +- **Opus has deepest regulatory knowledge.** Cited specific FINRA Rule 4210 + subsections, understood the distinction between restricting leverage vs + prohibiting activity, and knew about time-and-tick DTBP accounting. +- **GPT-5 has deepest broker-API semantic knowledge.** Reasoned about what + specific broker API fields actually mean vs what the design assumes + (buying_power already being 2×, DTBP in extended hours, house surcharges). +- **Sonnet is competent but surface-level.** Good coverage for a first pass + but doesn't match regulatory depth of Opus or semantic precision of GPT-5. + +### 2. Domain-specific lens changes model ranking + +In general assumption-finding (previous experiments): +- GPT-5 > Sonnet > Opus (by count) +- Opus > GPT-5 > Sonnet (by insight per finding) + +In regulatory compliance analysis: +- Opus > GPT-5 > Sonnet (by regulatory significance) +- GPT-5 > Opus > Sonnet (by broker-semantic precision) + +The regulatory lens ELEVATED Opus because it triggered domain-specific +knowledge that Opus possesses more deeply than the other models. + +### 3. Content filters as a new failure mode + +Enterprise AI proxies may filter financial/regulatory analytical content. +GPT-5's response was cut off by content safety — a failure mode not seen +in architectural analysis. For production regulatory compliance review, +use direct API access or configure filters for analytical discourse. + +## Practical Implications + +For systems with regulatory requirements (finance, healthcare, legal): +- **Run Opus for regulatory compliance analysis** — its domain knowledge + produces findings other models won't surface +- **Combine with GPT-5 for implementation semantics** — what does this API + field actually mean in practice? +- **Sonnet for fast first-pass** but not sole reviewer for regulatory matters +- **Direct API access for financial domain** — enterprise proxy content + filters may interfere + +## Comparison to Previous Experiments + +This extends the finding from #11 and #13 that task type changes model +performance. Here we show that task DOMAIN also matters. A model's strength +on architectural reasoning doesn't predict its strength on regulatory +reasoning. The optimal model assignment depends on both: +- Task type (assumptions vs races vs compliance) +- Task domain (architecture vs regulation vs security)