From d27ce6f5e13eda5e152c0569a81d333135b4e4b2 Mon Sep 17 00:00:00 2001
From: claw <claw@weiker.me>
Date: Thu, 7 May 2026 07:47:11 -0700
Subject: [PATCH] finding #38: regulatory compliance gap analysis (FINRA/PDT
 domain knowledge test)

First experiment testing domain-specific regulatory knowledge rather than
pure architectural reasoning. Opus demonstrates deepest FINRA Rule 4210
knowledge; GPT-5 finds broker-API semantic mismatches; content filters
are a new failure mode for financial domain analysis via enterprise proxies.
---
 ...7-38-regulatory-compliance-gap-analysis.md | 131 ++++++++++++++++++
 1 file changed, 131 insertions(+)
 create mode 100644 findings/2026-05-07-38-regulatory-compliance-gap-analysis.md

diff --git a/findings/2026-05-07-38-regulatory-compliance-gap-analysis.md b/findings/2026-05-07-38-regulatory-compliance-gap-analysis.md
new file mode 100644
index 0000000..2c2383d
--- /dev/null
+++ b/findings/2026-05-07-38-regulatory-compliance-gap-analysis.md
@@ -0,0 +1,131 @@
+# Finding #38: Regulatory Compliance Gap Analysis
+
+**Date:** 2026-05-07
+**Document:** `docs/impl/dtbp-margin-call.md` (363 lines)
+**Task type:** Domain-specific regulatory knowledge test (FINRA/SEC PDT rules)
+**Models:** GPT-5, Claude Opus 4.6, Claude Sonnet 4.6
+
+## Experiment Design
+
+First experiment testing **domain-specific regulatory knowledge** rather than pure
+architectural reasoning. Asked models to identify where the implementation design
+might violate or inadequately handle actual FINRA/SEC regulatory requirements
+around Pattern Day Trader (PDT) rules and margin calls.
+
+Prompt specified 5 categories:
+1. Regulatory gaps (FINRA/SEC PDT rules, Reg T requirements)
+2. Broker semantic mismatches (API field meanings under real conditions)
+3. Temporal edge cases (market boundaries, holidays, early closes)
+4. State machine incompleteness (missing states/transitions)
+5. Calculation correctness (DTBP arithmetic under specific order patterns)
+
+## Results
+
+| Model | Time | Output tokens | Reasoning tokens | Findings |
+|---|---|---|---|---|
+| GPT-5 | 155s | 11,734 | 9,024 | 13+ (cut off by content filter) |
+| Claude Opus 4.6 | 117s | 5,049 | (internal) | 15 |
+| Claude Sonnet 4.6 | ~39s | 1,938 | (internal) | 12 |
+
+**NOTE:** GPT-5's response was terminated by the SAP HAI proxy's content safety
+filter (financial/trading content triggered it), cutting off mid-finding #13.
+
+## Common Ground (all 3 identified)
+
+- Short sale DTBP consumption not tracked (buy-only accumulation)
+- Options assignment creating untracked DTBP consumption
+- Market close/open boundary timing issues
+- Margin call detection relying solely on DTBP numeric comparison
+- 5-day cure period calendar computation edge cases
+
+## GPT-5 Unique Findings
+
+- `account.buying_power` already being 2× from broker → system double-multiplies
+  to 4× in overnight mode (concrete implementation bug)
+- After-hours trades consuming DTBP that resets at 4pm (dtbp_used_today reset too
+  early for same-day extended session)
+- Premarket DTBP enforcement gap (broker enforces DTBP in extended hours but system
+  uses 2× overnight mode pre-open)
+- House/concentration surcharges consuming DTBP faster than notional cost
+- GTC orders executing after-hours at 4× sizing while system is in 2× overnight
+- FIFO/LIFO matching ambiguity for partial sell DTBP release
+
+## Claude Opus Unique Findings
+
+- **PDT designation trigger gap:** System passively reads PDT status but doesn't
+  preemptively gate the 4th day trade that CAUSES designation; $25k equity not
+  verified before triggering trade
+- **90-day freeze allows day trades:** Design restricts to 1× buying power but
+  FINRA actually PROHIBITS the activity entirely during escalation (not just
+  restricts leverage) — a genuine regulatory violation
+- **Margin call issuance date recovery:** If pipeline is down when call is issued,
+  system sets issued_at to detection time, not actual issuance → extends cure
+  period beyond regulatory 5 days
+- **Time-and-tick accounting requirement:** FINRA requires tracking maximum open
+  commitment (high water mark) for DTBP, not net basis — the release logic may
+  violate this
+- **Multiple concurrent margin calls:** Second call upserts over first, losing the
+  earlier deadline (single-state-per-user model inadequate)
+- **dtbp_used_today NOT reset in margin call mode:** Close sequence guard
+  (`bp_mode != :margin_dtbp`) skips reset, causing stale accumulation
+- **Cash account free-riding 90-day freeze:** Broader Reg T scope not modeled
+- **Broker re-query race on rapid fills:** Response ordering creates stale DTBP
+  window between consecutive fills
+
+## Claude Sonnet Unique Findings
+
+- PDT designation timing mismatch (Gargoyle vs broker overnight batch)
+- Wash sale impact on maintenance requirements affecting DTBP (IRS interaction)
+
+## Key Insights
+
+### 1. Regulatory domain expertise varies significantly across models
+
+- **Opus has deepest regulatory knowledge.** Cited specific FINRA Rule 4210
+  subsections, understood the distinction between restricting leverage vs
+  prohibiting activity, and knew about time-and-tick DTBP accounting.
+- **GPT-5 has deepest broker-API semantic knowledge.** Reasoned about what
+  specific broker API fields actually mean vs what the design assumes
+  (buying_power already being 2×, DTBP in extended hours, house surcharges).
+- **Sonnet is competent but surface-level.** Good coverage for a first pass
+  but doesn't match regulatory depth of Opus or semantic precision of GPT-5.
+
+### 2. Domain-specific lens changes model ranking
+
+In general assumption-finding (previous experiments):
+- GPT-5 > Sonnet > Opus (by count)
+- Opus > GPT-5 > Sonnet (by insight per finding)
+
+In regulatory compliance analysis:
+- Opus > GPT-5 > Sonnet (by regulatory significance)
+- GPT-5 > Opus > Sonnet (by broker-semantic precision)
+
+The regulatory lens ELEVATED Opus because it triggered domain-specific
+knowledge that Opus possesses more deeply than the other models.
+
+### 3. Content filters as a new failure mode
+
+Enterprise AI proxies may filter financial/regulatory analytical content.
+GPT-5's response was cut off by content safety — a failure mode not seen
+in architectural analysis. For production regulatory compliance review,
+use direct API access or configure filters for analytical discourse.
+
+## Practical Implications
+
+For systems with regulatory requirements (finance, healthcare, legal):
+- **Run Opus for regulatory compliance analysis** — its domain knowledge
+  produces findings other models won't surface
+- **Combine with GPT-5 for implementation semantics** — what does this API
+  field actually mean in practice?
+- **Sonnet for fast first-pass** but not sole reviewer for regulatory matters
+- **Direct API access for financial domain** — enterprise proxy content
+  filters may interfere
+
+## Comparison to Previous Experiments
+
+This extends the finding from #11 and #13 that task type changes model
+performance. Here we show that task DOMAIN also matters. A model's strength
+on architectural reasoning doesn't predict its strength on regulatory
+reasoning. The optimal model assignment depends on both:
+- Task type (assumptions vs races vs compliance)
+- Task domain (architecture vs regulation vs security)