Finding 28: Regulatory compliance analysis on wash sale tracking

- GPT-5 most comprehensive on IRS-specific rules (18 findings, 9600 reasoning tokens) - Sonnet fast first-pass (14 findings in 25s) - Opus high-density actionable (11 findings with clear remediation) - New insight: domain expertise tasks favor GPT-5 reasoning depth - Updated model assignment for compliance review workflow
2026-05-11 00:29:12 -07:00
parent 2b10595bff
commit ac55ecdb98
1 changed files with 107 additions and 0 deletions
@@ -0,0 +1,107 @@
+# Finding 28: Regulatory Compliance Assumption-Finding on Wash Sale Tracking
+
+**Date:** 2026-05-11
+**Task:** Regulatory compliance assumption-finding on tax law specification
+**Document:** `wash-sale-tracking.md` (127 lines)
+**Topic:** IRC §1091 wash sale detection, disallowed loss computation, holding period tacking
+
+## Summary
+
+GPT-5 is most comprehensive on domain-specific IRS rules; Opus produces high-density actionable recommendations; Sonnet is strong on structural compliance gaps.
+
+## Results
+
+| Model | Time | Output tokens | Reasoning tokens | Findings | Critical | High | Medium |
+|---|---|---|---|---|---|---|---|
+| GPT-5 | ~160s | 2,832 | 9,600 | 18 | 2 | 11 | 5 |
+| Claude Sonnet 4.6 | ~25s | 1,614 | (internal) | 14 | 3 | 5 | 6 |
+| Claude Opus 4.6 | ~77s | 2,688 | (internal) | 11 | 2 | 4 | 5 |
+
+## Common Ground (All 3 Identified)
+
+- "Substantially identical" definition too narrow (misses options, convertibles, ADRs)
+- No cross-account/cross-entity detection (IRS rule applies to spouse, controlled corp)
+- Trade date vs settlement date ambiguity
+- Short sale wash sale rules not addressed
+- Corporate action interactions (splits, mergers creating replacement lots)
+- DRIP purchases should trigger wash sale detection
+
+## GPT-5 Unique Findings
+
+1. **IRA/Roth IRA permanent disallowance** — If replacement is in IRA, loss is PERMANENTLY disallowed (no basis adjustment permitted), not just deferred. The doc's model is incorrect for IRA replacements.
+2. **Per-share vs lot-level granularity** — Doc says "replacement lot's effective cost basis" but IRS requires per-share matching within lots.
+3. **Multiple replacement lot allocation order** — IRS requires FIFO chronological matching; doc's simple min() formula doesn't handle multiple replacements.
+4. **Pre-sale purchase "still held" requirement** — Shares bought 30 days before sale must still be held ON sale date to count as replacement.
+5. **Replacement chain propagation** — Successive wash sales need to layer basis adjustments through multiple lots.
+6. **§475(f) mark-to-market trader carve-out** — MTM traders are exempt from wash sale rules.
+7. **Global caps across multiple replacement lots** — Sum of disallowed losses can't exceed total loss.
+8. **Conversions/exercise/assignment as acquisitions** — Option exercise counts as acquisition.
+
+## Sonnet Unique Findings
+
+1. Constructive sales and short-against-the-box interactions
+2. Related party trust/estate transactions (IRC §1091 extends to trusts)
+3. Year-end timing boundary issues (December losses with January replacements span tax years)
+4. Broker 1099-B reconciliation differences creating user confusion
+5. Fractional share rounding errors compounding over many wash sales
+
+## Opus Unique Findings
+
+1. **Holding period tacking should ADD periods, not just backdate** — Doc says "backdated" but correct rule is sum of holding periods
+2. Same-day transaction ordering race condition
+3. (Note: One finding referenced wrong document — rare analytical error)
+
+## Key Insights
+
+### Domain Expertise Matters
+
+This experiment tested a domain-specific analytical task (IRS tax law) rather than general architectural analysis. The results show:
+
+1. **GPT-5's reasoning tokens excel at regulatory specificity.** It consistently cited correct IRS publication sections, code numbers, and regulatory references. The 9,600 reasoning tokens appear spent on VERIFICATION.
+
+2. **Opus is MORE reliable on regulatory tasks than general architecture review.** 10/11 findings were precise and actionable. Domain specificity constrains Opus's tendency toward exploratory analysis.
+
+3. **Sonnet is well-suited for compliance first-pass.** At 25s for 14 findings, provides good breadth for identifying areas of concern.
+
+### Severity Distribution
+
+| Model | Critical | High | Medium | Total |
+|---|---|---|---|---|
+| GPT-5 | 2 | 11 | 5 | 18 |
+| Sonnet | 3 | 5 | 6 | 14 |
+| Opus | 2 | 4 | 5 | 11 |
+
+Sonnet rated MORE findings as Critical despite finding fewer issues overall, suggesting different severity calibration.
+
+## Practical Implication
+
+### Updated Model Assignment for Compliance Review
+
+1. **First pass: Sonnet** — Fast, broad coverage, identifies areas of concern
+2. **Deep dive: GPT-5** — Regulatory specificity, correct citations, highest precision
+3. **Recommendations: Opus** — Actionable remediation, design-level scope statements
+
+## Prompt Used
+
+```
+Analyze this architecture document for regulatory compliance assumptions.
+
+The document describes wash sale tracking for IRS tax compliance. Your task: identify
+assumptions the design makes about regulatory requirements, edge cases where the
+implementation might diverge from IRS rules, and places where the simplifications
+described could lead to incorrect tax reporting.
+
+Focus on:
+1. IRS rule interpretation gaps
+2. Edge cases the detection model might miss
+3. Calculation edge cases that could produce incorrect disallowed amounts
+4. Cross-account/cross-entity issues
+5. Timing/settlement date issues
+6. Corporate action interactions with wash sale tracking
+
+For each finding, specify:
+- The assumption or gap
+- The relevant IRS rule or section (if known)
+- How it could lead to incorrect tax reporting
+- Severity (Critical/High/Medium/Low)
+```