diff --git a/findings/2026-05-11-wash-sale-regulatory-compliance.md b/findings/2026-05-11-wash-sale-regulatory-compliance.md new file mode 100644 index 0000000..c7dedd3 --- /dev/null +++ b/findings/2026-05-11-wash-sale-regulatory-compliance.md @@ -0,0 +1,107 @@ +# Finding 28: Regulatory Compliance Assumption-Finding on Wash Sale Tracking + +**Date:** 2026-05-11 +**Task:** Regulatory compliance assumption-finding on tax law specification +**Document:** `wash-sale-tracking.md` (127 lines) +**Topic:** IRC §1091 wash sale detection, disallowed loss computation, holding period tacking + +## Summary + +GPT-5 is most comprehensive on domain-specific IRS rules; Opus produces high-density actionable recommendations; Sonnet is strong on structural compliance gaps. + +## Results + +| Model | Time | Output tokens | Reasoning tokens | Findings | Critical | High | Medium | +|---|---|---|---|---|---|---|---| +| GPT-5 | ~160s | 2,832 | 9,600 | 18 | 2 | 11 | 5 | +| Claude Sonnet 4.6 | ~25s | 1,614 | (internal) | 14 | 3 | 5 | 6 | +| Claude Opus 4.6 | ~77s | 2,688 | (internal) | 11 | 2 | 4 | 5 | + +## Common Ground (All 3 Identified) + +- "Substantially identical" definition too narrow (misses options, convertibles, ADRs) +- No cross-account/cross-entity detection (IRS rule applies to spouse, controlled corp) +- Trade date vs settlement date ambiguity +- Short sale wash sale rules not addressed +- Corporate action interactions (splits, mergers creating replacement lots) +- DRIP purchases should trigger wash sale detection + +## GPT-5 Unique Findings + +1. **IRA/Roth IRA permanent disallowance** — If replacement is in IRA, loss is PERMANENTLY disallowed (no basis adjustment permitted), not just deferred. The doc's model is incorrect for IRA replacements. +2. **Per-share vs lot-level granularity** — Doc says "replacement lot's effective cost basis" but IRS requires per-share matching within lots. +3. **Multiple replacement lot allocation order** — IRS requires FIFO chronological matching; doc's simple min() formula doesn't handle multiple replacements. +4. **Pre-sale purchase "still held" requirement** — Shares bought 30 days before sale must still be held ON sale date to count as replacement. +5. **Replacement chain propagation** — Successive wash sales need to layer basis adjustments through multiple lots. +6. **§475(f) mark-to-market trader carve-out** — MTM traders are exempt from wash sale rules. +7. **Global caps across multiple replacement lots** — Sum of disallowed losses can't exceed total loss. +8. **Conversions/exercise/assignment as acquisitions** — Option exercise counts as acquisition. + +## Sonnet Unique Findings + +1. Constructive sales and short-against-the-box interactions +2. Related party trust/estate transactions (IRC §1091 extends to trusts) +3. Year-end timing boundary issues (December losses with January replacements span tax years) +4. Broker 1099-B reconciliation differences creating user confusion +5. Fractional share rounding errors compounding over many wash sales + +## Opus Unique Findings + +1. **Holding period tacking should ADD periods, not just backdate** — Doc says "backdated" but correct rule is sum of holding periods +2. Same-day transaction ordering race condition +3. (Note: One finding referenced wrong document — rare analytical error) + +## Key Insights + +### Domain Expertise Matters + +This experiment tested a domain-specific analytical task (IRS tax law) rather than general architectural analysis. The results show: + +1. **GPT-5's reasoning tokens excel at regulatory specificity.** It consistently cited correct IRS publication sections, code numbers, and regulatory references. The 9,600 reasoning tokens appear spent on VERIFICATION. + +2. **Opus is MORE reliable on regulatory tasks than general architecture review.** 10/11 findings were precise and actionable. Domain specificity constrains Opus's tendency toward exploratory analysis. + +3. **Sonnet is well-suited for compliance first-pass.** At 25s for 14 findings, provides good breadth for identifying areas of concern. + +### Severity Distribution + +| Model | Critical | High | Medium | Total | +|---|---|---|---|---| +| GPT-5 | 2 | 11 | 5 | 18 | +| Sonnet | 3 | 5 | 6 | 14 | +| Opus | 2 | 4 | 5 | 11 | + +Sonnet rated MORE findings as Critical despite finding fewer issues overall, suggesting different severity calibration. + +## Practical Implication + +### Updated Model Assignment for Compliance Review + +1. **First pass: Sonnet** — Fast, broad coverage, identifies areas of concern +2. **Deep dive: GPT-5** — Regulatory specificity, correct citations, highest precision +3. **Recommendations: Opus** — Actionable remediation, design-level scope statements + +## Prompt Used + +``` +Analyze this architecture document for regulatory compliance assumptions. + +The document describes wash sale tracking for IRS tax compliance. Your task: identify +assumptions the design makes about regulatory requirements, edge cases where the +implementation might diverge from IRS rules, and places where the simplifications +described could lead to incorrect tax reporting. + +Focus on: +1. IRS rule interpretation gaps +2. Edge cases the detection model might miss +3. Calculation edge cases that could produce incorrect disallowed amounts +4. Cross-account/cross-entity issues +5. Timing/settlement date issues +6. Corporate action interactions with wash sale tracking + +For each finding, specify: +- The assumption or gap +- The relevant IRS rule or section (if known) +- How it could lead to incorrect tax reporting +- Severity (Critical/High/Medium/Low) +```