Finding 28: Regulatory compliance analysis on wash sale tracking

- GPT-5 most comprehensive on IRS-specific rules (18 findings, 9600 reasoning tokens)
- Sonnet fast first-pass (14 findings in 25s)
- Opus high-density actionable (11 findings with clear remediation)
- New insight: domain expertise tasks favor GPT-5 reasoning depth
- Updated model assignment for compliance review workflow
This commit is contained in:
Rodin
2026-05-11 00:29:12 -07:00
parent 2b10595bff
commit ac55ecdb98
@@ -0,0 +1,107 @@
# Finding 28: Regulatory Compliance Assumption-Finding on Wash Sale Tracking
**Date:** 2026-05-11
**Task:** Regulatory compliance assumption-finding on tax law specification
**Document:** `wash-sale-tracking.md` (127 lines)
**Topic:** IRC §1091 wash sale detection, disallowed loss computation, holding period tacking
## Summary
GPT-5 is most comprehensive on domain-specific IRS rules; Opus produces high-density actionable recommendations; Sonnet is strong on structural compliance gaps.
## Results
| Model | Time | Output tokens | Reasoning tokens | Findings | Critical | High | Medium |
|---|---|---|---|---|---|---|---|
| GPT-5 | ~160s | 2,832 | 9,600 | 18 | 2 | 11 | 5 |
| Claude Sonnet 4.6 | ~25s | 1,614 | (internal) | 14 | 3 | 5 | 6 |
| Claude Opus 4.6 | ~77s | 2,688 | (internal) | 11 | 2 | 4 | 5 |
## Common Ground (All 3 Identified)
- "Substantially identical" definition too narrow (misses options, convertibles, ADRs)
- No cross-account/cross-entity detection (IRS rule applies to spouse, controlled corp)
- Trade date vs settlement date ambiguity
- Short sale wash sale rules not addressed
- Corporate action interactions (splits, mergers creating replacement lots)
- DRIP purchases should trigger wash sale detection
## GPT-5 Unique Findings
1. **IRA/Roth IRA permanent disallowance** — If replacement is in IRA, loss is PERMANENTLY disallowed (no basis adjustment permitted), not just deferred. The doc's model is incorrect for IRA replacements.
2. **Per-share vs lot-level granularity** — Doc says "replacement lot's effective cost basis" but IRS requires per-share matching within lots.
3. **Multiple replacement lot allocation order** — IRS requires FIFO chronological matching; doc's simple min() formula doesn't handle multiple replacements.
4. **Pre-sale purchase "still held" requirement** — Shares bought 30 days before sale must still be held ON sale date to count as replacement.
5. **Replacement chain propagation** — Successive wash sales need to layer basis adjustments through multiple lots.
6. **§475(f) mark-to-market trader carve-out** — MTM traders are exempt from wash sale rules.
7. **Global caps across multiple replacement lots** — Sum of disallowed losses can't exceed total loss.
8. **Conversions/exercise/assignment as acquisitions** — Option exercise counts as acquisition.
## Sonnet Unique Findings
1. Constructive sales and short-against-the-box interactions
2. Related party trust/estate transactions (IRC §1091 extends to trusts)
3. Year-end timing boundary issues (December losses with January replacements span tax years)
4. Broker 1099-B reconciliation differences creating user confusion
5. Fractional share rounding errors compounding over many wash sales
## Opus Unique Findings
1. **Holding period tacking should ADD periods, not just backdate** — Doc says "backdated" but correct rule is sum of holding periods
2. Same-day transaction ordering race condition
3. (Note: One finding referenced wrong document — rare analytical error)
## Key Insights
### Domain Expertise Matters
This experiment tested a domain-specific analytical task (IRS tax law) rather than general architectural analysis. The results show:
1. **GPT-5's reasoning tokens excel at regulatory specificity.** It consistently cited correct IRS publication sections, code numbers, and regulatory references. The 9,600 reasoning tokens appear spent on VERIFICATION.
2. **Opus is MORE reliable on regulatory tasks than general architecture review.** 10/11 findings were precise and actionable. Domain specificity constrains Opus's tendency toward exploratory analysis.
3. **Sonnet is well-suited for compliance first-pass.** At 25s for 14 findings, provides good breadth for identifying areas of concern.
### Severity Distribution
| Model | Critical | High | Medium | Total |
|---|---|---|---|---|
| GPT-5 | 2 | 11 | 5 | 18 |
| Sonnet | 3 | 5 | 6 | 14 |
| Opus | 2 | 4 | 5 | 11 |
Sonnet rated MORE findings as Critical despite finding fewer issues overall, suggesting different severity calibration.
## Practical Implication
### Updated Model Assignment for Compliance Review
1. **First pass: Sonnet** — Fast, broad coverage, identifies areas of concern
2. **Deep dive: GPT-5** — Regulatory specificity, correct citations, highest precision
3. **Recommendations: Opus** — Actionable remediation, design-level scope statements
## Prompt Used
```
Analyze this architecture document for regulatory compliance assumptions.
The document describes wash sale tracking for IRS tax compliance. Your task: identify
assumptions the design makes about regulatory requirements, edge cases where the
implementation might diverge from IRS rules, and places where the simplifications
described could lead to incorrect tax reporting.
Focus on:
1. IRS rule interpretation gaps
2. Edge cases the detection model might miss
3. Calculation edge cases that could produce incorrect disallowed amounts
4. Cross-account/cross-entity issues
5. Timing/settlement date issues
6. Corporate action interactions with wash sale tracking
For each finding, specify:
- The assumption or gap
- The relevant IRS rule or section (if known)
- How it could lead to incorrect tax reporting
- Severity (Critical/High/Medium/Low)
```