Finding 28: Regulatory compliance analysis on wash sale tracking
- GPT-5 most comprehensive on IRS-specific rules (18 findings, 9600 reasoning tokens) - Sonnet fast first-pass (14 findings in 25s) - Opus high-density actionable (11 findings with clear remediation) - New insight: domain expertise tasks favor GPT-5 reasoning depth - Updated model assignment for compliance review workflow
This commit is contained in:
@@ -0,0 +1,107 @@
|
|||||||
|
# Finding 28: Regulatory Compliance Assumption-Finding on Wash Sale Tracking
|
||||||
|
|
||||||
|
**Date:** 2026-05-11
|
||||||
|
**Task:** Regulatory compliance assumption-finding on tax law specification
|
||||||
|
**Document:** `wash-sale-tracking.md` (127 lines)
|
||||||
|
**Topic:** IRC §1091 wash sale detection, disallowed loss computation, holding period tacking
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
GPT-5 is most comprehensive on domain-specific IRS rules; Opus produces high-density actionable recommendations; Sonnet is strong on structural compliance gaps.
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
| Model | Time | Output tokens | Reasoning tokens | Findings | Critical | High | Medium |
|
||||||
|
|---|---|---|---|---|---|---|---|
|
||||||
|
| GPT-5 | ~160s | 2,832 | 9,600 | 18 | 2 | 11 | 5 |
|
||||||
|
| Claude Sonnet 4.6 | ~25s | 1,614 | (internal) | 14 | 3 | 5 | 6 |
|
||||||
|
| Claude Opus 4.6 | ~77s | 2,688 | (internal) | 11 | 2 | 4 | 5 |
|
||||||
|
|
||||||
|
## Common Ground (All 3 Identified)
|
||||||
|
|
||||||
|
- "Substantially identical" definition too narrow (misses options, convertibles, ADRs)
|
||||||
|
- No cross-account/cross-entity detection (IRS rule applies to spouse, controlled corp)
|
||||||
|
- Trade date vs settlement date ambiguity
|
||||||
|
- Short sale wash sale rules not addressed
|
||||||
|
- Corporate action interactions (splits, mergers creating replacement lots)
|
||||||
|
- DRIP purchases should trigger wash sale detection
|
||||||
|
|
||||||
|
## GPT-5 Unique Findings
|
||||||
|
|
||||||
|
1. **IRA/Roth IRA permanent disallowance** — If replacement is in IRA, loss is PERMANENTLY disallowed (no basis adjustment permitted), not just deferred. The doc's model is incorrect for IRA replacements.
|
||||||
|
2. **Per-share vs lot-level granularity** — Doc says "replacement lot's effective cost basis" but IRS requires per-share matching within lots.
|
||||||
|
3. **Multiple replacement lot allocation order** — IRS requires FIFO chronological matching; doc's simple min() formula doesn't handle multiple replacements.
|
||||||
|
4. **Pre-sale purchase "still held" requirement** — Shares bought 30 days before sale must still be held ON sale date to count as replacement.
|
||||||
|
5. **Replacement chain propagation** — Successive wash sales need to layer basis adjustments through multiple lots.
|
||||||
|
6. **§475(f) mark-to-market trader carve-out** — MTM traders are exempt from wash sale rules.
|
||||||
|
7. **Global caps across multiple replacement lots** — Sum of disallowed losses can't exceed total loss.
|
||||||
|
8. **Conversions/exercise/assignment as acquisitions** — Option exercise counts as acquisition.
|
||||||
|
|
||||||
|
## Sonnet Unique Findings
|
||||||
|
|
||||||
|
1. Constructive sales and short-against-the-box interactions
|
||||||
|
2. Related party trust/estate transactions (IRC §1091 extends to trusts)
|
||||||
|
3. Year-end timing boundary issues (December losses with January replacements span tax years)
|
||||||
|
4. Broker 1099-B reconciliation differences creating user confusion
|
||||||
|
5. Fractional share rounding errors compounding over many wash sales
|
||||||
|
|
||||||
|
## Opus Unique Findings
|
||||||
|
|
||||||
|
1. **Holding period tacking should ADD periods, not just backdate** — Doc says "backdated" but correct rule is sum of holding periods
|
||||||
|
2. Same-day transaction ordering race condition
|
||||||
|
3. (Note: One finding referenced wrong document — rare analytical error)
|
||||||
|
|
||||||
|
## Key Insights
|
||||||
|
|
||||||
|
### Domain Expertise Matters
|
||||||
|
|
||||||
|
This experiment tested a domain-specific analytical task (IRS tax law) rather than general architectural analysis. The results show:
|
||||||
|
|
||||||
|
1. **GPT-5's reasoning tokens excel at regulatory specificity.** It consistently cited correct IRS publication sections, code numbers, and regulatory references. The 9,600 reasoning tokens appear spent on VERIFICATION.
|
||||||
|
|
||||||
|
2. **Opus is MORE reliable on regulatory tasks than general architecture review.** 10/11 findings were precise and actionable. Domain specificity constrains Opus's tendency toward exploratory analysis.
|
||||||
|
|
||||||
|
3. **Sonnet is well-suited for compliance first-pass.** At 25s for 14 findings, provides good breadth for identifying areas of concern.
|
||||||
|
|
||||||
|
### Severity Distribution
|
||||||
|
|
||||||
|
| Model | Critical | High | Medium | Total |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| GPT-5 | 2 | 11 | 5 | 18 |
|
||||||
|
| Sonnet | 3 | 5 | 6 | 14 |
|
||||||
|
| Opus | 2 | 4 | 5 | 11 |
|
||||||
|
|
||||||
|
Sonnet rated MORE findings as Critical despite finding fewer issues overall, suggesting different severity calibration.
|
||||||
|
|
||||||
|
## Practical Implication
|
||||||
|
|
||||||
|
### Updated Model Assignment for Compliance Review
|
||||||
|
|
||||||
|
1. **First pass: Sonnet** — Fast, broad coverage, identifies areas of concern
|
||||||
|
2. **Deep dive: GPT-5** — Regulatory specificity, correct citations, highest precision
|
||||||
|
3. **Recommendations: Opus** — Actionable remediation, design-level scope statements
|
||||||
|
|
||||||
|
## Prompt Used
|
||||||
|
|
||||||
|
```
|
||||||
|
Analyze this architecture document for regulatory compliance assumptions.
|
||||||
|
|
||||||
|
The document describes wash sale tracking for IRS tax compliance. Your task: identify
|
||||||
|
assumptions the design makes about regulatory requirements, edge cases where the
|
||||||
|
implementation might diverge from IRS rules, and places where the simplifications
|
||||||
|
described could lead to incorrect tax reporting.
|
||||||
|
|
||||||
|
Focus on:
|
||||||
|
1. IRS rule interpretation gaps
|
||||||
|
2. Edge cases the detection model might miss
|
||||||
|
3. Calculation edge cases that could produce incorrect disallowed amounts
|
||||||
|
4. Cross-account/cross-entity issues
|
||||||
|
5. Timing/settlement date issues
|
||||||
|
6. Corporate action interactions with wash sale tracking
|
||||||
|
|
||||||
|
For each finding, specify:
|
||||||
|
- The assumption or gap
|
||||||
|
- The relevant IRS rule or section (if known)
|
||||||
|
- How it could lead to incorrect tax reporting
|
||||||
|
- Severity (Critical/High/Medium/Low)
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user